I need your help to understand how I can have random bad data in my Used Space graph.
Because a image talk more than long explanation :
When it's ok : When the problem appear : - The problem come and disappears without explanation.
- I cant reproduce the problem on my debug plateforme with less active graphics.
The bad data were stored in the RRD but the snmp polling was correct.
I have launch tcpdump during half a day to have this confirmation.
My Tcpdump :
Code: Select all
tcpdump -s0 -i eth0 -n host X.X.X.X and src port 161 | grep "1.3.6.1.2.1.25.2.3.1"
17:59:13.877429 IP X.X.X.X.snmp > 10.7.79.6.33393: C=Ais35Noc GetResponse(35) .1.3.6.1.2.1.25.2.3.1.5.37=30407195
17:59:15.437682 IP X.X.X.X.snmp > 10.7.79.6.48804: C=Ais35Noc GetResponse(34) .1.3.6.1.2.1.25.2.3.1.6.37=2818065
18:04:14.913904 IP X.X.X.X.snmp > 10.7.79.6.50700: C=Ais35Noc GetResponse(35) .1.3.6.1.2.1.25.2.3.1.5.37=30407195
18:04:14.914863 IP X.X.X.X.snmp > 10.7.79.6.42538: C=Ais35Noc GetResponse(34) .1.3.6.1.2.1.25.2.3.1.6.37=2818136
18:09:13.412460 IP X.X.X.X.snmp > 10.7.79.6.59472: C=Ais35Noc GetResponse(35) .1.3.6.1.2.1.25.2.3.1.5.37=30407195
18:09:13.414845 IP X.X.X.X.snmp > 10.7.79.6.47269: C=Ais35Noc GetResponse(34) .1.3.6.1.2.1.25.2.3.1.6.37=2818205
Code: Select all
[root@cacti-04 ~]# rrdtool fetch /cacti/rra/X.X.X_hdd_total_39414.rrd AVERAGE (37 /data 4096 Bytes)
hdd_used hdd_total
1353084900: 1,1542545531e+10 1,2454787072e+11
1353085200: 1,1542839801e+10 1,0602881503e+11
1353085500: 1,1543129334e+10 2,4860187976e+10 Your time zone: vendredi 16 novembre 2012 18:05:00 GMT+1
1353085800: 1,1543410033e+10 1,2454787072e+11
I take all ideas to debug this problem.
If someone have a solution to active debug log only for one host ? (because activate debuging for all it is impossible regards to the big number off polling)
Plateform :
Cacti Version 0.8.8a with spine polling
Cacti OS unix
PHP Version 5.2.16
SNMP Version NET-SNMP version: 5.3.2.2
RRDTool Version RRDTool 1.4.x
Hosts more than 1 000
Graphs more than 15 000
Data Sources more than 25 000