My graphs stop updating
Moderators: Developers, Moderators
My graphs stop updating
I don't understand why, but sometimes, it happens that my graphs stop updating. When I nuke the host with the Data Sources and then, I recreate them, it re-starts. But it's not the best solution (I hope) !
I think that my RRDfiles are no more updated because after my host_and_data_sources rebirth there is a blank between when it stops and when it re-starts.
Hope I've been clear.
Anyone has a track ?
thx
I think that my RRDfiles are no more updated because after my host_and_data_sources rebirth there is a blank between when it stops and when it re-starts.
Hope I've been clear.
Anyone has a track ?
thx
i have the exact same thing, only this happened all at once, and only with Cisco CPU and memory graphs!
attached is an example...
this is what i did to try and restore the graphs, to no avail:
cleared poller cache
removed from the device the two graphs and re-installed them
checked permissions on the rrds
here is the rrd info:
filename = "snmp_alex_acca_5min_cpu_88.rrd"
rrd_version = "0001"
step = 300
last_update = 1091645648
ds[5min_cpu].type = "GAUGE"
ds[5min_cpu].minimal_heartbeat = 600
ds[5min_cpu].min = 0.0000000000e+00
ds[5min_cpu].max = 1.0000000000e+02
ds[5min_cpu].last_ds = "UNKN"
ds[5min_cpu].value = 0.0000000000e+00
ds[5min_cpu].unknown_sec = 248
rra[0].cf = "AVERAGE"
rra[0].rows = 700
rra[0].pdp_per_row = 6
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 4
rra[1].cf = "AVERAGE"
rra[1].rows = 17280
rra[1].pdp_per_row = 1
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = NaN
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = NaN
rra[2].cdp_prep[0].unknown_datapoints = 10
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 5.5556000000e+02
rra[3].cdp_prep[0].unknown_datapoints = 119
rra[4].cf = "MAX"
rra[4].rows = 700
rra[4].pdp_per_row = 6
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 4
rra[5].cf = "MAX"
rra[5].rows = 17280
rra[5].pdp_per_row = 1
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = NaN
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = NaN
rra[6].cdp_prep[0].unknown_datapoints = 10
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 1.2000000000e+01
rra[7].cdp_prep[0].unknown_datapoints = 119
rdtool fetch shows:
1091607300: 1.0380000000e+01
1091607600: 1.1376666667e+01
1091607900: 1.2000000000e+01
1091608200: 1.2000000000e+01
1091608500: 1.2000000000e+01
1091608800: 1.2000000000e+01
1091609100: 1.2000000000e+01
1091609400: 1.2000000000e+01
1091609700: 1.2000000000e+01
1091610000: nan
1091610300: nan
1091610600: nan
1091610900: nan
1091611200: nan
1091611500: nan
1091611800: nan
1091612100: nan
this happened only on Cisco devices, on these specific graphs.
it must be something i did, but what ?
--harel
attached is an example...
this is what i did to try and restore the graphs, to no avail:
cleared poller cache
removed from the device the two graphs and re-installed them
checked permissions on the rrds
here is the rrd info:
filename = "snmp_alex_acca_5min_cpu_88.rrd"
rrd_version = "0001"
step = 300
last_update = 1091645648
ds[5min_cpu].type = "GAUGE"
ds[5min_cpu].minimal_heartbeat = 600
ds[5min_cpu].min = 0.0000000000e+00
ds[5min_cpu].max = 1.0000000000e+02
ds[5min_cpu].last_ds = "UNKN"
ds[5min_cpu].value = 0.0000000000e+00
ds[5min_cpu].unknown_sec = 248
rra[0].cf = "AVERAGE"
rra[0].rows = 700
rra[0].pdp_per_row = 6
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 4
rra[1].cf = "AVERAGE"
rra[1].rows = 17280
rra[1].pdp_per_row = 1
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = NaN
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = NaN
rra[2].cdp_prep[0].unknown_datapoints = 10
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 5.5556000000e+02
rra[3].cdp_prep[0].unknown_datapoints = 119
rra[4].cf = "MAX"
rra[4].rows = 700
rra[4].pdp_per_row = 6
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 4
rra[5].cf = "MAX"
rra[5].rows = 17280
rra[5].pdp_per_row = 1
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = NaN
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = NaN
rra[6].cdp_prep[0].unknown_datapoints = 10
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 1.2000000000e+01
rra[7].cdp_prep[0].unknown_datapoints = 119
rdtool fetch shows:
1091607300: 1.0380000000e+01
1091607600: 1.1376666667e+01
1091607900: 1.2000000000e+01
1091608200: 1.2000000000e+01
1091608500: 1.2000000000e+01
1091608800: 1.2000000000e+01
1091609100: 1.2000000000e+01
1091609400: 1.2000000000e+01
1091609700: 1.2000000000e+01
1091610000: nan
1091610300: nan
1091610600: nan
1091610900: nan
1091611200: nan
1091611500: nan
1091611800: nan
1091612100: nan
this happened only on Cisco devices, on these specific graphs.
it must be something i did, but what ?
--harel
- Attachments
-
- cisco.gif (20.31 KiB) Viewed 4267 times
After clearing your poller cache, can you still find references to the "broken" Cisco templates in there? The cause of your problem really depends on whether Cacti is actually trying to poll data for these graphs or not.
If Cacti is actually re-creating the .rrd files for these templates after you delete them, that is a good sign.
Do you see any strange output when you run cmd.php/cactid?
-Ian
If Cacti is actually re-creating the .rrd files for these templates after you delete them, that is a good sign.
Do you see any strange output when you run cmd.php/cactid?
-Ian
Hi Rax,
Data Source: SNMP - Alex - Gw - 5 Minute CPU
RRD: /var/www/htdocs/cacti-0.8.5a/rra/snmp_alex_gw_5min_cpu_42.rrd
Action: 0, OID: (Host: 172.16.0.1, Community: mycom)
i tried creating a new router-device, to graph only cpu on that router. same results...
as for the cmd.php output: yes, definitely. this was not there before:
Missing object name
USAGE: snmpget [OPTIONS] AGENT OID [OID]... Version: 5.1.1
Web: http://www.net-snmp.org/
Email: net-snmp-coders@lists.sourceforge.net
OPTIONS:
-h, --help display this help message
-H display configuration file directives understood
-v 1|2c|3 specifies SNMP version to use
-V, --version display package version number
SNMP Version 1 or 2c specific
-c COMMUNITY set the community string
SNMP Version 3 specific
-a PROTOCOL set authentication protocol (MD5|SHA)
-A PASSPHRASE set authentication protocol pass phrase
-e ENGINE-ID set security engine ID (e.g. 800000020109840301)
-E ENGINE-ID set context engine ID (e.g. 800000020109840301)
-l LEVEL set security level (noAuthNoPriv|a
etc.
this appears every so often, right before a cisco router / switch normal output.
i tried snmpwalk and snmpget to see maybe the cpu oid got corrupted or something, and i get it just fine:
root@dublin:/var/www/htdocs/cacti# snmpwalk -v1 -c mycom 172.16.0.1:161 .1.3.6.1.4.1.9.2.1.58.0
SNMPv2-SMI::enterprises.9.2.1.58.0 = INTEGER: 4
it seems strange that all graphs for memory and cpu stopped acting together.
does this help?
--harel
after clearing the poller cache, i can see:After clearing your poller cache, can you still find references to the "broken" Cisco templates in there? The cause of your problem really depends on whether Cacti is actually trying to poll data for these graphs or not.
Data Source: SNMP - Alex - Gw - 5 Minute CPU
RRD: /var/www/htdocs/cacti-0.8.5a/rra/snmp_alex_gw_5min_cpu_42.rrd
Action: 0, OID: (Host: 172.16.0.1, Community: mycom)
seems like the rrd is updated, because it has the same time as the other rrds.If Cacti is actually re-creating the .rrd files for these templates after you delete them, that is a good sign.
i tried creating a new router-device, to graph only cpu on that router. same results...
as for the cmd.php output: yes, definitely. this was not there before:
Missing object name
USAGE: snmpget [OPTIONS] AGENT OID [OID]... Version: 5.1.1
Web: http://www.net-snmp.org/
Email: net-snmp-coders@lists.sourceforge.net
OPTIONS:
-h, --help display this help message
-H display configuration file directives understood
-v 1|2c|3 specifies SNMP version to use
-V, --version display package version number
SNMP Version 1 or 2c specific
-c COMMUNITY set the community string
SNMP Version 3 specific
-a PROTOCOL set authentication protocol (MD5|SHA)
-A PASSPHRASE set authentication protocol pass phrase
-e ENGINE-ID set security engine ID (e.g. 800000020109840301)
-E ENGINE-ID set context engine ID (e.g. 800000020109840301)
-l LEVEL set security level (noAuthNoPriv|a
etc.
this appears every so often, right before a cisco router / switch normal output.
i tried snmpwalk and snmpget to see maybe the cpu oid got corrupted or something, and i get it just fine:
root@dublin:/var/www/htdocs/cacti# snmpwalk -v1 -c mycom 172.16.0.1:161 .1.3.6.1.4.1.9.2.1.58.0
SNMPv2-SMI::enterprises.9.2.1.58.0 = INTEGER: 4
it seems strange that all graphs for memory and cpu stopped acting together.
does this help?
--harel
-
- Cacti User
- Posts: 311
- Joined: Tue Jun 29, 2004 12:52 pm
- Location: Indiana
It almost looks like you have set the graph maximum to 1 or something. If you had autoscale set, it almost always (in my expierence anyways) leaves a bit of room above the highest peak.remm wrote:I'm not using cactid but the crontab.
Vince22, I don't think we have the same problem because in my case, when the graphs stop updating the never start again. I have the same type of graphs with some of my routers, but I think it is normal.
Dave
For me happened the same thing as for "oharel": all traffic graphs stoped collecting data.
If I delete that data collection and restart it again, it collects correctly.
I've checked poller cache - there is no records left about those stoped traffic measurements. Tried to clear poller cache - it didn't help.
by the way, all these graphs stopped at the same time.
If I delete that data collection and restart it again, it collects correctly.
I've checked poller cache - there is no records left about those stoped traffic measurements. Tried to clear poller cache - it didn't help.
by the way, all these graphs stopped at the same time.
Melchandra wrote:
Mika wrote:
--harel
nope. it is set on autoscale.It almost looks like you have set the graph maximum to 1 or something
Mika wrote:
nope. just my cisco cpu and memory. on the same router and switches, interfaces are polled correctlyFor me happened the same thing as for "oharel": all traffic graphs stoped collecting data.
--harel
same problem
Hello,
i have sometime a hole in my graphs.
1092105300: 5.2142857143e-02
1092105600: 3.1600000000e-02
1092105900: 2.6900000000e-02
1092106200: nan
1092106500: nan
1092106800: nan
1092107100: nan
1092107400: nan
1092107700: nan
1092108000: nan
1092108300: nan
1092108600: nan
1092108900: nan
1092109200: nan
suddenly, all stop to be registered. But if i start php cmd.php
i don't see any problem.
I have a bunch of graphs (1500).
i don't know how to debug:-(
Tks
i have sometime a hole in my graphs.
1092105300: 5.2142857143e-02
1092105600: 3.1600000000e-02
1092105900: 2.6900000000e-02
1092106200: nan
1092106500: nan
1092106800: nan
1092107100: nan
1092107400: nan
1092107700: nan
1092108000: nan
1092108300: nan
1092108600: nan
1092108900: nan
1092109200: nan
suddenly, all stop to be registered. But if i start php cmd.php
i don't see any problem.
I have a bunch of graphs (1500).
i don't know how to debug:-(
Tks
-
- Cacti User
- Posts: 311
- Joined: Tue Jun 29, 2004 12:52 pm
- Location: Indiana
sounds right to me, somehow.
i looked throughout the forum but did not find anything like this; if there is, please direct me to the correct discussion.
anyway:
one fine morning my cmd.php stopped updating the rrds every 5 minutes. i had to configure it to work every minute in order for it to do the job properly.
here is the old file:
the output of cmd.php is the same whether it runs every 5 minutes or every minute...
also, if i change the cmd.php to /5, the only way i can get it run again is reboot the machine.
i am running cacti 0.8.5a on Slackware9
any idea, beside moving to cactid?
-harel
i looked throughout the forum but did not find anything like this; if there is, please direct me to the correct discussion.
anyway:
one fine morning my cmd.php stopped updating the rrds every 5 minutes. i had to configure it to work every minute in order for it to do the job properly.
here is the old file:
and here is the new:* /5 * * * php /var/www/htdocs/cacti/cmd.php > /tmp/cacti_log 2>&1
it takes a lot more than 5 minutes to run, but that was the situation even before it stopped working.* * * * * php /var/www/htdocs/cacti/cmd.php > /tmp/cacti_log 2>&1
the output of cmd.php is the same whether it runs every 5 minutes or every minute...
also, if i change the cmd.php to /5, the only way i can get it run again is reboot the machine.
i am running cacti 0.8.5a on Slackware9
any idea, beside moving to cactid?
-harel
Who is online
Users browsing this forum: No registered users and 6 guests