My graphs stop updating

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

remm
Posts: 24
Joined: Wed Apr 21, 2004 5:25 am
Location: Tours

My graphs stop updating

Post by remm »

I don't understand why, but sometimes, it happens that my graphs stop updating. When I nuke the host with the Data Sources and then, I recreate them, it re-starts. But it's not the best solution (I hope) !
I think that my RRDfiles are no more updated because after my host_and_data_sources rebirth there is a blank between when it stops and when it re-starts.
Hope I've been clear. :roll:
Anyone has a track ?
thx
Vince22
Posts: 2
Joined: Thu May 20, 2004 9:30 am

Post by Vince22 »

I have the same problem but I don't understand!
I have some graph for a device who are update but not all.
I have change the snmp time out but nothing change.
Help us!
Attachments
state1.jpg
state1.jpg (69.78 KiB) Viewed 4414 times
raX
Lead Developer
Posts: 2243
Joined: Sat Oct 13, 2001 7:00 pm
Location: Carlisle, PA
Contact:

Post by raX »

Are either of you using cactid to gather data for the graphs? If so, do you see any "stuck" cactid processes if you run 'ps ax' at the command line? Random holes in the graph usually indicate that the poller is either not running or is dying pre-maturely.

-Ian
remm
Posts: 24
Joined: Wed Apr 21, 2004 5:25 am
Location: Tours

Post by remm »

I'm not using cactid but the crontab.
Vince22, I don't think we have the same problem because in my case, when the graphs stop updating the never start again. I have the same type of graphs with some of my routers, but I think it is normal.
Attachments
ex.jpg
ex.jpg (30.52 KiB) Viewed 4372 times
Lux
Cacti User
Posts: 195
Joined: Tue Nov 11, 2003 10:57 am
Location: Luxembourg

Post by Lux »

Remm, I wouldn't question the blank times on my router, however I would question the times that you have a nearly constant 1% utilization. Does your graph show 0% at the same time every day? If you notice, your router is reporting no utilization during non-business hours (1800-0800).

Mike
remm
Posts: 24
Joined: Wed Apr 21, 2004 5:25 am
Location: Tours

Post by remm »

My graph do not show 0% at the same time every day, even if the trafic between 18 and 8 is almost non-existent.
I've a 1% CPU usage during nearly all the business hours, I had a peak once since I monitore it.
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

i have the exact same thing, only this happened all at once, and only with Cisco CPU and memory graphs!
attached is an example...
this is what i did to try and restore the graphs, to no avail:
cleared poller cache
removed from the device the two graphs and re-installed them
checked permissions on the rrds
here is the rrd info:
filename = "snmp_alex_acca_5min_cpu_88.rrd"
rrd_version = "0001"
step = 300
last_update = 1091645648
ds[5min_cpu].type = "GAUGE"
ds[5min_cpu].minimal_heartbeat = 600
ds[5min_cpu].min = 0.0000000000e+00
ds[5min_cpu].max = 1.0000000000e+02
ds[5min_cpu].last_ds = "UNKN"
ds[5min_cpu].value = 0.0000000000e+00
ds[5min_cpu].unknown_sec = 248
rra[0].cf = "AVERAGE"
rra[0].rows = 700
rra[0].pdp_per_row = 6
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 4
rra[1].cf = "AVERAGE"
rra[1].rows = 17280
rra[1].pdp_per_row = 1
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = NaN
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = NaN
rra[2].cdp_prep[0].unknown_datapoints = 10
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 5.5556000000e+02
rra[3].cdp_prep[0].unknown_datapoints = 119
rra[4].cf = "MAX"
rra[4].rows = 700
rra[4].pdp_per_row = 6
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 4
rra[5].cf = "MAX"
rra[5].rows = 17280
rra[5].pdp_per_row = 1
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = NaN
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = NaN
rra[6].cdp_prep[0].unknown_datapoints = 10
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 1.2000000000e+01
rra[7].cdp_prep[0].unknown_datapoints = 119

rdtool fetch shows:
1091607300: 1.0380000000e+01
1091607600: 1.1376666667e+01
1091607900: 1.2000000000e+01
1091608200: 1.2000000000e+01
1091608500: 1.2000000000e+01
1091608800: 1.2000000000e+01
1091609100: 1.2000000000e+01
1091609400: 1.2000000000e+01
1091609700: 1.2000000000e+01
1091610000: nan
1091610300: nan
1091610600: nan
1091610900: nan
1091611200: nan
1091611500: nan
1091611800: nan
1091612100: nan

this happened only on Cisco devices, on these specific graphs.
it must be something i did, but what :-? ?

--harel
Attachments
cisco.gif
cisco.gif (20.31 KiB) Viewed 4268 times
raX
Lead Developer
Posts: 2243
Joined: Sat Oct 13, 2001 7:00 pm
Location: Carlisle, PA
Contact:

Post by raX »

After clearing your poller cache, can you still find references to the "broken" Cisco templates in there? The cause of your problem really depends on whether Cacti is actually trying to poll data for these graphs or not.

If Cacti is actually re-creating the .rrd files for these templates after you delete them, that is a good sign.

Do you see any strange output when you run cmd.php/cactid?

-Ian
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

Hi Rax,
After clearing your poller cache, can you still find references to the "broken" Cisco templates in there? The cause of your problem really depends on whether Cacti is actually trying to poll data for these graphs or not.
after clearing the poller cache, i can see:
Data Source: SNMP - Alex - Gw - 5 Minute CPU
RRD: /var/www/htdocs/cacti-0.8.5a/rra/snmp_alex_gw_5min_cpu_42.rrd
Action: 0, OID: (Host: 172.16.0.1, Community: mycom)
If Cacti is actually re-creating the .rrd files for these templates after you delete them, that is a good sign.
seems like the rrd is updated, because it has the same time as the other rrds.

i tried creating a new router-device, to graph only cpu on that router. same results...

as for the cmd.php output: yes, definitely. this was not there before:
Missing object name
USAGE: snmpget [OPTIONS] AGENT OID [OID]...
Version: 5.1.1
Web: http://www.net-snmp.org/
Email: net-snmp-coders@lists.sourceforge.net

OPTIONS:
-h, --help display this help message
-H display configuration file directives understood
-v 1|2c|3 specifies SNMP version to use
-V, --version display package version number
SNMP Version 1 or 2c specific
-c COMMUNITY set the community string
SNMP Version 3 specific
-a PROTOCOL set authentication protocol (MD5|SHA)
-A PASSPHRASE set authentication protocol pass phrase
-e ENGINE-ID set security engine ID (e.g. 800000020109840301)
-E ENGINE-ID set context engine ID (e.g. 800000020109840301)
-l LEVEL set security level (noAuthNoPriv|a
etc.

this appears every so often, right before a cisco router / switch normal output.

i tried snmpwalk and snmpget to see maybe the cpu oid got corrupted or something, and i get it just fine:
root@dublin:/var/www/htdocs/cacti# snmpwalk -v1 -c mycom 172.16.0.1:161 .1.3.6.1.4.1.9.2.1.58.0
SNMPv2-SMI::enterprises.9.2.1.58.0 = INTEGER: 4

it seems strange that all graphs for memory and cpu stopped acting together.

does this help?

--harel
melchandra
Cacti User
Posts: 311
Joined: Tue Jun 29, 2004 12:52 pm
Location: Indiana

Post by melchandra »

remm wrote:I'm not using cactid but the crontab.
Vince22, I don't think we have the same problem because in my case, when the graphs stop updating the never start again. I have the same type of graphs with some of my routers, but I think it is normal.
It almost looks like you have set the graph maximum to 1 or something. If you had autoscale set, it almost always (in my expierence anyways) leaves a bit of room above the highest peak.
Dave
Mika
Cacti User
Posts: 64
Joined: Tue Mar 23, 2004 3:01 am

Post by Mika »

For me happened the same thing as for "oharel": all traffic graphs stoped collecting data.
If I delete that data collection and restart it again, it collects correctly.
I've checked poller cache - there is no records left about those stoped traffic measurements. Tried to clear poller cache - it didn't help.
by the way, all these graphs stopped at the same time.
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

Melchandra wrote:
It almost looks like you have set the graph maximum to 1 or something
nope. it is set on autoscale.

Mika wrote:
For me happened the same thing as for "oharel": all traffic graphs stoped collecting data.
nope. just my cisco cpu and memory. on the same router and switches, interfaces are polled correctly

:o

--harel
Anacronik12

same problem

Post by Anacronik12 »

Hello,

i have sometime a hole in my graphs.

1092105300: 5.2142857143e-02
1092105600: 3.1600000000e-02
1092105900: 2.6900000000e-02
1092106200: nan
1092106500: nan
1092106800: nan
1092107100: nan
1092107400: nan
1092107700: nan
1092108000: nan
1092108300: nan
1092108600: nan
1092108900: nan
1092109200: nan


suddenly, all stop to be registered. But if i start php cmd.php
i don't see any problem.
I have a bunch of graphs (1500).
i don't know how to debug:-(

Tks
melchandra
Cacti User
Posts: 311
Joined: Tue Jun 29, 2004 12:52 pm
Location: Indiana

Post by melchandra »

Hmm. 1500 graphs. I'd almost bet my cacti installation that you problem is that cmd.php takes longer than 5 minutes to run. This causes serious problems. Switch to cactid.
Dave
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

sounds right to me, somehow.
i looked throughout the forum but did not find anything like this; if there is, please direct me to the correct discussion.
anyway:
one fine morning my cmd.php stopped updating the rrds every 5 minutes. i had to configure it to work every minute in order for it to do the job properly.
here is the old file:
* /5 * * * php /var/www/htdocs/cacti/cmd.php > /tmp/cacti_log 2>&1
and here is the new:
* * * * * php /var/www/htdocs/cacti/cmd.php > /tmp/cacti_log 2>&1
it takes a lot more than 5 minutes to run, but that was the situation even before it stopped working.
the output of cmd.php is the same whether it runs every 5 minutes or every minute...
also, if i change the cmd.php to /5, the only way i can get it run again is reboot the machine.
i am running cacti 0.8.5a on Slackware9

any idea, beside moving to cactid?

-harel
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests