Suddenly some rrd's stopped updating.
Moderators: Developers, Moderators
Suddenly some rrd's stopped updating.
Dear all,
we've been running cacti for several years, bet several days ago faced a strange problem.
Out setup:
win2003 server
cacti 0.8.8a
cmd.php
rrdtool 1.2
The problem is that, for example, we create a new graph for a host, cacti starts graphing it.
Then in some time the rrd file stops getting updated. I go to the same device and see that the interface which was recently created and graphed is not greyed out and can be added again (with same counters).
Corresponding data sources are available, but can't be found when viewing poller cache.
Rebuilding poller cache doesn't help.
There is nothing strange in clog.
Have no idea what's going on.
The only strange thing I found - when I go to Date Templates, open Interface - Traffic, and then press Save button - I get 500 error in 30 seconds.
Thanks for any piece of advice.
we've been running cacti for several years, bet several days ago faced a strange problem.
Out setup:
win2003 server
cacti 0.8.8a
cmd.php
rrdtool 1.2
The problem is that, for example, we create a new graph for a host, cacti starts graphing it.
Then in some time the rrd file stops getting updated. I go to the same device and see that the interface which was recently created and graphed is not greyed out and can be added again (with same counters).
Corresponding data sources are available, but can't be found when viewing poller cache.
Rebuilding poller cache doesn't help.
There is nothing strange in clog.
Have no idea what's going on.
The only strange thing I found - when I go to Date Templates, open Interface - Traffic, and then press Save button - I get 500 error in 30 seconds.
Thanks for any piece of advice.
Re: Suddenly some rrd's stopped updating.
I´m getting this problem with some hosts.
I see the output with debug logging. I get graphs if I use the Real Time plugin. But some hosts stopped to update the RRAs.
I will try recreate some graphics next week and check the result.
Editted:
I see this warning on logs: PCOMMAND: Poller[0] Host[101] WARNING: Recache Event Detected for Host
I checked on forum and seems to be normal, but it´s happening evey cicle.
I already rebuilded the poller cache and still happening.
I see the output with debug logging. I get graphs if I use the Real Time plugin. But some hosts stopped to update the RRAs.
I will try recreate some graphics next week and check the result.
Editted:
I see this warning on logs: PCOMMAND: Poller[0] Host[101] WARNING: Recache Event Detected for Host
I checked on forum and seems to be normal, but it´s happening evey cicle.
I already rebuilded the poller cache and still happening.
"If I have seen further it is by standing on the shoulders of Giants." Isaac Newton
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: Suddenly some rrd's stopped updating.
The recaching is a quite normal event.
So the graphs build well but do not get new data filled into them?
Then, please redirect the poller output to e.g. poller.log to find possible errors during rrd file updates (hints at 2nd link of my sig)
R.
So the graphs build well but do not get new data filled into them?
Then, please redirect the poller output to e.g. poller.log to find possible errors during rrd file updates (hints at 2nd link of my sig)
R.
Re: Suddenly some rrd's stopped updating.
Same here.
RRDs of half of hosts suddenly stopped updating.
After hour all went back.
All I caught, its message:
So I guess, next time problem occur we can check it like this:
RRDs of half of hosts suddenly stopped updating.
After hour all went back.
All I caught, its message:
Before update status was=0.DEVEL: SQL Exec: "update host set status = '3', status_event_count = '0', status_fail_date = '0000-00-00 00:00:00', status_rec_date = '0000-00-00 00:00:00', status_last_error = '', min_time = '9.99999', max_time = '26.36000', cur_time = '13.16', avg_time = '13.651834627763', total_polls = '5158', failed_polls = '0', availability = '100' where hostname = One of stopped hosts
So I guess, next time problem occur we can check it like this:
Code: Select all
mysql -e 'select status, hostname from host' cactidb
Re: Suddenly some rrd's stopped updating.
Hi, guys!
My problem is that those graphs that are not updated are not present in Poller cache.
With poller debug enabled I see no errors, just a:
My problem is that those graphs that are not updated are not present in Poller cache.
With poller debug enabled I see no errors, just a:
Code: Select all
08/13/2013 09:12:54 AM - WEBLOG: Poller[0] CACTI2RRD: c:/cacti/rrdtool.exe graph - --imgformat=PNG --start=1376298773 --end=1376385173 --title="Cisco_tunnel - Traffic - Tu1 (Tunnel to the host that is not being grphed)" --rigid --base=1000 --height=130 --width=700 --alt-autoscale-max --lower-limit="0" COMMENT:"From 2013/08/12 09\:12\:53 To 2013/08/13 09\:12\:53\c" COMMENT:" \n" --vertical-label="bits per second" --slope-mode --font TITLE:11: --font AXIS:7: --font LEGEND:9: --font UNIT:7: DEF:a="C\:/cacti/rra/Cisco_tunnel_traffic_in_7166.rrd":"traffic_in":AVERAGE DEF:b="C\:/cacti/rra/Cisco_tunnel_traffic_in_7166.rrd":"traffic_out":AVERAGE CDEF:cdefa="a,8,*" CDEF:cdeff="b,8,*" AREA:cdefa#00CF0033:"" LINE1:cdefa#00CF00FF:"Inbound" GPRINT:cdefa:LAST:" Current\:%8.2lf %s" GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s" GPRINT:cdefa:MAX:"Maximum\:%8.2lf %s\n" AREA:cdeff#002A9733:"" LINE1:cdeff#002A97FF:"Outbound" GPRINT:cdeff:LAST:"Current\:%8.2lf %s" GPRINT:cdeff:AVERAGE:"Average\:%8.2lf %s" GPRINT:cdeff:MAX:"Maximum\:%8.2lf %s"
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: Suddenly some rrd's stopped updating.
Please see 2nd link of my sig to learn, how to run a spine cycle for that host only. This will narrow down your issues
R.
R.
Re: Suddenly some rrd's stopped updating.
Sorry, but should I do it if I'm using cmd.php?
I can't understand why when I open the device and press "create graphs", it lets me create graphs for interfaces for which data sources are already present (in "data sources", same counters). And those interfaces are the ones which occasionly stopped updating, while most of other interfaces at the same host are still being graphed.
The interaces on the device are present.
Thank you.
I can't understand why when I open the device and press "create graphs", it lets me create graphs for interfaces for which data sources are already present (in "data sources", same counters). And those interfaces are the ones which occasionly stopped updating, while most of other interfaces at the same host are still being graphed.
The interaces on the device are present.
Thank you.
Last edited by bichara on Wed Aug 14, 2013 1:48 am, edited 2 times in total.
Re: Suddenly some rrd's stopped updating.
The debugging link has the command to use if you're using cmd.php.
Re: Suddenly some rrd's stopped updating.
Appreciate your help, guys.
Here is the output of php -q cmd.php with ID of a one of "sick" devices:
These interfaces are graphed normally.
Graphs, which are not updated, are not present in the output of cmd.php log.
Here is the output of php -q cmd.php with ID of a one of "sick" devices:
Code: Select all
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Time: 0.0971 s, Theads: N/A, Hosts: 1
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[359] Graphs['TheRouterIsSick - CPU & Memory Usage'] SNMP: v2: 10.0.0.1, dsname: cisco_mem_used, oid: .1.3.6.1.4.1.9.9.48.1.1.1.5.1, output: 53399104
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[357] Graphs['TheRouterIsSick - CPU & Memory Usage'] SNMP: v2: 10.0.0.1, dsname: cisco_mem_free, oid: .1.3.6.1.4.1.9.9.48.1.1.1.6.1, output: 312558204
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[25] Graphs['TheRouterIsSick - CPU & Memory Usage'] SNMP: v2: 10.0.0.1, dsname: 5min_cpu, oid: .1.3.6.1.4.1.9.9.109.1.1.1.1.5.1, output: 5
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[7196] Graphs['TheRouterIsSick - Traffic - Tu1'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.140, output: 160116455
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[7196] Graphs['TheRouterIsSick - Traffic - Tu1'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.140, output: 84998331
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6778] Graphs['TheRouterIsSick - Traffic - Tu2'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.137, output: 131598411
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6778] Graphs['TheRouterIsSick - Traffic - Tu2'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.137, output: 95736777
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6776] Graphs['TheRouterIsSick - Traffic - Tu3'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.135, output: 149364716
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6776] Graphs['TheRouterIsSick - Traffic - Tu3'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.135, output: 99266325
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6775] Graphs['TheRouterIsSick - Traffic - Tu4'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.134, output: 1974814392
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6775] Graphs['TheRouterIsSick - Traffic - Tu4'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.134, output: 481201355
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6774] Graphs['TheRouterIsSick - Traffic - Fa0'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.132, output: 4328533011
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6774] Graphs['TheRouterIsSick - Traffic - Fa0'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.132, output: 1599605024
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6256] Graphs['TheRouterIsSick - Traffic - Tu5'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.131, output: 697514495
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6256] Graphs['TheRouterIsSick - Traffic - Tu5'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.131, output: 1438312518
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6255] Graphs['TheRouterIsSick - Traffic - Tu6'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.130, output: 4986223394
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6255] Graphs['TheRouterIsSick - Traffic - Tu6'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.130, output: 23102558874
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6254] Graphs['TheRouterIsSick - Traffic - Se1/0'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.128, output: 5795041201
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6254] Graphs['TheRouterIsSick - Traffic - Se1/0'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.128, output: 24677871761
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6253] Graphs['TheRouterIsSick - Traffic - Se1/1'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.127, output: 96210689347
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6253] Graphs['TheRouterIsSick - Traffic - Se1/1'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.127, output: 162957049625
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] RECACHE DQ[1] OID: .1.3.6.1.2.1.1.3.0, output: 290587589
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] RECACHE DQ[1] OID: .1.3.6.1.2.1.1.3.0
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] RECACHE: Processing 1 items in the auto reindex cache for '10.0.0.1'.
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] SNMP: Host responded to SNMP
Graphs, which are not updated, are not present in the output of cmd.php log.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: Suddenly some rrd's stopped updating.
Then, the issue is that the poller cache is not holding the required commands.
You may want try to rebuild the poller cache for this very host. BUT DO THIS USING THE LATEST CACTI CODE.
We have had an issue, where this rebuild will wipe the whole poller table and fill in only the host in question.
R.
You may want try to rebuild the poller cache for this very host. BUT DO THIS USING THE LATEST CACTI CODE.
We have had an issue, where this rebuild will wipe the whole poller table and fill in only the host in question.
R.
Re: Suddenly some rrd's stopped updating.
Thank you for the help.
That host unfortunatelly is no the only one experiencing problems
I've tried rebuilding poller cache both from Cacti web and running a rebuild_poller_cache.php from cli - still no result.
Like half of graphs are ill:
Switched to spine - same thing
What can cause that:
- graph is present in "graphs management"
- when I go to device to which that graph belongs and press "create graphs" - I can create the same graph again
That host unfortunatelly is no the only one experiencing problems
I've tried rebuilding poller cache both from Cacti web and running a rebuild_poller_cache.php from cli - still no result.
Like half of graphs are ill:
Code: Select all
08/15/2013 10:56:46 AM - SYSTEM STATS: Time:106.1199 Method:cmd.php Processes:10 Threads:N/A Hosts:267 HostsPerProcess:27 DataSources:4766 RRDsProcessed:2743
What can cause that:
- graph is present in "graphs management"
- when I go to device to which that graph belongs and press "create graphs" - I can create the same graph again
Re: Suddenly some rrd's stopped updating.
Seems like I've found the matter. Those graphs that suddenly stop updated have a wrong Output Type ID in corresponding Data Sources (In/Out Errors instead of In/Out Bits (64 Bit counters)). I have like 3 thousand data sources like that. Is there any other way but changing output type ID manually for each data source?
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: Suddenly some rrd's stopped updating.
An SQL update could help. But you've hit one of the most complex parts of Cacti. And I DO NOT have this SQL at hand ...
R.
R.
Re: Suddenly some rrd's stopped updating.
This is bad news
I've also found that some some Index Type are also missing
Gonna pick a box of beer and fix it manually.
Thank You for your help!
I've also found that some some Index Type are also missing
Gonna pick a box of beer and fix it manually.
Thank You for your help!
Re: Suddenly some rrd's stopped updating.
Hi!
Just another silly question: in cacti log file I get:
But when I go to Data Sources and apply 5000 rows per page I gen only one page containing like 3700 data sources.
What does it mean?
Thanks
Just another silly question: in cacti log file I get:
Code: Select all
08/16/2013 10:47:19 AM - SYSTEM STATS: Time:139.2566 Method:cmd.php Processes:10 Threads:N/A Hosts:267 HostsPerProcess:27 DataSources:5856 RRDsProcessed:3288
What does it mean?
Thanks
Who is online
Users browsing this forum: No registered users and 1 guest