Suddenly some rrd's stopped updating.

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Suddenly some rrd's stopped updating.

Post by bichara »

Dear all,
we've been running cacti for several years, bet several days ago faced a strange problem.
Out setup:
win2003 server
cacti 0.8.8a
cmd.php
rrdtool 1.2

The problem is that, for example, we create a new graph for a host, cacti starts graphing it.
Then in some time the rrd file stops getting updated. I go to the same device and see that the interface which was recently created and graphed is not greyed out and can be added again (with same counters).
Corresponding data sources are available, but can't be found when viewing poller cache.
Rebuilding poller cache doesn't help.
There is nothing strange in clog.
Have no idea what's going on.
The only strange thing I found - when I go to Date Templates, open Interface - Traffic, and then press Save button - I get 500 error in 30 seconds.
Thanks for any piece of advice.
imfv
Posts: 7
Joined: Wed Jan 25, 2012 11:04 am

Re: Suddenly some rrd's stopped updating.

Post by imfv »

I´m getting this problem with some hosts.
I see the output with debug logging. I get graphs if I use the Real Time plugin. But some hosts stopped to update the RRAs.
I will try recreate some graphics next week and check the result.
Editted:
I see this warning on logs: PCOMMAND: Poller[0] Host[101] WARNING: Recache Event Detected for Host
I checked on forum and seems to be normal, but it´s happening evey cicle.
I already rebuilded the poller cache and still happening.
"If I have seen further it is by standing on the shoulders of Giants." Isaac Newton
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Suddenly some rrd's stopped updating.

Post by gandalf »

The recaching is a quite normal event.
So the graphs build well but do not get new data filled into them?

Then, please redirect the poller output to e.g. poller.log to find possible errors during rrd file updates (hints at 2nd link of my sig)
R.
idle
Cacti User
Posts: 77
Joined: Wed May 26, 2004 10:49 am
Location: Barcelona
Contact:

Re: Suddenly some rrd's stopped updating.

Post by idle »

Same here.
RRDs of half of hosts suddenly stopped updating.
After hour all went back.
All I caught, its message:
DEVEL: SQL Exec: "update host set status = '3', status_event_count = '0', status_fail_date = '0000-00-00 00:00:00', status_rec_date = '0000-00-00 00:00:00', status_last_error = '', min_time = '9.99999', max_time = '26.36000', cur_time = '13.16', avg_time = '13.651834627763', total_polls = '5158', failed_polls = '0', availability = '100' where hostname = One of stopped hosts
Before update status was=0.
So I guess, next time problem occur we can check it like this:

Code: Select all

mysql -e 'select status, hostname from host' cactidb
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

Hi, guys!
My problem is that those graphs that are not updated are not present in Poller cache.
With poller debug enabled I see no errors, just a:

Code: Select all

08/13/2013 09:12:54 AM - WEBLOG: Poller[0] CACTI2RRD: c:/cacti/rrdtool.exe graph - --imgformat=PNG --start=1376298773 --end=1376385173 --title="Cisco_tunnel - Traffic - Tu1 (Tunnel to the host that is not being grphed)" --rigid --base=1000 --height=130 --width=700 --alt-autoscale-max --lower-limit="0" COMMENT:"From 2013/08/12 09\:12\:53 To 2013/08/13 09\:12\:53\c" COMMENT:" \n" --vertical-label="bits per second" --slope-mode --font TITLE:11: --font AXIS:7: --font LEGEND:9: --font UNIT:7: DEF:a="C\:/cacti/rra/Cisco_tunnel_traffic_in_7166.rrd":"traffic_in":AVERAGE DEF:b="C\:/cacti/rra/Cisco_tunnel_traffic_in_7166.rrd":"traffic_out":AVERAGE CDEF:cdefa="a,8,*" CDEF:cdeff="b,8,*" AREA:cdefa#00CF0033:"" LINE1:cdefa#00CF00FF:"Inbound" GPRINT:cdefa:LAST:" Current\:%8.2lf %s" GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s" GPRINT:cdefa:MAX:"Maximum\:%8.2lf %s\n" AREA:cdeff#002A9733:"" LINE1:cdeff#002A97FF:"Outbound" GPRINT:cdeff:LAST:"Current\:%8.2lf %s" GPRINT:cdeff:AVERAGE:"Average\:%8.2lf %s" GPRINT:cdeff:MAX:"Maximum\:%8.2lf %s" 



User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Suddenly some rrd's stopped updating.

Post by gandalf »

Please see 2nd link of my sig to learn, how to run a spine cycle for that host only. This will narrow down your issues
R.
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

Sorry, but should I do it if I'm using cmd.php?
I can't understand why when I open the device and press "create graphs", it lets me create graphs for interfaces for which data sources are already present (in "data sources", same counters). And those interfaces are the ones which occasionly stopped updating, while most of other interfaces at the same host are still being graphed.
The interaces on the device are present.
Thank you.
Last edited by bichara on Wed Aug 14, 2013 1:48 am, edited 2 times in total.
tylerc
Posts: 24
Joined: Tue Aug 13, 2013 10:59 pm

Re: Suddenly some rrd's stopped updating.

Post by tylerc »

The debugging link has the command to use if you're using cmd.php.
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

Appreciate your help, guys.
Here is the output of php -q cmd.php with ID of a one of "sick" devices:

Code: Select all


08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Time: 0.0971 s, Theads: N/A, Hosts: 1
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[359] Graphs['TheRouterIsSick - CPU & Memory Usage'] SNMP: v2: 10.0.0.1, dsname: cisco_mem_used, oid: .1.3.6.1.4.1.9.9.48.1.1.1.5.1, output: 53399104 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[357] Graphs['TheRouterIsSick - CPU & Memory Usage'] SNMP: v2: 10.0.0.1, dsname: cisco_mem_free, oid: .1.3.6.1.4.1.9.9.48.1.1.1.6.1, output: 312558204 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[25] Graphs['TheRouterIsSick - CPU & Memory Usage'] SNMP: v2: 10.0.0.1, dsname: 5min_cpu, oid: .1.3.6.1.4.1.9.9.109.1.1.1.1.5.1, output: 5 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[7196] Graphs['TheRouterIsSick - Traffic - Tu1'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.140, output: 160116455 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[7196] Graphs['TheRouterIsSick - Traffic - Tu1'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.140, output: 84998331 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6778] Graphs['TheRouterIsSick - Traffic - Tu2'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.137, output: 131598411 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6778] Graphs['TheRouterIsSick - Traffic - Tu2'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.137, output: 95736777 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6776] Graphs['TheRouterIsSick - Traffic - Tu3'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.135, output: 149364716 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6776] Graphs['TheRouterIsSick - Traffic - Tu3'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.135, output: 99266325 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6775] Graphs['TheRouterIsSick - Traffic - Tu4'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.134, output: 1974814392
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6775] Graphs['TheRouterIsSick - Traffic - Tu4'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.134, output: 481201355 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6774] Graphs['TheRouterIsSick - Traffic - Fa0'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.132, output: 4328533011 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6774] Graphs['TheRouterIsSick - Traffic - Fa0'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.132, output: 1599605024 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6256] Graphs['TheRouterIsSick - Traffic - Tu5'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.131, output: 697514495 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6256] Graphs['TheRouterIsSick - Traffic - Tu5'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.131, output: 1438312518 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6255] Graphs['TheRouterIsSick - Traffic - Tu6'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.130, output: 4986223394 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6255] Graphs['TheRouterIsSick - Traffic - Tu6'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.130, output: 23102558874 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6254] Graphs['TheRouterIsSick - Traffic - Se1/0'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.128, output: 5795041201
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6254] Graphs['TheRouterIsSick - Traffic - Se1/0'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.128, output: 24677871761 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6253] Graphs['TheRouterIsSick - Traffic - Se1/1'] SNMP: v2: 10.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.127, output: 96210689347 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] DS[6253] Graphs['TheRouterIsSick - Traffic - Se1/1'] SNMP: v2: 10.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.127, output: 162957049625 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] RECACHE DQ[1] OID: .1.3.6.1.2.1.1.3.0, output: 290587589 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] RECACHE DQ[1] OID: .1.3.6.1.2.1.1.3.0 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] RECACHE: Processing 1 items in the auto reindex cache for '10.0.0.1'. 
08/14/2013 08:07:36 AM - CMDPHP: Poller[0] Host[12] Description[TheRouterIsSick] SNMP: Host responded to SNMP

These interfaces are graphed normally.
Graphs, which are not updated, are not present in the output of cmd.php log.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Suddenly some rrd's stopped updating.

Post by gandalf »

Then, the issue is that the poller cache is not holding the required commands.
You may want try to rebuild the poller cache for this very host. BUT DO THIS USING THE LATEST CACTI CODE.
We have had an issue, where this rebuild will wipe the whole poller table and fill in only the host in question.
R.
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

Thank you for the help.
That host unfortunatelly is no the only one experiencing problems :(
I've tried rebuilding poller cache both from Cacti web and running a rebuild_poller_cache.php from cli - still no result.
Like half of graphs are ill:

Code: Select all

08/15/2013 10:56:46 AM - SYSTEM STATS: Time:106.1199 Method:cmd.php Processes:10 Threads:N/A Hosts:267 HostsPerProcess:27 DataSources:4766 RRDsProcessed:2743
Switched to spine - same thing :(

What can cause that:
- graph is present in "graphs management"
- when I go to device to which that graph belongs and press "create graphs" - I can create the same graph again :(
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

Seems like I've found the matter. Those graphs that suddenly stop updated have a wrong Output Type ID in corresponding Data Sources (In/Out Errors instead of In/Out Bits (64 Bit counters)). I have like 3 thousand data sources like that. Is there any other way but changing output type ID manually for each data source? :cry:
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Suddenly some rrd's stopped updating.

Post by gandalf »

An SQL update could help. But you've hit one of the most complex parts of Cacti. And I DO NOT have this SQL at hand ... :(
R.
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

This is bad news :(
I've also found that some some Index Type are also missing :(
Gonna pick a box of beer and fix it manually.
Thank You for your help!
bichara
Posts: 11
Joined: Mon Jul 02, 2012 2:09 am

Re: Suddenly some rrd's stopped updating.

Post by bichara »

Hi!
Just another silly question: in cacti log file I get:

Code: Select all

08/16/2013 10:47:19 AM - SYSTEM STATS: Time:139.2566 Method:cmd.php Processes:10 Threads:N/A Hosts:267 HostsPerProcess:27 DataSources:5856 RRDsProcessed:3288 
But when I go to Data Sources and apply 5000 rows per page I gen only one page containing like 3700 data sources.
What does it mean?
Thanks
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests