Poller interval changed - graphs stopped working.
Moderators: Developers, Moderators
Poller interval changed - graphs stopped working.
Hi,
All of my graphs have stopped updating, it seems like RRD files get updated with values by the poller but the graphs stopped showing new data since yesterday.
I have checked and tried every single method I could find, that includes:
- Rebuilt poller cache via web and CLI.
- Stop, start poller
- Crontab and both poller timing values on web are all on 5 minutes
- RRD's are all owned by apache
- Graph debug mode shows normal stuff and OK at the end.
- rrdtool lastupdate localhost_cpu_7.rrd: "cpu 1396943102: 3" (this time is more than 12 hours after data stopped on the graphs).
- Changed spine threads from 5 to 10 and back to 5.
- Ran poller manually, everything seems fine except for weathermap complaining about the maps: "DataCenter had no valid data, according to WeatherMapDataSource_rrd"
- Boost had some log entries about not having permissions to it's cached images; Granting +x on the boost cache didn't help, disabling boost didn't help.
- Waited a whole night to try and let it fix itself.
- Realtime is working fine.
- Restarted every relevant service (mysql, apache etc)
- Rebooted the server.
- All devices are in "Up" status.
- Poller output table empty (CLI script)
- Repaired database (CLI script)
- 1.5GB free RAM
- 55GB free disk space
- CPU usage is mostly low
Prior to the failure, cron job was on 5 minutes, same for settings page, however poller interval on settings page was set to 1 minute.
This generated some errors that worried me, I saw best practice to have them all match so I changed it so they're all set to 5 minutes, added +5 spine threads to the already used 5 and rebuilt the poller cache.
Excuse my french but wtf am I missing/doing wrong?
Update: Rebuilt poller cache about 10 times more, set poller interval to 1 minute, disabled boost and graphs are getting data now.
I'd *really* like to know what is happening since my boss started saying stuff like "I'm not sure we can trust this software".
So please, if anyone can shed some light on what may have caused this or how exactly the relevant mechanisms work I'd be grateful as I have a similar problem on a different machine.
All of my graphs have stopped updating, it seems like RRD files get updated with values by the poller but the graphs stopped showing new data since yesterday.
I have checked and tried every single method I could find, that includes:
- Rebuilt poller cache via web and CLI.
- Stop, start poller
- Crontab and both poller timing values on web are all on 5 minutes
- RRD's are all owned by apache
- Graph debug mode shows normal stuff and OK at the end.
- rrdtool lastupdate localhost_cpu_7.rrd: "cpu 1396943102: 3" (this time is more than 12 hours after data stopped on the graphs).
- Changed spine threads from 5 to 10 and back to 5.
- Ran poller manually, everything seems fine except for weathermap complaining about the maps: "DataCenter had no valid data, according to WeatherMapDataSource_rrd"
- Boost had some log entries about not having permissions to it's cached images; Granting +x on the boost cache didn't help, disabling boost didn't help.
- Waited a whole night to try and let it fix itself.
- Realtime is working fine.
- Restarted every relevant service (mysql, apache etc)
- Rebooted the server.
- All devices are in "Up" status.
- Poller output table empty (CLI script)
- Repaired database (CLI script)
- 1.5GB free RAM
- 55GB free disk space
- CPU usage is mostly low
Prior to the failure, cron job was on 5 minutes, same for settings page, however poller interval on settings page was set to 1 minute.
This generated some errors that worried me, I saw best practice to have them all match so I changed it so they're all set to 5 minutes, added +5 spine threads to the already used 5 and rebuilt the poller cache.
Excuse my french but wtf am I missing/doing wrong?
Update: Rebuilt poller cache about 10 times more, set poller interval to 1 minute, disabled boost and graphs are getting data now.
I'd *really* like to know what is happening since my boss started saying stuff like "I'm not sure we can trust this software".
So please, if anyone can shed some light on what may have caused this or how exactly the relevant mechanisms work I'd be grateful as I have a similar problem on a different machine.
Re: Poller interval changed - graphs stopped working.
Can anyone help please? I've exhausted all of my possibilities and am now clueless.
All graphs (inc. localhost) are not updating while it seems like is data going in the RRD files successfully.
Graphs are rendered sucessfully, just empty (NaN), did the poller cache dance, disabled most plugins, removed all hosts except localhost, no errors in the log.
I'm afraid to run the poller manually again because last time I got the "Waiting for 1 other poller" problem.
Realtime is working.
Here's the log:
I've just noticed that even though I removed all hosts except localhost, the log still reports 2 hosts!
Could this be a bug that is preventing my graphs from updating with data?
All graphs (inc. localhost) are not updating while it seems like is data going in the RRD files successfully.
Graphs are rendered sucessfully, just empty (NaN), did the poller cache dance, disabled most plugins, removed all hosts except localhost, no errors in the log.
I'm afraid to run the poller manually again because last time I got the "Waiting for 1 other poller" problem.
Realtime is working.
Here's the log:
Code: Select all
04/13/2014 06:10:02 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '301', Max Runtime '298', Poller Runs: '1'
04/13/2014 06:10:02 PM - POLLER: Poller[0] DEBUG: About to Spawn a Remote Process [CMD: /usr/local/spine/spine, ARGS: 0 1]
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 3
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 2
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 400
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The threads variable is 6
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The script timeout is 25
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: StartHost='0', EndHost='1', TotalPHPScripts='1'
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Required
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 10
04/13/2014 06:10:02 PM - SPINE: Poller[0] Version 0.8.8 starting
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Spine is running asroot.
04/13/2014 06:10:02 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
04/13/2014 06:10:02 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
04/13/2014 06:10:02 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server Routine Starting
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server Routine Starting
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server About to FORK Child Process
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server Child FORK Success
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: SERVER: spine
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: GETCWD: /usr/local/spine
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: DIRNAM: /var/www/html
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: FILENM: /var/www/html/script_server.php
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] PHP Script Server has Started - Parent is spine
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: SS[0] Confirmed PHP Script Server running using readfd[8], writefd[7]
04/13/2014 06:10:02 PM - SPINE: Poller[0] NOTE: Spine will support multithread device polling.
04/13/2014 06:10:02 PM - SPINE: Poller[0] NOTE: Spine is behaving in a 0.8.7g+ manner
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] DEBUG: Entering SNMP Ping
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[0] TH[1] Total Time: 0.0013 Seconds
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[0] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] SNMP Result: Host responded to SNMP
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] RECACHE: Processing 3 items in the auto reindex cache for '127.0.0.1'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 27120705
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] Recache DataQuery[2] OID: .1.3.6.1.2.1.1.3.0, output: 27120705
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] Recache DataQuery[3] OID: .1.3.6.1.2.1.1.3.0, output: 27120705
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] NOTE: There are '26' Polling Items for this Host
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[0] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 1'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[1] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 1, output: 4154507264
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[0] RESPONSE:'4154507264'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[1] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 1'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[1] RESPONSE:'501469184'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[1] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 1, output: 501469184
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[2] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 1'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[1] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 1, output: 12.07
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[2] RESPONSE:'12.07'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[3] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 10'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[2] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 10, output: 6308225024
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[3] RESPONSE:'6308225024'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[4] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 10'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[2] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 10, output: 0
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[4] RESPONSE:'0'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[5] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 10'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[2] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 10, output: 0
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[5] RESPONSE:'0'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[6] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 3'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[3] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 3, output: 10462732288
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[6] RESPONSE:'10462732288'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[7] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 3'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[3] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 3, output: 501469184
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[7] RESPONSE:'501469184'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[7] RESPONSE:'501469184'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[8] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 3'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[3] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 3, output: 4.79
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[8] RESPONSE:'4.79'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[9] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 31'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[9] RESPONSE:'14408253440'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[4] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 31, output: 14408253440
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[10] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 31'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[4] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 31, output: 1852530688
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[10] RESPONSE:'1852530688'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[11] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 31'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[4] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 31, output: 12.86
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[11] RESPONSE:'12.86'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[11] RESPONSE:'12.86'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[12] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 36'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[5] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 36, output: 507744256
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[12] RESPONSE:'507744256'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[13] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 36'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[5] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 36, output: 32774144
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[13] RESPONSE:'32774144'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[14] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 36'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[5] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 36, output: 6.45
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[14] RESPONSE:'6.45'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[15] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 6'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[6] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get total 6, output: 4154507264
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[15] RESPONSE:'4154507264'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[16] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 6'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[6] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get used 6, output: 39636992
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[16] RESPONSE:'39636992'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[17] INC: 'ss_host_disk.php' FUNC: 'ss_host_disk' PARMS: '127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 6'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[6] SS[0] SERVER: /var/www/html/scripts/ss_host_disk.php ss_host_disk 127.0.0.1 2:161:500:public:::MD5::DES: 1 get percent 6, output: 0.95
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[17] RESPONSE:'0.95'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[18] INC: 'ss_host_cpu.php' FUNC: 'ss_host_cpu' PARMS: '127.0.0.1 1 2:161:500:1:10:public:::MD5::DES: get usage 0'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[7] SS[0] SERVER: /var/www/html/scripts/ss_host_cpu.php ss_host_cpu 127.0.0.1 1 2:161:500:1:10:public:::MD5::DES: get usage 0, output: 1
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[18] RESPONSE:'1'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[19] INC: 'ss_host_cpu.php' FUNC: 'ss_host_cpu' PARMS: '127.0.0.1 1 2:161:500:1:10:public:::MD5::DES: get usage 1'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[8] SS[0] SERVER: /var/www/html/scripts/ss_host_cpu.php ss_host_cpu 127.0.0.1 1 2:161:500:1:10:public:::MD5::DES: get usage 1, output: 1
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[19] RESPONSE:'1'
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[20] INC: 'thold_alerts.php' FUNC: 'script_thold_alerts_count' PARMS: ''
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[13] SS[0] SERVER: /var/www/html/scripts/thold_alerts.php script_thold_alerts_count, output: thresholds:0 realert:0 restored:0
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PID[26092] CTR[20] RESPONSE:'thresholds:0 realert:0 restored:0'
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[9] SNMP: v2: 127.0.0.1, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 1493533
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[9] SNMP: v2: 127.0.0.1, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 6866666
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[10] SNMP: v2: 127.0.0.1, dsname: users, oid: .1.3.6.1.2.1.25.1.5.0, value: 1
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[11] SNMP: v2: 127.0.0.1, dsname: proc, oid: .1.3.6.1.2.1.25.1.6.0, value: 114
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DS[12] SNMP: v2: 127.0.0.1, dsname: Wait, oid: .1.3.6.1.4.1.2021.11.54.0, value: 34366
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] Total Time: 0.049 Seconds
04/13/2014 06:10:02 PM - SPINE: Poller[0] Host[1] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: SS[0] Script Server Shutdown Started
04/13/2014 06:10:02 PM - PHPSVR: Poller[0] DEBUG: PHP Script Server Shutdown request received, exiting
04/13/2014 06:10:02 PM - POLLER: Poller[0] Parsed MULTI output field 'thresholds:0' [map thresholds->thresholds]
04/13/2014 06:10:02 PM - POLLER: Poller[0] Parsed MULTI output field 'realert:0' [map realert->realert]
04/13/2014 06:10:02 PM - POLLER: Poller[0] Parsed MULTI output field 'restored:0' [map restored->restored]
04/13/2014 06:10:02 PM - WEATHERMAP: Poller[0] WM poller_output: STARTING
04/13/2014 06:10:02 PM - WEATHERMAP: Poller[0] WM poller_output: ENDING
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_thresholds_13.rrd --template thresholds:realert:restored 1397401802:0:0:0
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_wait_12.rrd --template Wait 1397401802:34366
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_proc_11.rrd --template proc 1397401802:114
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_users_10.rrd --template users 1397401802:1
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_traffic_in_9.rrd --template traffic_out:traffic_in 1397401802:6866666:1493533
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_cpu_8.rrd --template cpu 1397401802:1
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_cpu_7.rrd --template cpu 1397401802:1
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_hdd_total_6.rrd --template hdd_percent:hdd_used:hdd_total 1397401802:0.95:39636992:4154507264
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_hdd_total_5.rrd --template hdd_percent:hdd_used:hdd_total 1397401802:6.45:32774144:507744256
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_hdd_total_4.rrd --template hdd_percent:hdd_used:hdd_total 1397401802:12.86:1852530688:14408253440
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_hdd_total_3.rrd --template hdd_percent:hdd_used:hdd_total 1397401802:4.79:501469184:10462732288
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_hdd_total_2.rrd --template hdd_percent:hdd_used:hdd_total 1397401802:0:0:6308225024
04/13/2014 06:10:02 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/rra/localhost_hdd_total_1.rrd --template hdd_percent:hdd_used:hdd_total 1397401802:12.07:501469184:4154507264
04/13/2014 06:10:02 PM - SYSTEM STATS: Time:0.3745 Method:spine Processes:1 Threads:6 Hosts:2 HostsPerProcess:2 DataSources:26 RRDsProcessed:13
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
04/13/2014 06:10:02 PM - SPINE: Poller[0] DEBUG: Net-SNMP Close Completed
04/13/2014 06:10:02 PM - SPINE: Poller[0] Time: 0.3355 s, Threads: 6, Hosts: 2
04/13/2014 06:10:02 PM - SYSTEM THOLD STATS: Time:0.0659 Tholds:6 TotalHosts:1 DownHosts:0 NewDownHosts:0
04/13/2014 06:10:02 PM - POLLER: Poller[0] DEBUG: About to Spawn a Remote Process [CMD: /usr/bin/php, ARGS: -q /var/www/html/plugins/hmib/poller_hmib.php -M]
04/13/2014 06:10:02 PM - WEATHERMAP: Poller[0] DEBUG: [weathermap_memory_check@poller-common.php:11] MEM Initial: memory_get_usage() says 16.5MBytes used. Limit is 512M
04/13/2014 06:10:02 PM - WEATHERMAP: Poller[0] DEBUG: [weathermap_run_maps@poller-common.php:108] Iterating all maps.
04/13/2014 06:10:02 PM - WEATHERMAP: Poller[0] DEBUG: [weathermap_run_maps@poller-common.php:239] Iterated all 0 maps.
04/13/2014 06:10:02 PM - WEATHERMAP: Poller[0] DEBUG: [weathermap_memory_check@poller-common.php:11] MEM Final: memory_get_usage() says 16.5MBytes used. Limit is 512M
Could this be a bug that is preventing my graphs from updating with data?
-
- Cacti Pro User
- Posts: 613
- Joined: Tue Aug 29, 2006 4:09 pm
- Location: NL
Re: Poller interval changed - graphs stopped working.
Did you by any change update php? If so, is your timezone set correctly?
What do the log files say?
How did you install cacti? Straight from this web-site, or is your setup using a distribution package?
What do the log files say?
How did you install cacti? Straight from this web-site, or is your setup using a distribution package?
Maintainer of cacti in Debian (and Ubuntu).
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
Re: Poller interval changed - graphs stopped working.
Did not upgrade PHP
Using CactiEZ 0.8.8a
I posted 1 poller run on debug level in my previous post.
Using NTP.
Using CactiEZ 0.8.8a
I posted 1 poller run on debug level in my previous post.
Using NTP.
Re: Poller interval changed - graphs stopped working.
Please help, if this is a bug it is quite severe, if this is a configuration issue it's even worse since everything is pretty much sterilized.
Just found out that cacti is creating the RRD files as root, pretty sure that's not the source of the problem but perhaps this might be a symptom that can point to the real source.
I did not run the poller manually for this, just added a device and graphs via the web GUI and this is what I get:
What does this mean?
I've just noticed that the DEBUG log has a message saying "Spine is running asroot", not sure if relevant.
Wtf is going on here? can anyone shed some light PLEASE?
Just found out that cacti is creating the RRD files as root, pretty sure that's not the source of the problem but perhaps this might be a symptom that can point to the real source.
I did not run the poller manually for this, just added a device and graphs via the web GUI and this is what I get:
Code: Select all
drwxrwxr-x. 2 apache apache 4096 Apr 20 12:25 .
drwxr-xr-x. 13 root root 4096 Apr 20 01:00 ..
-rwxrwxr-x. 1 apache apache 33 Jun 5 2012 .htaccess
-rw-r--r-- 1 apache apache 95608 Apr 20 12:25 localhost_cpu_7.rrd
-rw-r--r-- 1 apache apache 95608 Apr 20 12:25 localhost_cpu_8.rrd
-rw-r--r-- 1 apache apache 282440 Apr 20 12:25 localhost_hdd_total_1.rrd
-rw-r--r-- 1 apache apache 282440 Apr 20 12:25 localhost_hdd_total_2.rrd
-rw-r--r-- 1 apache apache 282440 Apr 20 12:25 localhost_hdd_total_3.rrd
-rw-r--r-- 1 apache apache 282440 Apr 20 12:25 localhost_hdd_total_4.rrd
-rw-r--r-- 1 apache apache 282440 Apr 20 12:25 localhost_hdd_total_5.rrd
-rw-r--r-- 1 apache apache 282440 Apr 20 12:25 localhost_hdd_total_6.rrd
-rw-r--r-- 1 apache apache 95608 Apr 20 12:25 localhost_proc_11.rrd
-rw-r--r-- 1 apache apache 331912 Apr 20 12:25 localhost_thresholds_13.rrd
-rw-r--r-- 1 root root 189024 Apr 20 12:25 localhost_traffic_in_21.rrd
-rw-r--r-- 1 apache apache 189024 Apr 20 12:25 localhost_traffic_in_9.rrd
-rw-r--r-- 1 apache apache 95608 Apr 20 12:25 localhost_users_10.rrd
-rw-r--r-- 1 apache apache 112440 Apr 20 12:25 localhost_wait_12.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_26.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_27.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_28.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_29.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_30.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_31.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_32.rrd
-rw-r--r-- 1 root root 282440 Apr 20 12:25 test_hdd_total_33.rrd
-rw-r--r-- 1 root root 95608 Apr 20 12:25 test_proc_23.rrd
-rw-r--r-- 1 root root 331912 Apr 20 12:25 test_thresholds_25.rrd
-rw-r--r-- 1 root root 189024 Apr 20 12:25 test_traffic_in_34.rrd
-rw-r--r-- 1 root root 189024 Apr 20 12:25 test_traffic_in_35.rrd
-rw-r--r-- 1 root root 95608 Apr 20 12:25 test_users_22.rrd
-rw-r--r-- 1 root root 112440 Apr 20 12:25 test_wait_24.rrd
I've just noticed that the DEBUG log has a message saying "Spine is running asroot", not sure if relevant.
Wtf is going on here? can anyone shed some light PLEASE?
Re: Poller interval changed - graphs stopped working.
I think that CMDPHP may be running along with spine at the same time.
When I try to run poller.php --force I get the "Waiting on 1 poller" error a whole bunch of times.
That could also be what's creating the root owned RRD files.
When I try to run poller.php --force I get the "Waiting on 1 poller" error a whole bunch of times.
That could also be what's creating the root owned RRD files.
Re: Poller interval changed - graphs stopped working.
This was magically solved when I decided to copy settings (that didn't make any sense to me in the first place) from my other machine that works:
Web:
Poller Interval: 5 Minutes -> 1 Minute
Cron Interval: 5 Minutes (no change)
Maximum Threads Per Process: 1 -> 5
Actual Crontab timing: 5 Minutes (no change)
Why are these the only values that work for me?
I don't believe these numbers are special, I think I ran into a bug that had the system stuck in a "bad" status, specifically something to do with the poller, RRD's would get updated but the graphs were still empty.
I think it was resolved by me fiddling around with the above values regardless of the actual numbers that I chose.
I started debugging poller.php but as this issue was about two weeks old and I was getting zero help here on the forums, I just said f it and copied the settings.
Web:
Poller Interval: 5 Minutes -> 1 Minute
Cron Interval: 5 Minutes (no change)
Maximum Threads Per Process: 1 -> 5
Actual Crontab timing: 5 Minutes (no change)
Why are these the only values that work for me?
I don't believe these numbers are special, I think I ran into a bug that had the system stuck in a "bad" status, specifically something to do with the poller, RRD's would get updated but the graphs were still empty.
I think it was resolved by me fiddling around with the above values regardless of the actual numbers that I chose.
I started debugging poller.php but as this issue was about two weeks old and I was getting zero help here on the forums, I just said f it and copied the settings.
Re: Poller interval changed - graphs stopped working.
Since you are running CactiEZ, you should NOT mess with the poller settings. Both the cron and the poller should be at 1 minute intervals (cron can be changed but not the poller interval).
Graphs don't just magically work with whatever poller interval you decide. Most templates are either hard coded to work with 5 minute or as with CactiEZ, a lot of them are hard coded to work with 1 minute.
Inside of every Data Template you will have a step size and Heartbeat. Both of these help determine whether a graph is going to work at whatever interval you want. For instance a normal graph that takes data at 5 minute intervals will have a step of 300 (it accepts new data every 300 seconds / 5 minutes). At 600, the heartbeat just tells it needs to miss 2 data points before it writes an NaN.
For a graph that is actually updating every 1 minute (in CactiEZ these will be your traffic and a few other graphs). The step is 60, and the heartbeat is 120. If your cron and poller is set to 5 minute, then you will be getting NaNs because the poller is only updating every 300 seconds, while the graph heartbeat is writing NaNs after only 120s.
The reason it works with the cron as 5 minute and the poller as 1 minute is a feature of Cacti. That means that the poller will launch every 5 minutes with the cron, but after it finishes, it will sleep until the next 1 minute interval, and then will loop through again. It will do that until its time for the cron to run again.
If you really wish to move to a 5 minute interval, you will need to change the settings, and then change every single data template to use the 300/600 step/heartbeat and then remove all the old rrds (or modify them yourself).
Graphs don't just magically work with whatever poller interval you decide. Most templates are either hard coded to work with 5 minute or as with CactiEZ, a lot of them are hard coded to work with 1 minute.
Inside of every Data Template you will have a step size and Heartbeat. Both of these help determine whether a graph is going to work at whatever interval you want. For instance a normal graph that takes data at 5 minute intervals will have a step of 300 (it accepts new data every 300 seconds / 5 minutes). At 600, the heartbeat just tells it needs to miss 2 data points before it writes an NaN.
For a graph that is actually updating every 1 minute (in CactiEZ these will be your traffic and a few other graphs). The step is 60, and the heartbeat is 120. If your cron and poller is set to 5 minute, then you will be getting NaNs because the poller is only updating every 300 seconds, while the graph heartbeat is writing NaNs after only 120s.
The reason it works with the cron as 5 minute and the poller as 1 minute is a feature of Cacti. That means that the poller will launch every 5 minutes with the cron, but after it finishes, it will sleep until the next 1 minute interval, and then will loop through again. It will do that until its time for the cron to run again.
If you really wish to move to a 5 minute interval, you will need to change the settings, and then change every single data template to use the 300/600 step/heartbeat and then remove all the old rrds (or modify them yourself).
Re: Poller interval changed - graphs stopped working.
Many thanks man, that was quite informative, I appreciate you took the time to explain it.
I had no idea I wasn't supposed to mess with the poller settings when using CactiEZ.
I still don't understand the actual poller behavior in my current setup (5 min for cron, 1 min for poller) but that's ok since I'll probably change cron back to 1 minute.
I had no idea I wasn't supposed to mess with the poller settings when using CactiEZ.
I still don't understand the actual poller behavior in my current setup (5 min for cron, 1 min for poller) but that's ok since I'll probably change cron back to 1 minute.
Re: Poller interval changed - graphs stopped working.
I need help with this...
we recently change poller time from 10 sec to 30 sec. I found some data template had hard coded time 5-min. Some of the graphs doesn't work..
I deleted the graph and data source .. it shows graph fine.. but I lost my old data
I tried deleting data source.. leaving graphs intact and recreate graph so that new rrd/data sources are being created... I found I loose old data even doing that when new rrd file generated even if I assign to old graphs...Is there a way to keep old graph data and still fix the problem..
we recently change poller time from 10 sec to 30 sec. I found some data template had hard coded time 5-min. Some of the graphs doesn't work..
I deleted the graph and data source .. it shows graph fine.. but I lost my old data
I tried deleting data source.. leaving graphs intact and recreate graph so that new rrd/data sources are being created... I found I loose old data even doing that when new rrd file generated even if I assign to old graphs...Is there a way to keep old graph data and still fix the problem..
Re: Poller interval changed - graphs stopped working.
That version of cacti did not make it easy to change polling interval and without some community guidance, I know that officially we don't support it any more. I never even used that version of Cacti as I found the 1.x version to work far better.
Cacti Developer & Release Manager
The Cacti Group
Director
BV IT Solutions Ltd
+--------------------------------------------------------------------------+
Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation
The Cacti Group
Director
BV IT Solutions Ltd
+--------------------------------------------------------------------------+
Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation
Re: Poller interval changed - graphs stopped working.
Thanks netniV...
I am using older version for couple of reason.. Weathermap and other plugins...
We ended up deleting current graphs and change template to reflect 30 sec time and re-create graphs..
Who is online
Users browsing this forum: No registered users and 6 guests