Bug? Polling happens at unexpected times/rrd_next_step prob?

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
curiousgeorge
Posts: 4
Joined: Fri May 24, 2013 3:50 am

Bug? Polling happens at unexpected times/rrd_next_step prob?

Post by curiousgeorge »

Hello,

I am using Cacti 0.8.8a on Linux. I have a small installation with 7 hosts. I encountered strange behavior where my data sources polled at the right intervals but at the wrong times! The RRD files expect data every 5 minutes, e.g. 11:30, 11:35, 11:45. However, data gets inserted into the files at 11:33, 11:38, 11:48, etc. The resulting normalization from these "bad" times causes the data to be less precise, since it seems rrdtool expects the value inserted at 11:33 to be for 11:30. Hence, there is more normalization by rrdtool for the value given at 11:33.

This happened after I changed the poller interval setting. I added a new RRA and one Data Source that would poll every minute, so I changed the poller interval to 1 minute (from 5 minutes) and rebuilt the poller cache. After doing this, the problem began.

Ultimately, I fixed this by going into the database and running the following command "update poller_item set rrd_next_step=0;". This synchronized all the values to 0, so that the poller would pick up every item on its next run. Prior to that, I observed that all the rra_next_ids had varying values, which caused the poller to execute a few poller items at the "bad" times on each run. (Alternatively as another fix, I could have set the poller interval to 5 minutes again and waited for the rrd_next_step values to all reset to "0" on the next run.)

Is this normal, expected behavior - for the 5 minute intervals to run between/outside of the 0,5,10,15,20,30,35,40,45,50,55 minutes? If it is not normal behavior, then how did this happen?

Here are some excerpts from my cacti.log file (DS[20] is the data source that polls every minute. The other DS items poll every 5 minutes). You can see that several items get polled at bad times:

Code: Select all

05/24/2013 12:36:03 AM - SYSTEM STATS: Time:1.1097 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:3
05/24/2013 12:36:03 AM - CMDPHP: Poller[0] Time: 1.0430 s, Theads: N/A, Hosts: 6
05/24/2013 12:36:02 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 6
05/24/2013 12:36:02 AM - CMDPHP: Poller[0] Host[1] DS[19] CMD: /opt/cacti-0.8.8a/scripts/unix_temperature.sh, output: 67.0
05/24/2013 12:36:02 AM - CMDPHP: Poller[0] Host[1] DS[3] CMD: perl /opt/cacti-0.8.8a/scripts/linux_memory.pl MemFree:, output: 5752460
05/24/2013 12:36:02 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.05 ms)
05/24/2013 12:35:04 AM - SYSTEM STATS: Time:2.4393 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:3
05/24/2013 12:35:04 AM - CMDPHP: Poller[0] Time: 2.3493 s, Theads: N/A, Hosts: 6
05/24/2013 12:35:04 AM - CMDPHP: Poller[0] Host[3] DS[18] CMD: perl /opt/cacti-0.8.8a/scripts/ssh_stats.sh server user output: 1min:0.00 5min:0.01 15min:0.05 running_threads:1 total_threads:67
05/24/2013 12:35:02 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 0
05/24/2013 12:35:02 AM - CMDPHP: Poller[0] Host[1] DS[7] CMD: perl /opt/cacti-0.8.8a/scripts/unix_processes.pl, output: 215
05/24/2013 12:35:02 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.06 ms)
05/24/2013 12:35:02 AM - POLLER: Poller[0] NOTE: Poller Int: '60', Cron Int: '300', Time Since Last: '301', Max Runtime '298', Poller Runs: '5'
05/24/2013 12:34:02 AM - SYSTEM STATS: Time:1.1060 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:2
05/24/2013 12:34:02 AM - CMDPHP: Poller[0] Time: 1.0247 s, Theads: N/A, Hosts: 6
05/24/2013 12:34:01 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 0
05/24/2013 12:34:01 AM - CMDPHP: Poller[0] Host[1] DS[6] CMD: perl /opt/cacti-0.8.8a/scripts/unix_users.pl , output: 5
05/24/2013 12:34:01 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.06 ms)
05/24/2013 12:33:02 AM - SYSTEM STATS: Time:1.1154 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:3
05/24/2013 12:33:02 AM - CMDPHP: Poller[0] Time: 1.0319 s, Theads: N/A, Hosts: 6
05/24/2013 12:33:01 AM - CMDPHP: Poller[0] Host[1] DS[32] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 9
05/24/2013 12:33:01 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 4
05/24/2013 12:33:01 AM - CMDPHP: Poller[0] Host[1] DS[5] CMD: perl /opt/cacti-0.8.8a/scripts/loadavg_multi.pl, output: 1min:0.00 5min:0.02 10min:0.00
05/24/2013 12:33:01 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.07 ms)
05/24/2013 12:32:02 AM - SYSTEM STATS: Time:1.5198 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:2
05/24/2013 12:32:02 AM - CMDPHP: Poller[0] Time: 1.4282 s, Theads: N/A, Hosts: 6
05/24/2013 12:32:02 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'192.168.1.222', and OID:'.1.3.6.1.2.1.1.3.0'
05/24/2013 12:32:01 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 7
05/24/2013 12:32:01 AM - CMDPHP: Poller[0] Host[1] DS[4] CMD: perl /opt/cacti-0.8.8a/scripts/linux_memory.pl SwapFree:, output: 13197304
05/24/2013 12:32:01 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.04 ms)
05/24/2013 12:31:02 AM - SYSTEM STATS: Time:1.1500 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:3
05/24/2013 12:31:02 AM - CMDPHP: Poller[0] Time: 1.0620 s, Theads: N/A, Hosts: 6
05/24/2013 12:31:01 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 7
05/24/2013 12:31:01 AM - CMDPHP: Poller[0] Host[1] DS[19] CMD: /opt/cacti-0.8.8a/scripts/unix_temperature.sh, output: 59.0
05/24/2013 12:31:01 AM - CMDPHP: Poller[0] Host[1] DS[3] CMD: perl /opt/cacti-0.8.8a/scripts/linux_memory.pl MemFree:, output: 5769900
05/24/2013 12:31:01 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.06 ms)
05/24/2013 12:30:03 AM - SYSTEM STATS: Time:2.6379 Method:cmd.php Processes:1 Threads:N/A Hosts:7 HostsPerProcess:7 DataSources:4 RRDsProcessed:3
05/24/2013 12:30:03 AM - CMDPHP: Poller[0] Time: 2.5658 s, Theads: N/A, Hosts: 6
05/24/2013 12:30:03 AM - CMDPHP: Poller[0] Host[3] DS[18] CMD: perl /opt/cacti-0.8.8a/scripts/ssh_stats.sh server user output: 1min:0.00 5min:0.01 15min:0.05 running_threads:1 total_threads:67
05/24/2013 12:30:02 AM - CMDPHP: Poller[0] Host[2] PING: UDP Ping Success (0.05 ms)
05/24/2013 12:30:01 AM - CMDPHP: Poller[0] Host[1] DS[20] CMD: /opt/cacti-0.8.8a/scripts/random/random.sh, output: 6
05/24/2013 12:30:01 AM - CMDPHP: Poller[0] Host[1] DS[7] CMD: perl /opt/cacti-0.8.8a/scripts/unix_processes.pl, output: 220
05/24/2013 12:30:01 AM - CMDPHP: Poller[0] Host[1] PING: UDP Ping Success (0.05 ms)
05/24/2013 12:30:01 AM - POLLER: Poller[0] NOTE: Poller Int: '60', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '5'
curiousgeorge
Posts: 4
Joined: Fri May 24, 2013 3:50 am

Re: Bug? Polling happens at unexpected times/rrd_next_step p

Post by curiousgeorge »

I think I know how this issue occurred. It appears to be a bug.

If I restart the computer (or even stop cacti from running) in the middle of a polling run that has 1 minute intervals, then the rrd_next_step interval stops getting decremented. This affects the polling items that rely on 5 minute intervals, since at each 1 minute run, the rrd_next_step is decreased by 60.

When it becomes available again, Cacti will resume its regular polling activities when the cron poller task kicks off again. But for our 5 minute interval items, the rrd_next_step was only partly decreased during the last run (e.g. let's say it could be at 180). When Cacti resumes polling, it continues to decrease rrd_next_step from its previous point (reducing by 60 for each 1-minute poll), which causes it to get prematurely run (e.g. in 3 minutes, and then 5 minutes from that point thereafter).
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests