HELP: rrdtool update doesn't always happen

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
kuma3
Posts: 25
Joined: Tue Oct 02, 2007 1:17 pm

HELP: rrdtool update doesn't always happen

Post by kuma3 »

Hoping you guys can help troubleshooting some rrd update problem. With debug turn on, I can see poller does its thing and pull the data correctly every 5 minutes. Problem is after polling, CACTI2RRD's rrdtool update doesn't always happen and of course results in graphs with ugly gaps.

I am wondering what's the logic between polling and updating rrd and if there is something retrainting 'rrdtool update' from happening after poll.

Any hints and direction is appreciated.

Thanks
MLin


Sample of the log file below. You can see 11:35 poll happened, but no rrd update where 11:40 had both poll and update.
10/22/2007 11:35:13 PM - CMDPHP: Poller[0] Host[9] DS[1279] CMD: /usr/local/nagios/libexec/check_nrpe -H xxx.yyy.zzz -c ads_stats, output: tot_req:5754 tot_imp:5582 tot_imp_no:709 tot_click:0 tot_click_inv:6 tot_owed:0.00 req_rate:19.180 imp_rate:18.607 imp_no_rate:2.363 click_rate:0.000 click_inv_rate:0.027 money_rate:0.00 imp_no_prct:11.270 click_inv_prct:100.000 ctr_prct:0.000 tot_max_bid:0.00 tot_inv_max_bid:0.54 inv_max_bid_rate:100.000 err_ticket_allowedclicks:4 err_ticket_hashmismatch:2 err_ticket_hashmismatch_max_bid:0.46 err_ticket_allowedclicks_max_bid:0.08
10/22/2007 11:40:13 PM - CMDPHP: Poller[0] Host[9] DS[1279] CMD: /usr/local/nagios/libexec/check_nrpe -H xxx.yyy.zzz -c ads_stats, output: tot_req:6895 tot_imp:6208 tot_imp_no:1159 tot_click:0 tot_click_inv:2 tot_owed:0.00 req_rate:22.983 imp_rate:20.693 imp_no_rate:3.863 click_rate:0.000 click_inv_rate:2.000 money_rate:0.00 imp_no_prct:15.732 click_inv_prct:100.000 ctr_prct:0.000 tot_max_bid:0.00 tot_inv_max_bid:0.10 inv_max_bid_rate:100.000 err_ticket_allowedclicks:2 err_ticket_allowedclicks_max_bid:0.10
10/22/2007 11:40:23 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /usr/local/nagios/share/cacti/rra/xxx.yyy.zzz.err_tt_allowedclick_1279.rrd --template err_tt_allowedclick 1193096412:2
kuma3
Posts: 25
Joined: Tue Oct 02, 2007 1:17 pm

Post by kuma3 »

One more finding. Correct me if wrong, but it looks like poller would first save the output into poller_output table, then some other process would come in and use that data to update the RRD and delete used entry afterward.

If that's correct, then this could be a problem with that second process because my poller_output has 31978 entries with oldest dated back to 10/18/2007.

What can I do here? How do I find out why cacti doesn't like those entries?


Thanks,
MLin
kuma3
Posts: 25
Joined: Tue Oct 02, 2007 1:17 pm

Post by kuma3 »

Dig deeper into the code and I think I've found the reason. Here's the code snippet that processes poller output to rrd.

In lib/poller.php around line 284

Code: Select all

      /* fallback values */
      if ((!isset($rrd_update_array{$item["rrd_path"]}["times"][$unix_time])) && ($item["rrd_name"] != "")) {
        $rrd_update_array{$item["rrd_path"]}["times"][$unix_time]{$item["rrd_name"]} = "U";
      }else if ((!isset($rrd_update_array{$item["rrd_path"]}["times"][$unix_time])) && ($item["rrd_name"] == "")) {
        unset($rrd_update_array{$item["rrd_path"]});
      }   
    }  
Here's some background info.

1. A script to pull mutiple metric called foo.
2. Multiple data templates use foo to gather their data.
3. One behavior of foo is that it won't include a metric in its output if it's 0. (bad).

And now, here's how my poller_output got fill with 37000+ rows and can never be cleaned up without doing a table truncate:

1. Once in a while, output from foo would not match any data sources of a data template since it doesn't output a metric when it's 0. So that output entry would stay in poller_output table after process_poller_output since this function couldn't match any data source to output and there isn't code to clean up such entry.
2. Next time around, good output from foo that matches with some data sources comes in and $rrd_update_array{$item['rrd_path']} gets updated for that time period. However, when the processing loop hit the row from step 1, the code pasted above would actually unset whole $rrd_update_array{$item['rrd_path']} and effectively remove all good data points previously saved in the array before this iteration.

Couple possible solutions for this problem.
1. Erased everything from poller_output and it should run ok for a while until we encounter another no-match output.
2. Make sure script foo outputs every metric every single time even if the value is 0.
3. Update lib/poller.php so instead of unsetting $rrd_update_array for the whole rrd_path, just remove the bad row from table.

I am for solution 3 but want to get some feedback from Cacti developers if my observation is correct and if I miss anything. If it's ok, I can modify that piece of code and submit a patch.


Thanks,
M Lin[/list]
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

What version of Cacti. I would suspect it is an older version. You need to upgrade I would suspect. This is not allowed to happen in 0.8.7. Typically it happens when you do not have the "memory_limit" in the php.ini file set high enough. Once it melts down from a lack of memory, the table "would" simply start to grow out of control.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
kuma3
Posts: 25
Joined: Tue Oct 02, 2007 1:17 pm

Post by kuma3 »

TheWitness,

I am running 0.8.6j and by the look of 0.8.7's poller.php from svn I suspect it will have the same problem. I understand if the memory_limit isn't set high enough, process_poller_output would die in the middle of things and left unprocessed data in the poller_output table.

However, this is not the case here. Memory limit has not been reached. Two problems I am seeing in the code is:

1. As it goes through each row in poller_output, it populates $rrd_update_array when the output matches the data sources. However, when it sees an output that doesn't match ANY data source item, it erased the whole array for that rrd file which kills all the good data previously saved.

2. There is no code handling those no-matched output so they are stuck in the poller_output table forever and increase the chance of problem #1 showing up.
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Hmm. We truncate the poller output table on every pass in 0.8.7, so bogus data is erased. This is why I recommend this. Otherwise, I believe I am aware of an issues similar to this. Anyway, by truncating the table, this "buildup" should no longer happen. In fact in 0.8.7, I provide a warning in the Cacti Log to let you know that something is "hosed".

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
kuma3
Posts: 25
Joined: Tue Oct 02, 2007 1:17 pm

Post by kuma3 »

cool! I will try the new version soon then.
kuma3
Posts: 25
Joined: Tue Oct 02, 2007 1:17 pm

Post by kuma3 »

Sorry I couldn't find the code where the poller_output table is truncated, but I looked again in lib/poller.php, it has the same fall back value code in it which would still result in removing good values when it encounters a bad output.

For whatever it's worth, I am attaching the diff for 0.8.6j for lib/poller.php.
Attachments
lib_poller.patch
(1.73 KiB) Downloaded 261 times
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest