ALL_DATA_SOURCES_NODUPS not working with MAX?

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
jlg
Posts: 39
Joined: Thu Feb 16, 2006 7:39 pm

ALL_DATA_SOURCES_NODUPS not working with MAX?

Post by jlg »

hi folks,

my graphs are breaking when i change the "Consolidation Function" from AVERAGE to MAX for the "ucd/net - CPU Usage" graph template.

i've narrowed down the problem to the "Total" graph template item. for each item that i change from AVERAGE to MAX, i lose entries in the corresponding cdef until i finally get a blank one.

let me illustrate. here is a sample starting graph (i put everything back from memory so it's possible that this does not 100% match the "as shipped" version of the graph template.)

Code: Select all

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="debops2 - CPU Usage" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--upper-limit=100 \
--lower-limit=0 \
--vertical-label="percent" \
DEF:a="/usr/share/cacti/site/rra/debops2_cpu_system_212.rrd":cpu_system:AVERAGE \
DEF:b="/usr/share/cacti/site/rra/debops2_cpu_user_213.rrd":cpu_user:AVERAGE \
DEF:c="/usr/share/cacti/site/rra/debops2_cpu_nice_211.rrd":cpu_nice:AVERAGE \
CDEF:cdefbc=TIME,1166473128,GT,a,a,UN,0,a,IF,IF,TIME,1166473128,GT,b,b,UN,0,b,IF,IF,TIME,1166473128,GT,c,c,UN,0,c,IF,IF,+,+ \
AREA:a#FF0000:"System"  \
GPRINT:a:LAST:"Current\:%8.3lf"  \
GPRINT:a:AVERAGE:"Average\:%8.3lf"  \
GPRINT:a:MAX:"Maximum\:%8.3lf\n"  \
STACK:b#0000FF:"User"  \
GPRINT:b:LAST:"  Current\:%8.3lf"  \
GPRINT:b:AVERAGE:"Average\:%8.3lf"  \
GPRINT:b:MAX:"Maximum\:%8.3lf\n"  \
STACK:c#00FF00:"Nice"  \
GPRINT:c:LAST:"  Current\:%8.3lf"  \
GPRINT:c:AVERAGE:"Average\:%8.3lf"  \
GPRINT:c:MAX:"Maximum\:%8.3lf\n"  \
LINE1:cdefbc#:"Total"  \
GPRINT:cdefbc:LAST:" Current\:%8.2lf %s"  \
GPRINT:cdefbc:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:cdefbc:MAX:"Maximum\:%8.2lf %s\n" 

now, after changing "System" from AVERAGE to MAX:

Code: Select all

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="debops2 - CPU Usage" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--upper-limit=100 \
--lower-limit=0 \
--vertical-label="percent" \
DEF:a="/usr/share/cacti/site/rra/debops2_cpu_system_212.rrd":cpu_system:MAX \
DEF:b="/usr/share/cacti/site/rra/debops2_cpu_user_213.rrd":cpu_user:AVERAGE \
DEF:c="/usr/share/cacti/site/rra/debops2_cpu_nice_211.rrd":cpu_nice:AVERAGE \
CDEF:cdefbc=TIME,1166473436,GT,b,b,UN,0,b,IF,IF,TIME,1166473436,GT,c,c,UN,0,c,IF,IF,+ \
AREA:a#FF0000:"System"  \
GPRINT:a:LAST:"Current\:%8.3lf"  \
GPRINT:a:AVERAGE:"Average\:%8.3lf"  \
GPRINT:a:MAX:"Maximum\:%8.3lf\n"  \
STACK:b#0000FF:"User"  \
GPRINT:b:LAST:"  Current\:%8.3lf"  \
GPRINT:b:AVERAGE:"Average\:%8.3lf"  \
GPRINT:b:MAX:"Maximum\:%8.3lf\n"  \
STACK:c#00FF00:"Nice"  \
GPRINT:c:LAST:"  Current\:%8.3lf"  \
GPRINT:c:AVERAGE:"Average\:%8.3lf"  \
GPRINT:c:MAX:"Maximum\:%8.3lf\n"  \
LINE1:cdefbc#:"Total"  \
GPRINT:cdefbc:LAST:" Current\:%8.2lf %s"  \
GPRINT:cdefbc:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:cdefbc:MAX:"Maximum\:%8.2lf %s\n" 
as you can see, the cdef is now shorter.

now, after changing "User", "Nice" and "Total" from AVERAGE to MAX, i get this:

Code: Select all

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="debops2 - CPU Usage" \
--rigid \
--base=1000 \
--height=120 \
--width=500 \
--upper-limit=100 \
--lower-limit=0 \
--vertical-label="percent" \
DEF:a="/usr/share/cacti/site/rra/debops2_cpu_system_212.rrd":cpu_system:MAX \
DEF:b="/usr/share/cacti/site/rra/debops2_cpu_user_213.rrd":cpu_user:MAX \
DEF:c="/usr/share/cacti/site/rra/debops2_cpu_nice_211.rrd":cpu_nice:MAX \
CDEF:cdefbc= \
AREA:a#FF0000:"System"  \
GPRINT:a:LAST:"Current\:%8.3lf"  \
GPRINT:a:AVERAGE:"Average\:%8.3lf"  \
GPRINT:a:MAX:"Maximum\:%8.3lf\n"  \
STACK:b#0000FF:"User"  \
GPRINT:b:LAST:"  Current\:%8.3lf"  \
GPRINT:b:AVERAGE:"Average\:%8.3lf"  \
GPRINT:b:MAX:"Maximum\:%8.3lf\n"  \
STACK:c#00FF00:"Nice"  \
GPRINT:c:LAST:"  Current\:%8.3lf"  \
GPRINT:c:AVERAGE:"Average\:%8.3lf"  \
GPRINT:c:MAX:"Maximum\:%8.3lf\n"  \
LINE1:cdefbc#:"Total"  \
GPRINT:cdefbc:LAST:" Current\:%8.2lf %s"  \
GPRINT:cdefbc:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:cdefbc:MAX:"Maximum\:%8.2lf %s\n" 
which gives me this error from "Graph Debug": ERROR: can't parse CDEF 'cdefbc='

i tried to read lib/rrd.php but it was beyond my ability to quickly grasp. is ALL_DATA_SOURCES_NODUPS fundamentally incompatible with the MAX consolidation function?

thanks,
jlg
jlg
Posts: 39
Joined: Thu Feb 16, 2006 7:39 pm

Post by jlg »

a follow up to help clarify the issue...

INITIAL PROBLEM TO SOLVE:
if you use the "ucd/net - CPU Usage" graph template and look back at historical data, you will discover that the "peaks disappear" the further back in time that you go.

for instance, let's say you noticed that cpu spiked up to 95% on 11/1/2006. then a month later, you want to review the cpu usage for 11/1/2006, you will see that the graph might show the highest cpu usage to be 45% for that day eventhough you know that it had peaked to 95% when you first watched the data on 11/1/2006. this is because the consolidation function called AVERAGE will take many data points and "consolidate them to the AVERAGE of all data points"...thus averaging down the peaks.

you don't even need to wait a month to see this effect. simply look at today's peak (say 95% for example) and then look at the page where the daily, weekly, monthly and yearly graphs are displayed and then check the legends. you'll see that the max for the week will not even match the 95% for today even though today is included in the weekly graph.


FIRST SHOT AT SOLVING PROBLEM:
you can change each of the graph template item data sources (System, User, Nice) to use the consolidation function MAX. you do not even need to drop and recreate your graphs if you have the default RRAs. this is because the default RRAs will store both the AVERAGE and MAX consolidation functions. changing the graph template only tells cacti which set of data to use.


SNAG:
however, the "ucd/net - CPU Usage" graph template has one graph template item called "Total" which graphs the sum of System, User and Nice...a very nice feature of the graph IMHO.

the magic is done by using the cacti CDEF called "Total All Data Sources". this CDEF uses a special token only recoginized by cacti called "ALL_DATA_SOURCES_NODUPS". the cacti code catches this token and rewrites the CDEF in the graphing command as posted in the first post.

however, as you can see, simply changing the consolidation function (from AVERAGE to MAX) for the graph template items System, User and Nice, eventually causes the generated CDEF in the graphing command to end up with an empty string...which is wrong and breaks the graph.


RESTATED QUESTION:
can anyone who works on this section of code comment on whether this is working as intended (ie, only work for the consolidation function AVERAGE)...or is a bug?

if it's a bug and you guys need help debugging it, can someone direct me to a simple tutorial on debugging php scripts in apache? and perhaps a simple overview of the $cf_ds_cache, $graph_items & $graph_item_types data objects?


FYI:
if you simply remove the "Total" graph item and it's legend, then the graph works just fine (ie, displaying the historical MAX rather than the historical AVERAGE.)

FYI #2:
if you add an additional "cacti RRA" which stores the 5 minute snapshot going back as far as you need (ie, 1-2 years in my case), then the consolidation problem does not affect your data. so in theory, you should be able to use the "ucd/net - CPU Usage" graph template "as is" (ie, with the AVERAGE consolidation function) without destroying the peaks.

however, i do not yet know if the "pixel consolidation" will cause essentially the same problem. ie, when graphing over an entire year, you may have more data points than pixels and so some decision needs to be made regarding consolidating those data points into pixels. again, i would choose to use MAX to retain the history of peaks (and to know which area of the graph to drill down into.) but so far, using MAX has been incompatible with the graph item "Total".
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

jlg wrote:however, i do not yet know if the "pixel consolidation" will cause essentially the same problem. ie, when graphing over an entire year, you may have more data points than pixels and so some decision needs to be made regarding consolidating those data points into pixels. again, i would choose to use MAX to retain the history of peaks (and to know which area of the graph to drill down into.) but so far, using MAX has been incompatible with the graph item "Total".
Sorry, but I'm not that familiar with this part of the code.
Your statement above aims at a problem that is often misunderstood. In my opinion, it would be better to solve this by adding the reduce=MAX option of the DEF statements to be sure, that no MAX items are "grapghically" averaged.
But currently, cacti does not support this option.
Reinhard
jlg
Posts: 39
Joined: Thu Feb 16, 2006 7:39 pm

Post by jlg »

hi Reinhard, thanks for the tip regarding DEF. i looked at the rrdgraph man page and found this:

Code: Select all

       DEF:vname=rrd:ds-name:CF
           Define virtual name for a data source. This name can then be used in the functions explained below. The
           DEF call automatically chooses an RRA which contains CF consolidated data in a resolution appropriate for
           the size of the graph to be drawn.  Ideally this means that one data point from the RRA should be repre-
           sented by one pixel in the graph.  If the resolution of the RRA is higher than the resolution of the
           graph, the data in the RRA will be further consolidated according to the consolidation function (CF) cho-
           sen.
so this confirms my suspicion that i would need to use the MAX CF to ensure that the pixel aggregation would also honor the "MAX".

however, i do not see any reference to reduce=MAX in the man pages (debian sarge installation). i'll look around further.

also, since you didn't say that the reported behavior is expected...i'll conclude that it's a bug. perhaps i will find a fix for it but i'll need to learn how to debug php in apache first.

thanks,
jlg
jlg
Posts: 39
Joined: Thu Feb 16, 2006 7:39 pm

Post by jlg »

regarding reduce=MAX, i found this link:
http://oss.oetiker.ch/rrdtool/doc/rrdgraph_data.en.html

which states:

Code: Select all

DEF:<vname>=<rrdfile>:<ds-name>:<CF>[:step=<step>][:start=<time>][:end=<time>][:reduce=<CF>]

If consolidation needs to be done, the CF of the RRA specified in the DEF itself will be used to reduce the data density. This behaviour can be changed using :reduce=<CF>. This optional parameter specifies the CF to use during the data reduction phase.
so essentially, it's an override for the default behavior. the changes that i was attempting to do would cause the default behavior to retain the MAX so this override would be unnecessary. in fact, if i simply remove the "Total" graph template item, then all works as expected.

however, i am trying not to lose "Total" since i find it useful.

jlg
jlg
Posts: 39
Joined: Thu Feb 16, 2006 7:39 pm

Post by jlg »

hi folks, ok, i found the problem and patched it in my version to work.

against debian sarge "0.8.6c-7sarge3", this patch works for ALL_DATA_SOURCES_NODUPS.

Code: Select all

*** /usr/share/cacti/site/lib/rrd.php-orig      Sat Apr  8 08:21:03 2006
--- /usr/share/cacti/site/lib/rrd.php   Mon Jan 22 10:34:23 2007
***************
*** 836,843 ****
                                for ($t=0;($t<count($graph_items));$t++) {
                                        if ((ereg("(AREA|STACK|LINE[123])", $graph_item_types{$graph_items[$t]["graph_type_id"]})) && (!empty($graph_items[$t]["data_template_rrd_id"]))) {
                                                /* if the user screws up CF settings, PHP will generate warnings if left unchecked */
!                                               if (isset($cf_ds_cache{$graph_items[$t]["data_template_rrd_id"]}[$cf_id])) {
!                                                       $def_name = generate_graph_def_name(strval($cf_ds_cache{$graph_items[$t]["data_template_rrd_id"]}[$cf_id]));
                                                        $cdef_total_ds .= ($item_count == 0 ? "" : ",") . "TIME," . (time() - $seconds_between_graph_updates) . ",GT,$def_name,$def_name,UN,0,$def_name,IF,IF"; /* convert unknowns to '0' first */
                                                        $item_count++;
                                                }
--- 836,845 ----
                                for ($t=0;($t<count($graph_items));$t++) {
                                        if ((ereg("(AREA|STACK|LINE[123])", $graph_item_types{$graph_items[$t]["graph_type_id"]})) && (!empty($graph_items[$t]["data_template_rrd_id"]))) {
                                                /* if the user screws up CF settings, PHP will generate warnings if left unchecked */
!                                                 $my_cf_ids = array_keys ($cf_ds_cache{$graph_items[$t]["data_template_rrd_id"]});
!                                                 foreach ($my_cf_ids as $my_cf_id) {
!
!                                                       $def_name = generate_graph_def_name(strval($cf_ds_cache{$graph_items[$t]["data_template_rrd_id"]}[$my_cf_id]));
                                                        $cdef_total_ds .= ($item_count == 0 ? "" : ",") . "TIME," . (time() - $seconds_between_graph_updates) . ",GT,$def_name,$def_name,UN,0,$def_name,IF,IF"; /* convert unknowns to '0' first */
                                                        $item_count++;
                                                }

in short, i am simply looping through all the available data sources and ignoring the "faked consolidation function id" ($cf_id). i'll explain in more detail in the next post to keep this one from being too crowded.

jlg
jlg
Posts: 39
Joined: Thu Feb 16, 2006 7:39 pm

Post by jlg »

hi folks, here's the more detailed analysis.

CORE PROBLEM:
in /usr/share/cacti/site/lib/rrd.php, we have the function rrdtool_function_graph.

the faked $cf_id which is hardwired to "1" on line 822 causes the "data store cache check" to fail for any data sources that do not use "1" (ie, AVERAGE) for their consolidation function.

ie, this check on line 839 is essentially hardwired to only check for "1" (AVERAGE) and not "3" (MAX):

Code: Select all

isset($cf_ds_cache{$graph_items[$t]["data_template_rrd_id"]}[$cf_id])
therefore, after adding up all of the "found" data sources, it ends up with an empty string which you see in my first post:

Code: Select all

CDEF:cdefbc= \

WHAT DO YOU MEAN BY "FAKED $cf_id"?

in this code snippet, we fail all of the checks and hit the "last else" case on line 822.

Code: Select all

813   if (isset($cf_ds_cache{$graph_item["data_template_rrd_id"]}[1])) {
814           $cf_id = 1; /* CF: AVERAGE */
815   }elseif (isset($cf_ds_cache{$graph_item["data_template_rrd_id"]}[3])) {
816           $cf_id = 3; /* CF: MAX */
817   }elseif (isset($cf_ds_cache{$graph_item["data_template_rrd_id"]}[2])) {
818           $cf_id = 2; /* CF: MIN */
819   }elseif (isset($cf_ds_cache{$graph_item["data_template_rrd_id"]}[4])) {
820           $cf_id = 4; /* CF: LAST */
821   }else{
822           $cf_id = 1; /* CF: AVERAGE */
823   }
as you can see, we essentially gave up and set $cf_id = 1 which corresponds to AVERAGE.


WHY DOES $cf_id=1 CAUSE A PROBLEM?

this code snippet is run when:
1. a graph item's cdef contains "ALL_DATA_SOURCES_(NO)?DUPS"
2. this is the first time we are seeing this cdef

the apparent goal here is to loop through all of the graph items whose type is "AREA|STACK|LINE[123]" and build the string $cdef_total_ds.

Code: Select all

835   $item_count = 0;
836   for ($t=0;($t<count($graph_items));$t++) {
837           if ((ereg("(AREA|STACK|LINE[123])", $graph_item_types{$graph_items[$t]["graph_type_id"]})) && (!empty($graph_items[$t]
["data_template_rrd_id"]))) {
838                   /* if the user screws up CF settings, PHP will generate warnings if left unchecked */
839                   if (isset($cf_ds_cache{$graph_items[$t]["data_template_rrd_id"]}[$cf_id])) {
840                           $def_name = generate_graph_def_name(strval($cf_ds_cache{$graph_items [$t]["data_template_rrd_id"]}[$cf
_id]));
841                           $cdef_total_ds .= ($item_count == 0 ? "" : ",") . "TIME," . (time() - $seconds_between_graph_updates)
. ",GT,$def_name,$def_name,UN,0,$def_name,IF,IF"; /* convert unknowns to '0' first */
842                           $item_count++;
843                   }
844           }
845   }
the problem is that when you set your graph item "CF Type" to MAX, then the check on line 839 will fail because MAX corresponds to "3", not "1". so even though you have data sources in $cf_ds_cache, the above code fails to find them and we end up with the empty string listed above.

my patch in the earlier post ignores the "faked $cf_id" and simply loops over all of the data source "CF Type"s.



PROBLEMS WITH MY SOLUTION:
1. the first problem is that i really don't know what is the "correct fix" for the bigger picture. i only changed it for ALL_DATA_SOURCES_NODUPS...ignoring "SIMILAR_DATA_SOURCES_(NO)?DUPS".

also, since i cannot find documentation on ALL_DATA_SOURCES_NODUPS, i do not know if there was any intention to honor the "CF Type". the original code seems to honor $cf_id though the name ALL_DATA_SOURCES_NODUPS suggests othewise. my first attempt to solve this did honor the "CF Type" but upon reflection, it made no sense since the name said "ALL_*".


2. i'm running on what looks like an older version, so there's a possibility this has already been fixed and i am merely adding to the noise. however, i searched for ALL_DATA_SOURCES_NODUPS in the bug system (bugs.cacti.net)and did not find anything.

jlg
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests