[ HOWTO ] Graphing Holt-Winters Predictive Analysis

koaps · Post by **koaps** » Thu Nov 20, 2008 4:37 pm

So, I finally worked out tick support using a nice little hack JasperJ worked out for VDEFs.

The stuff I'm about to mention is for graphing HW enabled rrd's,

I personally use external scripts to create and update rrd's, so I haven't messed with having Cacti actually create them.

First off you will need to edit include/global_arrays.php and add in 3 new CF's.

It should look like this:

Code: Select all

$consolidation_functions = array(1 =>
        "AVERAGE",
        "MIN",
        "MAX",
        "LAST",
        "HWPREDICT",
        "DEVPREDICT",
        "FAILURES",
        );

Once you have that done, you will be able to create the graph templates and CDEF's.

The CDEF's you will need are:

Upper bounds: cdef=b,c,2,*,+
Lower bounds: cdef=b,c,2,*,-
Ticks: cdef=CURRENT_DATA_SOURCE TICK:CURRENT_DATA_SOURCE#ffffa0:1.0:"Failures\n"

Things to note here are, I have hardcoded the math in the bounds to use DEF's b and c, which in my case are HWPREDICT and DEVPREDICT respectively. Your setup could be different if you have more than one DS in your rrd, for instance for network traffic you have an in DS and an out DS. You will need to create Upper_In/Lower_In and Upper_Out/Lower_Out in that case and put the correct DEF letters inplace for the CDEFs.

The Ticks CDEF uses the graph injection hack that Jasper figured out, this could be removed in future versions of Cacti since it technically is a vulnerability though I don't think it's that severe of one.

Here's some debugging output from the graph so you can see what's going on:

Code: Select all

DEF:a="my.rrd":bps:AVERAGE \
DEF:b="my.rrd":bps:HWPREDICT \
DEF:c="my.rrd":bps:DEVPREDICT \
DEF:d="my.rrd":bps:FAILURES \
CDEF:cdefc=b,c,2,*,+ \
CDEF:cdefd=b,c,2,*,- \
CDEF:cdefe=d TICK:d#ffffa0:1.0:"Failures\n" \
AREA:a#000000FF:"\n"  \
LINE3:b#FFF200B2:""  \
LINE2:cdefc#FF0000B2:""  \
LINE2:cdefd#FF0000B2:""  \
 \
LINE1:d:""

To get the DEF for FAILURES I made a line1 with no color, this gives me DEF d.
After I have that, I made a comment, with the CF set to FAILURES and the CDEF set to Tick.

That will set CURRENT_DATA_SOURCE to equal DEF d and the tick will be graphed.

Currently the ticks are plotting correctly for me, I have graphs generated both within Cacti and externally and so far they match. I have a situation right now that puts my graphs in a failure state, so I need to wait til I recover from that state to see if the failure ticks stop when I no longer violate my confidence bands.

Let me know if you guys need some more info, I have posted stuff on how I did my confidence bands here:

http://forums.cacti.net/viewtopic.php?t ... t&start=30

When I get a chance I'll dump my graph templates and post them.

brian.nz · Post by **brian.nz** » Thu Nov 20, 2008 6:23 pm

Awesome work.

1 Question, you mentioned you use custom scripts to create and update rrd's for use with HW.

I have around 4800 rrd's that needs to be made HW aware, any chance of sharing those scripts or pointing me in the direction of those scripts.

Regards

koaps · Post by **koaps** » Thu Nov 20, 2008 7:26 pm

There's a perl script to do something like that.

http://internap.dl.sourceforge.net/sour ... 0.1.tar.gz

http://rrfw.sourceforge.net/rrdman/rrd_ ... y.pod.html

I have never used it but it might work for ya.

My script does something completely different, it copies 2 gigs of random data to our san, doing a md5sum at the end to check for corruption and then saves the transfer rate into a rrd and makes a graph.

Just remember a few things, with 4800 rrds, if each has only one data store, you will be adding 3 more RRA's along with any already there(Average, Max, Min or Last). This can take a long time to update and with that many you can easily go past 5 minutes, causing updates to fail.

The other thing is, HW is for trending, if you use the default thresholds and window lengths you will be looking for 35 minutes of non-continuous failure in a 45 minute window, not exactly realtime, and in my experience it takes several days of data collection before confidence bounds even show up.

You can see this clearly in the image I posted from Cacti, there's 6 violations(values outside my confidence bounds) before it marks a failure, on the 7th(35 minutes total).

I'm still working on making my script react when it detects a failure, most likely it will tell nagios directly causing a page to go out.

My scripts are in perl, so I use the RRDs module directly and it's almost as easy as working with rrdtool on the command line.

To give you an idea, this is how my script creates rrd's

Code: Select all

        if (not -e "$rrd_path/$k.rrd") {
#           print "Creating RRD $rrd_path/$k.rrd\n";
            RRDs::create ("$rrd_path/$k.rrd",
                "DS:$ds:GAUGE:1800:U:U",
                "RRA:AVERAGE:0.5:1:2016",
                "RRA:HWPREDICT:1440:0.1:0.0035:288",
                );
            $err=RRDs::error;
            if ($err) {print "problem creating the RRD: $err\n";}
        }

It first checks to see if the rrd exists, if not, creates it with default HW values.

RRA:HWPREDICT

0.1:0.0035:288

Which I go from:

http://www.usenix.org/events/lisa2000/f ... index.html

I still have a lot testing to do to see how the system works, so far it's working.

We experienced a drive failure in a san disk tray and HW detected the RAID rebuilding process and marked it as a failure, this is why my bps dropped from over 200MB/s to less than 100MB/s.

Let me know if that script works or not, I can probably modify my script to do something like what you need.

Post by **gandalf** » Sat Nov 22, 2008 4:43 am

I will have to do a wrapup of everything you've mentioned here. The ugly one might be the VDEF issue. Then, there will be a good chance to get it into 088.
Reinhard

koaps · Post by **koaps** » Mon Nov 24, 2008 4:38 pm

On a side note about failure graphs, they are only applied to current data.

So if you try to go back and look for a failure, they will not be marked on the graph.

I have tested this directly with rrdtool, so it's not a side effect of the "hack" per se.

psyber · Post by **psyber** » Thu Apr 02, 2009 5:42 pm

I've been playing with this a bit in the past week to see what will need to be done
to add proper functionality in cacti to do forecasting and trending.

This is just me thinking out loud and trying to get the ball rolling in the community:

Simply adding "HWPREDICT", "SEASONAL", "DEVSEASONAL", "DEVPREDICT",
and "FAILURES" to global arrays produces somewhat buggy behavior.

MHWPREDICT should also be added to that list of CFs.

Adding the new CFs here allows you to pick them as a CF when creating or modifying a graph template
but they also all show up in RRAs editor which will make rrdtool barf on create if any of the RRA periods
use one of them as a CF because the new CFs take different arguments. rrdtool allows for implicit creation
of HW enabled RRDs by simply calling HWPREDICT or MHWPREDICT as the final argument in create. rrdtool will
create all the HW related CFs with some default values. For serious trending and forecasting I think
this falls way short but implicit creation is certainly the low hanging fruit in this scenario. This
allows the masses to get their feet wet and the bleeding edge to get bloody by tuneing their values
to match their situation.

So to summerize I think add the CFs to global array, mask the new CFs from selection in RRAs except for
HWPREDICT and MHWPREDICT, add some logic to cacti's rrdcreate function that says if HWPREDICT or MHWPREDICT
is chosen make it the final argument and dont pass the usual xff, rows etc with it.
(I dont think you can implicit create both a HWPREDICT and a MHWPREDICT in a rrd but I dont see why you couldnt have both CFs defined I'll try and test that)

Another option might be to just postpend HWPREDICT to all rrds created from now on, the down side is larger rrd files and many things that will probably never be of interest to do trending on.

At some point there will have to be a widget to easily tune all the HW settings if they are to be of any real use.
Perhaps a HW plugin would be appropriate.

While researching rrdtool and Holt-winters I stumbled across a receint patch by Martin Sperl for the addition
of 2 new CDEFs (PREDICT and PREDICTSIGMA) with a hint of more to come later. Its simple sliding window that
predicts about a day into the future. I've run it across a couple functions like SINE, e^x, and a couple real
world datasets and it seems to do quite well. Very interesting and also very easy to add to cacti.
You can read the docs in the patch here:
http://oss.oetiker.ch/rrdtool-trac/changeset/1649
If you want to give it a whirl you'll have to grab the nightly trunk snapshot from SVN and build it.
and Martins announcment here:
http://www.mail-archive.com/rrd-develop ... 02849.html

I'll see if I can throw patch together to add the functionality, but as these CDEFs haven't even made
the rrdtool documentation yet I dont think theres any rush.

Cacti

[ HOWTO ] Graphing Holt-Winters Predictive Analysis

[ HOWTO ] Graphing Holt-Winters Predictive Analysis

Holt-Winter

Another HW behavior note.

Who is online