Thanks for the RRDTool manual link. I know this sounds silly, but it only recently occurred to me that Cacti is really just a program that simplifies using RRDTool for graphing by providing a php interface. It also provides a very good list of commonly-used data gathering methods and presents them well with some pre-configured graph templates. So before you get Cacti, it's wise to know how RRDTool works first. I've spent some time learning about RRDTool and how it works and have a better understanding of things now.
I'm using version 0.81 of Cacti, and my quest for the traceroute graph continues. I've run into what I think are bugs in Cacti that I'd like to report here. The first is easy to work around, and the last two don't seem to have any work around solution at all. Here they are:
1. The Graph Template page changes the Template name to match a new Graph Item Input name when you create a new Graph Item Input. I can consistently duplicate this problem on any Graph Template I choose and similar problems may occur in other pages.
Steps to duplicate: Click Graph Templates and then click any template in the list. Click Add for Graph Item Inputs. Type a name and click Save (all you need to do is type the name, but you can play with the other stuff if you want). When you are returned to the main Graph Template page, notice the name of the template is now the name of the Graph Item Input you just created. Click the red X button next to the input you created and notice the Graph Template name changes back to the original.
2. Again in the Graph Template, I can't successfully create a Graph Template Item of type GPRINT. This problem may exist for other types, and I know I have been able to successfully use AREA.
Steps to duplicate: select any Graph Template and Add a Graph Template Item. Select an appropriate Data Source and choose type GPRINT. Give it a name and choose Save. Associate the Graph Template with a Polling Host and check the graph. You should see that the graph is broken. Check the Source link and you'll see the GPRINT line. It looks something like this for the GPRINT line:
Code: Select all
GPRINT::AVERAGE:"Gateway%8.2lf %s"
The back-to-back colons should have a letter between them that references a data source. If you check the Graph Template in Cacti, it shows you've done that, but it doesn't seem to reflect that when it does the rrdtool graph command.
3. This one is the doozy and I have to explain a lot so here we go. As you know, with the traceroute graph I need to create a Data Input Method with as many Output Fields as I have hops that I want to trace. I have also created a Data Template with corresponding Data Source Items for each hop. Next I created a Graph Template with Graph Items of type AREA for each hop (I haven't done a legend for the graph yet, just the lines. I can't do the legend because of bug 2 and because I just want to get the graph working and then I'll pretty it up later). For kicks and giggles I added two Graph Item Inputs for each of my ten AREA Graph Items (20 total). One for the line Color and one for the Data Source for each hop. (Note: Graph Item 1 is Hop10, Graph Item 2 is Hop9, etc. because the farther hops will have higher latency and I want the lower latency graph lines to be drawn last and therefore visible in front of the higher ones so you can see which hops are jacking up the milliseconds.)
Okay, now I sicked my graph template on a polling host and waited a while for it to gather data. The polling host I was tracing was four hops away from the Cacti box. The first two hops were next to zero milliseconds so they shouldn't have shown up on the graph, the third hop jumped to 200 ms and the last hop was averaging around ten milliseconds more than the third. Now the graph isn't labeled yet, but Hop1 was yellow, Hop2 was a sunny orange color, Hop3 was redish-orange color, and Hop4 was red.
The graph I was looking at was showing Hop1 (yellow) hovering at 200 ms with Hop4 (red) just above that. I looked at the source for the graph and noticed the data source for Hop1 was in the third position and the data source for Hop3 was in the first position. I blamed myself. I went to the Console and clicked Data Sources and fished out the data source for the graph (I have a big list now). I double-checked each of the Hops and made sure they were pointing to the right Internal Data Source Name. Then I double-checked the Data Template. Each of the Data Source Items was pointing to the right Output Field. I rechecked all the Output fields in the Data Input Method. Everything looked right. I did a dump of the rrd database for the graph and saw that Hop3's data was indeed being dumped into the field for Hop1. I decided that my script had a problem and redesigned it:
Code: Select all
#!/usr/bin/perl
@trace = `traceroute -m 9 -w 2 -q 1 -n $ARGV[0] | sed -l0 -e '1{d}' -e 's/\*/0\.0\.0\.0\ \ 0\.000\ ms/g'`;
$hopcount = @trace;
for ($i = 0; $i < $hopcount; $i ++){
$hop = $i + 1;
@trace[$i] =~ s/( |)(.*[0-9])(.* )(.*[0-9]\..*[0-9]\..*[0-9]\..*[0-9])(.* )(.*[0-9])(\.[0-9][0-9][0-9] ms)//;
print "hop0$hop:$6 ";
}
I omitted the tenth hop (in case the collection mechanism was getting confused between hop1 and hop10) and renamed my labels hop01, hop02, etc. instead of hop1, hop2, etc. I realize the perl code is messy; I'll fix that after I get it working.
I scrapped my Data Input Method, Data Template, and Graph Template and started over. This time I was careful to make sure I made no mistakes with naming and to create them in order. When I was done I again picked a Polling Host (this time a different one, but again with only four hops). I noticed a clue when I created the Data Source this time. Because the Graph Template has Graph Item Inputs of type Data Source for each hop I was able to see the order in which the data sources should be listed (from Hop09 to Hop01, remember I need them listed in reverse order to graph properly) but instead I saw two sets of numbers transposed again so they appeared from the top down in this order: 8, 9, 7, 6, 5, 3, 4, 2, 1
I attempted to correct this by manually inputting the correct order for the four hops that were out of place. When I waited for the data to collect and viewed the graph I was disappointed again. Here's the source for the graph:
Code: Select all
/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--title="Host 158.58.14.189 - Hop Latency" \
--base=1000 \
--height=150 \
--width=800 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="Milliseconds" \
DEF:a="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop08:AVERAGE \
DEF:b="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop09:AVERAGE \
DEF:c="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop07:AVERAGE \
DEF:d="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop06:AVERAGE \
DEF:e="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop05:AVERAGE \
DEF:f="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop03:AVERAGE \
DEF:g="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop04:AVERAGE \
DEF:h="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop02:AVERAGE \
DEF:i="/srv/www/htdocs/rra/host_158_58_14_189_hop09_103.rrd":hop01:AVERAGE \
AREA:a#55009D:"" \
AREA:b#005199:"" \
AREA:c#00A348:"" \
AREA:d#00FF00:"" \
AREA:e#FFF200:"" \
AREA:f#FFAB00:"" \
AREA:g#FF7D00:"" \
AREA:h#FF5700:"" \
AREA:i#FF0000:""
The DEF lines show that the two sets of data sources are still being transposed. I've tested the traceroute.pl script to see if it's listing the hops out of order and it's not. I'm not sure what else to try at this point. I hope you can fix this or help me out because I'm really stuck.
Thanks,