Same graph on two devices doesn't behave the same

DaveClose · Post by **DaveClose** » Fri May 09, 2008 1:15 pm

To create a custom graph, I first create new data templates, then a new graph template, then I associate the graph template with a device, then I add the graph to that device. Usually it works just fine. Now I have one which behaves strangely and I'm stuck trying to debug it. See the two graphs attached.

The good graph is from the first device for which I followed this procedure, the bad graph is from the second but is similar to all subsequent devices I've tried. (I don't repeat the first two steps above for additional devices.) All debug modes do not report any errors, but I'm dumbfounded as to the origin of the data in the bad graph. I don't even know what that information could possibly mean.

SNMP queries to both devices respond correctly from the command line. What I'd like to see is what query Cacti is sending to the device and what response it is getting. Perhaps I'll resort to tcpdump. But if the query and response is correct, what can account for this strange behavior?

DaveClose · Post by **DaveClose** » Fri May 09, 2008 1:27 pm

Additional information: tcpdump shows both devices getting exactly the same queries and responses.

buck · Post by **buck** » Fri May 09, 2008 1:58 pm

You can check out System Utilities -> View Poller Cache to answer your question about what query info cacti is sending.

Also setting the Log level to DEBUG will show you the data it's sending/getting back.

Post by **gandalf** » Sat May 10, 2008 2:37 pm

And please visit both graphs at Graph Management. Switch to DEBUG and compare.
Reinhard

DaveClose · Post by **DaveClose** » Mon May 12, 2008 11:56 am

1. I already verified the SNMP query and reply using tcpdump. That was faster and easier than Cacti's debug mode. As I wrote above, the query and response were identical for both graphs.

2. The graph management output is attached. I see no significant difference.

3. Still hoping for an explanation of the "u" and "m" units attached to the bad graph.

Post by **gandalf** » Mon May 12, 2008 12:58 pm

DaveClose wrote:3. Still hoping for an explanation of the "u" and "m" units attached to the bad graph.

u represents micro and m represents milli as per sticky thread in this forum
Reinhard

Post by **gandalf** » Mon May 12, 2008 1:01 pm

The graph statements are fine. I headed for CDEFs but there are none.
Next issue is to verify data source type usage of both rrd files. To do so, please run

Code: Select all

rrdtool info ...

against the rrd files of both graphs. The first 30 lines are required only. I suspect the failing one uses COUNTER instead of GAUGE. Change this using "rrdtool tune" and verify the data template used.
Reinhard

DaveClose · Post by **DaveClose** » Mon May 12, 2008 1:17 pm

I can't get to the files at this moment, but I'll verify later today. However, the data for these graphs is actually produced as a string by an SNMP "exec" extension script. I had originally defined the data templates as GAUGE but that produced graphs with NaN data. I then changed the templates to DERIVE and the graphs started working. After that, I created the additional graph(s) and they produced the strange results.

If the additional graphs had also said NaN, I might have suspected the data source type. But what is causing the "u" and "m" data?

Post by **gandalf** » Mon May 12, 2008 1:30 pm

Again, "u" is "micro" = 1E-06
"m" is "milli"=1E-03
Reinhard

DaveClose · Post by **DaveClose** » Mon May 12, 2008 1:36 pm

Reinhard wrote:
> Again, "u" is "micro" = 1E-06, "m" is "milli"=1E-03

I understand the unit-of-measure prefices, what I don't understand is the units. If the reported data is "micro-units", then what is "units"? Why does Cacti think the reported data is micro- or mini-?

As the data provided through SNMP is actually a string, NaN would make sense. But mini- or micro- does not make sense (to me).

Post by **gandalf** » Mon May 12, 2008 1:42 pm

RRDTool only stores numbers. Everything is converted to numbers. If conversion fails, rrdtool will not update.
So it's a converted number.
The fact, that the second graph only shows minimal numbers (in the range of 1E-03) IMHO shows, that the rrd file is using a COUNTER. E.g. minimal changes in temperature stored as a rate (that is: divided by rrd step = 300) will result in very minimal values. COUNTERs stores differences only, not absolute values as GAUGEs do!
I recommend changing DStype to GAUGE for the failing rrd. Old data will NOT be changed, but new data should work
Reinhard

DaveClose · Post by **DaveClose** » Mon May 12, 2008 1:54 pm

Ok, I find that the RRD files for the good graphs were recorded as GAUGE and those for the bad graph as DERIVE. I've changed the bad one and need to wait a while to see the effect. I've also change the templates. I presume my change to the templates did not affect the already created RRD files, though the change to the graph seemed to occur at the same time.

But that just makes me wonder what happened originally, when I created these templates as GAUGE and the graph reported NaN, then I changed the templates to DERIVE and the graph started working. I've searched all the RRDtool documentation I can find and I don't find any reference to how RRDtool handles string data from SNMP. If it just deletes the quotation marks and accepts an otherwise valid numeric value, great. But in that case, why did I get NaN initially?

Same graph on two devices doesn't behave the same

Same graph on two devices doesn't behave the same

Who is online