[solved] script with intermittent NAN results

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

[solved] script with intermittent NAN results

Post by krypsys »

hello.
running cacti version
after reading this http://penguinman-techtalk.blogspot.com ... thout.html article on using WGET to a custom website to pull the data source values, I wrote an app to data mine some information and post it to an internal website, but have had trouble understanding why some of my variable names are not 'liked' by Cacti - that is, some variables come back with data and other do not and i finally 'dumbed down' my variable names to 'var0, var1, var2' etc and they all work great.

The issue, though, is those variables names are non-descriptive (and my app generates the name).

What i have is an array of IPs in a data pool. My app runs through the data pool and creates the variables names and current connections to that IP - so my final output looked like this:

192_168_1_1p801:12 192_168_1_1:802:27 192_168_1_1:803:35

After my first go at it, the last two worked, but not the first. I troubleshot my code and the script for several hours, then tried:

19216811801:12 19216811802:27 19216811803:35

then my first two worked, but not the third!?

I researched the Cacti docs but couldn't find any 'limits' on what I could use (though, generally, read it has to be alphanumeric and, maybe, less than 17 characters).

Finally, for grins, I changed my output to be
var0:12 var1:27 var3:35

and it works perfectly.

But that lacks descriptors so I have no way of knowing which var is for which IP address.

my code is the same so, at this point, I am convinced this is a 'bug' with Cacti or my lack of understanding what 'requirements' the data output field (for a data input method) for Cacti are.

Any thoughts out there?
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Cacti expects scripts to return data in the format: var0:12 var1:27 var3:35 etc. You can use whatever descriptive names you want. http://docs.cacti.net/ has more details.
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

yes, that is what I thought, which is why the confusion and the post.

please tell me, then, in my example what is wrong with my descriptors (for my output fields)?

what is 'wrong' that Cacti likes this:

var0:15

but NOT this

19216811p801:15

it reports NaN to the graph in the latter example.
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

...well, to be more specific, it worked for some values but not others.
again, to use my examples, it reported data for this:

192_168_1_1p802:15

but not this one
192_168_1_207p803:27

The same code, however, when I changed my sources to

var0:15

and

var1:27

worked as expected.


So, for a recap, when my output looked like this:
192_168_1_1p802:15 192_168_1_207p803:27

it reported data for the first but not the second

and when my data looked like this:
var0:15 var1:27

both worked.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

http://docs.cacti.net/manual:087:3a_adv ... with_cacti

Sounding like a possible bug to me. Without looking into the code, my WAG would be the underscores are throwing it off. Although, not sure how you're building a Data Input method that uses dynamically changing fields (IP & port it looks like), since they need to be hardcoded into the template....

You using spine or cmd.php? What method are you using to print out the results to cacti? Spine is picky and only wants one 'print' statement.

With the cacti.log set to debug logging, you only saw cacti multi-line parse 192_168_1_1p802:15 but not 192_168_1_207p803:27, right? The 'rrdtool update' command also lacked the "27" data?
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

well, my code dynamically builds the output, but once built, it's consistent; that is, every time i query that exact website for that same pool, the fields will always be the same.

so i query the website from my workstation to 'find out' what my data outputs will be, and then I go and build my data input method, hardcoding it into the templates. (which ideally I want to be the ip addresses (so they are descriptive), but the output fields need to be alphanumeric - henceforth the underscores...)

cmd.php - spine isn't even installed.

i post strict text to the website (removed all html tags) and pull it with the wget command from the article i posted first.

sadly, i never even looked at the logs; the graph templates just showed valid data for one of the items and NaN for the other. so i tried a second, same deal, one worked, one NaN.

that was when i removed the undescores but found the same inconsistency and then, again, now that I've gone with 'var1' they all work.

is there a 'more formal process' i should take to submit a bug request? at this point my 'workaround' (rewriting the outputs to be var1, var2, etc) works, but it defeats one of the purposes of my spending time writing the custom app (to dynamically create meaningful variables). i can graph those variables, but I have to manually cross-reference them to make them meaningful.

i really appreciate the conversation and would be willing to re-create whatever i need to in an effort to get a technical solution.

i value the conversation and input. thank you.
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

drats

Post by krypsys »

well, stink, I thought my workaround was 100%, but I just created a new graph (using the var0, var1, outputs) and am receiving a NAN.

is there a way to 'debug' the polling process? I enabled debugging on the lowest level and see this:

03/30/2010 09:24:27 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=1269872667 --end=1269959067 --title="Web Server - LTM Applications - Dallas Internet" --base=1000 --height=120 --width=500 --alt-autoscale-max --lower-limit=0 COMMENT:"From 2010/03/29 09\:24\:27 To 2010/03/30 09\:24\:27\c" COMMENT:" \n" --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:6: --font LEGEND:8: --font UNIT:6: DEF:a="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":Dallas_Primary_Web:AVERAGE DEF:b="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":Dallas_Primary_Web:MAX DEF:c="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":SA_Backup_Web:AVERAGE DEF:d="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":SA_Backup_Web:MAX AREA:a#FFF200FF:"Primary Web" GPRINT:a:LAST:"Current\: %8.2lf %s" GPRINT:a:AVERAGE:"Average\: %8.2lf %s" GPRINT:b:MAX:"Maximum\: %8.2lf %s\n" AREA:c#0000FFFF:"SA Bkup Web":STACK GPRINT:c:LAST:"Current\: %8.2lf %s" GPRINT:c:AVERAGE:"Average\: %8.2lf %s" GPRINT:d:MAX:"Maximum\: %8.2lf %s\n"
03/30/2010 09:24:27 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool info /var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd

but it's not terribly meaningful.

my data input method is:
wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool

and when I run that from the command line of ubuntu I get:
root@ubuntu804beta:~# wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool
var1:19 var2:5

yet my graph looks like the attached.

I have verified my data input methods are setup correctly (var1 and var2) and the data template matches up.

Snap.
Attachments
cactI_image.png
cactI_image.png (25.04 KiB) Viewed 4073 times
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Please provide timeouts and retries on the wget.
If you changed the data input method, it is advised to cli/rebuild_poller_cache.php
Then, please find more hints at the 2nd link of my sig
R.
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

well, shoot...after bouncing the poller_cache, none of my graphs are reporting any data now! stink...

been troubleshooting that for the last hour...any advice? I read quite a few threads and, for grins, installed SPINE and see:
03/30/2010 01:45:02 PM - SYSTEM STATS: Time:1.1204 Method:spine Processes:1 Threads:1 Hosts:3 HostsPerProcess:3 DataSources:9 RRDsProcessed:9

in my cacti log, but no more detail than that...

as for the wget - where do I see timeouts and retries?

i'm going through the second link, too, but don't think any of that matters any more until my graphing starts working again....
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

ok...another update...not sure 'what' i did to stop the graphs from working but after clicking around and checking a bunch of things (including a rookie-move of restarting the entire server), i noticed that the graphs under 'graph management' started collecting again (see snapshot1 pic)

but then the graph on the main site (graphs tab) shows NAN values.

but then when I click the image, I see values!

grr...
Attachments
image showing graph is working
image showing graph is working
snapshot1.jpg (176.15 KiB) Viewed 4026 times
image with NAN values
image with NAN values
snapshot2.jpg (151.84 KiB) Viewed 4026 times
snapshot3.JPG
snapshot3.JPG (153.73 KiB) Viewed 4026 times
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

The cacti.log contains the details we're after for troubleshooting. That WEBLOG stuff just creates noise and IMO should be turned off.

Change the logging level to debug then watch what happens in the log file during a polling cycle. Watch for when your wget script is called, the data gets returned, multi-value line parsing and finally 'rrdtool update' commands issued for each rrd file. No graphing data for some DS field is just a visible symptom of something gone wrong, which the cacit.log will show.

In your Data Input Method, you sure you checked the all boxes in the Output section to update the rrd file?
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

good base-check on the update rrd-flag...thank you for that...often these are simple things....yes, though, it is checked.

and thank you for the very clear and concise description of what to look for in the debug output - that 'description of the 4 things' that happen. it helped me resolve this 'most recent' issue....here is the recap:

Here is what is in the debug output for the one getting the NAN value:

1. WGET GETS CALLED:
03/30/2010 03:30:02 PM - CMDPHP: Poller[0] Host[2] DS[22] CMD: wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool, output: var1:10 var2:0

2. the data gets returned was:
output: var1:10 var2:0

3. Multi-Value Line Parsing
03/30/2010 03:30:03 PM - POLLER: Poller[0] Parsed MULTI output field 'var2:0' [map var2->SA_Backup_Web]

** (NOTE - INDEED, I DO NOT SEE ONE OF THESE FOR 'var1' which is the NAN value I am seeing on the graph!)

4. RRD-TOOL UPDATE
03/30/2010 03:30:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_22.rrd --template SA_Backup_Web 1269981002:0

**

So, then, i'm thinking 'why wouldn't a multi-parse value appear' for that guy? Hmmm...so I went back and looked at the 'data template' (where the cross-reference happens) and my variables are like this:

var1 is associated with Dallas_Primary_Server
var2 is associates with SA_Backup_Web

and i started counting 'characters' and remember reading something about '17 as a limit' to other fields so I changed the Dallas one to read 'Dallas_Web' then (for ease) deleted the data sources (and graphs) and rebuilt them.

And that worked - so, now, I am definitely back to 'feeling good' about the 'workaround' i put in for this original application issue (using var in place of '19216811p801' etc, etc).

so where do I stand now (overall) with this issue?
1. the graphs on the 'main' page (Graphs tab, then click the server) seem to be 'cached' with ones that are 'broken' - not showing any graph data and/or all NAN values. the graphs appear to be working OK under graph management. This is a new problem that I have not found a solution on, yet - even tried deleting and rebuilding all graphs but these exact same 'cached' images appear on the main screen...

2. original 'bug' still not identified. i'd rather my variables be descriptive (19216811p801) for example, but have had issues getting these to work...

I'm going to re-test #2 now, ensuring this same 'too many characters in the output field for the data input method' issue is not the case...
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

SOLVED

Post by krypsys »

UPDATE - after using BSOD2600's direction and discovering the output field names in the data template were too long, I rebuilt my app to use the <[IP]p[PORT]>:<VALUE> parameter (making sure the output field names were less than 17 characters) and my graphs are all reporting values now.

Perhaps this should be a check when creating the data template? Or perhaps more recent versions support longer field names....

And when I checked this morning the graphs on the main page are all updated and looking good.

Thank you for the support.

RECAP:
PROBLEM: some of my data was reporting NaN values, though I felt confident the data input method, template, and graphs were built correctly.

SOLUTION: the 'Internal Data Source Name' field cannot be more than X (maybe 17?) characters.

Cacti accepted the longer values I keyed in which it shouldn't? Or perhaps more recent versions support longer names (Cacti version I am running is 0.8.7b)

I could not find this limitation in the Cacti documentation.

Thank you, community, for the support.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Glad you figured it out.

Cacti 0.8.7b?!? dude, you really should upgrade to the latest of 0.8.7e. Lots of security holes and features added since then...
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

lol! yea, man...i'm running Ubuntu 8.04 and that's the version that installed (apt-get) so what's up with that!?

I'll look into upgrading...

:)

thanks, again.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests