[solved] script with intermittent NAN results
Moderators: Developers, Moderators
[solved] script with intermittent NAN results
hello.
running cacti version
after reading this http://penguinman-techtalk.blogspot.com ... thout.html article on using WGET to a custom website to pull the data source values, I wrote an app to data mine some information and post it to an internal website, but have had trouble understanding why some of my variable names are not 'liked' by Cacti - that is, some variables come back with data and other do not and i finally 'dumbed down' my variable names to 'var0, var1, var2' etc and they all work great.
The issue, though, is those variables names are non-descriptive (and my app generates the name).
What i have is an array of IPs in a data pool. My app runs through the data pool and creates the variables names and current connections to that IP - so my final output looked like this:
192_168_1_1p801:12 192_168_1_1:802:27 192_168_1_1:803:35
After my first go at it, the last two worked, but not the first. I troubleshot my code and the script for several hours, then tried:
19216811801:12 19216811802:27 19216811803:35
then my first two worked, but not the third!?
I researched the Cacti docs but couldn't find any 'limits' on what I could use (though, generally, read it has to be alphanumeric and, maybe, less than 17 characters).
Finally, for grins, I changed my output to be
var0:12 var1:27 var3:35
and it works perfectly.
But that lacks descriptors so I have no way of knowing which var is for which IP address.
my code is the same so, at this point, I am convinced this is a 'bug' with Cacti or my lack of understanding what 'requirements' the data output field (for a data input method) for Cacti are.
Any thoughts out there?
running cacti version
after reading this http://penguinman-techtalk.blogspot.com ... thout.html article on using WGET to a custom website to pull the data source values, I wrote an app to data mine some information and post it to an internal website, but have had trouble understanding why some of my variable names are not 'liked' by Cacti - that is, some variables come back with data and other do not and i finally 'dumbed down' my variable names to 'var0, var1, var2' etc and they all work great.
The issue, though, is those variables names are non-descriptive (and my app generates the name).
What i have is an array of IPs in a data pool. My app runs through the data pool and creates the variables names and current connections to that IP - so my final output looked like this:
192_168_1_1p801:12 192_168_1_1:802:27 192_168_1_1:803:35
After my first go at it, the last two worked, but not the first. I troubleshot my code and the script for several hours, then tried:
19216811801:12 19216811802:27 19216811803:35
then my first two worked, but not the third!?
I researched the Cacti docs but couldn't find any 'limits' on what I could use (though, generally, read it has to be alphanumeric and, maybe, less than 17 characters).
Finally, for grins, I changed my output to be
var0:12 var1:27 var3:35
and it works perfectly.
But that lacks descriptors so I have no way of knowing which var is for which IP address.
my code is the same so, at this point, I am convinced this is a 'bug' with Cacti or my lack of understanding what 'requirements' the data output field (for a data input method) for Cacti are.
Any thoughts out there?
Cacti expects scripts to return data in the format: var0:12 var1:27 var3:35 etc. You can use whatever descriptive names you want. http://docs.cacti.net/ has more details.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
...well, to be more specific, it worked for some values but not others.
again, to use my examples, it reported data for this:
192_168_1_1p802:15
but not this one
192_168_1_207p803:27
The same code, however, when I changed my sources to
var0:15
and
var1:27
worked as expected.
So, for a recap, when my output looked like this:
192_168_1_1p802:15 192_168_1_207p803:27
it reported data for the first but not the second
and when my data looked like this:
var0:15 var1:27
both worked.
again, to use my examples, it reported data for this:
192_168_1_1p802:15
but not this one
192_168_1_207p803:27
The same code, however, when I changed my sources to
var0:15
and
var1:27
worked as expected.
So, for a recap, when my output looked like this:
192_168_1_1p802:15 192_168_1_207p803:27
it reported data for the first but not the second
and when my data looked like this:
var0:15 var1:27
both worked.
http://docs.cacti.net/manual:087:3a_adv ... with_cacti
Sounding like a possible bug to me. Without looking into the code, my WAG would be the underscores are throwing it off. Although, not sure how you're building a Data Input method that uses dynamically changing fields (IP & port it looks like), since they need to be hardcoded into the template....
You using spine or cmd.php? What method are you using to print out the results to cacti? Spine is picky and only wants one 'print' statement.
With the cacti.log set to debug logging, you only saw cacti multi-line parse 192_168_1_1p802:15 but not 192_168_1_207p803:27, right? The 'rrdtool update' command also lacked the "27" data?
Sounding like a possible bug to me. Without looking into the code, my WAG would be the underscores are throwing it off. Although, not sure how you're building a Data Input method that uses dynamically changing fields (IP & port it looks like), since they need to be hardcoded into the template....
You using spine or cmd.php? What method are you using to print out the results to cacti? Spine is picky and only wants one 'print' statement.
With the cacti.log set to debug logging, you only saw cacti multi-line parse 192_168_1_1p802:15 but not 192_168_1_207p803:27, right? The 'rrdtool update' command also lacked the "27" data?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
well, my code dynamically builds the output, but once built, it's consistent; that is, every time i query that exact website for that same pool, the fields will always be the same.
so i query the website from my workstation to 'find out' what my data outputs will be, and then I go and build my data input method, hardcoding it into the templates. (which ideally I want to be the ip addresses (so they are descriptive), but the output fields need to be alphanumeric - henceforth the underscores...)
cmd.php - spine isn't even installed.
i post strict text to the website (removed all html tags) and pull it with the wget command from the article i posted first.
sadly, i never even looked at the logs; the graph templates just showed valid data for one of the items and NaN for the other. so i tried a second, same deal, one worked, one NaN.
that was when i removed the undescores but found the same inconsistency and then, again, now that I've gone with 'var1' they all work.
is there a 'more formal process' i should take to submit a bug request? at this point my 'workaround' (rewriting the outputs to be var1, var2, etc) works, but it defeats one of the purposes of my spending time writing the custom app (to dynamically create meaningful variables). i can graph those variables, but I have to manually cross-reference them to make them meaningful.
i really appreciate the conversation and would be willing to re-create whatever i need to in an effort to get a technical solution.
i value the conversation and input. thank you.
so i query the website from my workstation to 'find out' what my data outputs will be, and then I go and build my data input method, hardcoding it into the templates. (which ideally I want to be the ip addresses (so they are descriptive), but the output fields need to be alphanumeric - henceforth the underscores...)
cmd.php - spine isn't even installed.
i post strict text to the website (removed all html tags) and pull it with the wget command from the article i posted first.
sadly, i never even looked at the logs; the graph templates just showed valid data for one of the items and NaN for the other. so i tried a second, same deal, one worked, one NaN.
that was when i removed the undescores but found the same inconsistency and then, again, now that I've gone with 'var1' they all work.
is there a 'more formal process' i should take to submit a bug request? at this point my 'workaround' (rewriting the outputs to be var1, var2, etc) works, but it defeats one of the purposes of my spending time writing the custom app (to dynamically create meaningful variables). i can graph those variables, but I have to manually cross-reference them to make them meaningful.
i really appreciate the conversation and would be willing to re-create whatever i need to in an effort to get a technical solution.
i value the conversation and input. thank you.
drats
well, stink, I thought my workaround was 100%, but I just created a new graph (using the var0, var1, outputs) and am receiving a NAN.
is there a way to 'debug' the polling process? I enabled debugging on the lowest level and see this:
03/30/2010 09:24:27 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=1269872667 --end=1269959067 --title="Web Server - LTM Applications - Dallas Internet" --base=1000 --height=120 --width=500 --alt-autoscale-max --lower-limit=0 COMMENT:"From 2010/03/29 09\:24\:27 To 2010/03/30 09\:24\:27\c" COMMENT:" \n" --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:6: --font LEGEND:8: --font UNIT:6: DEF:a="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":Dallas_Primary_Web:AVERAGE DEF:b="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":Dallas_Primary_Web:MAX DEF:c="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":SA_Backup_Web:AVERAGE DEF:d="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":SA_Backup_Web:MAX AREA:a#FFF200FF:"Primary Web" GPRINTLAST:"Current\: %8.2lf %s" GPRINTAVERAGE:"Average\: %8.2lf %s" GPRINTMAX:"Maximum\: %8.2lf %s\n" AREA:c#0000FFFF:"SA Bkup Web":STACK GPRINT:c:LAST:"Current\: %8.2lf %s" GPRINT:c:AVERAGE:"Average\: %8.2lf %s" GPRINT:d:MAX:"Maximum\: %8.2lf %s\n"
03/30/2010 09:24:27 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool info /var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd
but it's not terribly meaningful.
my data input method is:
wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool
and when I run that from the command line of ubuntu I get:
root@ubuntu804beta:~# wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool
var1:19 var2:5
yet my graph looks like the attached.
I have verified my data input methods are setup correctly (var1 and var2) and the data template matches up.
Snap.
is there a way to 'debug' the polling process? I enabled debugging on the lowest level and see this:
03/30/2010 09:24:27 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=1269872667 --end=1269959067 --title="Web Server - LTM Applications - Dallas Internet" --base=1000 --height=120 --width=500 --alt-autoscale-max --lower-limit=0 COMMENT:"From 2010/03/29 09\:24\:27 To 2010/03/30 09\:24\:27\c" COMMENT:" \n" --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:6: --font LEGEND:8: --font UNIT:6: DEF:a="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":Dallas_Primary_Web:AVERAGE DEF:b="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":Dallas_Primary_Web:MAX DEF:c="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":SA_Backup_Web:AVERAGE DEF:d="/var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd":SA_Backup_Web:MAX AREA:a#FFF200FF:"Primary Web" GPRINTLAST:"Current\: %8.2lf %s" GPRINTAVERAGE:"Average\: %8.2lf %s" GPRINTMAX:"Maximum\: %8.2lf %s\n" AREA:c#0000FFFF:"SA Bkup Web":STACK GPRINT:c:LAST:"Current\: %8.2lf %s" GPRINT:c:AVERAGE:"Average\: %8.2lf %s" GPRINT:d:MAX:"Maximum\: %8.2lf %s\n"
03/30/2010 09:24:27 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool info /var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_21.rrd
but it's not terribly meaningful.
my data input method is:
wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool
and when I run that from the command line of ubuntu I get:
root@ubuntu804beta:~# wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool
var1:19 var2:5
yet my graph looks like the attached.
I have verified my data input methods are setup correctly (var1 and var2) and the data template matches up.
Snap.
- Attachments
-
- cactI_image.png (25.04 KiB) Viewed 4074 times
well, shoot...after bouncing the poller_cache, none of my graphs are reporting any data now! stink...
been troubleshooting that for the last hour...any advice? I read quite a few threads and, for grins, installed SPINE and see:
03/30/2010 01:45:02 PM - SYSTEM STATS: Time:1.1204 Method:spine Processes:1 Threads:1 Hosts:3 HostsPerProcess:3 DataSources:9 RRDsProcessed:9
in my cacti log, but no more detail than that...
as for the wget - where do I see timeouts and retries?
i'm going through the second link, too, but don't think any of that matters any more until my graphing starts working again....
been troubleshooting that for the last hour...any advice? I read quite a few threads and, for grins, installed SPINE and see:
03/30/2010 01:45:02 PM - SYSTEM STATS: Time:1.1204 Method:spine Processes:1 Threads:1 Hosts:3 HostsPerProcess:3 DataSources:9 RRDsProcessed:9
in my cacti log, but no more detail than that...
as for the wget - where do I see timeouts and retries?
i'm going through the second link, too, but don't think any of that matters any more until my graphing starts working again....
ok...another update...not sure 'what' i did to stop the graphs from working but after clicking around and checking a bunch of things (including a rookie-move of restarting the entire server), i noticed that the graphs under 'graph management' started collecting again (see snapshot1 pic)
but then the graph on the main site (graphs tab) shows NAN values.
but then when I click the image, I see values!
grr...
but then the graph on the main site (graphs tab) shows NAN values.
but then when I click the image, I see values!
grr...
- Attachments
-
- image showing graph is working
- snapshot1.jpg (176.15 KiB) Viewed 4027 times
-
- image with NAN values
- snapshot2.jpg (151.84 KiB) Viewed 4027 times
-
- snapshot3.JPG (153.73 KiB) Viewed 4027 times
The cacti.log contains the details we're after for troubleshooting. That WEBLOG stuff just creates noise and IMO should be turned off.
Change the logging level to debug then watch what happens in the log file during a polling cycle. Watch for when your wget script is called, the data gets returned, multi-value line parsing and finally 'rrdtool update' commands issued for each rrd file. No graphing data for some DS field is just a visible symptom of something gone wrong, which the cacit.log will show.
In your Data Input Method, you sure you checked the all boxes in the Output section to update the rrd file?
Change the logging level to debug then watch what happens in the log file during a polling cycle. Watch for when your wget script is called, the data gets returned, multi-value line parsing and finally 'rrdtool update' commands issued for each rrd file. No graphing data for some DS field is just a visible symptom of something gone wrong, which the cacit.log will show.
In your Data Input Method, you sure you checked the all boxes in the Output section to update the rrd file?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
good base-check on the update rrd-flag...thank you for that...often these are simple things....yes, though, it is checked.
and thank you for the very clear and concise description of what to look for in the debug output - that 'description of the 4 things' that happen. it helped me resolve this 'most recent' issue....here is the recap:
Here is what is in the debug output for the one getting the NAN value:
1. WGET GETS CALLED:
03/30/2010 03:30:02 PM - CMDPHP: Poller[0] Host[2] DS[22] CMD: wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool, output: var1:10 var2:0
2. the data gets returned was:
output: var1:10 var2:0
3. Multi-Value Line Parsing
03/30/2010 03:30:03 PM - POLLER: Poller[0] Parsed MULTI output field 'var2:0' [map var2->SA_Backup_Web]
** (NOTE - INDEED, I DO NOT SEE ONE OF THESE FOR 'var1' which is the NAN value I am seeing on the graph!)
4. RRD-TOOL UPDATE
03/30/2010 03:30:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_22.rrd --template SA_Backup_Web 1269981002:0
**
So, then, i'm thinking 'why wouldn't a multi-parse value appear' for that guy? Hmmm...so I went back and looked at the 'data template' (where the cross-reference happens) and my variables are like this:
var1 is associated with Dallas_Primary_Server
var2 is associates with SA_Backup_Web
and i started counting 'characters' and remember reading something about '17 as a limit' to other fields so I changed the Dallas one to read 'Dallas_Web' then (for ease) deleted the data sources (and graphs) and rebuilt them.
And that worked - so, now, I am definitely back to 'feeling good' about the 'workaround' i put in for this original application issue (using var in place of '19216811p801' etc, etc).
so where do I stand now (overall) with this issue?
1. the graphs on the 'main' page (Graphs tab, then click the server) seem to be 'cached' with ones that are 'broken' - not showing any graph data and/or all NAN values. the graphs appear to be working OK under graph management. This is a new problem that I have not found a solution on, yet - even tried deleting and rebuilding all graphs but these exact same 'cached' images appear on the main screen...
2. original 'bug' still not identified. i'd rather my variables be descriptive (19216811p801) for example, but have had issues getting these to work...
I'm going to re-test #2 now, ensuring this same 'too many characters in the output field for the data input method' issue is not the case...
and thank you for the very clear and concise description of what to look for in the debug output - that 'description of the 4 things' that happen. it helped me resolve this 'most recent' issue....here is the recap:
Here is what is in the debug output for the one getting the NAN value:
1. WGET GETS CALLED:
03/30/2010 03:30:02 PM - CMDPHP: Poller[0] Host[2] DS[22] CMD: wget --quiet --no-cache -O - http://certificate/ltmquery/default.asp ... https_pool, output: var1:10 var2:0
2. the data gets returned was:
output: var1:10 var2:0
3. Multi-Value Line Parsing
03/30/2010 03:30:03 PM - POLLER: Poller[0] Parsed MULTI output field 'var2:0' [map var2->SA_Backup_Web]
** (NOTE - INDEED, I DO NOT SEE ONE OF THESE FOR 'var1' which is the NAN value I am seeing on the graph!)
4. RRD-TOOL UPDATE
03/30/2010 03:30:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/lib/cacti/rra/web_server_-_ltm_applications_dallas_primary_web_22.rrd --template SA_Backup_Web 1269981002:0
**
So, then, i'm thinking 'why wouldn't a multi-parse value appear' for that guy? Hmmm...so I went back and looked at the 'data template' (where the cross-reference happens) and my variables are like this:
var1 is associated with Dallas_Primary_Server
var2 is associates with SA_Backup_Web
and i started counting 'characters' and remember reading something about '17 as a limit' to other fields so I changed the Dallas one to read 'Dallas_Web' then (for ease) deleted the data sources (and graphs) and rebuilt them.
And that worked - so, now, I am definitely back to 'feeling good' about the 'workaround' i put in for this original application issue (using var in place of '19216811p801' etc, etc).
so where do I stand now (overall) with this issue?
1. the graphs on the 'main' page (Graphs tab, then click the server) seem to be 'cached' with ones that are 'broken' - not showing any graph data and/or all NAN values. the graphs appear to be working OK under graph management. This is a new problem that I have not found a solution on, yet - even tried deleting and rebuilding all graphs but these exact same 'cached' images appear on the main screen...
2. original 'bug' still not identified. i'd rather my variables be descriptive (19216811p801) for example, but have had issues getting these to work...
I'm going to re-test #2 now, ensuring this same 'too many characters in the output field for the data input method' issue is not the case...
SOLVED
UPDATE - after using BSOD2600's direction and discovering the output field names in the data template were too long, I rebuilt my app to use the <[IP]p[PORT]>:<VALUE> parameter (making sure the output field names were less than 17 characters) and my graphs are all reporting values now.
Perhaps this should be a check when creating the data template? Or perhaps more recent versions support longer field names....
And when I checked this morning the graphs on the main page are all updated and looking good.
Thank you for the support.
RECAP:
PROBLEM: some of my data was reporting NaN values, though I felt confident the data input method, template, and graphs were built correctly.
SOLUTION: the 'Internal Data Source Name' field cannot be more than X (maybe 17?) characters.
Cacti accepted the longer values I keyed in which it shouldn't? Or perhaps more recent versions support longer names (Cacti version I am running is 0.8.7b)
I could not find this limitation in the Cacti documentation.
Thank you, community, for the support.
Perhaps this should be a check when creating the data template? Or perhaps more recent versions support longer field names....
And when I checked this morning the graphs on the main page are all updated and looking good.
Thank you for the support.
RECAP:
PROBLEM: some of my data was reporting NaN values, though I felt confident the data input method, template, and graphs were built correctly.
SOLUTION: the 'Internal Data Source Name' field cannot be more than X (maybe 17?) characters.
Cacti accepted the longer values I keyed in which it shouldn't? Or perhaps more recent versions support longer names (Cacti version I am running is 0.8.7b)
I could not find this limitation in the Cacti documentation.
Thank you, community, for the support.
Glad you figured it out.
Cacti 0.8.7b?!? dude, you really should upgrade to the latest of 0.8.7e. Lots of security holes and features added since then...
Cacti 0.8.7b?!? dude, you really should upgrade to the latest of 0.8.7e. Lots of security holes and features added since then...
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Who is online
Users browsing this forum: No registered users and 2 guests