All graphs stopped updating, showing NAN values
Moderators: Developers, Moderators
All graphs stopped updating, showing NAN values
Last week our cacti graphs stopped updating. We are now getting NAN values on over 100 graphs (ALL of them). I've gone through the "Debug NaN's in your graphs" link here and still am not getting any luck. I did create a simple device as a test and configured ucd/net node and was able to poll it's memory, cpu, etc.
http://forums.cacti.net/about15136.html
1. When I set the log file to DEBUG, I don't see anything useful in our logs. No errors. I should mention this is a custom install that a previous employee installed and configured. We are basically just monitoring License servers only. We are also writing to an NFS mount on our file server, not the local /var/www/html/cacti/rrd directory. Here is the only thing I see in the logs with DEBUG enabled:
08/18/2015 11:20:40 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=-86400 --end=-300 --title="bda_3x_BDA_TOKEN" --base=1000 --height=240 --width=1000 --alt-autoscale-max --lower-limit=0 --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:8: --font LEGEND:10: --font UNIT:8: DEF:a="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":inuse:MAX DEF:b="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":capacity:MAX AREA:a#00FF00FF:"inuse" GPRINTLAST:" Current\:%8.0lf" GPRINTMAX:"Maximum\:%8.0lf\n" LINE2:b#FF0000FF:"capacity" GPRINTLAST:"Current\:%8.0lf"
08/18/2015 11:20:40 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=-86400 --end=-300 --title="bda_3x_BDA_TOKEN" --base=1000 --height=240 --width=1000 --alt-autoscale-max --lower-limit=0 --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:8: --font LEGEND:10: --font UNIT:8: DEF:a="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":inuse:MAX DEF:b="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":capacity:MAX AREA:a#00FF00FF:"inuse" GPRINTLAST:" Current\:%8.0lf" GPRINTMAX:"Maximum\:%8.0lf\n" LINE2:b#FF0000FF:"capacity" GPRINTLAST:"Current\:%8.0lf"
2.We are not using SNMP.
3. Debug mode under graph management shows RRDTools says: OK
RRDTool Command:
/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="bda_3x_BDA_TOKEN" \
--base=1000 \
--height=240 \
--width=1000 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="" \
--slope-mode \
--font TITLE:12: \
--font AXIS:8: \
--font LEGEND:10: \
--font UNIT:8: \
DEF:a="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":inuse:MAX \
DEF:b="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":capacity:MAX \
AREA:a#00FF00FF:"inuse" \
GPRINTLAST:" Current\:%8.0lf" \
GPRINTMAX:"Maximum\:%8.0lf\n" \
LINE2:b#FF0000FF:"capacity" \
GPRINTLAST:"Current\:%8.0lf"
RRDTool Says:
OK
4. poller_output in mysql shows 0
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
5. I changed the php.ini memory_limit to 256MB, 512MB, then 1024MB with no luck
6. I did an RRDTOOL fetch and see the nans values after previous data:
rrdtool fetch /tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd MAX
inuse capacity
1439838300: nan nan
1439838600: nan nan
1439838900: nan nan
1439839200: nan nan
1439839500: nan nan
1439839800: nan nan
1439840100: nan nan
1439840400: nan nan
7. I ran an RRDTOOL info and got the following after changing the maximum from 999 to 9999, then 999999 (we aren't hitting the minimum or maximum):
rrdtool info /tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd
filename = "/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd"
rrd_version = "0003"
step = 300
last_update = 1439230566
ds[inuse].type = "GAUGE"
ds[inuse].minimal_heartbeat = 600
ds[inuse].min = 0.0000000000e+00
ds[inuse].max = 9.9900000000e+02
ds[inuse].last_ds = "24"
ds[inuse].value = 1.5931428960e+03
ds[inuse].unknown_sec = 0
ds[capacity].type = "GAUGE"
ds[capacity].minimal_heartbeat = 600
ds[capacity].min = 0.0000000000e+00
ds[capacity].max = 9.9900000000e+02
ds[capacity].last_ds = "60"
ds[capacity].value = 3.9828572400e+03
ds[capacity].unknown_sec = 0
rra[0].cf = "MAX"
rra[0].rows = 210379
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
8. I confirmed I can write to the NFS mount as the directory is 777 now (we are on private network, so not overly concerned about security of permissions)
9. This is NOT an rpm install and only one crontab exists:
crontab -l
*/5 * * * * cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1
System details:
Centos 5.5
php 5.1.6
mysql 5.0.95
cacti version 0.8.7e
rrdtool 1.2.x
Again, this has been running for years without issues. No changes were made that I am aware of. I am running out of ideas. Any help is greatly appreciated!!
http://forums.cacti.net/about15136.html
1. When I set the log file to DEBUG, I don't see anything useful in our logs. No errors. I should mention this is a custom install that a previous employee installed and configured. We are basically just monitoring License servers only. We are also writing to an NFS mount on our file server, not the local /var/www/html/cacti/rrd directory. Here is the only thing I see in the logs with DEBUG enabled:
08/18/2015 11:20:40 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=-86400 --end=-300 --title="bda_3x_BDA_TOKEN" --base=1000 --height=240 --width=1000 --alt-autoscale-max --lower-limit=0 --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:8: --font LEGEND:10: --font UNIT:8: DEF:a="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":inuse:MAX DEF:b="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":capacity:MAX AREA:a#00FF00FF:"inuse" GPRINTLAST:" Current\:%8.0lf" GPRINTMAX:"Maximum\:%8.0lf\n" LINE2:b#FF0000FF:"capacity" GPRINTLAST:"Current\:%8.0lf"
08/18/2015 11:20:40 AM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=-86400 --end=-300 --title="bda_3x_BDA_TOKEN" --base=1000 --height=240 --width=1000 --alt-autoscale-max --lower-limit=0 --vertical-label="" --slope-mode --font TITLE:12: --font AXIS:8: --font LEGEND:10: --font UNIT:8: DEF:a="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":inuse:MAX DEF:b="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":capacity:MAX AREA:a#00FF00FF:"inuse" GPRINTLAST:" Current\:%8.0lf" GPRINTMAX:"Maximum\:%8.0lf\n" LINE2:b#FF0000FF:"capacity" GPRINTLAST:"Current\:%8.0lf"
2.We are not using SNMP.
3. Debug mode under graph management shows RRDTools says: OK
RRDTool Command:
/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="bda_3x_BDA_TOKEN" \
--base=1000 \
--height=240 \
--width=1000 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="" \
--slope-mode \
--font TITLE:12: \
--font AXIS:8: \
--font LEGEND:10: \
--font UNIT:8: \
DEF:a="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":inuse:MAX \
DEF:b="/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd":capacity:MAX \
AREA:a#00FF00FF:"inuse" \
GPRINTLAST:" Current\:%8.0lf" \
GPRINTMAX:"Maximum\:%8.0lf\n" \
LINE2:b#FF0000FF:"capacity" \
GPRINTLAST:"Current\:%8.0lf"
RRDTool Says:
OK
4. poller_output in mysql shows 0
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
5. I changed the php.ini memory_limit to 256MB, 512MB, then 1024MB with no luck
6. I did an RRDTOOL fetch and see the nans values after previous data:
rrdtool fetch /tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd MAX
inuse capacity
1439838300: nan nan
1439838600: nan nan
1439838900: nan nan
1439839200: nan nan
1439839500: nan nan
1439839800: nan nan
1439840100: nan nan
1439840400: nan nan
7. I ran an RRDTOOL info and got the following after changing the maximum from 999 to 9999, then 999999 (we aren't hitting the minimum or maximum):
rrdtool info /tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd
filename = "/tools/cb/license/rrd/bda_3x_BDA_TOKEN.rrd"
rrd_version = "0003"
step = 300
last_update = 1439230566
ds[inuse].type = "GAUGE"
ds[inuse].minimal_heartbeat = 600
ds[inuse].min = 0.0000000000e+00
ds[inuse].max = 9.9900000000e+02
ds[inuse].last_ds = "24"
ds[inuse].value = 1.5931428960e+03
ds[inuse].unknown_sec = 0
ds[capacity].type = "GAUGE"
ds[capacity].minimal_heartbeat = 600
ds[capacity].min = 0.0000000000e+00
ds[capacity].max = 9.9900000000e+02
ds[capacity].last_ds = "60"
ds[capacity].value = 3.9828572400e+03
ds[capacity].unknown_sec = 0
rra[0].cf = "MAX"
rra[0].rows = 210379
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
8. I confirmed I can write to the NFS mount as the directory is 777 now (we are on private network, so not overly concerned about security of permissions)
9. This is NOT an rpm install and only one crontab exists:
crontab -l
*/5 * * * * cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1
System details:
Centos 5.5
php 5.1.6
mysql 5.0.95
cacti version 0.8.7e
rrdtool 1.2.x
Again, this has been running for years without issues. No changes were made that I am aware of. I am running out of ideas. Any help is greatly appreciated!!
- Attachments
-
- This is the one week view after it stopped working showing NaN's.
- Capture2.PNG (27.1 KiB) Viewed 2773 times
-
- This is the 2 week view showing it was working.
- Capture1.PNG (39.83 KiB) Viewed 2773 times
Re: All graphs stopped updating, showing NAN values
That's quite the ancient system Cacti 0.8.8f is the current release, btw.
Change the cacti.log logging level back down to low or medium. I'd stop logging the WEBLOG entries too. Now, is the poller running every 5 minutes? Any errors?
Change the cacti.log logging level back down to low or medium. I'd stop logging the WEBLOG entries too. Now, is the poller running every 5 minutes? Any errors?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: All graphs stopped updating, showing NAN values
Yes, I inherited this install and it's pretty customized as well as important to the company. I didn't want to touch it unless I had to. Guess now is that time. So I changed the logging to medium and disabled the WEBLOG entries and nothing shows up in the logs unless I "su - cactiuser" and then run "/usr/bin/php /var/www/html/cacti/log/poller.php" but I do see that the poller is being run every 5 minutes via cron:
Aug 18 13:35:01 cacti crond[4386]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:40:01 cacti crond[4405]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:45:01 cacti crond[4424]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:50:01 cacti crond[4443]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:55:01 cacti crond[4465]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 14:00:01 cacti crond[4484]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
I am really at a loss here. Any ideas/input is greatly appreciated! I really doubt upgrading to the latest version will fix this issue, but if that is what is recommended I'll take a snapshot and give that a try.
Aug 18 13:35:01 cacti crond[4386]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:40:01 cacti crond[4405]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:45:01 cacti crond[4424]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:50:01 cacti crond[4443]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 13:55:01 cacti crond[4465]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
Aug 18 14:00:01 cacti crond[4484]: (root) CMD (cactiuser /usr/bin/php /var/www/html/cacti/poller.php > /dev/null 2>&1)
I am really at a loss here. Any ideas/input is greatly appreciated! I really doubt upgrading to the latest version will fix this issue, but if that is what is recommended I'll take a snapshot and give that a try.
Re: All graphs stopped updating, showing NAN values
Ok, I took a snapshot of the VM and upgraded to the latest version. I am having the same problem. Here is something odd. When I click on the "system utilities" > "technical support" it only shows 18 data sources (I Just added 2 devices with 9 data sources each). It almost as if it's not seeing these other 1700 that exist. I'm really losing my mind on this one!!!
Re: All graphs stopped updating, showing NAN values
see if there are any db issues and run: /var/www/html/cacti/cli/repair_database.php
SELinux permission issue?
if you manually run usr/bin/php /var/www/html/cacti/poller.php, you should see the output on the console. does it run? errors?
SELinux permission issue?
if you manually run usr/bin/php /var/www/html/cacti/poller.php, you should see the output on the console. does it run? errors?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: All graphs stopped updating, showing NAN values
Thanks for the reply. SElinux is disabled.
I ran the repair db script and poller manually:
[root@cacti ~]# /usr/bin/php /var/www/html/cacti/cli/repair_database.php
Repairing All Cacti Database Tables
Repairing Table -> 'cdef' Successful
Repairing Table -> 'cdef_items' Successful
Repairing Table -> 'colors' Successful
Repairing Table -> 'data_input' Successful
Repairing Table -> 'data_input_data' Successful
Repairing Table -> 'data_input_fields' Successful
Repairing Table -> 'data_local' Successful
Repairing Table -> 'data_template' Successful
Repairing Table -> 'data_template_data' Successful
Repairing Table -> 'data_template_data_rra' Successful
Repairing Table -> 'data_template_rrd' Successful
Repairing Table -> 'graph_local' Successful
Repairing Table -> 'graph_template_input' Successful
Repairing Table -> 'graph_template_input_defs' Successful
Repairing Table -> 'graph_templates' Successful
Repairing Table -> 'graph_templates_gprint' Successful
Repairing Table -> 'graph_templates_graph' Successful
Repairing Table -> 'graph_templates_item' Successful
Repairing Table -> 'graph_tree' Successful
Repairing Table -> 'graph_tree_items' Successful
Repairing Table -> 'host' Successful
Repairing Table -> 'host_graph' Successful
Repairing Table -> 'host_snmp_cache' Successful
Repairing Table -> 'host_snmp_query' Successful
Repairing Table -> 'host_template' Successful
Repairing Table -> 'host_template_graph' Successful
Repairing Table -> 'host_template_snmp_query' Successful
Repairing Table -> 'poller' Successful
Repairing Table -> 'poller_command' Successful
Repairing Table -> 'poller_item' Successful
Repairing Table -> 'poller_output' Successful
Repairing Table -> 'poller_reindex' Successful
Repairing Table -> 'poller_time' Successful
Repairing Table -> 'rra' Successful
Repairing Table -> 'rra_cf' Successful
Repairing Table -> 'settings' Successful
Repairing Table -> 'settings_graphs' Successful
Repairing Table -> 'settings_tree' Successful
Repairing Table -> 'snmp_query' Successful
Repairing Table -> 'snmp_query_graph' Successful
Repairing Table -> 'snmp_query_graph_rrd' Successful
Repairing Table -> 'snmp_query_graph_rrd_sv' Successful
Repairing Table -> 'snmp_query_graph_sv' Successful
Repairing Table -> 'user_auth' Successful
Repairing Table -> 'user_auth_perms' Successful
Repairing Table -> 'user_auth_realm' Successful
Repairing Table -> 'user_log' Successful
Repairing Table -> 'version' Successful
[root@cacti ~]#
[root@cacti ~]# /usr/bin/php /var/www/html/cacti/poller.php
08/19/2015 09:07:57 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '176', Max Runtime '298', Poller Runs: '1'
08/19/2015 09:07:57 AM - POLLER: Poller[0] NOTE: Cron is configured to run too often! The Poller Interval is '300' seconds, with a minimum Cron period of '300' seconds, but only 176 seconds have passed since the poller last ran.
Still no updates to the RRD's.
I ran the repair db script and poller manually:
[root@cacti ~]# /usr/bin/php /var/www/html/cacti/cli/repair_database.php
Repairing All Cacti Database Tables
Repairing Table -> 'cdef' Successful
Repairing Table -> 'cdef_items' Successful
Repairing Table -> 'colors' Successful
Repairing Table -> 'data_input' Successful
Repairing Table -> 'data_input_data' Successful
Repairing Table -> 'data_input_fields' Successful
Repairing Table -> 'data_local' Successful
Repairing Table -> 'data_template' Successful
Repairing Table -> 'data_template_data' Successful
Repairing Table -> 'data_template_data_rra' Successful
Repairing Table -> 'data_template_rrd' Successful
Repairing Table -> 'graph_local' Successful
Repairing Table -> 'graph_template_input' Successful
Repairing Table -> 'graph_template_input_defs' Successful
Repairing Table -> 'graph_templates' Successful
Repairing Table -> 'graph_templates_gprint' Successful
Repairing Table -> 'graph_templates_graph' Successful
Repairing Table -> 'graph_templates_item' Successful
Repairing Table -> 'graph_tree' Successful
Repairing Table -> 'graph_tree_items' Successful
Repairing Table -> 'host' Successful
Repairing Table -> 'host_graph' Successful
Repairing Table -> 'host_snmp_cache' Successful
Repairing Table -> 'host_snmp_query' Successful
Repairing Table -> 'host_template' Successful
Repairing Table -> 'host_template_graph' Successful
Repairing Table -> 'host_template_snmp_query' Successful
Repairing Table -> 'poller' Successful
Repairing Table -> 'poller_command' Successful
Repairing Table -> 'poller_item' Successful
Repairing Table -> 'poller_output' Successful
Repairing Table -> 'poller_reindex' Successful
Repairing Table -> 'poller_time' Successful
Repairing Table -> 'rra' Successful
Repairing Table -> 'rra_cf' Successful
Repairing Table -> 'settings' Successful
Repairing Table -> 'settings_graphs' Successful
Repairing Table -> 'settings_tree' Successful
Repairing Table -> 'snmp_query' Successful
Repairing Table -> 'snmp_query_graph' Successful
Repairing Table -> 'snmp_query_graph_rrd' Successful
Repairing Table -> 'snmp_query_graph_rrd_sv' Successful
Repairing Table -> 'snmp_query_graph_sv' Successful
Repairing Table -> 'user_auth' Successful
Repairing Table -> 'user_auth_perms' Successful
Repairing Table -> 'user_auth_realm' Successful
Repairing Table -> 'user_log' Successful
Repairing Table -> 'version' Successful
[root@cacti ~]#
[root@cacti ~]# /usr/bin/php /var/www/html/cacti/poller.php
08/19/2015 09:07:57 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '176', Max Runtime '298', Poller Runs: '1'
08/19/2015 09:07:57 AM - POLLER: Poller[0] NOTE: Cron is configured to run too often! The Poller Interval is '300' seconds, with a minimum Cron period of '300' seconds, but only 176 seconds have passed since the poller last ran.
Still no updates to the RRD's.
Re: All graphs stopped updating, showing NAN values
Moving to the linux forum for more expert troubleshooting...
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Who is online
Users browsing this forum: No registered users and 4 guests