[solved]Never ending Story - NaN Values in almost all graphs

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

[solved]Never ending Story - NaN Values in almost all graphs

Post by um3n »

Hello Guys,

i have some trouble with my cactiserver.

Yesterday i run into the problem, that the pollerprocess was terminated by time, because of

Code: Select all

03/13/2015 11:33:00 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.

I changed the Pollerprocess to run only all 5 Minutes. Since then the Poller seems to run in normal way, but the most of my graphs are printing NaN now.

I followed the guide from here, but everything seems fine..
I checked the scripts manually and that give me some values... but that values does not find the into the graphs.

Code: Select all

[root@cacti /usr/share/cacti/site] # /usr/bin/perl /usr/local/bin/a3-monitoring.pl -u rmi://lpm-04.prod.local:9100/BHT-UserServer-1-Prod-01
ClientSessionsaktiv:2 Speichernutzung:38 Benutzeraktiv:21 Speichergesamt:2800746496 Benutzergesamt:24 Speicherbenutzt:1091567616 Speicherfrei:1708130304 Speichermaximal:3816816640 ClientSessionsgesam:2 Mandanten:1 DBSessionsaktiv:5 Threads:71 DBSessionsgesamt:5 Transaktionen:627932
Here is the Snippet from the Logfile of a graph where i can see NaN...

Code: Select all

03/18/2015 09:32:17 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/lib/cacti/rra/53/3511.rrd --template Benutzergesamt:Speichernutzung:DBSessionsaktiv:Speichermaximal:Speicherfrei:ClientSessionsaktiv:Speichergesamt:Benutzeraktiv:Transaktionen:Threads:Mandanten:Speicherbenutzt:DBSessionsgesamt:ClientSessionsgesam 1426667489:25:45:5:3816816640:1336934400:2:2444230656:21:627298:63:1:1107296256:6:2
This is the Output from rrdtool fetch

Code: Select all

1426660200: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426660560: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426660920: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426661280: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426661640: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426662000: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
1426662360: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
1426662720: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426663080: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426663440: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426663800: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426664160: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426664520: 1,7000000000e+01 2,5176309760e+09 5,0000000000e+00 4,9250000000e+01 7,7187500000e-01 3,8168166400e+09 0,0000000000e+00 1,0000000000e+00 1,2623282176e+09 5,0000000000e+00 1,2542541824e+09 5,9550000000e+01 1,9000000000e+01 0,0000000000e+00
1426664880: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426665240: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426665600: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426665960: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426666320: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426666680: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426667040: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 0,0000000000e+00
1426667400: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 1,6333333333e+00
1426667760: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan 2,0000000000e+00
1426668120: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
This is the output from rrdtool info

Code: Select all

[root@cacti /usr/share/cacti/site] # rrdtool info /var/lib/cacti/rra/53/3511.rrd
filename = "/var/lib/cacti/rra/53/3511.rrd"
rrd_version = "0003"
step = 60
last_update = 1426668003
header_size = 25856
ds[Benutzeraktiv].index = 0
ds[Benutzeraktiv].type = "GAUGE"
ds[Benutzeraktiv].minimal_heartbeat = 120
ds[Benutzeraktiv].min = 0,0000000000e+00
ds[Benutzeraktiv].max = 6,5000000000e+04
ds[Benutzeraktiv].last_ds = "22"
ds[Benutzeraktiv].value = NaN
ds[Benutzeraktiv].unknown_sec = 3
ds[Benutzergesamt].index = 12
ds[Benutzergesamt].type = "GAUGE"
ds[Benutzergesamt].minimal_heartbeat = 120
ds[Benutzergesamt].min = 0,0000000000e+00
ds[Benutzergesamt].max = 6,5000000000e+04
ds[Benutzergesamt].last_ds = "24"
ds[Benutzergesamt].value = NaN
ds[Benutzergesamt].unknown_sec = 3
I also have a bunch of

Code: Select all

03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wiki.dbh.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.32'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wiki.dbh.local', and OID:'.1.3.6.1.2.1.25.2.3.1.5.32'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wiki.dbh.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.33'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wiki.dbh.local', and OID:'.1.3.6.1.2.1.25.2.3.1.5.33'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'zhosting8.dbh.de', and OID:'.1.3.6.1.2.1.25.2.3.1.6.8'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'zhosting18.prod.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.8'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'zhosting21-db-slave.prod.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.8'
03/18/2015 09:14:24 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'zhosting8-db-slave.prod.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.8'
03/18/2015 09:14:25 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wadis-bhv-db-slave.prod.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.63'
03/18/2015 09:14:25 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wadis-bhv-db-slave.prod.local', and OID:'.1.3.6.1.2.1.25.2.3.1.5.63'
03/18/2015 09:14:26 AM - CMDPHP: Poller[0] WARNING: SNMP Get Timeout for Host:'wadis-whv-db-slave.prod.local', and OID:'.1.3.6.1.2.1.25.2.3.1.6.63'
in the logfile

I also checked this poller_output in my mysqldb

Code: Select all

Database changed
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
General Information
Operating System: Ubuntu 14.04.1 LTS (3.13.0-34-generic)
Webserver: Server version: Apache/2.4.7 (Ubuntu) Server built: Jul 22 2014 14:36:38
Cacti: 0.8.8b
Spine: SPINE 0.8.8b
MySQL: 5.5.38-0ubuntu0.14.04.1
PHP: 5.5.9-1ubuntu4.3
RRDTool (Cygwin): 1.4.7
Net-SNMP: NET-SNMP version: 5.7.2
Cygwin (cygwin1.dll version):
Plugin Architecture:3.1
Hosts: 329
Graphs: 12016
Data Sources: Script/Command: 2187
SNMP: 1116
SNMP Query: 6918
Script Query: 123
Script - Script Server (PHP): 78
Script Query - Script Server: 1719
Total: 12141
Poller Information
Interval 300
Type SPINE 0.8.8b Copyright 2002-2013 by The Cacti Group
Items Action[0]: 12117
Action[1]: 2187
Action[2]: 3422
Total: 17726
Concurrent Processes 4
Max Threads 30
PHP Servers 9
Script Timeout 80
Max OID 25
Last Run Statistics Time:71.7355 Method:spine Processes:4 Threads:30 Hosts:323 HostsPerProcess:81 DataSources:17726 RRDsProcessed:11607
PHP Information
PHP SNMP Installed
max_execution_time 180
memory_limit: 512M
If you need some more informations, please let me know...
Attachments
cacti.zip
(1.12 MiB) Downloaded 108 times
Last edited by um3n on Fri Mar 20, 2015 12:56 am, edited 1 time in total.
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
cigamit
Developer
Posts: 3369
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: Never ending Story - NaN Values in almost all graphs

Post by cigamit »

If you were doing 1 minute polling before (and 1 minute graphs) and then changed the poller to 5 minutes, then yes, you are going to have all NaNs. Your RRDs would still be expecting data every minute, and when they don't get it (heartbeat) they log everything as NaNs.
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: Never ending Story - NaN Values in almost all graphs

Post by um3n »

Thank you for your respone :)

Okay... that sounds very simple ^^

How could i change their expectations?
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
cigamit
Developer
Posts: 3369
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: Never ending Story - NaN Values in almost all graphs

Post by cigamit »

Its not a simple thing to do, and most likely means you would have to delete any RRDs associated with the data templates that expect 1 minute data, thus losing your historical data.

A much easier approach is to switch back to 1 minute, and try to fix the actual issue you were seeing (Poller running over the allotted time).
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: Never ending Story - NaN Values in almost all graphs

Post by um3n »

Hey,

at the moment everything runs fine... I am at the limit of 60 seconds.
So... in some cases the Poller will terminated.

How can I find the best settings for the Poller?
At the moment there run one poller process and a lot of tasks per process.
The scriptserver timeout value is by 15 seconds.

Its only 323 hosts... and there are some hosts more... I am afraid that we run into the pollerproblem again.

The old instance runs with cmd.php and 64 processes, I build that new one with spine...

It would be great, when you have some tipps for me.


.... Maybe i should create a new topic... because this topic is solved....
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
cigamit
Developer
Posts: 3369
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: Never ending Story - NaN Values in almost all graphs

Post by cigamit »

Concurrent Processes 4
Max Threads 30
PHP Servers 9
Script Timeout 80
Max OID 25
I would change the scriptserver timeout to 5.
How many processors are in this Cacti box? What is the load on the box during polling? If its a high load, you may want to drop the number of concurrent processes down to 2.

You could also be having a Disk I/O issue. You may try out the boost plugin (use the memory tables) to see if that is the issue.
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: Never ending Story - NaN Values in almost all graphs

Post by um3n »

cigamit wrote:I would change the scriptserver timeout to 5.
How many processors are in this Cacti box? What is the load on the box during polling? If its a high load, you may want to drop the number of concurrent processes down to 2.

You could also be having a Disk I/O issue. You may try out the boost plugin (use the memory tables) to see if that is the issue.
Yeah... thats true... the Disk I/O is a problem... i will try the boost plugin.

ok... ok... this values not the current one...
Concurrent Processes 1
Max Threads 60
PHP Servers 7
Script Timeout 15
Max OID 25
Last Run Statistics Time:55.7070 Method:spine Processes:1 Threads:60 Hosts:323 HostsPerProcess:323 DataSources:10347 RRDsProcessed:7543
A snapshot from the top output

Code: Select all

top - 16:25:22 up 1 day,  2:02,  1 user,  load average: 20,02, 21,25, 21,40
Aufgaben: 262 total,  16 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu0  : 77,7 be, 21,6 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,7 hi,  0,0 si,  0,0 st
%Cpu1  : 69,4 be, 30,2 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,3 hi,  0,0 si,  0,0 st
%Cpu2  : 70,0 be, 30,0 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  : 63,5 be, 36,5 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:   8173480 total,  5682064 used,  2491416 free,   149496 buffers
KiB Swap:  3554424 total,      416 used,  3554008 free.  4660188 cached Mem
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
cigamit
Developer
Posts: 3369
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: Never ending Story - NaN Values in almost all graphs

Post by cigamit »

Ya, your load average is way too high. I would change the concurrent processes to 2, and the threads to 15.

The boost plugin will probably help more then anything. You just need to make sure you import the memory tables, not the mysaim. You may also have to tweak your mysql config to allow for bigger memory tables.
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: Never ending Story - NaN Values in almost all graphs

Post by um3n »

I will try the boostplugin tomorrow ... i changed the poller values today.

Thank you and you will here from me tomorrow.
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: Never ending Story - NaN Values in almost all graphs

Post by um3n »

Hey...

I changed the Cactipollersettings to two concurrent processes and the maximum threads to 15.
I also installed the boost plugin...

Code: Select all

top - 09:20:31 up 1 day, 18:57,  1 user,  load average: 23,72, 21,95, 21,49
Aufgaben: 268 total,  12 running, 255 sleeping,   0 stopped,   1 zombie
%Cpu0  : 86,5 be, 12,8 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,8 hi,  0,0 si,  0,0 st
%Cpu1  : 82,7 be, 17,3 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  : 66,4 be, 33,6 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  : 70,2 be, 29,8 sy,  0,0 ni,  0,0 un,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:   8173480 total,  7869608 used,   303872 free,   147536 buffers
KiB Swap:  3554424 total,     2084 used,  3552340 free.  6576000 cached Mem
You see, the situation is the same as yesterday. The poller is is working on the timelimit.

I also run into the "pollertable is full" problem... Iam working on that.
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
cigamit
Developer
Posts: 3369
Joined: Thu Apr 07, 2005 3:29 pm
Location: B/CS Texas
Contact:

Re: Never ending Story - NaN Values in almost all graphs

Post by cigamit »

Please post a screen shot of the Boost Status page (under Utilities).
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: Never ending Story - NaN Values in almost all graphs

Post by um3n »

Hey,

my team decided to put some more CPU into the machine... now its running like charme :(

Thank you for your help.

To the boost plugin runs well now, but i have some other issues with some selfmaded graphs... i will create an extra thread for that issue.
I played with the configuration and the mysqlsettings to fix the boostplugin.

Doesnt matter anymore... :)

Thank you for your time cigamit, your are the best ;)
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests