Problem with data gathering

donat · Post by **donat** » Thu May 12, 2016 7:03 am

Hi all.

Guys i have the strange problem with gathering of data.

How does my monitoring system work? - simply, each minute cron takes statistics I need (by using scripts) and puts the values to a certain file (counter).
Then snmp oids (where each oid is compared to a proper counter) takes the value I need, from a certain counter (text file).

So looks very obvious and may work without any problem (now we don't include probable problems with IP connectivity - latency, loss, etc.)

The Cacti is installed on the virtual machine (proxmox container).
When I'm trying to get data I need (from the machine where cacti is installed) by snmpwalk/snmpget - there is no problem, data is always present.
But, there is a big but, when cacti is gathering the same data using the same OID's (that I recently used to check whether it works fine), 80% of its attempts is: Value: U

But I'm sure that there is no problem with network/cacti-devices/scripts/and everything else except cacti-server's part.

Also, I tried to gather the same data form cli by using: spine bin file, and took a detailed look on the output (again and again), so that I noticed, that the "U" value is present, not always, but frequently.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Ok, now please take a look at my cacti configuration:

Devices config:
SNMP timeout - 5000
Maximum OID's Per Get Request - 30

Poller config:
Maximum Concurrent Poller Processes: 1 (tried to change to 20 - unworkable)
Maximum Threads per Process: 20 (tried to change to 100 - unworkable)
Number of PHP Script Servers: 8 (tried to change to 10 - unworkable)
Script and Script Server Timeout Value: 115 (seconds)
The Maximum SNMP OID's Per SNMP Get Request: 20 (tried to change to 100 - unworkable)

General cacti config:
SNMP timeout - 5000 (unnecessary, because it doesn't matter if we set the same in device config)
SNMP Retries - 2 (tried to change to 5 - unworkable)

Also I want to say that again, I confident that there is no problem with snmpwalk/snmpget - it works fine (checked).
I noticed, that regularly, "U" value appears at the end of gathering list of cacti log file (log file specified to show HIGH-detailed information) - problem is connected with cacti-devices which gives data last of all.
But time which poller need to gather all information is about 0.7 seconds (only!).

Tried to use default cmd.php, but problem is present.

Spine version - 0.8.8h
Cacti version - 0.8.8a
RRDtool version - 1.4.7

P.S.: There was a time when spine version was 0.8.8a and there was no problem (but we used to use cmd.php on that period).

What should I check more?

phalek · Post by **phalek** » Fri May 13, 2016 4:57 am

Please post your "SYSTEM STAT" data from several polls ( Polling times, number of hosts, data sources ... ).

donat · Post by **donat** » Fri May 13, 2016 5:21 am

Of course:

05/13/2016 01:20:05 PM - SYSTEM THOLD STATS: Time:0.2689 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 01:20:04 PM - SPINE: Poller[0] Time: 3.4398 s, Threads: 80, Hosts: 14
05/13/2016 01:20:04 PM - SYSTEM STATS: Time:3.4486 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:20:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

05/13/2016 01:15:05 PM - SYSTEM THOLD STATS: Time:0.2914 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 01:15:05 PM - SPINE: Poller[0] Time: 3.8765 s, Threads: 80, Hosts: 14
05/13/2016 01:15:05 PM - SYSTEM STATS: Time:3.8850 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:15:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

05/13/2016 01:10:04 PM - SPINE: Poller[0] Time: 3.2832 s, Threads: 80, Hosts: 14
05/13/2016 01:10:04 PM - SYSTEM STATS: Time:3.2919 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:10:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

05/13/2016 01:05:05 PM - SPINE: Poller[0] Time: 3.9796 s, Threads: 80, Hosts: 14
05/13/2016 01:05:05 PM - SYSTEM STATS: Time:3.9884 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:05:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

phalek · Post by **phalek** » Fri May 13, 2016 5:26 am

Ok,

Get your "threads" down to twice the number of your CPU cores. Then install spine which matches your Cacti version.

donat · Post by **donat** » Fri May 13, 2016 5:38 am

I've just upgrade cacti yesterday to ver. 0.8.8h
So now spine ver. is 0.8.8h and cacti ver. is 0.8.8h too.

What did you mean, when you asked me to get my "threads" down?
I understood that you mean "Maximum Threads per Process". Or did you mean value "Number of Collection Threads" in device config?

phalek · Post by **phalek** » Fri May 13, 2016 6:00 am

Maximum Threads per Process

donat · Post by **donat** » Fri May 13, 2016 6:33 am

I've just reduced it to 25, but nothing has changed yet.
Unknown values are still being here.

Statistics:

05/13/2016 02:30:06 PM - SYSTEM THOLD STATS: Time:0.2575 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:30:06 PM - SYSTEM STATS: Time:4.0478 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:30:06 PM - SPINE: Poller[0] Time: 4.0368 s, Threads: 25, Hosts: 14
05/13/2016 02:30:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

05/13/2016 02:25:04 PM - SYSTEM THOLD STATS: Time:0.3059 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:25:03 PM - SPINE: Poller[0] Time: 2.6084 s, Threads: 25, Hosts: 14
05/13/2016 02:25:03 PM - SYSTEM STATS: Time:2.6176 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:25:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

05/13/2016 02:20:05 PM - SYSTEM THOLD STATS: Time:0.2525 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:20:05 PM - SYSTEM STATS: Time:4.2168 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:20:05 PM - SPINE: Poller[0] Time: 4.2072 s, Threads: 25, Hosts: 14
05/13/2016 02:20:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'

What should I check more?

phalek · Post by **phalek** » Fri May 13, 2016 6:41 am

you could switch to cmd.php and check if there's a difference.

donat · Post by **donat** » Fri May 13, 2016 6:43 am

I made an effort, to change spine to cmd.php, but it didn't help me (cacti was already upgraded to 'h' versions).

donat · Post by **donat** » Sun May 15, 2016 5:54 am

I've reinstalled cacti on another machine (physical - not vm), but noticed, that this problem is actual.

: - (

donat · Post by **donat** » Sun May 15, 2016 1:17 pm

I've found that my problem is directly connected with CPU usage of certain cacti-device.
So it's nice that this device is VM and I'm able to increase core quantity.

Cacti

Problem with data gathering

Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Re: Problem with data gathering

Who is online