Problem with data gathering
Moderators: Developers, Moderators
Problem with data gathering
Hi all.
Guys i have the strange problem with gathering of data.
How does my monitoring system work? - simply, each minute cron takes statistics I need (by using scripts) and puts the values to a certain file (counter).
Then snmp oids (where each oid is compared to a proper counter) takes the value I need, from a certain counter (text file).
So looks very obvious and may work without any problem (now we don't include probable problems with IP connectivity - latency, loss, etc.)
The Cacti is installed on the virtual machine (proxmox container).
When I'm trying to get data I need (from the machine where cacti is installed) by snmpwalk/snmpget - there is no problem, data is always present.
But, there is a big but, when cacti is gathering the same data using the same OID's (that I recently used to check whether it works fine), 80% of its attempts is: Value: U
But I'm sure that there is no problem with network/cacti-devices/scripts/and everything else except cacti-server's part.
Also, I tried to gather the same data form cli by using: spine bin file, and took a detailed look on the output (again and again), so that I noticed, that the "U" value is present, not always, but frequently.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Ok, now please take a look at my cacti configuration:
Devices config:
SNMP timeout - 5000
Maximum OID's Per Get Request - 30
Poller config:
Maximum Concurrent Poller Processes: 1 (tried to change to 20 - unworkable)
Maximum Threads per Process: 20 (tried to change to 100 - unworkable)
Number of PHP Script Servers: 8 (tried to change to 10 - unworkable)
Script and Script Server Timeout Value: 115 (seconds)
The Maximum SNMP OID's Per SNMP Get Request: 20 (tried to change to 100 - unworkable)
General cacti config:
SNMP timeout - 5000 (unnecessary, because it doesn't matter if we set the same in device config)
SNMP Retries - 2 (tried to change to 5 - unworkable)
Also I want to say that again, I confident that there is no problem with snmpwalk/snmpget - it works fine (checked).
I noticed, that regularly, "U" value appears at the end of gathering list of cacti log file (log file specified to show HIGH-detailed information) - problem is connected with cacti-devices which gives data last of all.
But time which poller need to gather all information is about 0.7 seconds (only!).
Tried to use default cmd.php, but problem is present.
Spine version - 0.8.8h
Cacti version - 0.8.8a
RRDtool version - 1.4.7
P.S.: There was a time when spine version was 0.8.8a and there was no problem (but we used to use cmd.php on that period).
What should I check more?
Guys i have the strange problem with gathering of data.
How does my monitoring system work? - simply, each minute cron takes statistics I need (by using scripts) and puts the values to a certain file (counter).
Then snmp oids (where each oid is compared to a proper counter) takes the value I need, from a certain counter (text file).
So looks very obvious and may work without any problem (now we don't include probable problems with IP connectivity - latency, loss, etc.)
The Cacti is installed on the virtual machine (proxmox container).
When I'm trying to get data I need (from the machine where cacti is installed) by snmpwalk/snmpget - there is no problem, data is always present.
But, there is a big but, when cacti is gathering the same data using the same OID's (that I recently used to check whether it works fine), 80% of its attempts is: Value: U
But I'm sure that there is no problem with network/cacti-devices/scripts/and everything else except cacti-server's part.
Also, I tried to gather the same data form cli by using: spine bin file, and took a detailed look on the output (again and again), so that I noticed, that the "U" value is present, not always, but frequently.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Ok, now please take a look at my cacti configuration:
Devices config:
SNMP timeout - 5000
Maximum OID's Per Get Request - 30
Poller config:
Maximum Concurrent Poller Processes: 1 (tried to change to 20 - unworkable)
Maximum Threads per Process: 20 (tried to change to 100 - unworkable)
Number of PHP Script Servers: 8 (tried to change to 10 - unworkable)
Script and Script Server Timeout Value: 115 (seconds)
The Maximum SNMP OID's Per SNMP Get Request: 20 (tried to change to 100 - unworkable)
General cacti config:
SNMP timeout - 5000 (unnecessary, because it doesn't matter if we set the same in device config)
SNMP Retries - 2 (tried to change to 5 - unworkable)
Also I want to say that again, I confident that there is no problem with snmpwalk/snmpget - it works fine (checked).
I noticed, that regularly, "U" value appears at the end of gathering list of cacti log file (log file specified to show HIGH-detailed information) - problem is connected with cacti-devices which gives data last of all.
But time which poller need to gather all information is about 0.7 seconds (only!).
Tried to use default cmd.php, but problem is present.
Spine version - 0.8.8h
Cacti version - 0.8.8a
RRDtool version - 1.4.7
P.S.: There was a time when spine version was 0.8.8a and there was no problem (but we used to use cmd.php on that period).
What should I check more?
- phalek
- Developer
- Posts: 2838
- Joined: Thu Jan 31, 2008 6:39 am
- Location: Kressbronn, Germany
- Contact:
Re: Problem with data gathering
Please post your "SYSTEM STAT" data from several polls ( Polling times, number of hosts, data sources ... ).
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Re: Problem with data gathering
Of course:
05/13/2016 01:20:05 PM - SYSTEM THOLD STATS: Time:0.2689 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 01:20:04 PM - SPINE: Poller[0] Time: 3.4398 s, Threads: 80, Hosts: 14
05/13/2016 01:20:04 PM - SYSTEM STATS: Time:3.4486 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:20:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:15:05 PM - SYSTEM THOLD STATS: Time:0.2914 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 01:15:05 PM - SPINE: Poller[0] Time: 3.8765 s, Threads: 80, Hosts: 14
05/13/2016 01:15:05 PM - SYSTEM STATS: Time:3.8850 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:15:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:10:04 PM - SPINE: Poller[0] Time: 3.2832 s, Threads: 80, Hosts: 14
05/13/2016 01:10:04 PM - SYSTEM STATS: Time:3.2919 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:10:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:05:05 PM - SPINE: Poller[0] Time: 3.9796 s, Threads: 80, Hosts: 14
05/13/2016 01:05:05 PM - SYSTEM STATS: Time:3.9884 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:05:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:20:05 PM - SYSTEM THOLD STATS: Time:0.2689 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 01:20:04 PM - SPINE: Poller[0] Time: 3.4398 s, Threads: 80, Hosts: 14
05/13/2016 01:20:04 PM - SYSTEM STATS: Time:3.4486 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:20:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:15:05 PM - SYSTEM THOLD STATS: Time:0.2914 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 01:15:05 PM - SPINE: Poller[0] Time: 3.8765 s, Threads: 80, Hosts: 14
05/13/2016 01:15:05 PM - SYSTEM STATS: Time:3.8850 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:15:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:10:04 PM - SPINE: Poller[0] Time: 3.2832 s, Threads: 80, Hosts: 14
05/13/2016 01:10:04 PM - SYSTEM STATS: Time:3.2919 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:10:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 01:05:05 PM - SPINE: Poller[0] Time: 3.9796 s, Threads: 80, Hosts: 14
05/13/2016 01:05:05 PM - SYSTEM STATS: Time:3.9884 Method:spine Processes:1 Threads:80 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 01:05:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
- phalek
- Developer
- Posts: 2838
- Joined: Thu Jan 31, 2008 6:39 am
- Location: Kressbronn, Germany
- Contact:
Re: Problem with data gathering
Ok,
Get your "threads" down to twice the number of your CPU cores. Then install spine which matches your Cacti version.
Get your "threads" down to twice the number of your CPU cores. Then install spine which matches your Cacti version.
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Re: Problem with data gathering
I've just upgrade cacti yesterday to ver. 0.8.8h
So now spine ver. is 0.8.8h and cacti ver. is 0.8.8h too.
What did you mean, when you asked me to get my "threads" down?
I understood that you mean "Maximum Threads per Process". Or did you mean value "Number of Collection Threads" in device config?
So now spine ver. is 0.8.8h and cacti ver. is 0.8.8h too.
What did you mean, when you asked me to get my "threads" down?
I understood that you mean "Maximum Threads per Process". Or did you mean value "Number of Collection Threads" in device config?
- phalek
- Developer
- Posts: 2838
- Joined: Thu Jan 31, 2008 6:39 am
- Location: Kressbronn, Germany
- Contact:
Re: Problem with data gathering
Maximum Threads per Process
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Re: Problem with data gathering
I've just reduced it to 25, but nothing has changed yet.
Unknown values are still being here.
Statistics:
05/13/2016 02:30:06 PM - SYSTEM THOLD STATS: Time:0.2575 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:30:06 PM - SYSTEM STATS: Time:4.0478 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:30:06 PM - SPINE: Poller[0] Time: 4.0368 s, Threads: 25, Hosts: 14
05/13/2016 02:30:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 02:25:04 PM - SYSTEM THOLD STATS: Time:0.3059 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:25:03 PM - SPINE: Poller[0] Time: 2.6084 s, Threads: 25, Hosts: 14
05/13/2016 02:25:03 PM - SYSTEM STATS: Time:2.6176 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:25:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 02:20:05 PM - SYSTEM THOLD STATS: Time:0.2525 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:20:05 PM - SYSTEM STATS: Time:4.2168 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:20:05 PM - SPINE: Poller[0] Time: 4.2072 s, Threads: 25, Hosts: 14
05/13/2016 02:20:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
What should I check more?
Unknown values are still being here.
Statistics:
05/13/2016 02:30:06 PM - SYSTEM THOLD STATS: Time:0.2575 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:30:06 PM - SYSTEM STATS: Time:4.0478 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:30:06 PM - SPINE: Poller[0] Time: 4.0368 s, Threads: 25, Hosts: 14
05/13/2016 02:30:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 02:25:04 PM - SYSTEM THOLD STATS: Time:0.3059 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:25:03 PM - SPINE: Poller[0] Time: 2.6084 s, Threads: 25, Hosts: 14
05/13/2016 02:25:03 PM - SYSTEM STATS: Time:2.6176 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:25:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
05/13/2016 02:20:05 PM - SYSTEM THOLD STATS: Time:0.2525 Tholds:23 TotalHosts:13 DownHosts:0 NewDownHosts:0
05/13/2016 02:20:05 PM - SYSTEM STATS: Time:4.2168 Method:spine Processes:1 Threads:25 Hosts:14 HostsPerProcess:14 DataSources:190 RRDsProcessed:190
05/13/2016 02:20:05 PM - SPINE: Poller[0] Time: 4.2072 s, Threads: 25, Hosts: 14
05/13/2016 02:20:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
What should I check more?
- phalek
- Developer
- Posts: 2838
- Joined: Thu Jan 31, 2008 6:39 am
- Location: Kressbronn, Germany
- Contact:
Re: Problem with data gathering
you could switch to cmd.php and check if there's a difference.
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Re: Problem with data gathering
I made an effort, to change spine to cmd.php, but it didn't help me (cacti was already upgraded to 'h' versions).
Re: Problem with data gathering
I've reinstalled cacti on another machine (physical - not vm), but noticed, that this problem is actual.
: - (
: - (
Re: Problem with data gathering
I've found that my problem is directly connected with CPU usage of certain cacti-device.
So it's nice that this device is VM and I'm able to increase core quantity.
So it's nice that this device is VM and I'm able to increase core quantity.
Who is online
Users browsing this forum: justprintout and 2 guests