Cacti graphs not populating on some poller cycles
Moderators: Developers, Moderators
Cacti graphs not populating on some poller cycles
Hi all,
I am running a Cacti server of which the specs are mentioned below. Recently, after experiencing too much CPU utilization on my CPU I started using SPINE. But now 'some' of my graphs are not being populated on each polling cycle.( refer to the attached image)
My configs are as shown below. Please let me know what I should change to get a smooth curve in my graphs.
Thanks in advance,
D.
Configs
General Information
Date Fri, 16 Nov 2012 08:51:06 +0400
Cacti Version 0.8.8a
Cacti OS win32
SNMP Version NET-SNMP version: 5.6.1.1
RRDTool Version RRDTool 1.4.x
Hosts 75
Graphs 1708
Data Sources Script/Command: 45
SNMP: 334
SNMP Query: 788
Script Query: 626
Script - Script Server (PHP): 15
Script Query - Script Server: 1
Total: 1809
Poller Information
Interval 300
Type SPINE 0.8.8a Copyright 2002-2012 by The Cacti Group
Items Action[0]: 1659
Action[1]: 911
Action[2]: 14
Total: 2584
Concurrent Processes 1
Max Threads 4
PHP Servers 1
Script Timeout 25
Max OID 10
Last Run Statistics Time:78.5150 Method:spine Processes:1 Threads:4 Hosts:53 HostsPerProcess:53 DataSources:2584 RRDsProcessed:1224
PHP Information
PHP Version 5.3.10
PHP OS WINNT
PHP uname Windows NT USER-PC 6.1 build 7601 (Windows 7 Business Edition Service Pack 1) i586
PHP SNMP Installed
max_execution_time 30
memory_limit 128M
I am running a Cacti server of which the specs are mentioned below. Recently, after experiencing too much CPU utilization on my CPU I started using SPINE. But now 'some' of my graphs are not being populated on each polling cycle.( refer to the attached image)
My configs are as shown below. Please let me know what I should change to get a smooth curve in my graphs.
Thanks in advance,
D.
Configs
General Information
Date Fri, 16 Nov 2012 08:51:06 +0400
Cacti Version 0.8.8a
Cacti OS win32
SNMP Version NET-SNMP version: 5.6.1.1
RRDTool Version RRDTool 1.4.x
Hosts 75
Graphs 1708
Data Sources Script/Command: 45
SNMP: 334
SNMP Query: 788
Script Query: 626
Script - Script Server (PHP): 15
Script Query - Script Server: 1
Total: 1809
Poller Information
Interval 300
Type SPINE 0.8.8a Copyright 2002-2012 by The Cacti Group
Items Action[0]: 1659
Action[1]: 911
Action[2]: 14
Total: 2584
Concurrent Processes 1
Max Threads 4
PHP Servers 1
Script Timeout 25
Max OID 10
Last Run Statistics Time:78.5150 Method:spine Processes:1 Threads:4 Hosts:53 HostsPerProcess:53 DataSources:2584 RRDsProcessed:1224
PHP Information
PHP Version 5.3.10
PHP OS WINNT
PHP uname Windows NT USER-PC 6.1 build 7601 (Windows 7 Business Edition Service Pack 1) i586
PHP SNMP Installed
max_execution_time 30
memory_limit 128M
- Attachments
-
- nogoodgraph.jpg (54.94 KiB) Viewed 5020 times
Re: Cacti graphs not populating on some poller cycles
did you follow the debugging guide to further troubleshoot this issue yet?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Cacti graphs not populating on some poller cycles
Hi BSOD,
Thanks for your reply.
Exactly which guide are you talking about? I've followed a few but still couldn't get it fixed. Looking at my cacti logs yesterday I found out that I'm getting the following error for many devices :
11/17/2012 04:46:17 PM - SPINE: Poller[0] Host[49] ERROR: Empty result [aaa.bbb.ccc.ddd]: 'C:\php\php.exe -q C:\Apache2\htdocs\cacti\scripts\mikrotik_wireless_interfaces.php blahblah aaa.bbb.ccc.ddd get ifInSignal 00:00:00:00:00:00'
But when I run this script in the command line I get the result. I am pretty sure this is the root cause of my problem. Do you have any idea how I can resolve it?
Thanks.
D.
Thanks for your reply.
Exactly which guide are you talking about? I've followed a few but still couldn't get it fixed. Looking at my cacti logs yesterday I found out that I'm getting the following error for many devices :
11/17/2012 04:46:17 PM - SPINE: Poller[0] Host[49] ERROR: Empty result [aaa.bbb.ccc.ddd]: 'C:\php\php.exe -q C:\Apache2\htdocs\cacti\scripts\mikrotik_wireless_interfaces.php blahblah aaa.bbb.ccc.ddd get ifInSignal 00:00:00:00:00:00'
But when I run this script in the command line I get the result. I am pretty sure this is the root cause of my problem. Do you have any idea how I can resolve it?
Thanks.
D.
Re: Cacti graphs not populating on some poller cycles
Ah, so it's only the mikrotik graphs which are breaking? Does that mikrotik_wireless_interfaces.php have an snmp timeout value which possibly should get increased?
Looks like you already found the debug guide - http://docs.cacti.net/manual:088:4_help.2_debugging
Looks like you already found the debug guide - http://docs.cacti.net/manual:088:4_help.2_debugging
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Cacti graphs not populating on some poller cycles
Hi BSOD,
Yes I did go through that debugging guide.
I went through the mikrotik_wireless_interfaces.php for any snmp time out value but I couldn't find anything related to it. I attached that file for your referrence please take a look at it yourself.
I was using the spine command line with verbosity=8, and I got the following results for a random device I am trying to monitor:
From the above output we can see that the poller_output table is filled with 'U' values. What I dont get is, how it is working at one time but not the other. (thus giving gaps in the graph). Could you tell me which table in the cacti database holds information about the data collected during a polling cycle?
Another thing is for the above device I've set the snmp time out value to 1000ms. Please correct me if I'm wrong but the above device was polled in 0.12s right? so that means the poller did not wait 1000ms to time out that host, is that correct? If this is the case, what other methods are available for me to change the snmp time out value.
Cacti is driving me crazy !!
D
Yes I did go through that debugging guide.
I went through the mikrotik_wireless_interfaces.php for any snmp time out value but I couldn't find anything related to it. I attached that file for your referrence please take a look at it yourself.
I was using the spine command line with verbosity=8, and I got the following results for a random device I am trying to monitor:
Code: Select all
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is1
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEVDBG: SQL:'SELECT id, hostname, snmp_community, snmp_version, snmp_username, snmp_password, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context, snmp_port, snmp_timeout, max_oids, availability_method, ping_method, ping_port, ping_timeout, ping_retries,status, status_event_count, status_fail_date, status_rec_date, status_last_error, min_time, max_time, cur_time, avg_time, total_polls, failed_polls, availability FROM host WHERE id=116'
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] DEBUG: Entering TCP Ping 11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] DEBUG: TCP Host Alive, Try Count:1, Time:15.9998 ms
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] DEBUG: Entering SNMP Ping 11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] PING Result: TCP: Host is Alive
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] SNMP Result: Host responded to SNMP
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEVDBG: SQL:'UPDATE host SET status='3', status_event_count='0', status_fail_date='0000-00-00 00:00:00', status_rec_date='0000-00-00 00:00:00', status_last_error='', min_time='0.000000', max_time='116.999990', cur_time='15.499945', avg_time='10.407015', total_polls='3174', failed_polls='0', availability='100.0000' WHERE id='116''
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEVDBG: SQL:'SELECT data_query_id, action, op, assert_value, arg1 FROM poller_reindex WHERE host_id=116'
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] TH[1] Host has no information for recache.
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEVDBG: SQL:'SELECT snmp_port, count(snmp_port) FROM poller_item WHERE host_id=116 AND rrd_next_step < 0 GROUP BY snmp_port '
11/20/2012 10:35:51 PM - SPINE: Poller[0] DEVDBG: SQL:'SELECT action, hostname, snmp_community, snmp_version, snmp_username, snmp_password, rrd_name, rrd_path, arg1, arg2, arg3, local_data_id, rrd_num, snmp_port, snmp_timeout, snmp_auth_protocol, snmp_priv_passphrase, snmp_priv_protocol, snmp_context FROM poller_item WHERE host_id=116 and rrd_next_step <=0 ORDER by snmp_port '
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] TH[1] NOTE: There are '5' Polling Items for this Host
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] TH[1] DS[12092] WARNING: SNMP timeout detected [1000 ms], ignoring host '10.0.121.29'
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] TH[1] DS[12092] SNMP: v1: 10.0.121.29, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.7, value: U
11/20/2012 10:35:51 PM - SPINE: Poller[0] Host[116] TH[1] DS[12092] WARNING: SNMP timeout detected [1000 ms], ignoring host '10.0.121.29'
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12092] SNMP: v1: 10.0.121.29, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.7, value: U
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12085] WARNING: SNMP timeout detected [1000 ms], ignoring host '10.0.121.29'
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12085] SNMP: v1: 10.0.121.29, dsname: signal_in, oid: .1.3.6.1.4.1.14988.1.1.1.1.1.4.9, value: U
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12082] WARNING: SNMP timeout detected [1000 ms], ignoring host '10.0.121.29'
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12082] SNMP: v1: 10.0.121.29, dsname: signal_in, oid: .1.3.6.1.4.1.14988.1.1.1.1.1.4.6, value: U
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12077] WARNING: SNMP timeout detected [1000 ms], ignoring host '10.0.121.29'
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] DS[12077] SNMP: v1: 10.0.121.29, dsname: signal_in, oid: .1.3.6.1.4.1.14988.1.1.1.1.1.4.1, value: U
11/20/2012 10:35:52 PM - SPINE: Poller[0] DEVDBG: SQL:'INSERT INTO poller_output(local_data_id, rrd_name, time, output) VALUES (12092,'traffic_out','2012-11-20 22:35:51','U'),(12092,'traffic_in','2012-11-20 22:35:51','U'),(12085,'signal_in','2012-11-20 22:35:51','U'),(12082,'signal_in','2012-11-20 22:35:51','U'),(12077,'signal_in','2012-11-20 22:35:51','U') ON DUPLICATE KEY UPDATE output=VALUES(output)'
11/20/2012 10:35:52 PM - SPINE: Poller[0] DEVDBG: SQL:'UPDATE poller_item SET rrd_next_step=IF((rrd_next_step-60)>=0, (rrd_next_step-60), (rrd_step-60)) WHERE host_id=116'
11/20/2012 10:35:52 PM - SPINE: Poller[0] Host[116] TH[1] Total Time: 0.12 Seconds
Another thing is for the above device I've set the snmp time out value to 1000ms. Please correct me if I'm wrong but the above device was polled in 0.12s right? so that means the poller did not wait 1000ms to time out that host, is that correct? If this is the case, what other methods are available for me to change the snmp time out value.
Cacti is driving me crazy !!
D
- Attachments
-
- mikrotik_wireless_interfaces.txt
- (5.14 KiB) Downloaded 140 times
Re: Cacti graphs not populating on some poller cycles
mudmud wrote:I went through the mikrotik_wireless_interfaces.php for any snmp time out value but I couldn't find anything related to it.
Not being familiar with that script, it was more of a question of the hosts snmp timeout value is passed into it. I'll assume yes.
No, it appears the device is taking longer than 1000ms to respond to the snmp query from Spine, which is timing out and thus returning no data. Tried increasing the snmp timeout to 3-5 seconds? sure there isnt any anti-dos protection getting triggered?mudmud wrote:Another thing is for the above device I've set the snmp time out value to 1000ms.
No, it means thread 1 took 0.12s to perform all of those polling operations against Host[116].mudmud wrote:Please correct me if I'm wrong but the above device was polled in 0.12s right? so that means the poller did not wait 1000ms to time out that host, is that correct?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Cacti graphs not populating on some poller cycles
So the time it 'waits' for the snmp to time out is not counted as part of the polling against the same host?BSOD2600 wrote:mudmud wrote:
Please correct me if I'm wrong but the above device was polled in 0.12s right? so that means the poller did not wait 1000ms to time out that host, is that correct?
No, it means thread 1 took 0.12s to perform all of those polling operations against Host[116].
BSOD, do you have any explaination for :
Another thing, I increased the snmp time out value to 5 seconds, but still it is getting timed out. There is a firewall that I am going through but I have already added exceptions to allow SNMP traffic. Do you think that it may be somehow dropping the packets? Because I am sure the device is within reach as SNMP community is configured there and ICMP ping takes less than 3ms.From the above output we can see that the poller_output table is filled with 'U' values. What I dont get is, how it is working at one time but not the other. (thus giving gaps in the graph). Could you tell me which table in the cacti database holds information about the data collected during a polling cycle?
Thanks in advance for your help.
Re: Cacti graphs not populating on some poller cycles
I'm not really sure. Spine could be paralleling the requests so overall it only took 1200ms.mudmud wrote:So the time it 'waits' for the snmp to time out is not counted as part of the polling against the same host?
http://docs.cacti.net/manual:088:99_reference.db_design look in the poller_cache.mudmud wrote: BSOD, do you have any explaination for :
From the above output we can see that the poller_output table is filled with 'U' values. What I dont get is, how it is working at one time but not the other. (thus giving gaps in the graph). Could you tell me which table in the cacti database holds information about the data collected during a polling cycle?
Could run wireshark on the cacti sever and/or packet capture on the firewall to really check what is going on with the SNMP traffic. If you're only having this problem with a specific class of device, I'd be inclined to believe it's the device and not Cacti.mudmud wrote:Another thing, I increased the snmp time out value to 5 seconds, but still it is getting timed out. There is a firewall that I am going through but I have already added exceptions to allow SNMP traffic. Do you think that it may be somehow dropping the packets? Because I am sure the device is within reach as SNMP community is configured there and ICMP ping takes less than 3ms.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Cacti graphs not populating on some poller cycles
Thanks BSOD...I will monitor the traffic from the cacti server and get back to the forum...
Who is online
Users browsing this forum: No registered users and 2 guests