[SOLVED] cacti 0.8.7a / spine 0.8.7b - graph gaps
Moderators: Developers, Moderators
[SOLVED] cacti 0.8.7a / spine 0.8.7b - graph gaps
Hello,
I was hoping someone could help point me in the right direction.
I have a recently upgraded/large cacti instance where I'm seeing huge gaps in my graphs. I'm polling 265 hosts with 8669 data sources on a 1 minute interval. I'm using spine 0.8.7b for the poller.
Initially I had my threads per process set to 16 and I was seeing gaps in graphs, but no errors in the log. I changed my threads count to 2, and now I see - "ERROR: Spine Timed Out While Processing Hosts Internal in the log" at the end of a poll cycle.
What's interesting is that I also see "PING Result: ICMP: Host is Alive" messages in the log although SNMP is set for downed host detection.
Any ideas what might be going on here? Thanks.
My config details below:
Date Tue, 11 Dec 2007 10:04:39 -0600
Cacti Version 0.8.7a
Cacti OS win32
SNMP Version net-snmp
RRDTool Version RRDTool 1.2.x
Hosts 265
Graphs 5294
Data Sources SNMP: 5949
SNMP Query: 2720
Total: 8669
Poller Information
Interval 60
Type spine
Items Action[0]: 17501
Total: 17501
Concurrent Processes 1
Max Threads 2
PHP Servers 10
Script Timeout 30
Max OID 30
Last Run Statistics Time:12.8818 Method:spine Processes:1 Threads:2 Hosts:265 HostsPerProcess:265 DataSources:17501 RRDsProcessed:615
I was hoping someone could help point me in the right direction.
I have a recently upgraded/large cacti instance where I'm seeing huge gaps in my graphs. I'm polling 265 hosts with 8669 data sources on a 1 minute interval. I'm using spine 0.8.7b for the poller.
Initially I had my threads per process set to 16 and I was seeing gaps in graphs, but no errors in the log. I changed my threads count to 2, and now I see - "ERROR: Spine Timed Out While Processing Hosts Internal in the log" at the end of a poll cycle.
What's interesting is that I also see "PING Result: ICMP: Host is Alive" messages in the log although SNMP is set for downed host detection.
Any ideas what might be going on here? Thanks.
My config details below:
Date Tue, 11 Dec 2007 10:04:39 -0600
Cacti Version 0.8.7a
Cacti OS win32
SNMP Version net-snmp
RRDTool Version RRDTool 1.2.x
Hosts 265
Graphs 5294
Data Sources SNMP: 5949
SNMP Query: 2720
Total: 8669
Poller Information
Interval 60
Type spine
Items Action[0]: 17501
Total: 17501
Concurrent Processes 1
Max Threads 2
PHP Servers 10
Script Timeout 30
Max OID 30
Last Run Statistics Time:12.8818 Method:spine Processes:1 Threads:2 Hosts:265 HostsPerProcess:265 DataSources:17501 RRDsProcessed:615
Re: cacti 0.8.7a / spine 0.8.7b - graph gaps
You do know it's per-device now. SNMP is set for that device in question, right?briang wrote:What's interesting is that I also see "PING Result: ICMP: Host is Alive" messages in the log although SNMP is set for downed host detection.
As for the gaps and spine, I've noticed a lot of users reporting that. Will have to wait for TheWitness to return to comment.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
After testing, I believe my graph gaps are due to poller wraps. I'm configured with a 5 min scheduled task and a 1 min poll interval. My latest stats show 288 secs to complete a poll.
Does anyone know if these metrics are typical for windows? My server is a w2k3, 2 x 2.33ghz dual core with 4gb ram. All of my tests are snmp...
Any ideas to improve performance?
12/11/2007 01:49:59 PM - SYSTEM STATS: Time:288.9060 Method:spine Processes:8 Threads:16 Hosts:265 HostsPerProcess:34 DataSources:17501 RRDsProcessed:8056
Does anyone know if these metrics are typical for windows? My server is a w2k3, 2 x 2.33ghz dual core with 4gb ram. All of my tests are snmp...
Any ideas to improve performance?
12/11/2007 01:49:59 PM - SYSTEM STATS: Time:288.9060 Method:spine Processes:8 Threads:16 Hosts:265 HostsPerProcess:34 DataSources:17501 RRDsProcessed:8056
Look through the announcement forum for the cacti metrics thread; it'll give you a good idea what others are running.briang wrote:Does anyone know if these metrics are typical for windows? My server is a w2k3, 2 x 2.33ghz dual core with 4gb ram. All of my tests are snmp...
Try tweaking the number of processes and threads for spine. Also enable the query cache for mysql and make sure its big enough to hold all your data. Lastly, might look into using the Boost plugin.briang wrote:Any ideas to improve performance?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
I'll try all of the above - thanks. Just one more question as I'm unclear to the inner workings of the poller. In the cacti log I see the below for one poll run.
Running the scheduled task as the logged in user, I can see the cmd window for spine running - which corresponds to the first entry. What exactly is happening between the spine time entry of 46.8120 and the stats entry of 134.5982?
I'm trying to account for the ~87 secs, as the rrd updates seem to be complete by the time the spine entry is written to the log.
Thanks
12/11/2007 05:13:47 PM - SPINE: Poller[0] Time: 46.8120 s, Threads: 8, Hosts: 26
5
12/11/2007 05:15:14 PM - SYSTEM STATS: Time:134.5982 Method:spine Processes:1 Threads:8 Hosts:265 HostsPerProcess:265 DataSources:13658 RRDsProcessed:4589
Running the scheduled task as the logged in user, I can see the cmd window for spine running - which corresponds to the first entry. What exactly is happening between the spine time entry of 46.8120 and the stats entry of 134.5982?
I'm trying to account for the ~87 secs, as the rrd updates seem to be complete by the time the spine entry is written to the log.
Thanks
12/11/2007 05:13:47 PM - SPINE: Poller[0] Time: 46.8120 s, Threads: 8, Hosts: 26
5
12/11/2007 05:15:14 PM - SYSTEM STATS: Time:134.5982 Method:spine Processes:1 Threads:8 Hosts:265 HostsPerProcess:265 DataSources:13658 RRDsProcessed:4589
Turn the cacti logging level to high or debug to see what cacti is doing under the covers.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
After debugging it turns out that the actual rrd updates were taking the ~85 secs - about 60% of the poll cycle (my initial assumption was incorrect).
I removed some of the less critical data sources and installed the boost plugin using the myISAM version - now I'm seeing excellent performance. The entire polling cycle now takes 35 secs and after initial testing my graph gaps have disapeared. Even the mass boost rrd update has only a nominal hit on performance.
Bottom line is that I knocked off 80 secs per poll. Here are my stats now:
12/13/2007 10:04:26 AM - SPINE: Poller[0] Time: 25.8280 s, Threads: 8, Hosts: 265
12/13/2007 10:04:35 AM - SYSTEM STATS: Time:35.3552 Method:spine Processes:1 Threads:8 Hosts:265 HostsPerProcess:265 DataSources:13658 RRDsProcessed:0
Thanks for the help.
I removed some of the less critical data sources and installed the boost plugin using the myISAM version - now I'm seeing excellent performance. The entire polling cycle now takes 35 secs and after initial testing my graph gaps have disapeared. Even the mass boost rrd update has only a nominal hit on performance.
Bottom line is that I knocked off 80 secs per poll. Here are my stats now:
12/13/2007 10:04:26 AM - SPINE: Poller[0] Time: 25.8280 s, Threads: 8, Hosts: 265
12/13/2007 10:04:35 AM - SYSTEM STATS: Time:35.3552 Method:spine Processes:1 Threads:8 Hosts:265 HostsPerProcess:265 DataSources:13658 RRDsProcessed:0
Thanks for the help.
Who is online
Users browsing this forum: No registered users and 1 guest