[SOLVED] cacti 0.8.7a / spine 0.8.7b - graph gaps

briang · Post by **briang** » Tue Dec 11, 2007 11:08 am

Hello,

I was hoping someone could help point me in the right direction.

I have a recently upgraded/large cacti instance where I'm seeing huge gaps in my graphs. I'm polling 265 hosts with 8669 data sources on a 1 minute interval. I'm using spine 0.8.7b for the poller.

Initially I had my threads per process set to 16 and I was seeing gaps in graphs, but no errors in the log. I changed my threads count to 2, and now I see - "ERROR: Spine Timed Out While Processing Hosts Internal in the log" at the end of a poll cycle.

What's interesting is that I also see "PING Result: ICMP: Host is Alive" messages in the log although SNMP is set for downed host detection.

Any ideas what might be going on here? Thanks.

My config details below:
Date Tue, 11 Dec 2007 10:04:39 -0600
Cacti Version 0.8.7a
Cacti OS win32
SNMP Version net-snmp
RRDTool Version RRDTool 1.2.x
Hosts 265
Graphs 5294
Data Sources SNMP: 5949
SNMP Query: 2720
Total: 8669
Poller Information
Interval 60
Type spine
Items Action[0]: 17501
Total: 17501
Concurrent Processes 1
Max Threads 2
PHP Servers 10
Script Timeout 30
Max OID 30
Last Run Statistics Time:12.8818 Method:spine Processes:1 Threads:2 Hosts:265 HostsPerProcess:265 DataSources:17501 RRDsProcessed:615

BSOD2600 · Post by **BSOD2600** » Tue Dec 11, 2007 1:11 pm

briang wrote:What's interesting is that I also see "PING Result: ICMP: Host is Alive" messages in the log although SNMP is set for downed host detection.

You do know it's per-device now. SNMP is set for that device in question, right?

As for the gaps and spine, I've noticed a lot of users reporting that. Will have to wait for TheWitness to return to comment.

briang · Post by **briang** » Tue Dec 11, 2007 1:27 pm

I did not know host down detection was device specific. I will check all hosts now.

Regarding the graph gaps - I'm testing different thread/concurrent process combinations to see if I can see improvement with a given setting.

briang · Post by **briang** » Tue Dec 11, 2007 4:31 pm

After testing, I believe my graph gaps are due to poller wraps. I'm configured with a 5 min scheduled task and a 1 min poll interval. My latest stats show 288 secs to complete a poll.

Does anyone know if these metrics are typical for windows? My server is a w2k3, 2 x 2.33ghz dual core with 4gb ram. All of my tests are snmp...

Any ideas to improve performance?

12/11/2007 01:49:59 PM - SYSTEM STATS: Time:288.9060 Method:spine Processes:8 Threads:16 Hosts:265 HostsPerProcess:34 DataSources:17501 RRDsProcessed:8056

BSOD2600 · Post by **BSOD2600** » Tue Dec 11, 2007 6:46 pm

briang wrote:Does anyone know if these metrics are typical for windows? My server is a w2k3, 2 x 2.33ghz dual core with 4gb ram. All of my tests are snmp...

Look through the announcement forum for the cacti metrics thread; it'll give you a good idea what others are running.

briang wrote:Any ideas to improve performance?

Try tweaking the number of processes and threads for spine. Also enable the query cache for mysql and make sure its big enough to hold all your data. Lastly, might look into using the Boost plugin.

briang · Post by **briang** » Wed Dec 12, 2007 9:19 am

I'll try all of the above - thanks. Just one more question as I'm unclear to the inner workings of the poller. In the cacti log I see the below for one poll run.

Running the scheduled task as the logged in user, I can see the cmd window for spine running - which corresponds to the first entry. What exactly is happening between the spine time entry of 46.8120 and the stats entry of 134.5982?

I'm trying to account for the ~87 secs, as the rrd updates seem to be complete by the time the spine entry is written to the log.

Thanks

12/11/2007 05:13:47 PM - SPINE: Poller[0] Time: 46.8120 s, Threads: 8, Hosts: 26
5
12/11/2007 05:15:14 PM - SYSTEM STATS: Time:134.5982 Method:spine Processes:1 Threads:8 Hosts:265 HostsPerProcess:265 DataSources:13658 RRDsProcessed:4589

BSOD2600 · Post by **BSOD2600** » Wed Dec 12, 2007 10:37 am

Turn the cacti logging level to high or debug to see what cacti is doing under the covers.

briang · Post by **briang** » Thu Dec 13, 2007 11:13 am

After debugging it turns out that the actual rrd updates were taking the ~85 secs - about 60% of the poll cycle (my initial assumption was incorrect).

I removed some of the less critical data sources and installed the boost plugin using the myISAM version - now I'm seeing excellent performance. The entire polling cycle now takes 35 secs and after initial testing my graph gaps have disapeared. Even the mass boost rrd update has only a nominal hit on performance.

Bottom line is that I knocked off 80 secs per poll. Here are my stats now:

12/13/2007 10:04:26 AM - SPINE: Poller[0] Time: 25.8280 s, Threads: 8, Hosts: 265

12/13/2007 10:04:35 AM - SYSTEM STATS: Time:35.3552 Method:spine Processes:1 Threads:8 Hosts:265 HostsPerProcess:265 DataSources:13658 RRDsProcessed:0

Thanks for the help.

Cacti

[SOLVED] cacti 0.8.7a / spine 0.8.7b - graph gaps

[SOLVED] cacti 0.8.7a / spine 0.8.7b - graph gaps

Re: cacti 0.8.7a / spine 0.8.7b - graph gaps

Who is online