Gaps/spikes in CPU graph, other graphs ok.

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Gaps/spikes in CPU graph, other graphs ok.

Post by hoganw »

Hi everyone.

I'm having an issue (Well, a couple, but this one has been bugging me the most). I have caci 0.8.6b running on a RedHat AS3.0 box configured with about 110 devices and 1100 data sources, polling with cactid. Works great! Now, the problem I'm having is with the CPU graph for another AS3.0 box. I have graphs for disk, load avg, advanced memory, CPU and network traffic. All the graphs report as they should, except for the CPU. The specs of that machine are:

4xP4 Xeon 3.0GHz (HT), 16GB ram, RHEL AS3.0, net-snmp 5.0.9-2.30E.3. Here's what the CPU graph looks like (PMDCDB-001) and a working graph (PMDB-004). Both machines are identical configs. AS3, same version of net-snmp, same CPUs, same memory. The only difference is that the one with the broken graph gets really hammered, however the graphs for load average, etc for the same server are 100% ok. Also notice the maximum on the broken graph is 163%...

Ideas? Thanks!
Attachments
Working graph
Working graph
graph_image2.php.png (7.44 KiB) Viewed 5705 times
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

I would suggest either upgrading your cacti installation to 0.8.6c or applying all the patches for 0.8.6b.

You can locate full version here:

http://www.cacti.net/download_cacti.php

And patches here:

http://www.cacti.net/download_patches.php
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

We just put another server in place yesterday and it got beat on heavily for an hour during the night, same CPU graph issues.

I'll upgrade/apply patches for 'c' and see what happens, but I don't see anything in the patch list that seems related to the problem.
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

Let me know the results.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

I haven't seen any gaps in the past couple of hours, but sometimes that happens anyway. However, I'm only seeing one line in the graph for system CPU and none for user. If you look in the broken graph you'll see the blue "user" cpu line show up at random but is largely missing. Still the case after patching.
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

Upgraded to 'c' and still no good. Better than yesterday's graph, but they're not always horrid. So, no change.

Applied the patches for 'c' and will keep watching. On a side note, I'm 99% sure it's a graphing problem seeing as other polls for the same host are fine, but is there a way I can easily check to make sure the data is actually there to graph where there are gaps?
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

Only thing I can suggest would be to write a script that would log the retrieved value to a file, with a time stamp, and run it every 5 minutes.

This way, you can see what is happening at those times.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

If you are indeed at 0.8.6c, please review your snmp timeouts and post some of your log output. If neccessary, change logging to DEBUG and review for errors. You may have to run poller.php to get real fruitful errors when in DEBUG mode.

Thanks,

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

Verified 0.8.6c, SNMP timeout 500(ms), average query time under 'devices' is ~3ms.

On a side note, which I'm going to check up on some more, 64-bit counters doesn't work for any data source. I'm trying to graph multiple gig-e interfaces and whenever I set to use 64-bit counters I have empty graphs.

Data source debug:

/usr/local/bin/rrdtool create \
/usr/local/share/cacti/rra/pmdcdb001_cpu_system_62.rrd \
--step 300 \
DS:cpu_system:COUNTER:600:0:100 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MIN:0.5:1:600 \
RRA:MIN:0.5:6:700 \
RRA:MIN:0.5:24:775 \
RRA:MIN:0.5:288:797 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
RRA:LAST:0.5:1:600 \
RRA:LAST:0.5:6:700 \
RRA:LAST:0.5:24:775 \
RRA:LAST:0.5:288:797 \

Poller DEBUG output for CPU specific to host with problem:

01/27/2005 12:03:46 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 4
01/27/2005 12:03:46 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 449174784
01/27/2005 12:03:46 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 1823947020
01/27/2005 12:03:51 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_user_63.rrd --template cpu_user
1106856225:1823947020
01/27/2005 12:03:51 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_system_62.rrd --template cpu_system 1106856225:449174784
01/27/2005 12:03:51 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_nice_61.rrd --template cpu_nice
1106856225:4
01/27/2005 12:05:02 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 4
01/27/2005 12:05:02 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 449184063
01/27/2005 12:05:02 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 1823987348
01/27/2005 12:05:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_nice_61.rrd --template cpu_nice
1106856300:4
01/27/2005 12:05:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_system_62.rrd --template cpu_system 1106856300:449184063
01/27/2005 12:05:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_user_63.rrd --template cpu_user
1106856300:1823987348
01/27/2005 12:05:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_nice_61.rrd --template cpu_nice
1106856300:4
01/27/2005 12:05:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_system_62.rrd --template cpu_system 1106856300:449184063
01/27/2005 12:05:03 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_user_63.rrd --template cpu_user
1106856300:1823987348
01/27/2005 12:10:04 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_nice, oid: .1.3.6.1.4.1.2021.11.51.0, value: 4
01/27/2005 12:10:04 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_system, oid: .1.3.6.1.4.1.2021.11.52.0, value: 449222000
01/27/2005 12:10:04 PM - CACTID: Poller[0] Host[9] SNMP: v1: pmdcdb-001, dsname: cpu_user, oid: .1.3.6.1.4.1.2021.11.50.0, value: 1824142214
01/27/2005 12:10:10 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_system_62.rrd --template cpu_system 1106856600:449222000
01/27/2005 12:10:10 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_nice_61.rrd --template cpu_nice
1106856600:4
01/27/2005 12:10:10 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_user_63.rrd --template cpu_user
1106856600:1824142214
01/27/2005 12:10:10 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_system_62.rrd --template cpu_system 1106856600:449222000
01/27/2005 12:10:10 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_nice_61.rrd --template cpu_nice
1106856600:4
01/27/2005 12:10:10 PM - POLLER: Poller[0] CACTI2RRD: /usr/local/bin/rrdtool update /usr/local/share/cacti/rra/pmdcdb001_cpu_user_63.rrd --template cpu_user
1106856600:1824142214
Attachments
Recent broken graph from today
Recent broken graph from today
graph_image3.php.png (8.06 KiB) Viewed 5656 times
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Please run poller.php from the command line and redirect output to a file. Post that file. Run in DEBUG please. Sorry for the confution. 64Bit counters only work with SNMP v2.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

Here are two outputs, one done a few mins after the other.

I changed to v2 and 64-bit counters, that corrected data collected before but I still get no new data since making the change. I go to the data sources and execute a verbose query, works fine. Seeing these errors in cacti.log:

01/27/2005 03:10:02 PM - CACTID: Poller[0] Host[10] WARNING: Result from SNMP not valid. Partial Result: No Such Object avail...
01/27/2005 03:10:02 PM - CACTID: Poller[0] Host[37] WARNING: Result from SNMP not valid. Partial Result: No Such Object avail...
01/27/2005 03:10:02 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '79-cpu_nice-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '88-cpu_nice-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '1230-traffic_in-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '246-traffic_in-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '1005-traffic_in-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '1026-traffic_in-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] Host[10] WARNING: Result from SNMP not valid. Partial Result: No Such Object avail...
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '283-cpu_system-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] ERROR: Problem with MySQL: Duplicate entry '71-cpu_system-2005-01-27 15:10:00' for key 1
01/27/2005 03:10:03 PM - CACTID: Poller[0] Host[37] WARNING: Result from SNMP not valid. Partial Result: No Such Object avail...

Only change was to v2 and 64-bit counters. Only the 64-bit counter graphs are lacking new data, so v2 is ok since it can still poll for other data sources.
Attachments
poller-output.tar.gz
Archive contains 2 poller.php outputs.
(34.98 KiB) Downloaded 247 times
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

Let the updated Cacti run over the weekend and using SNMP v2, still no change in graphs.

What next? :) Any insight from the poller logs?
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Please insure that you only have 1 cron job running poller.php or any other component of Cacti. It appears that you are getting errors with updates to RRD files due to duplicate attempts to log the same data. This is a classic example of having the job running twice at the same time. Otherwise, the output looks good.

Here is an example of the error from your second file.:
ERROR: illegal attempt to update using time 1106865374 when last update time is 1106865374 (minimum one second step)
Sorry about the delay!

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

Yup, only one cron job is running. I have seen random periods where the 296 second timeout gets triggered, perhaps another poller is getting kicked off before the last one finishes due to the timeout?

At this point the odd CPU graphs are mostly just annoying, I'm more interested in getting the 64-bit counter problem figured out since we're having more and more servers running data at speeds > 100mbit. Suppose I'll do a search and post a new thread on the subject if I can't come across anything since it doesn't fit in this one. :)

Thanks for the help so far, I appreciate it!

-Hogan
hoganw
Posts: 26
Joined: Wed Jan 26, 2005 12:54 pm
Location: Carlsbad, CA
Contact:

Post by hoganw »

Just to update, I did some experiments with the CPU graphs and part of the problem is the busy servers have CPU usage that "exceeds" 100%. I have one graph showing CPU usage spiking at around 170%.

These are all dual or quad CPU machines with hyperthreading, so they report either 4 or 8 CPUs to the OS.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests