regular gaps in cpu graphs

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

regular gaps in cpu graphs

Post by repudi8or »

Hi folks,

I have very regular gaps in my graphs, it seems like one polling period every hour is a NAN. However its only for some metrics, on some hosts (out of approx 200).

Any suggestions would be appreciated.

Following the "debug nans" guide i have gathered this additional info :-

-bash-3.00$ /usr/local/rrdtool-1.2.19/bin/rrdtool fetch monitorme_cpu_idle_785.rrd AVERAGE
cpu_idle


1192757400: 8.3243629406e+01
1192757700: 7.9383902625e+01
1192758000: 8.3658311111e+01
1192758300: 7.9825566729e+01
1192758600: 8.2972681469e+01
1192758900: 8.2169877778e+01
1192759200: 8.3598666667e+01
1192759500: 8.1111479376e+01
1192759800: 8.3635057723e+01
1192760100: 8.3807308970e+01
1192760400: NaN
1192760700: 8.0660000000e+01
1192761000: 8.4805181395e+01
1192761300: 8.6270015037e+01
1192761600: 8.8554983278e+01
1192761900: 8.4548146757e+01
1192762200: 8.4576943522e+01
1192762500: 8.4789502809e+01
1192762800: 8.2883310037e+01
1192763100: 7.4099006054e+01
1192763400: 8.1565544444e+01
1192763700: 8.2110000000e+01
1192764000: NaN
1192764300: 8.1607973422e+01
1192764600: 8.3575048608e+01
1192764900: 8.1753191475e+01
1192765200: 8.6133977275e+01
1192765500: 7.9706354292e+01
1192765800: 8.0761866667e+01
1192766100: 8.1509805316e+01
1192766400: 8.3751261351e+01
1192766700: 7.9394633333e+01
1192767000: 8.3153795238e+01
1192767300: 8.3428571429e+01
1192767600: NaN
1192767900: 8.0357859532e+01

-bash-3.00$

Every 12th entry is a NAN

if i check the cactid log i see values for every poll:-
10/19/2007 02:20:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639313241
10/19/2007 02:25:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639337268
10/19/2007 02:30:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639361700
10/19/2007 02:35:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639384953
10/19/2007 02:40:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639409634
10/19/2007 02:45:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639432872
10/19/2007 02:50:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639457442
10/19/2007 02:55:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639479473
10/19/2007 03:00:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639501023
10/19/2007 03:05:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639522549
10/19/2007 03:10:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639547413
10/19/2007 03:15:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639570604
10/19/2007 03:20:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639596058
10/19/2007 03:25:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639620261
10/19/2007 03:30:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639645265

When I run cactid --verbosity=5 for this host all looks fine. values are retrieved ok from the agent and inserted ok into the database.

an rrdtool info shows :-
filename = "monitorme_cpu_idle_785.rrd"
rrd_version = "0003"
step = 300
last_update = 1193005518
ds[cpu_idle].type = "COUNTER"
ds[cpu_idle].minimal_heartbeat = 600
ds[cpu_idle].min = 0.0000000000e+00
ds[cpu_idle].max = 1.0000000000e+02
ds[cpu_idle].last_ds = "659832327"
ds[cpu_idle].value = 1.4316040268e+03
ds[cpu_idle].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 4.7798848564e+02
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 2.1612519690e+04
rra[3].cdp_prep[0].unknown_datapoints = 22

hmmm the min and max look ok, its a percentage so it should be from 0-100. However, the value shown exceeds the range. Could this be my issue? Although checking some of the other values also show values that appear out of the range (ie rra[3].cdp_prep[0].value

checking table poller_output looks ok:-
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 5 |
+----------+
1 row in set (0.00 sec)

What else can i check?

Regards Repudi8or
Attachments
gaps.PNG
gaps.PNG (25.77 KiB) Viewed 3546 times
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Redirect your crontab's stdout and stderr to a file for a few hours and look after the error messages. Post your findings.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

ok I redirected my cron job output to /tmp/poller.out

mostly its "OK" or "Waiting on x/y pollers" .... However when i grep -v those out i see random timeouts :-

Timeout: No Response from pollmetadv:161
Timeout: No Response from pollmetadv:161
Timeout: No Response from pollmep4901:161
Timeout: No Response from pollmep4901:161
Timeout: No Response from pollmep4902:161
Timeout: No Response from pollmep4902:161
10/22/2007 01:10:26 PM - SYSTEM STATS: Time:25.3602 Method:cactid Processes:10 Threads:10 Hosts:51 HostsPerProcess:6 DataSources:922 RRDsProcessed:730
Timeout: No Response from pollmep4903:161
Timeout: No Response from pollmep4903:161
10/22/2007 01:15:27 PM - SYSTEM STATS: Time:26.5192 Method:cactid Processes:10 Threads:10 Hosts:51 HostsPerProcess:6 DataSources:922 RRDsProcessed:730
Timeout: No Response from pollmep4900:161
Timeout: No Response from pollmep4900:161
10/22/2007 01:20:27 PM - SYSTEM STATS: Time:26.5436 Method:cactid Processes:10 Threads:10 Hosts:51 HostsPerProcess:6 DataSources:922 RRDsProcessed:730


So I will increase the timeout and the poller retries values to see if it gets around these timeouts... Thanks for the hint Witness

Regards Rep.
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

Ok so the timeout errors have gone after i increased the timeout value and the retries. HOWEVER I still have the regular gaps in my graphs and I have some new messages in the poller.log :-


Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12


I cant find any data source that should poll that oid. Weird !
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Reduce your MAXOID's for the System as well. It would appear that the SNMP agent is not handling things well and is bombing (crashing maybe?).

For Cactid, Max OID's per get request is under Settings->Poller.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

MAXOID's was set to 10, i have reduced it to 5.... still getting hourly gaps....
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

You need to send me more of your log's around the times of the errors. Also, what about the graphs. Is it all graphs of simply some of them? The other thing it could be, and the reason I asked you to redirect your cron log's to stderr and stdout to a file, was to see if you have a creeping time sync. If you are on a VM, you need to get off it. Also, maybe set your driftfile a bit tighter.

Regards,

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

I sent the logs via email.

In answer to the other things :-

Its a majority of graphs for some specific metric types (ie cpu and network traffic... but the gaps appear for about 80% of hosts)

Not sure how to identify a creeping time sync. Do you mean where the poller is running from? The poller process IS running in a solaris zone (sort of like a vm). The drift file is empty.

Regards Repudi8or
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

If you turn off ntp sync for a few hours, see if the problem goes away.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Also, decrease the number of processes and increase the number of threads. Processes should not be more than 2x cores.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

I have decreased the # of processes to the same as # of cores (cacti implementation is in a container on a T2000 which is quad core). Thus processes now =4

increased # of threads to 20

turned of ntp.

I will give it an hour or two and report the outcome.

I just wanted to mention... other metrics are graphing ok (ie no gaps).... Things like load average, memory, swap, filesystems, users, processes are all fine... its only effecting cpu and network traffic graphs (see attached pic)

Regards Rep
Attachments
p4901.PNG
p4901.PNG (83.75 KiB) Viewed 3447 times
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Well, looking at your graphs, I would say that it is not a time issue. I would have to see the cron output to get more detail. This is quite odd. It may be a two crontab issue, but I am uncertain as to the coposition of your graphs. (aka Data Sources)

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

Hi Witness, thanks very much for your assistance thusfar.

You are correct about it not being time related. Ntp enabled or disabled makes no difference. None of the other changes (processes, threads) have made a difference either. I still have the gaps.

I am using UNIX Template Set for Cacti: HP-UX, Solaris, Linux for all but the cpu graph. The cpu one is CPU Usage for Solaris

data sources for the cpu graph for example are :-

Code: Select all

 <name>ucd/net - CPU Usage - Idle by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - Idle</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_0800029bc1889337106bea2e0ded25a868ba9f>
  <t_data_source_name /> 
  <data_source_name>cpu_idle</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_0800029bc1889337106bea2e0ded25a868ba9f>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.53.0</value> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002fde318f91908642c62c30d3eb4faa029>
- <hash_010002a0ba6b77fc8afeca43436d8ff086d319>
  <name>ucd/net - CPU Usage - System by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - System</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_080002a69c46b4d0cdbef3b4bc5f9bb9229188>
  <t_data_source_name /> 
  <data_source_name>cpu_system</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_080002a69c46b4d0cdbef3b4bc5f9bb9229188>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.52.0</value> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002a0ba6b77fc8afeca43436d8ff086d319>
- <hash_0100022c90edd564d5fe11496a4f563740eb9a>
  <name>ucd/net - CPU Usage - User by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - User</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_080002f6f4541d553ff538ce5a4f9b7389a685>
  <t_data_source_name /> 
  <data_source_name>cpu_user</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_080002f6f4541d553ff538ce5a4f9b7389a685>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.50.0</value> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_0100022c90edd564d5fe11496a4f563740eb9a>
- <hash_010002b9c2d1f751b4130aa9a61863cc8aa4f4>
  <name>ucd/net - CPU Usage - Wait by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - Wait</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_0800025ad130a3e183b3e01d2a62f9488798e4>
  <t_data_source_name /> 
  <data_source_name>cpu_wait</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_0800025ad130a3e183b3e01d2a62f9488798e4>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.54.0</value> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002b9c2d1f751b4130aa9a61863cc8aa4f4>
- <hash_010002490d340faa55c47cdcffd39d12e286b4>
  <name>ucd/net - CPU Usage - Kernel by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - Kernel</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_0800029fdc17072e43c68e07cd163673443206>
  <t_data_source_name /> 
  <data_source_name>cpu_kernel</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_0800029fdc17072e43c68e07cd163673443206>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.55.0</value> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002490d340faa55c47cdcffd39d12e286b4>
Im not sure what you mean by "a two crontab issue" however there is only a single entry for the cacti users crontab :-

-bash-3.00$ crontab -l
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/local/php/bin/php /usr/local/apache2/htdocs/cacti/poller.php >> /tmp/poller.log 2>&1

I emailed you the output from poller.log

Regards Rep.
repudi8or
Posts: 27
Joined: Tue Sep 25, 2007 7:41 pm

Post by repudi8or »

hmmm, increased the cacti.log debugging output to turn everything on and trying to work my way backwards through the problem... I chose one server and one interval of null values. I went backwards thru the cacti.log and found the rrdtool updates putting in values of "U" for the cpu metrics. However earlier in the log i see the snmp polls successfully return numeric values for the related oid's (see attached log).

So my question is, what can effect this data between the value being returned by the poller and being stored in the rrd?

Regards Rep
Attachments
myhost2.txt
(25.51 KiB) Downloaded 226 times
User avatar
TheWitness
Developer
Posts: 16997
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Is this one of the hosts:

10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] RECACHE: Processing 2 items in the auto reindex cache for 'my4900'
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] ASSERT: '356242' .lt. '26221' failed. Recaching host 'my4900', data query #1
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] NOTICE: Spike Kill in Effect for 'my4900'
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] ASSERT: '356242' .lt. '26221' failed. Recaching host 'my4900', data query #8
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] NOTICE: Spike Kill in Effect for 'my4900'

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests