regular gaps in cpu graphs

repudi8or · Post by **repudi8or** » Sun Oct 21, 2007 5:42 pm

Hi folks,

I have very regular gaps in my graphs, it seems like one polling period every hour is a NAN. However its only for some metrics, on some hosts (out of approx 200).

Any suggestions would be appreciated.

Following the "debug nans" guide i have gathered this additional info :-

-bash-3.00$ /usr/local/rrdtool-1.2.19/bin/rrdtool fetch monitorme_cpu_idle_785.rrd AVERAGE
cpu_idle

1192757400: 8.3243629406e+01
1192757700: 7.9383902625e+01
1192758000: 8.3658311111e+01
1192758300: 7.9825566729e+01
1192758600: 8.2972681469e+01
1192758900: 8.2169877778e+01
1192759200: 8.3598666667e+01
1192759500: 8.1111479376e+01
1192759800: 8.3635057723e+01
1192760100: 8.3807308970e+01
1192760400: NaN
1192760700: 8.0660000000e+01
1192761000: 8.4805181395e+01
1192761300: 8.6270015037e+01
1192761600: 8.8554983278e+01
1192761900: 8.4548146757e+01
1192762200: 8.4576943522e+01
1192762500: 8.4789502809e+01
1192762800: 8.2883310037e+01
1192763100: 7.4099006054e+01
1192763400: 8.1565544444e+01
1192763700: 8.2110000000e+01
1192764000: NaN
1192764300: 8.1607973422e+01
1192764600: 8.3575048608e+01
1192764900: 8.1753191475e+01
1192765200: 8.6133977275e+01
1192765500: 7.9706354292e+01
1192765800: 8.0761866667e+01
1192766100: 8.1509805316e+01
1192766400: 8.3751261351e+01
1192766700: 7.9394633333e+01
1192767000: 8.3153795238e+01
1192767300: 8.3428571429e+01
1192767600: NaN
1192767900: 8.0357859532e+01

-bash-3.00$

Every 12th entry is a NAN

if i check the cactid log i see values for every poll:-
10/19/2007 02:20:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639313241
10/19/2007 02:25:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639337268
10/19/2007 02:30:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639361700
10/19/2007 02:35:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639384953
10/19/2007 02:40:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639409634
10/19/2007 02:45:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639432872
10/19/2007 02:50:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639457442
10/19/2007 02:55:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639479473
10/19/2007 03:00:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639501023
10/19/2007 03:05:24 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639522549
10/19/2007 03:10:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639547413
10/19/2007 03:15:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639570604
10/19/2007 03:20:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639596058
10/19/2007 03:25:25 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639620261
10/19/2007 03:30:26 PM - CACTID: Poller[0] Host[56] DS[785] SNMP: v2: monitorme, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 639645265

When I run cactid --verbosity=5 for this host all looks fine. values are retrieved ok from the agent and inserted ok into the database.

an rrdtool info shows :-
filename = "monitorme_cpu_idle_785.rrd"
rrd_version = "0003"
step = 300
last_update = 1193005518
ds[cpu_idle].type = "COUNTER"
ds[cpu_idle].minimal_heartbeat = 600
ds[cpu_idle].min = 0.0000000000e+00
ds[cpu_idle].max = 1.0000000000e+02
ds[cpu_idle].last_ds = "659832327"
ds[cpu_idle].value = 1.4316040268e+03
ds[cpu_idle].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 4.7798848564e+02
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 2.1612519690e+04
rra[3].cdp_prep[0].unknown_datapoints = 22

hmmm the min and max look ok, its a percentage so it should be from 0-100. However, the value shown exceeds the range. Could this be my issue? Although checking some of the other values also show values that appear out of the range (ie rra[3].cdp_prep[0].value

checking table poller_output looks ok:-
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 5 |
+----------+
1 row in set (0.00 sec)

What else can i check?

Regards Repudi8or

Post by **TheWitness** » Sun Oct 21, 2007 8:14 pm

Redirect your crontab's stdout and stderr to a file for a few hours and look after the error messages. Post your findings.

TheWitness

repudi8or · Post by **repudi8or** » Sun Oct 21, 2007 10:28 pm

ok I redirected my cron job output to /tmp/poller.out

mostly its "OK" or "Waiting on x/y pollers" .... However when i grep -v those out i see random timeouts :-

Timeout: No Response from pollmetadv:161
Timeout: No Response from pollmetadv:161
Timeout: No Response from pollmep4901:161
Timeout: No Response from pollmep4901:161
Timeout: No Response from pollmep4902:161
Timeout: No Response from pollmep4902:161
10/22/2007 01:10:26 PM - SYSTEM STATS: Time:25.3602 Method:cactid Processes:10 Threads:10 Hosts:51 HostsPerProcess:6 DataSources:922 RRDsProcessed:730
Timeout: No Response from pollmep4903:161
Timeout: No Response from pollmep4903:161
10/22/2007 01:15:27 PM - SYSTEM STATS: Time:26.5192 Method:cactid Processes:10 Threads:10 Hosts:51 HostsPerProcess:6 DataSources:922 RRDsProcessed:730
Timeout: No Response from pollmep4900:161
Timeout: No Response from pollmep4900:161
10/22/2007 01:20:27 PM - SYSTEM STATS: Time:26.5436 Method:cactid Processes:10 Threads:10 Hosts:51 HostsPerProcess:6 DataSources:922 RRDsProcessed:730

So I will increase the timeout and the poller retries values to see if it gets around these timeouts... Thanks for the hint Witness

Regards Rep.

repudi8or · Post by **repudi8or** » Sun Oct 21, 2007 11:29 pm

Ok so the timeout errors have gone after i increased the timeout value and the retries. HOWEVER I still have the regular gaps in my graphs and I have some new messages in the poller.log :-

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

Error in packet.
Reason: (genError) A general failure occured
Failed object: .1.3.6.1.4.1.42.2.15.12

I cant find any data source that should poll that oid. Weird !

Post by **TheWitness** » Mon Oct 22, 2007 9:21 am

Reduce your MAXOID's for the System as well. It would appear that the SNMP agent is not handling things well and is bombing (crashing maybe?).

For Cactid, Max OID's per get request is under Settings->Poller.

TheWitness

repudi8or · Post by **repudi8or** » Mon Oct 22, 2007 7:15 pm

MAXOID's was set to 10, i have reduced it to 5.... still getting hourly gaps....

Post by **TheWitness** » Mon Oct 22, 2007 8:05 pm

You need to send me more of your log's around the times of the errors. Also, what about the graphs. Is it all graphs of simply some of them? The other thing it could be, and the reason I asked you to redirect your cron log's to stderr and stdout to a file, was to see if you have a creeping time sync. If you are on a VM, you need to get off it. Also, maybe set your driftfile a bit tighter.

Regards,

TheWitness

repudi8or · Post by **repudi8or** » Mon Oct 22, 2007 9:26 pm

I sent the logs via email.

In answer to the other things :-

Its a majority of graphs for some specific metric types (ie cpu and network traffic... but the gaps appear for about 80% of hosts)

Not sure how to identify a creeping time sync. Do you mean where the poller is running from? The poller process IS running in a solaris zone (sort of like a vm). The drift file is empty.

Regards Repudi8or

Post by **TheWitness** » Mon Oct 22, 2007 9:30 pm

If you turn off ntp sync for a few hours, see if the problem goes away.

TheWitness

Post by **TheWitness** » Mon Oct 22, 2007 9:39 pm

Also, decrease the number of processes and increase the number of threads. Processes should not be more than 2x cores.

TheWitness

repudi8or · Post by **repudi8or** » Mon Oct 22, 2007 9:55 pm

I have decreased the # of processes to the same as # of cores (cacti implementation is in a container on a T2000 which is quad core). Thus processes now =4

increased # of threads to 20

turned of ntp.

I will give it an hour or two and report the outcome.

I just wanted to mention... other metrics are graphing ok (ie no gaps).... Things like load average, memory, swap, filesystems, users, processes are all fine... its only effecting cpu and network traffic graphs (see attached pic)

Regards Rep

Post by **TheWitness** » Tue Oct 23, 2007 6:15 am

Well, looking at your graphs, I would say that it is not a time issue. I would have to see the cron output to get more detail. This is quite odd. It may be a two crontab issue, but I am uncertain as to the coposition of your graphs. (aka Data Sources)

TheWitness

repudi8or · Post by **repudi8or** » Tue Oct 23, 2007 5:26 pm

Hi Witness, thanks very much for your assistance thusfar.

You are correct about it not being time related. Ntp enabled or disabled makes no difference. None of the other changes (processes, threads) have made a difference either. I still have the gaps.

I am using UNIX Template Set for Cacti: HP-UX, Solaris, Linux for all but the cpu graph. The cpu one is CPU Usage for Solaris

data sources for the cpu graph for example are :-

Code: Select all

 <name>ucd/net - CPU Usage - Idle by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - Idle</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_0800029bc1889337106bea2e0ded25a868ba9f>
  <t_data_source_name /> 
  <data_source_name>cpu_idle</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_0800029bc1889337106bea2e0ded25a868ba9f>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.53.0</value> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002fde318f91908642c62c30d3eb4faa029>
- <hash_010002a0ba6b77fc8afeca43436d8ff086d319>
  <name>ucd/net - CPU Usage - System by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - System</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_080002a69c46b4d0cdbef3b4bc5f9bb9229188>
  <t_data_source_name /> 
  <data_source_name>cpu_system</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_080002a69c46b4d0cdbef3b4bc5f9bb9229188>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.52.0</value> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002a0ba6b77fc8afeca43436d8ff086d319>
- <hash_0100022c90edd564d5fe11496a4f563740eb9a>
  <name>ucd/net - CPU Usage - User by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - User</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_080002f6f4541d553ff538ce5a4f9b7389a685>
  <t_data_source_name /> 
  <data_source_name>cpu_user</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_080002f6f4541d553ff538ce5a4f9b7389a685>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.50.0</value> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_0100022c90edd564d5fe11496a4f563740eb9a>
- <hash_010002b9c2d1f751b4130aa9a61863cc8aa4f4>
  <name>ucd/net - CPU Usage - Wait by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - Wait</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_0800025ad130a3e183b3e01d2a62f9488798e4>
  <t_data_source_name /> 
  <data_source_name>cpu_wait</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_0800025ad130a3e183b3e01d2a62f9488798e4>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.54.0</value> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002b9c2d1f751b4130aa9a61863cc8aa4f4>
- <hash_010002490d340faa55c47cdcffd39d12e286b4>
  <name>ucd/net - CPU Usage - Kernel by balint</name> 
- <ds>
  <t_name>on</t_name> 
  <name>|host_description| - CPU Usage - Kernel</name> 
  <data_input_id>hash_0300023eb92bb845b9660a7445cf9740726522</data_input_id> 
  <t_rra_id /> 
  <t_rrd_step /> 
  <rrd_step>300</rrd_step> 
  <t_active /> 
  <active>on</active> 
  <rra_items>hash_150002c21df5178e5c955013591239eb0afd46|hash_1500020d9c0af8b8acdc7807943937b3208e29|hash_1500026fc2d038fb42950138b0ce3e9874cc60|hash_150002e36f3adb9f152adfa5dc50fd2b23337e</rra_items> 
  </ds>
- <items>
- <hash_0800029fdc17072e43c68e07cd163673443206>
  <t_data_source_name /> 
  <data_source_name>cpu_kernel</data_source_name> 
  <t_rrd_minimum /> 
  <rrd_minimum>0</rrd_minimum> 
  <t_rrd_maximum /> 
  <rrd_maximum>100</rrd_maximum> 
  <t_data_source_type_id /> 
  <data_source_type_id>2</data_source_type_id> 
  <t_rrd_heartbeat /> 
  <rrd_heartbeat>600</rrd_heartbeat> 
  <t_data_input_field_id /> 
  <data_input_field_id>0</data_input_field_id> 
  </hash_0800029fdc17072e43c68e07cd163673443206>
  </items>
- <data>
- <item_000>
  <data_input_field_id>hash_07000292f5906c8dc0f964b41f4253df582c38</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_000>
- <item_001>
  <data_input_field_id>hash_0700024276a5ec6e3fe33995129041b1909762</data_input_field_id> 
  <t_value /> 
  <value>.1.3.6.1.4.1.2021.11.55.0</value> 
  </item_001>
- <item_002>
  <data_input_field_id>hash_07000232285d5bf16e56c478f5e83f32cda9ef</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_002>
- <item_003>
  <data_input_field_id>hash_070002ad14ac90641aed388139f6ba86a2e48b</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_003>
- <item_004>
  <data_input_field_id>hash_0700029c55a74bd571b4f00a96fd4b793278c6</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_004>
- <item_005>
  <data_input_field_id>hash_070002012ccb1d3687d3edb29c002ea66e72da</data_input_field_id> 
  <t_value /> 
  <value /> 
  </item_005>
  </data>
  </hash_010002490d340faa55c47cdcffd39d12e286b4>

Im not sure what you mean by "a two crontab issue" however there is only a single entry for the cacti users crontab :-

-bash-3.00$ crontab -l
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/local/php/bin/php /usr/local/apache2/htdocs/cacti/poller.php >> /tmp/poller.log 2>&1

I emailed you the output from poller.log

Regards Rep.

repudi8or · Post by **repudi8or** » Wed Oct 24, 2007 8:19 pm

hmmm, increased the cacti.log debugging output to turn everything on and trying to work my way backwards through the problem... I chose one server and one interval of null values. I went backwards thru the cacti.log and found the rrdtool updates putting in values of "U" for the cpu metrics. However earlier in the log i see the snmp polls successfully return numeric values for the related oid's (see attached log).

So my question is, what can effect this data between the value being returned by the poller and being stored in the rrd?

Regards Rep

Post by **TheWitness** » Wed Oct 24, 2007 8:57 pm

Is this one of the hosts:

10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] RECACHE: Processing 2 items in the auto reindex cache for 'my4900'
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] ASSERT: '356242' .lt. '26221' failed. Recaching host 'my4900', data query #1
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] NOTICE: Spike Kill in Effect for 'my4900'
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] ASSERT: '356242' .lt. '26221' failed. Recaching host 'my4900', data query #8
10/24/2007 04:15:09 PM - CACTID: Poller[0] Host[56] NOTICE: Spike Kill in Effect for 'my4900'

TheWitness

regular gaps in cpu graphs

regular gaps in cpu graphs

Who is online