Hourly spikes, possibly a time issue

ollyb · Post by **ollyb** » Sat Jun 24, 2006 6:28 am

Hi there

I'm graphing data from various switches and routers, using cacti running on a VMware VPS node. There are huge spikes occuring on all graphs every hour (see pic below). I thought I'd found the source of this problem - the timing isn't quite right on the node and it runs about 4 minutes fast per hour. ntpdate was running at 1 minute past each hour which brought it back to say 00:56. So when it got back to 01:00, cron ran the poller again, and an extra 4 minutes worth of data had passed through, so this was added to the original transfer for 01:00.

Am I making any sesne?

Anyway, I canged it so that ntpdate runs every 5 minutes, starting at 2 minutes past the hour, making sure it's all correctly updated and not too far out of sync when the poller does its thing. Problem is I'm still getting the same spikes as before.

Has anyone seen this sort of thing before?

Post by **rony** » Sat Jun 24, 2006 10:10 pm

Counter roll over on one of the interfaces?

ollyb · Post by **ollyb** » Sun Jun 25, 2006 3:22 am

No, it's happening on all interfaces

This started annoying me, and I've found where the problem is. I picked a random graph and did "rrdtool info" to get raw data out of it approaching the hour....

last_update = 1151168036 338 Saturday, June 24th 2006, 16:53:56
ds[traffic_in].last_ds = "3698415271" 36524593
ds[traffic_in].value = 2.5502378544e+07
ds[traffic_out].last_ds = "1409881384" 3344365
ds[traffic_out].value = 2.3351187574e+06

last_update = 1151168362 326 Saturday, June 24th 2006, 16:59:22
ds[traffic_in].last_ds = "3734949651" 36534380
ds[traffic_in].value = 2.9361986380e+07
ds[traffic_out].last_ds = "1415188106" 5306722
ds[traffic_out].value = 4.2649115460e+06

last_update = 1151168400 38 Saturday, June 24th 2006, 17:00:00
ds[traffic_in].last_ds = "3770563053" 35613402
ds[traffic_in].value = 0.0000000000e+00
ds[traffic_out].last_ds = "1427197975" 12009869
ds[traffic_out].value = 0.0000000000e+00

I've put next to the unix times the amount of seconds that one was since the last update, same with the counter. Then the time in a readable format.

at all times the unix time differences are between 310 and 330 seconds, though in reality it's actually about 300 secs. Supposedly then there's only 38 secs between the one on the hour and the one before, but still a comparable amount of traffic. And in actual fact it was again 5 minutes for that update.

despite doing "ntpdate <<server>> && hwclock --systohc" every 5 minutes, cacti seems to be ignoring this and then tripping over itself on the hour. As I said this is a VMWare node, but I'd still have thought that cacti gets the time from the node's system cock, ut it seems maybe not.

Post by **rony** » Sun Jun 25, 2006 10:46 am

Make sure that VMware isn't trying to update the guest operating system clock.

Also, Cacti uses the system clock.

mories · Post by **mories** » Sun Jun 25, 2006 1:45 pm

ollyb,

I have had some time issues with my cacti machine on vmware, it had nothing to do with cacti, but with linux and time keeping.

Maybe you can take a look at this kb article.
I did a 'watch -n1 date' and noticed my clock was running to fast.
The kb article suggest to add 'clock=pit' as a kernel parameter in grub.conf or lilo.conf

Clock in a Linux Guest Runs More Slowly or Quickly Than Real Time
http://www.vmware.com/support/kb/enduse ... faqid=1420

mories

ollyb · Post by **ollyb** » Mon Jun 26, 2006 3:02 am

Thanks for the pointers guys, hopefully I'll be able to get this thing sorted

Hourly spikes, possibly a time issue

Hourly spikes, possibly a time issue

Who is online