Question about running Cacti on a VMware ESX server

apperrault · Post by **apperrault** » Wed Oct 01, 2008 1:15 pm

Hi all,
I have a question. is it recommended to run Cacti, either in Windows or Linux, as a guest OS on a VMware ESX server? The reason i ask is the minute i moved my Cacti server over to my ESX server, i started seeing problems with my graphs. It is as though things are timing out.

I don't want to have to rebuild this system again, but if that is what will be most effective, i will do that.

Thanks much

app

Post by **TheWitness** » Wed Oct 01, 2008 7:04 pm

Well, if it's a large system, I would "NEVER" virtualize it. There are too many I/O issues with VM's that can not be overcome.

Secondly, you've got to be kidding me Windows vs. Linux. After being a Windows advocate for 25+ years, hands down Linux x86_64 to the rescue. I prefer CentOS, but I'm going to get shot by someone on that one.

TheWitness

apperrault · Post by **apperrault** » Thu Oct 02, 2008 2:22 pm

what do you consider large? we are probably going to have 500 or so servers that we are monitoring, with between 5 and 15 graphs per device. I have it installed in Linux, but i am noticing something very odd. I am missing poller cycles. Here is the output from my log:

Code: Select all

10/02/2008 12:10:02 PM - SYSTEM STATS: Time:1.1790 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 12:00:03 PM - SYSTEM STATS: Time:1.2300 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:55:02 AM - SYSTEM STATS: Time:1.1832 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:50:03 AM - SYSTEM STATS: Time:1.2349 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:40:03 AM - SYSTEM STATS: Time:1.1819 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:35:06 AM - SYSTEM STATS: Time:4.4800 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:30:06 AM - SYSTEM STATS: Time:4.4734 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:20:04 AM - SYSTEM STATS: Time:2.2949 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:15:06 AM - SYSTEM STATS: Time:4.4043 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:10:04 AM - SYSTEM STATS: Time:2.2819 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:00:04 AM - SYSTEM STATS: Time:2.2767 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:55:07 AM - SYSTEM STATS: Time:5.4918 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:35:03 AM - POLLER: Poller[0] ERROR: Network Discover subnet setting is not set!
10/02/2008 10:35:03 AM - POLLER: Poller[0] Network Discover is now running
10/02/2008 10:35:03 AM - SYSTEM STATS: Time:1.2203 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:30:02 AM - SYSTEM STATS: Time:1.1915 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:25:05 AM - SYSTEM STATS: Time:4.3849 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:20:08 AM - SYSTEM STATS: Time:5.5990 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:15:03 AM - SYSTEM STATS: Time:1.1861 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:10:03 AM - SYSTEM STATS: Time:2.1978 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:05:04 AM - SYSTEM STATS: Time:2.3172 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:00:04 AM - SYSTEM STATS: Time:2.3997 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 09:47:08 AM - WEBUI: Cacti Log Cleared from Web Management Interface

This doesnt make any sense to me. this, obviously is causing gaps in my graphs. I want to make sure it isn't from the virtualization side of things.

thanks much

app

apperrault · Post by **apperrault** » Thu Oct 02, 2008 2:33 pm

OK, now i have some updated info. I set my logging to medium and i found this in the log:

Code: Select all

10/02/2008 12:30:01 PM - POLLER: Poller[0] NOTE: Cron is configured to run too often! The Poller Interval is '300' seconds, with a minimum Cron period of '300' seconds, but only 0 seconds have passed since the poller last ran.
10/02/2008 12:30:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '0', Max Runtime '298', Poller Runs: '1'

Any thoughts on correcting this?

app

Post by **TheWitness** » Fri Oct 03, 2008 10:34 am

If it's on a VM, it's due to clock drifting. For this reason alone, "I HATE VM'S FOR TIME SENSITIVE CALCULATIONS". If you want to overcome this in Cacti, you must change your cron to the following:

Code: Select all

php -q poller.php --force

TheWitness

torstentfk · Post by **torstentfk** » Tue Oct 07, 2008 2:33 am

Hi,

same problem was!! here. Install the vmtools which synchs the timers host-guest system. After this no timeout problem occurs.

Torsten

aut0maticdan · Post by **aut0maticdan** » Wed Nov 26, 2008 3:29 pm

TheWitness wrote:If it's on a VM, it's due to clock drifting. For this reason alone, "I HATE VM'S FOR TIME SENSITIVE CALCULATIONS". If you want to overcome this in Cacti, you must change your cron to the following:
Code: Select all
php -q poller.php --force
TheWitness

This right here:

Cron period of '300' seconds, but only 0 seconds have passed since the poller last ran.

Makes me think you are probably wrong. If it were drift you'd expect that 200-and-change seconds have passed since the last poller. You'd also expect it to run successfully every other execution or so.

I think there is a bug at play here. I recently installed and configured cacti on two centos 5 machines (one x86 and one x86_64). I see the same behavior on the x86 machine after it worked flawlessly for several hours on its own.

I'm aware of the double poller scenario and have already ruled it out.

fmerrill · Post by **fmerrill** » Thu Nov 27, 2008 10:13 am

I am running in a VirtualBox VM on CentOS 4.7 without these issues.
The host is actually Windows Server 2003 (not my preference!), and on the non-virtualized machine, I am running a Netflow collection package that is receiving several thousands flows per second. The host has 4Gb of RAM, with 4 Xeon 3GB processors.
The VM is set for 1GB RAM with 1MB for video and only virtualizes 1 processor. I am not using any GUI on the VM, since it's purpose is a backend and frontend server for Cacti, not as a workstation.

The current VirtualBox VM installation where Cacti lives has ~300 hosts (all network routers and switches) ~1300 Data Sources, and ~700 RRDs running a 1 minute poller interval.
The poller being used is Spine, and it does take longer to complete than it would likely take on a non-virtualized machine, it averages between 18 and 26 seconds to complete a poller cycle and retrieve data from those ~1300 data sources.
I have Cacti poller settings configured for 2 poller processes, with 14 threads per process. I have it set to use 2 PHP script servers.

There are a few occasional graph gaps when a device doesn't respond in time (for whatever reason), however, since the primary purpose of Cacti is as a frontend for a tool that graphs performance data, and not an event monitoring tool, the gaps do not bother me at a technical level, because the devices themselves are not down during those times.
So, for managements sake, to avoid the questions about gaps, I have adjusted to automatically fill the gaps since they are not really issues anyway. I did this by tuning my existing RRDs to use a heartbeat of 600 seconds.
This may sound extreme to some, but, again, this is not being used as an event monitor, but as a performance data collection and presentation tool. I monitor (not really measure) events - up/down status, SNMP traps, etc, etc, etc - on another system

I rarely get more than one device during a poller cycle that didn't respond, or for whatever reason data was not collected for one data source, so the actual data that was possibly missed is minimal, and makes no real difference in the bigger scheme of things.
In reality, if I am, for example, collecting latency data (I use remote latency measurement with Cisco IP SLA, not that from Cacti), and collecting once per minute, then in a 1 hour sample, if I miss 2 data collections for that device, and my average latency shows that for the other 58 polls the average was ~28ms, do I really care that much about those 2 missed collections? Do I really think they are going to be that much different than the other 58?

So, in any case, if a device is really down, and the heartbeat gets exceeded, then the gaps will begin. I also updated my templates to use this heartbeat value, so any devices added will use the same Heartbeat without having to be tuned.

With all of that said, yes, I had to make sure my clock in the VM is getting updated correctly (I'm using the VirtualBox timesync addition)
But, a moderate system is definitely viable in a VM, but, since I have yet to scale up, I am not certain what the limitations will be if/when I do.

Now, more importantly... Larry, how are you doing the time travel part?

aut0maticdan · Post by **aut0maticdan** » Wed Dec 03, 2008 11:56 am

It was totally my bad.

I have a cold standby system ready to take over in case this particular admin machine goes down. Part of that is regular syncing of the admin web applications like cacti and some home grown stuff. I used yum to install cacti on both machines and that automatically puts the poller cron into cron.d for you.

Once I synced the cacti stuff over the first time as part of standard backups, its started executing and updating the same DB causing my collisions. Since the two servers share the same ntpd server and are ntp peers of each other, their time is exactly the same and that is why it was always 0 seconds for me and the gaps were intermittent.

Just as a note, I am not running inside a virtual machine. I just had what seemed to be the exact same problem.

YoMarK · Post by **YoMarK** » Wed Dec 03, 2008 1:28 pm

TheWitness wrote:If it's on a VM, it's due to clock drifting.

Not likely it will drift 5 minutes in 5 minutes. Besides that, the VM itself does not know that it's time is drifting, so this is impossible(it can't execute things in the wrong order).

TheWitness wrote: "I HATE VM'S FOR TIME SENSITIVE CALCULATIONS".

True, however cacti works fine in my testing enviroment, and a few seconds "timedrift" isnt a big problem for cacti i think.

My cacti production enviroment runs on a physical machine, but this has more to do with the fact that I want to monitor ESX, SAN and VM's from a separate system. All other important systems run on Vmware, including database clusters, Exchange and domain controllers(1 excluded for time synchronisation).

rootkit · Post by **rootkit** » Thu Dec 04, 2008 7:01 pm

i have cacti in on esxi with the host connected to a netapp (dual 4gb fiber) no issues for me... on that instance monitors all hosp servers/networks.

over 2k... running rhel5 64bit 4gb ram. might be small but works fine for me.

Running exchange on vm, ha no thx =)

cigamit · Post by **cigamit** » Fri Dec 05, 2008 2:56 pm

If you want to run Cacti as a VM (which is what I do, production and testing. ESX as my base, CentOS 4 / 5 as the Cacti server) then there are a few easy tips to fix the clock drift for CentOS at least.

Edit /etc/yum.repos.d/CentOS-Base.repo
Add this

Code: Select all

[testing]
name=CentOS Testing 
baseurl=http://dev.centos.org/centos/$releasever/testing/$basearch/
enabled=0
gpgcheck=1
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing

You will notice that its not enabled by default, keep it that way.

Now just run this command

Code: Select all

yum install kernel-vm --enablerepo=testing

This will install a kernel that uses a 100Mhz clock, versus the 1000Mhz used in the normal kernel.

You will then want to edit /etc/grub.conf to make it boot to the VM Kernel normally. Then just reboot.

You will then want to synchronize the time properly, so run these commands

Code: Select all

service ntpd stop
ntpdate -u 0.pool.ntp.org
hwclock -w
service ntpd start

------------

And Yes, with the clock drift your cron CAN miss pollings. While building my CactiEZ CD, I can easily say that I have installed Cacti in a VM at least 200 times (I got it pretty automated now, from ISO build to VM install < 5 minutes). I have seen the clock drift with the missed pollings time and time again.

bilbus · Post by **bilbus** » Mon Jan 05, 2009 12:08 am

I have cacti working fine on Vmware ESX 3.5. Only about 20 devices though.

Unless your cacti server is IO heavy .. and i cant beleve it would be, running in ESX is just fine.

I have seen exchange 2003 / 2007 server clusters running in ESX, so i dont think cacti will be a problem.

Cacti

Question about running Cacti on a VMware ESX server

Question about running Cacti on a VMware ESX server

just to report back on my issue

Who is online