Question about running Cacti on a VMware ESX server
Moderators: Developers, Moderators
-
- Cacti User
- Posts: 379
- Joined: Fri Feb 16, 2007 11:37 am
- Location: Emeryville, CA
- Contact:
Question about running Cacti on a VMware ESX server
Hi all,
I have a question. is it recommended to run Cacti, either in Windows or Linux, as a guest OS on a VMware ESX server? The reason i ask is the minute i moved my Cacti server over to my ESX server, i started seeing problems with my graphs. It is as though things are timing out.
I don't want to have to rebuild this system again, but if that is what will be most effective, i will do that.
Thanks much
app
I have a question. is it recommended to run Cacti, either in Windows or Linux, as a guest OS on a VMware ESX server? The reason i ask is the minute i moved my Cacti server over to my ESX server, i started seeing problems with my graphs. It is as though things are timing out.
I don't want to have to rebuild this system again, but if that is what will be most effective, i will do that.
Thanks much
app
[b]Cacti Version[/b] - 0.8.7b
[b]Plugin Architecture[/b] - 2.2 Beta
[b]Poller Type[/b] - CMD.php
[b]Server Info[/b] - Linux 2.6.9-78.0.1.ELsmp
[b]Web Server[/b] - Apache/2.0.52 (Red Hat)
[b]PHP[/b] - 4.3.9
[b]MySQL[/b] - 4.1.22
[b]RRDTool[/b] - 1.2.23
[b]SNMP[/b] - 5.1.2
[b]Plugins[/b][list]Global Plugin Settings (settings - v0.5)
SuperLinks (superlinks - v0.72)
Host Info (hostinfo - v0.2)
Report Creator (reports - v0.3)
Update Checker (update - v0.4)
Realtime for Cacti (realtime - v0.35)
Cacti Log View (clog - v1.1)
RRD File Cleaner (rrdclean - v0.36)
Network Discovery (discovery - v0.9)
Uptime (uptime - v0.4)[/list]
[b]Plugin Architecture[/b] - 2.2 Beta
[b]Poller Type[/b] - CMD.php
[b]Server Info[/b] - Linux 2.6.9-78.0.1.ELsmp
[b]Web Server[/b] - Apache/2.0.52 (Red Hat)
[b]PHP[/b] - 4.3.9
[b]MySQL[/b] - 4.1.22
[b]RRDTool[/b] - 1.2.23
[b]SNMP[/b] - 5.1.2
[b]Plugins[/b][list]Global Plugin Settings (settings - v0.5)
SuperLinks (superlinks - v0.72)
Host Info (hostinfo - v0.2)
Report Creator (reports - v0.3)
Update Checker (update - v0.4)
Realtime for Cacti (realtime - v0.35)
Cacti Log View (clog - v1.1)
RRD File Cleaner (rrdclean - v0.36)
Network Discovery (discovery - v0.9)
Uptime (uptime - v0.4)[/list]
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Well, if it's a large system, I would "NEVER" virtualize it. There are too many I/O issues with VM's that can not be overcome.
Secondly, you've got to be kidding me Windows vs. Linux. After being a Windows advocate for 25+ years, hands down Linux x86_64 to the rescue. I prefer CentOS, but I'm going to get shot by someone on that one.
TheWitness
Secondly, you've got to be kidding me Windows vs. Linux. After being a Windows advocate for 25+ years, hands down Linux x86_64 to the rescue. I prefer CentOS, but I'm going to get shot by someone on that one.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Cacti User
- Posts: 379
- Joined: Fri Feb 16, 2007 11:37 am
- Location: Emeryville, CA
- Contact:
what do you consider large? we are probably going to have 500 or so servers that we are monitoring, with between 5 and 15 graphs per device. I have it installed in Linux, but i am noticing something very odd. I am missing poller cycles. Here is the output from my log:
This doesnt make any sense to me. this, obviously is causing gaps in my graphs. I want to make sure it isn't from the virtualization side of things.
thanks much
app
Code: Select all
10/02/2008 12:10:02 PM - SYSTEM STATS: Time:1.1790 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 12:00:03 PM - SYSTEM STATS: Time:1.2300 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:55:02 AM - SYSTEM STATS: Time:1.1832 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:50:03 AM - SYSTEM STATS: Time:1.2349 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:40:03 AM - SYSTEM STATS: Time:1.1819 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:35:06 AM - SYSTEM STATS: Time:4.4800 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:30:06 AM - SYSTEM STATS: Time:4.4734 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:20:04 AM - SYSTEM STATS: Time:2.2949 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:15:06 AM - SYSTEM STATS: Time:4.4043 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:10:04 AM - SYSTEM STATS: Time:2.2819 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 11:00:04 AM - SYSTEM STATS: Time:2.2767 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:55:07 AM - SYSTEM STATS: Time:5.4918 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:35:03 AM - POLLER: Poller[0] ERROR: Network Discover subnet setting is not set!
10/02/2008 10:35:03 AM - POLLER: Poller[0] Network Discover is now running
10/02/2008 10:35:03 AM - SYSTEM STATS: Time:1.2203 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:30:02 AM - SYSTEM STATS: Time:1.1915 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:25:05 AM - SYSTEM STATS: Time:4.3849 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:20:08 AM - SYSTEM STATS: Time:5.5990 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:15:03 AM - SYSTEM STATS: Time:1.1861 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:10:03 AM - SYSTEM STATS: Time:2.1978 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:05:04 AM - SYSTEM STATS: Time:2.3172 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 10:00:04 AM - SYSTEM STATS: Time:2.3997 Method:cmd.php Processes:1 Threads:N/A Hosts:3 HostsPerProcess:3 DataSources:59 RRDsProcessed:34
10/02/2008 09:47:08 AM - WEBUI: Cacti Log Cleared from Web Management Interface
This doesnt make any sense to me. this, obviously is causing gaps in my graphs. I want to make sure it isn't from the virtualization side of things.
thanks much
app
-
- Cacti User
- Posts: 379
- Joined: Fri Feb 16, 2007 11:37 am
- Location: Emeryville, CA
- Contact:
OK, now i have some updated info. I set my logging to medium and i found this in the log:
Any thoughts on correcting this?
app
Code: Select all
10/02/2008 12:30:01 PM - POLLER: Poller[0] NOTE: Cron is configured to run too often! The Poller Interval is '300' seconds, with a minimum Cron period of '300' seconds, but only 0 seconds have passed since the poller last ran.
10/02/2008 12:30:01 PM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '0', Max Runtime '298', Poller Runs: '1'
app
[b]Cacti Version[/b] - 0.8.7b
[b]Plugin Architecture[/b] - 2.2 Beta
[b]Poller Type[/b] - CMD.php
[b]Server Info[/b] - Linux 2.6.9-78.0.1.ELsmp
[b]Web Server[/b] - Apache/2.0.52 (Red Hat)
[b]PHP[/b] - 4.3.9
[b]MySQL[/b] - 4.1.22
[b]RRDTool[/b] - 1.2.23
[b]SNMP[/b] - 5.1.2
[b]Plugins[/b][list]Global Plugin Settings (settings - v0.5)
SuperLinks (superlinks - v0.72)
Host Info (hostinfo - v0.2)
Report Creator (reports - v0.3)
Update Checker (update - v0.4)
Realtime for Cacti (realtime - v0.35)
Cacti Log View (clog - v1.1)
RRD File Cleaner (rrdclean - v0.36)
Network Discovery (discovery - v0.9)
Uptime (uptime - v0.4)[/list]
[b]Plugin Architecture[/b] - 2.2 Beta
[b]Poller Type[/b] - CMD.php
[b]Server Info[/b] - Linux 2.6.9-78.0.1.ELsmp
[b]Web Server[/b] - Apache/2.0.52 (Red Hat)
[b]PHP[/b] - 4.3.9
[b]MySQL[/b] - 4.1.22
[b]RRDTool[/b] - 1.2.23
[b]SNMP[/b] - 5.1.2
[b]Plugins[/b][list]Global Plugin Settings (settings - v0.5)
SuperLinks (superlinks - v0.72)
Host Info (hostinfo - v0.2)
Report Creator (reports - v0.3)
Update Checker (update - v0.4)
Realtime for Cacti (realtime - v0.35)
Cacti Log View (clog - v1.1)
RRD File Cleaner (rrdclean - v0.36)
Network Discovery (discovery - v0.9)
Uptime (uptime - v0.4)[/list]
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
If it's on a VM, it's due to clock drifting. For this reason alone, "I HATE VM'S FOR TIME SENSITIVE CALCULATIONS". If you want to overcome this in Cacti, you must change your cron to the following:
TheWitness
Code: Select all
php -q poller.php --force
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Cacti User
- Posts: 367
- Joined: Tue Apr 05, 2005 9:52 am
- Location: Munich, Germany
-
- Posts: 2
- Joined: Wed Nov 26, 2008 3:23 pm
This right here:TheWitness wrote:If it's on a VM, it's due to clock drifting. For this reason alone, "I HATE VM'S FOR TIME SENSITIVE CALCULATIONS". If you want to overcome this in Cacti, you must change your cron to the following:
TheWitnessCode: Select all
php -q poller.php --force
Makes me think you are probably wrong. If it were drift you'd expect that 200-and-change seconds have passed since the last poller. You'd also expect it to run successfully every other execution or so.Cron period of '300' seconds, but only 0 seconds have passed since the poller last ran.
I think there is a bug at play here. I recently installed and configured cacti on two centos 5 machines (one x86 and one x86_64). I see the same behavior on the x86 machine after it worked flawlessly for several hours on its own.
I'm aware of the double poller scenario and have already ruled it out.
I am running in a VirtualBox VM on CentOS 4.7 without these issues.
The host is actually Windows Server 2003 (not my preference!), and on the non-virtualized machine, I am running a Netflow collection package that is receiving several thousands flows per second. The host has 4Gb of RAM, with 4 Xeon 3GB processors.
The VM is set for 1GB RAM with 1MB for video and only virtualizes 1 processor. I am not using any GUI on the VM, since it's purpose is a backend and frontend server for Cacti, not as a workstation.
The current VirtualBox VM installation where Cacti lives has ~300 hosts (all network routers and switches) ~1300 Data Sources, and ~700 RRDs running a 1 minute poller interval.
The poller being used is Spine, and it does take longer to complete than it would likely take on a non-virtualized machine, it averages between 18 and 26 seconds to complete a poller cycle and retrieve data from those ~1300 data sources.
I have Cacti poller settings configured for 2 poller processes, with 14 threads per process. I have it set to use 2 PHP script servers.
There are a few occasional graph gaps when a device doesn't respond in time (for whatever reason), however, since the primary purpose of Cacti is as a frontend for a tool that graphs performance data, and not an event monitoring tool, the gaps do not bother me at a technical level, because the devices themselves are not down during those times.
So, for managements sake, to avoid the questions about gaps, I have adjusted to automatically fill the gaps since they are not really issues anyway. I did this by tuning my existing RRDs to use a heartbeat of 600 seconds.
This may sound extreme to some, but, again, this is not being used as an event monitor, but as a performance data collection and presentation tool. I monitor (not really measure) events - up/down status, SNMP traps, etc, etc, etc - on another system
I rarely get more than one device during a poller cycle that didn't respond, or for whatever reason data was not collected for one data source, so the actual data that was possibly missed is minimal, and makes no real difference in the bigger scheme of things.
In reality, if I am, for example, collecting latency data (I use remote latency measurement with Cisco IP SLA, not that from Cacti), and collecting once per minute, then in a 1 hour sample, if I miss 2 data collections for that device, and my average latency shows that for the other 58 polls the average was ~28ms, do I really care that much about those 2 missed collections? Do I really think they are going to be that much different than the other 58?
So, in any case, if a device is really down, and the heartbeat gets exceeded, then the gaps will begin. I also updated my templates to use this heartbeat value, so any devices added will use the same Heartbeat without having to be tuned.
With all of that said, yes, I had to make sure my clock in the VM is getting updated correctly (I'm using the VirtualBox timesync addition)
But, a moderate system is definitely viable in a VM, but, since I have yet to scale up, I am not certain what the limitations will be if/when I do.
Now, more importantly... Larry, how are you doing the time travel part?
The host is actually Windows Server 2003 (not my preference!), and on the non-virtualized machine, I am running a Netflow collection package that is receiving several thousands flows per second. The host has 4Gb of RAM, with 4 Xeon 3GB processors.
The VM is set for 1GB RAM with 1MB for video and only virtualizes 1 processor. I am not using any GUI on the VM, since it's purpose is a backend and frontend server for Cacti, not as a workstation.
The current VirtualBox VM installation where Cacti lives has ~300 hosts (all network routers and switches) ~1300 Data Sources, and ~700 RRDs running a 1 minute poller interval.
The poller being used is Spine, and it does take longer to complete than it would likely take on a non-virtualized machine, it averages between 18 and 26 seconds to complete a poller cycle and retrieve data from those ~1300 data sources.
I have Cacti poller settings configured for 2 poller processes, with 14 threads per process. I have it set to use 2 PHP script servers.
There are a few occasional graph gaps when a device doesn't respond in time (for whatever reason), however, since the primary purpose of Cacti is as a frontend for a tool that graphs performance data, and not an event monitoring tool, the gaps do not bother me at a technical level, because the devices themselves are not down during those times.
So, for managements sake, to avoid the questions about gaps, I have adjusted to automatically fill the gaps since they are not really issues anyway. I did this by tuning my existing RRDs to use a heartbeat of 600 seconds.
This may sound extreme to some, but, again, this is not being used as an event monitor, but as a performance data collection and presentation tool. I monitor (not really measure) events - up/down status, SNMP traps, etc, etc, etc - on another system
I rarely get more than one device during a poller cycle that didn't respond, or for whatever reason data was not collected for one data source, so the actual data that was possibly missed is minimal, and makes no real difference in the bigger scheme of things.
In reality, if I am, for example, collecting latency data (I use remote latency measurement with Cisco IP SLA, not that from Cacti), and collecting once per minute, then in a 1 hour sample, if I miss 2 data collections for that device, and my average latency shows that for the other 58 polls the average was ~28ms, do I really care that much about those 2 missed collections? Do I really think they are going to be that much different than the other 58?
So, in any case, if a device is really down, and the heartbeat gets exceeded, then the gaps will begin. I also updated my templates to use this heartbeat value, so any devices added will use the same Heartbeat without having to be tuned.
With all of that said, yes, I had to make sure my clock in the VM is getting updated correctly (I'm using the VirtualBox timesync addition)
But, a moderate system is definitely viable in a VM, but, since I have yet to scale up, I am not certain what the limitations will be if/when I do.
Now, more importantly... Larry, how are you doing the time travel part?
-
- Posts: 2
- Joined: Wed Nov 26, 2008 3:23 pm
just to report back on my issue
It was totally my bad.
I have a cold standby system ready to take over in case this particular admin machine goes down. Part of that is regular syncing of the admin web applications like cacti and some home grown stuff. I used yum to install cacti on both machines and that automatically puts the poller cron into cron.d for you.
Once I synced the cacti stuff over the first time as part of standard backups, its started executing and updating the same DB causing my collisions. Since the two servers share the same ntpd server and are ntp peers of each other, their time is exactly the same and that is why it was always 0 seconds for me and the gaps were intermittent.
Just as a note, I am not running inside a virtual machine. I just had what seemed to be the exact same problem.
I have a cold standby system ready to take over in case this particular admin machine goes down. Part of that is regular syncing of the admin web applications like cacti and some home grown stuff. I used yum to install cacti on both machines and that automatically puts the poller cron into cron.d for you.
Once I synced the cacti stuff over the first time as part of standard backups, its started executing and updating the same DB causing my collisions. Since the two servers share the same ntpd server and are ntp peers of each other, their time is exactly the same and that is why it was always 0 seconds for me and the gaps were intermittent.
Just as a note, I am not running inside a virtual machine. I just had what seemed to be the exact same problem.
Not likely it will drift 5 minutes in 5 minutes. Besides that, the VM itself does not know that it's time is drifting, so this is impossible(it can't execute things in the wrong order).TheWitness wrote:If it's on a VM, it's due to clock drifting.
True, however cacti works fine in my testing enviroment, and a few seconds "timedrift" isnt a big problem for cacti i think.TheWitness wrote: "I HATE VM'S FOR TIME SENSITIVE CALCULATIONS".
My cacti production enviroment runs on a physical machine, but this has more to do with the fact that I want to monitor ESX, SAN and VM's from a separate system. All other important systems run on Vmware, including database clusters, Exchange and domain controllers(1 excluded for time synchronisation).
If you want to run Cacti as a VM (which is what I do, production and testing. ESX as my base, CentOS 4 / 5 as the Cacti server) then there are a few easy tips to fix the clock drift for CentOS at least.
Edit /etc/yum.repos.d/CentOS-Base.repo
Add this
You will notice that its not enabled by default, keep it that way.
Now just run this command
This will install a kernel that uses a 100Mhz clock, versus the 1000Mhz used in the normal kernel.
You will then want to edit /etc/grub.conf to make it boot to the VM Kernel normally. Then just reboot.
You will then want to synchronize the time properly, so run these commands
------------
And Yes, with the clock drift your cron CAN miss pollings. While building my CactiEZ CD, I can easily say that I have installed Cacti in a VM at least 200 times (I got it pretty automated now, from ISO build to VM install < 5 minutes). I have seen the clock drift with the missed pollings time and time again.
Edit /etc/yum.repos.d/CentOS-Base.repo
Add this
Code: Select all
[testing]
name=CentOS Testing
baseurl=http://dev.centos.org/centos/$releasever/testing/$basearch/
enabled=0
gpgcheck=1
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing
Now just run this command
Code: Select all
yum install kernel-vm --enablerepo=testing
You will then want to edit /etc/grub.conf to make it boot to the VM Kernel normally. Then just reboot.
You will then want to synchronize the time properly, so run these commands
Code: Select all
service ntpd stop
ntpdate -u 0.pool.ntp.org
hwclock -w
service ntpd start
And Yes, with the clock drift your cron CAN miss pollings. While building my CactiEZ CD, I can easily say that I have installed Cacti in a VM at least 200 times (I got it pretty automated now, from ISO build to VM install < 5 minutes). I have seen the clock drift with the missed pollings time and time again.
Who is online
Users browsing this forum: No registered users and 2 guests