Using SNMP & Cacti to monitor a dynamic cloud-based back-end

mwijnants81 · Post by **mwijnants81** » Wed Oct 30, 2013 5:46 am

Dear all,

We are interested in exploiting SNMP to monitor a dynamic cloud computing back-end that is implemented by means of CloudStack [1]. The back-end consists of a cluster of physical machines on which Virtual Machine (VM) instances are spawned and destroyed on an as-needed basis. Each VM in turn is capable of hosting multiple server applications, which are also dynamic in nature in the sense that the number of server applications can fluctuate over time. Combined, this yields a highly dynamic setup that is characterized by both VM and application churn. The VMs currently run Windows 7, yet might in the future be migrated to a Linux/Unix distribution.

We would like to monitor not only the spawned VMs globally, but also the individual server applications which they accommodate. The monitoring should in addition involve not only "physical" resource usage (i.e., CPU, memory, network bandwidth, …), but also application-layer metrics (e.g., number of clients that are connected to a particular application server). Finally, we would like the monitoring to be automated as much as possible. For example, as soon as a new VM is spawned, we would like Cacti to automatically start collecting statistics about it; the same holds true for situations where a new application server is started on an existing VM.

In order to allow for extensive SNMP-based resource monitoring of the back-end, we have turned to the net-snmp project [2]. We have found tutorials and documentation about how to extend the net-snmp agent to support "non-standard" metrics. We have read about writing snmpd extensions (e.g., [3]), and we have successfully experimented with the net-snmp "extends" directive to introduce new OID values (e.g., [4]). We succeeded in using the latter approach to attach a "client_count" OID to a local file on a VM’s disk that is continuously updated with the current client count of a server application that is running on that VM; we then manually instructed Cacti to query this OID and to plot the evolution of its value. So far so good. Unfortunately, we are struggling to generalize this approach so that it works for dynamic numbers of application servers (and VMs). At the same time, we are unsure about how to automate this process (i.e., we would like Cacti to automatically start collecting statistics as soon as a new VM/application server spawns, without the need for manual intervention).

From the information that we have collected so far, we feel like our best bet is to define a custom MIB tree, with a branch for each of the (physical or application-layer) metrics that we would like to monitor from the application servers. For instance, the "client_count" branch of the MIB tree could hold one instance/row for each currently active application server to specify the current client count of this particular server. Another branch could hold the CPU usage of each application server. Perhaps we can use the application server's PID as row index? Cacti could then do a SNMP walk of these subtrees to obtain, for example, the client count values of all currently running application servers on a particular VM. Is this approach viable? If so, how could it best be realized? If not, does any of you have any suggestions about how our problem could be addressed?

Many thanks in advance!
Best regards,
Maarten Wijnants

[1] http://cloudstack.apache.org/
[2] http://www.net-snmp.org/
[3] http://www.net-snmp.org/wiki/index.php/ ... MIB_Module
[4] http://docs.fedoraproject.org/en-US/Fed ... nding.html

Using SNMP & Cacti to monitor a dynamic cloud-based back-end

Using SNMP & Cacti to monitor a dynamic cloud-based back-end

Who is online