It sounds like you have in mind some type of a daemon that will process threshold events by interogating a database to determine what the appropriate values should be and act accordingly.Agreed. Would like to set thresholds by each Data Query and manage event management outside of ping availability, outside of the poller, in a separate package. I am still contemplating a design, your assistance would be appreciated.
Well, let me first try to explain my design evolution. aloe started as a stand alone web app to view syslog messages written via odbc to a mysql database from Kiwi's syslogd. I was in a primarily NT shop, but desired a way to centralize system and network events, there were utilities available to forward NT event logs to syslog and all the other devices had support for it, so it made sence to use syslog to collect it all and at a relatively low cost (~$100 for the full version of kiwi to support logging via odbc). I could now corelate system and network events etc, but that left me with yet another browser window open on my desktop which i did not want, so i decided to integrate it into cacti. In keeping with Cacti's spirit of supporting all OS's i adopted syslog-ng's database schema and do to the flexibility of Kiwi's output options it did not not require much modification to the app. Then it dawned on me, if i could have cacti scripts log information to this database either directly or indirectly through syslog, that would prove extremely useful. So i set off to attempt it.
Having done my fair share of battles with cacti's php code, and don't get me wrong Ian is a wizard in my eyes, to attempt to add this functionality to cacti from the outset would have meant spending alot of time in the php code vs. getting the results i desired. I chose perl because there is a wider array of modules available for everything from snmpv2c support to remote syslog. I don't know if you've tried to write an event to a remote syslog via php, but unless that has changed in the latest version you can't, granted you can record it to an NT event log, however these event messages are not formatted properly and they result in alot of extraneous crap and exporting them via utilities like evtsys does not work properly. Based on my research that could only be corrected by registering event messages via a dll, so in my opinion was not worth the extra aggravation. Yes these events could be sent via smtp, but that would violate my original goal of centralizing all the events. I've been refining a function library so that there are some standard calls that can be made to check and react to threshold violations, xml input to pass script parameters and output integrated with cacti for stunning visual effects, i've added a multi indexed script input so that adding multiple anything isn't such a chore. The most frustrating thing has been cacti's graph centric nature and it data collection.
My experience with this has been that the nature of threshold monitoring is very much dependant on the method implemented to monitor it. For instance availability via ping, something may respond in a reasonable amount of time, but suffer from 75% packet loss, i would argue that device is down or at least want to be alerted to the fact that it is experiencing 75% packet loss, either way both rtt and packet loss need to be checked. I would also rather know the health of a network interface vs utilization, if i'm looking at a utilization graph that is currently at 4 Mbps that's not bad if it's a 100Mb+ Full Duplex link, but if it is 10 half then there is a problem, again input/output utilization, speed, duplex, collisions, discards etc. need to be monitored and more importantly displayed on the same page as the graph being viewed. My point being that the dynamics of these things can only be accounted for by dynamic means, in cacti to me that currently means the scripts. Additionally to allow for the appropriate level of granularity, I would think that there would need to be a means to configure thresholds on a global/default, device/default specific and datapoint specific level.
I would also suggest caution implementing these features. Other users appreciate the roots of cacti; the ease of which historical data collection and graphing can be accomplished is what attracted them to cacti in the first place. Benign hooks into monitoring, alerting and alike would be desirable so not to detract from cacti's design goals.
So in summary aloe is an event sink , a place to store events. My intent is to have scripts exploit this functionality, because currently they are the only thing that can account for the dynamic or specificc nature of what is being polled or monitored.
I will consider that a design flaw!I did notice your aloe looked similar to HP OpenView.
Cheers! Phil