aloe event console add on

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

Agreed. Would like to set thresholds by each Data Query and manage event management outside of ping availability, outside of the poller, in a separate package. I am still contemplating a design, your assistance would be appreciated.
It sounds like you have in mind some type of a daemon that will process threshold events by interogating a database to determine what the appropriate values should be and act accordingly.

Well, let me first try to explain my design evolution. aloe started as a stand alone web app to view syslog messages written via odbc to a mysql database from Kiwi's syslogd. I was in a primarily NT shop, but desired a way to centralize system and network events, there were utilities available to forward NT event logs to syslog and all the other devices had support for it, so it made sence to use syslog to collect it all and at a relatively low cost (~$100 for the full version of kiwi to support logging via odbc). I could now corelate system and network events etc, but that left me with yet another browser window open on my desktop which i did not want, so i decided to integrate it into cacti. In keeping with Cacti's spirit of supporting all OS's i adopted syslog-ng's database schema and do to the flexibility of Kiwi's output options it did not not require much modification to the app. Then it dawned on me, if i could have cacti scripts log information to this database either directly or indirectly through syslog, that would prove extremely useful. So i set off to attempt it.

Having done my fair share of battles with cacti's php code, and don't get me wrong Ian is a wizard in my eyes, to attempt to add this functionality to cacti from the outset would have meant spending alot of time in the php code vs. getting the results i desired. I chose perl because there is a wider array of modules available for everything from snmpv2c support to remote syslog. I don't know if you've tried to write an event to a remote syslog via php, but unless that has changed in the latest version you can't, granted you can record it to an NT event log, however these event messages are not formatted properly and they result in alot of extraneous crap and exporting them via utilities like evtsys does not work properly. Based on my research that could only be corrected by registering event messages via a dll, so in my opinion was not worth the extra aggravation. Yes these events could be sent via smtp, but that would violate my original goal of centralizing all the events. I've been refining a function library so that there are some standard calls that can be made to check and react to threshold violations, xml input to pass script parameters and output integrated with cacti for stunning visual effects, i've added a multi indexed script input so that adding multiple anything isn't such a chore. The most frustrating thing has been cacti's graph centric nature and it data collection.

My experience with this has been that the nature of threshold monitoring is very much dependant on the method implemented to monitor it. For instance availability via ping, something may respond in a reasonable amount of time, but suffer from 75% packet loss, i would argue that device is down or at least want to be alerted to the fact that it is experiencing 75% packet loss, either way both rtt and packet loss need to be checked. I would also rather know the health of a network interface vs utilization, if i'm looking at a utilization graph that is currently at 4 Mbps that's not bad if it's a 100Mb+ Full Duplex link, but if it is 10 half then there is a problem, again input/output utilization, speed, duplex, collisions, discards etc. need to be monitored and more importantly displayed on the same page as the graph being viewed. My point being that the dynamics of these things can only be accounted for by dynamic means, in cacti to me that currently means the scripts. Additionally to allow for the appropriate level of granularity, I would think that there would need to be a means to configure thresholds on a global/default, device/default specific and datapoint specific level.

I would also suggest caution implementing these features. Other users appreciate the roots of cacti; the ease of which historical data collection and graphing can be accomplished is what attracted them to cacti in the first place. Benign hooks into monitoring, alerting and alike would be desirable so not to detract from cacti's design goals.

So in summary aloe is an event sink , a place to store events. My intent is to have scripts exploit this functionality, because currently they are the only thing that can account for the dynamic or specificc nature of what is being polled or monitored.
I did notice your aloe looked similar to HP OpenView.
I will consider that a design flaw!

Cheers! Phil
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »


I don't know if you've tried to write an event to a remote syslog via php, but unless that has changed in the latest version you can't, granted you can record it to an NT event log, however these event messages are not formatted properly and they result in alot of extraneous crap and exporting them via utilities like evtsys does not work properly. Based on my research that could only be corrected by registering event messages via a dll, so in my opinion was not worth the extra aggravation.
The development of a resource DLL will be done. It is not complicated. No big deal.
I chose perl because there is a wider array of modules available for everything from snmpv2c support to remote syslog.
I believe both SNMP v2 and Syslog are supported in PHP 4.3.7+. I am using Syslog and PHP uses NetSNMP.
Yes these events could be sent via smtp, but that would violate my original goal of centralizing all the events.
E-Mail support would be optional. Syslog/Eventlog will be in 0.8.6. Unfortunately the Syslog facilities for NT are only the Eventlog.
I would also suggest caution implementing these features. Other users appreciate the roots of cacti; the ease of which historical data collection and graphing can be accomplished is what attracted them to cacti in the first place. Benign hooks into monitoring, alerting and alike would be desirable so not to detract from cacti's design goals.
Agreed. Ian and I have had this conversation often. The intent is not to build a monster NMS but rather tailor specific features to what the RRD database provides. What we put in 0.8.6 was a way simply to know the status of a host prior to polling a downed host thousands of times and essentially killing the poller process. We have achieved that goal.

The next evolution is to add some form of optional basic notification. In doing so, we will have built a framework for event management. The remaining component of that event management is to formulate a simple "Template based" framework to apply to the data sources.

It is likely that that form of event management will not be in Cacti for some time. However, simple smtp will be an option soon, rooted in the poller (I hope).

I hope that the 0.8.6 upgrade will be a positive one for the Cacti community. I am especially fond of the PHP Script Server that accellerates PHP scripts 20 fold. We are also working with a Cacti appreciator in developing a Perl Script Server for 0.8.7.

I appreciate the time you spent commenting on aloe and am very desirous to include some method of viewing host up/down and system events in the next version.

Oh, by the way...
I will consider that a design flaw! :lol: :lol:
Regards,

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

How does something like this sound.

Add a new table to cacti to store current values from any event processing the poller or scripts wish to do. Something like

device/host, data input method/query , output field/ds item, current value, last value, current time, status (new or processed entry)

device, ping, rtt, 300, now, new

Daemonized script processes each entry in this table based on the status field every x seconds/minutes
Calculates the delta of the last and current values
Checks the delta value against a threshold to determine whether this is a nonevent or an event that is new, existing or cleared
Takes the appropriate logging/notification action.
Updates the status value of the record to processed and replaces the last value field of this table with the current value.

New - log the event to a DB (directly or via syslog) optional send an alert via smtp (other methods can be added)
Update - log the event to a DB (directly or via syslog) optional send an alert via smtp (reduces alert spam)
Cleared - log the event to a DB (directly or via syslog) optional send an alert via smtp

An smtp update every x minutes/hours for events that have not cleared would probably be nice too.

Event database would be entirely separate from cacti, therefore it can be hosted on a separate machine if need be. This would also be possible for the ems daemon.

Impacts to cacti -

Modifications necessary to include threshold values with the DS
A table to serve as the bridge between cacti and the EMS daemon
A console to view the event DB - aloe or an aloe like substance
A simple way for the ems daemon to dermine what the threshold for the delta value is. Either included in the record of this new table or through a db query.
What ever poller modifications may be necessary.
A console to indicate device state.
It would also be desirable to include the ems daemon configuration parameters such as notification actions to take when something happens,
processing interval etc, in cacti.


Language options could be

PHP - it is required for Cacti, although the option to log via remote syslog would not be currently possible direct inserts to an event DB could be used as a work around

PERL - not required for cacti (though i highly recommended it) most of this has already been written, although it would have to be recomposed to process a db table and daemonized.

Python - probably not in widespread use, but it can be compiled into a standalone executable. Supports threading on NT and *nix platforms. probably more leg in the long run because more features like retrys for downed devices and such could be added in as threads versus being processed sequentially.

That leaves the matter of determining and displaying the state of a device. If that is left to the poller to decide, then it could simply be a matter of adding logic to this event processing daemon that basically says if the data input method is poller and there current value = 1 then update some table with device state = down. The information in this table along with some php in cacti could then display the device state (along with some stunning visual effects of course).

A little bit simplistic, but i think that meets the initial goal and allows for this to evolve in a low impact way to cacti. Shouldn't interfere with distributed cacti, and the same distribution model could probably be applied to this. The remote poller could process any events that take place on it's assigned poller tasks, update a local event db cache which could be pulled by a centralized ems daemon, or something along those lines.

Cheers, Phil
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

thought so...
i think i even located the place in the ping-probe.pl script where it says where to look for the info, but as i am no programmer, i dont touch things i dont know :)

here is the info:
root@dublin:~# /bin/ping -c 4 -s 56 81.199.3.1
PING 81.199.3.1 (81.199.3.1) 56(84) bytes of data.
64 bytes from 81.199.3.1: icmp_seq=1 ttl=252 time=104 ms
64 bytes from 81.199.3.1: icmp_seq=2 ttl=252 time=105 ms
64 bytes from 81.199.3.1: icmp_seq=3 ttl=252 time=105 ms
64 bytes from 81.199.3.1: icmp_seq=4 ttl=252 time=105 ms

--- 81.199.3.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3034ms
rtt min/avg/max/mdev = 104.983/105.391/105.698/0.286 ms

-harel
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

by the way, i did not notice the conversation before i wrote what i wrote, so the above refers to Sidewinder's:
Could you run the ping command from the command shell and post the results?

/bin/ping -c 4 -s 56 81.199.3.1
thanks

-harel
sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

oharel wrote:by the way, i did not notice the conversation before i wrote what i wrote, so the above refers to Sidewinder's:
Could you run the ping command from the command shell and post the results?

/bin/ping -c 4 -s 56 81.199.3.1
thanks

-harel
it looks like there is a subtle difference in the output. the script looks for round-trip or nothing to determine the start of the summary information. The output for this ping is displaying rtt.

i've added that in the attached script and hopefully that will resolve this for you. Otherwise there is a problem with the expression parsing, that will take me abit to sort, i not a programmer either :)

Please let me know how you make out.
Cheers, Phil
Attachments
ping-probe.pl.txt
(19.51 KiB) Downloaded 504 times
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

:) getting there!
now i have only the error on the packet loss
now, the strange thing is, i did exactly like you did - added an rtt statement, but then, you are a better no-programmer than i :)

can u give a clue as to the packet loss problem?

thanks again for your time and patience.

-harel
sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

oharel wrote::) getting there!
now i have only the error on the packet loss
now, the strange thing is, i did exactly like you did - added an rtt statement, but then, you are a better no-programmer than i :)

can u give a clue as to the packet loss problem?

thanks again for your time and patience.

-harel
can you paste this in the packet loss logic, and give that a try.

elsif ($ping_output =~ m@(\d+)% packet loss,\s+$@m) {
# Redhat
$pt{loss} = $1;
}

i think the 4 packets transmitted, 4 received, 0% packet loss, time 3034ms is what's casing the problem.

I hope this is worth your effort.

-Phil
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

YES!!! :D

it is working. thanks!
elsif ($ping_output =~ m@(\d+)% packet loss,\s+$@m) {
since that did not work, i did:
elsif ($ping_output =~ m@(\d+)% packet|loss,\s+$@m) {

(copied from the other lines ;) )

anyway, i get RTT Average Current in the graph themselves at NaN, instead of an actual number. any idea why?

-harel
Guest

Post by Guest »

oharel wrote:YES!!! :D

it is working. thanks!
elsif ($ping_output =~ m@(\d+)% packet loss,\s+$@m) {
since that did not work, i did:
elsif ($ping_output =~ m@(\d+)% packet|loss,\s+$@m) {

(copied from the other lines ;) )

-harel
Great, now you're a programmer! Well that's half the battle, i'll have to brush up on my regexp skills and readdress that one.
anyway, i get RTT Average Current in the graph themselves at NaN, instead of an actual number. any idea why?
Now it looks like the cacti poller doesn't like the output from the script. Could you run the script from the command and post the results?

Thanks, Phil
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

Hi Sidewinder,

running:
rrdtool fetch /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd MAX
yields:
rtt_avg pkt_loss

1091741100: nan 1.0000000000e+02
1091741400: nan 1.0000000000e+02
1091741700: nan 1.0000000000e+02
etc.

running the command from the shell:
root@dublin:/var/www/htdocs/cacti/scripts# /usr/bin/perl /var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 81.199.3.1
yields:
query -> select rrd_path from data_input_data_cache where command like '%/var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 81.199.3.1';
result -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd
ping-probe output file -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd.xml
avg:105 loss:0

below is what i see in the graph itself.

where does the rrd take the information from? the ping-probe.pl script?

-harel
Attachments
ping-probe NaN.gif
ping-probe NaN.gif (14.32 KiB) Viewed 6758 times
sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

oharel wrote:Hi Sidewinder,

running:
rrdtool fetch /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd MAX
yields:
rtt_avg pkt_loss

1091741100: nan 1.0000000000e+02
1091741400: nan 1.0000000000e+02
1091741700: nan 1.0000000000e+02
etc.

running the command from the shell:
root@dublin:/var/www/htdocs/cacti/scripts# /usr/bin/perl /var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 81.199.3.1
yields:
query -> select rrd_path from data_input_data_cache where command like '%/var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 81.199.3.1';
result -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd
ping-probe output file -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd.xml
avg:105 loss:0

below is what i see in the graph itself.

where does the rrd take the information from? the ping-probe.pl script?

-harel
Well the the rrd ultimately gets the information from the script, but it gets it through the poller. The script returns the value or values in this case to the poller and then the poller updates the rrd. The strange part is the return values from the script look correct, however you may want to change the debug parameter in aloe-config.xml back to zero, and as i type this it dawns on me i've only run this with cactid are you running cmd.php?
sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

Well i just tried it through cmd.php with debug both on and off and that works ok, but thats on a windows machine. Would it be possible for you to redirect the output of your poller to a file and paste the items relevant to script? That might contain a clue as to what is wrong.

-Phil
oharel
Cacti User
Posts: 84
Joined: Wed Jan 07, 2004 11:16 am

Post by oharel »

Here it is:

Data Source: aaa monitor - RTT Monitor
RRD: /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_rtt_avg_935.rrd
Action: 2, Script: perl /var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 81.199.3.1
Data Source: aaa monitor - yahoo.com - RTT Monitor
RRD: /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_yahoo_com_rtt_avg_946.rrd
Action: 2, Script: perl /var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 216.109.118.77
Data Source: aaa monitor - cisco.com - RTT Monitor
RRD: /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_cisco_com_rtt_avg_950.rrd
Action: 2, Script: perl /var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 198.133.219.25

the cmd.php file output shows:
_input_data_cache where command like '%/var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 198.133.219.25';
result -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_cisco_com_rtt_avg_950.rrd
ping-probe output file -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_cisco_com_rtt_avg_950.rrd.xml
avg:66 loss:0
MULTI expansion: found fieldid: 39, found rrdname: pkt_loss, value: 0

does that help?

-harel
sidewinder
Cacti User
Posts: 66
Joined: Sat Dec 06, 2003 12:44 pm
Location: Winchester, MA

Post by sidewinder »

oharel,
the cmd.php file output shows:
_input_data_cache where command like '%/var/www/htdocs/cacti-0.8.5a/scripts/ping-probe.pl 198.133.219.25';
result -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_cisco_com_rtt_avg_950.rrd
ping-probe output file -> /var/www/htdocs/cacti-0.8.5a/rra/aaa_monitor_cisco_com_rtt_avg_950.rrd.xml
This is a result of the debug setting in either aloe-config.xml or ping-probe-config-template.xml or both. Can you make sure both of them are set to 0 (zero).
avg:66 loss:0
This is output from the script that should be getting passed back to cmd.php.
MULTI expansion: found fieldid: 39, found rrdname: pkt_loss, value: 0
This is the poller which appears to be only finding the output from loss
does that help?
Yes it does help. Not so much as why it happening, but at least what is happening.

Can you double check the data template and make sure that the output fields are are mapped properly for each of the script outputs (pkt_loss -> loss, rtt_avg -> avg)?

There also seems to be some subtleys in Redhat that don't exist in FreeBSD or Windows can you try turn of the debug switches in the xml config files, perhaps that is confusing the poller.

I don't have access to anything RH, but if this doesn't solve it, i'll try and get my hands on a copy and load it up.

-Phil
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests