Feature Request: Alert Dependencies

Lazybones · Post by **Lazybones** » Sun Jun 13, 2010 8:07 pm

Now this is sort of a problem with both monitor and threshold.

Lets say I am monitoring a router, servers and switches in a remote locations and my wan connection goes down.

What happens now is I get an alert for every device that has a low threshold.. when really all I need is a single alert for the router since in terms of alerting all these devices depend on it for connectivity to the monitoring system.

the same could be said for the switches and the servers.

The simplest way I can think to solve this would be to add a new parent attribute to each host or threshold graph.

Threshold would then do two passes before sending alerts.. First it would determine the alert state of all devices, then it would go through them to send alerts.. If the device has no parent and alert is sent, if it has a parent it will check the parents alert state, if the parent has an alert no alert is sent for that device on that pass.

For hosts with multiple graphs I could see the host it self being an automatic implied parent, thus when a graph goes to send an alert if the host is down, only a single host alert would be sent.

We have a lot of remote sites and equipment, so having our on call techs hit with 20 alert notices when something like the main MPLS link goes down is a bit of a hassle and sort of clouds the real problem momentarily.

Post by **TheWitness** » Sun Jun 13, 2010 9:47 pm

Formal Noun is "Event Correlation". I'm not too sure how this would be done. How does Nagio's do it?

TheWitness

Lazybones · Post by **Lazybones** » Sun Jun 13, 2010 9:59 pm

TheWitness wrote:Formal Noun is "Event Correlation". I'm not too sure how this would be done. How does Nagio's do it?

TheWitness

No idea. I am looking at it for replacing "whats up". Whats up basically lets you create a dependency link to a parent device, then ignores the child alert if the parent is down.

You link a server, to a switch and the switch to a router, which may be linked to a master router..

If a parent is down, none of its children or children children send alerts.

Howie · Post by **Howie** » Mon Jun 14, 2010 2:45 am

Event Correlation is more about time series. E.g. if you got a down from device A, but an up from device B, then there is no alert, but if you only get one of those two events in a given period, then there is a problem.

Dependencies would even be handy in the poller - no point in trying to poll that remote site until the WAN router comes back up, either. The simple WUG style of "don't poll if X is down" and "don't poll is X is up" seems like it wouldn't be too hard to do...

Post by **TheWitness** » Mon Jun 14, 2010 7:18 am

That makes sense, however, you might have to go a little bit further. Say for example. Don't poll if either X is down or X.ifIndex is down, or in a case where you have dual entrances/routers etc, Don't poll if either X is down or X.ifIndexA and X.ifIndexB are down.

This is one class of Event correlation. Can you guys think of any other? How would we handle DNS outages for example?

TheWitness

Lazybones · Post by **Lazybones** » Mon Jun 14, 2010 9:41 am

If you link everything by thold graph then you leave it flexible to the user.

Ie you link server:eth0 to switch:port4 to router:eth1 for example.

As for dns you could have a polled cannary test where the polled tries a known good local dns value, if it fails it alams on dns and ignores any hosts that are not ip based.

enrique.belo · Post by **enrique.belo** » Tue Jun 29, 2010 4:49 am

I think we can use the simple "critical monitor" on WUG. Critical monitors are usually PING from WUG. This means if the PING fails on a device, WUG will no longer poll the interfaces of that device because it will surely fail.

Lets start on this since this is the simpler. I'm pretty sure you can do it guys since cacti already has STATUS check on each device. If status check results to DOWN state, then cacti wont poll DS of that device.

Thanks!

aaps · Post by **aaps** » Sun Jan 29, 2012 7:51 pm

G'day

Sorry to revive a dead thread but couldn't see any other posts regarding this.

Has this since been implemented into Cacti?

I'm also currently in the same boat as the OP where I have multiple sites and if the link between said sites is down, flooding of my inbox ensues! My only option that I'm aware of currently is to set up multiple Cacti monitors at each site, however this is the less than graceful solution as there really should be a way to run it all from one location with the appropriate correlation occuring behind the scenes.

Please advise on whether this issue has been corrected/implemented.

Cheers,

Adrian Apps.

Post by **gandalf** » Mon Jan 30, 2012 4:08 pm

To my konwledge, this hasn't been implemented thus far
R.

Post by **TheWitness** » Mon Jan 30, 2012 8:17 pm

No, it has not, but it's a simple option to 'Disable Threshold Notifications When Host is Down'. Checkbox. Combined with the most recent Maintenance plugin options would be a 'snap' to implement (like <= 20 minutes).

TheWitness

cigamit · Post by **cigamit** » Sun Feb 19, 2012 1:40 pm

Thresholds should already not alert if the host is down. The only real issue is if you have thresholds that alert when they go below a specific value, and if they alert faster than the down host interval you have set in cacti. For instance, your threshold alerts every minute, but in cacti you have it set so the host doesnt count as down until 2 minutes.

It won't be easy to add in "criticals" effectively until Cacti has groups.

computer_guru · Post by **computer_guru** » Thu Apr 26, 2012 11:01 am

I think alert dependencies would be a nice feature. Each device could have a parent/child relationship.

Then dependency rules could be used:
- If parent device has an active thold, then do not alert on child
- If child device has an active alert then do not alert
etc....

agomezro · Post by **agomezro** » Sun May 25, 2014 4:34 am

Hi Guys,
I'm new to Cacti and I'm looking for a way to manage device dependencies and/or Event Correlation.
Is there any update on this?
I've googled it but found nothing relevant for cacti.
Many thanks in advance.

Cacti

Feature Request: Alert Dependencies

Feature Request: Alert Dependencies

Re: Feature Request: Alert Dependencies

Re: Feature Request: Alert Dependencies

Re: Feature Request: Alert Dependencies

Re: Feature Request: Alert Dependencies

Re: Feature Request: Alert Dependencies

Re: Feature Request: Alert Dependencies

Who is online