Next Steps and My Deep Thoughts for Features - Hello!!

Post by **TheWitness** » Sat Apr 10, 2004 7:40 am

Ian,

Hope college is going well. It's been a long time since I attempted dialog with you on your development effort. Kudo's!!!

Now it's time to take the next step. Graph/Host templates have evolved, so it is time to move on. Below are the major next "Major" steps to this evolving product. There will be several smaller cleanup items that will be required along the way, but here are the next major ones none the less.

1) Event Management - As a process/demon
2) Summary Reporting - As a process/demon

Both features should be delivered via database structure changes so as to enhance performance rather than through realtime calculations.

1) Event Management
Event Management has several subcomponents that include but are not limited to the following:
- Paging API via TAPI - Lower priority
- E-Mail utilizing SMTP - Deliver first
- Event Management Groups - A Must - Assignable by either device or graph tree. These are the alarm groups including message format, group members, snmp servers to utilize, etc.
- Availability and Uptime Statistics - A Must - This is required for every device monitored period. If the device is not reachable because of either community string change, system crash, network issue, etc. it should be noted.
- Threshold definition as a part of each item being polled - A Must - This includes definitions by template for event active level and event clear level, with the option apply system monitoring parameters per item (this is optional at this point, see the next item to understand)
- System Monitoring Parameters - A Must - This would be defined at the system level and potentially definable at the polling event level. It includes the following: How many failed attempts before event is declared, how many cleared events before returning to normal. Thats all.

2) Summary Reporting

Here I is where I will ramble a little (aka thoughts not too clear). Here's the bad news, abandon the idea that the system is just there to provide graphs. Your fist view into the system is something entirely different. The good news is, what you have already done is the rock, the foundation, you name it.

Ok. first thing first. The Summary reporting process/demon will scan the RRD files and uptime database for conditions. When it finds a condition, it set's an attribute for each item polled (red, green, yellow)
- Red is down or above critical
- Yellow is recovering (back on line or below threshold, but not for the required number of polls)
- Green is available or below threshold

Now second. Abandon the graphs as the primary view. I will send you a screenprint email offline for a cool example. But what you will do is create a series of RYG bubbles by device for the following columns: Availability, Round Trip Delay, and Health. Health is the status of all polled items. In otherwords, if any one is above critical then health is red, if recovering, yellow, if ok green. You can optionally provide columns such as round trip delay averages, uptime percent, down since, etc.

When you click on any bubble, you get additional detail for each item that would include more bubbles in a form view, possibly in a separate window. When you click a polled item bubble, possibly in another frame, you get a graph, or if you click on the "Health" bubble, you get all the graphs.

Ok, my creative juices now expended. I have much more requirement to provide, but I don't want to kill you. I wish you the best in your endeavor. I hope that people are rewarding you via Paypal and look forward to providing advice in the future.

Kindest Regards,

Larry Adams
General Motors, Inc.
a.k.a. TheWitness

bulek · Post by **bulek** » Sun Apr 11, 2004 12:53 pm

Please, do not turn cacti into another all-in-one features rich hog. There are a lot of other software around doing all this stuff both commercial (like Vitalsuite) and open source (like JFFNMS). This is exactly the reason I dropped Concord eHealth at some point in favour of cacti. Its simplicity and flexibility were unbeatable. I don't have anything against improving and adding new features but keep it simple please.

- Piotr

Post by **TheWitness** » Sun Apr 11, 2004 8:20 pm

With regard to bulk, agreed. But what is being asked adds a vital feature to an already incredibly flexable, yet nite quite scalable, package. I agree that Vital is way to bulky, slow and the graphics are hard to get to. However, event management is essential and summary reporting provides Cacti a base component towards scalability which, once a high speed multithreaded polling engine becomes available will be essential.

TheWitness

Post by **TheWitness** » Sun Apr 11, 2004 8:21 pm

With regard to bulk, agreed. But what is being asked adds a vital feature to an already incredibly flexable, yet nite quite scalable, package. I agree that Vital is way to bulky, slow and the graphics are hard to get to. However, event management is essential and summary reporting provides Cacti a base component towards scalability which, once a high speed multithreaded polling engine becomes available will be essential.

TheWitness

youngmug · Post by **youngmug** » Mon Apr 12, 2004 11:36 pm

While this sounds like a good idea, I am of the mind myself that it is best to keep it simple.

I would prefer to see Cacti continue to exolve into the best graphing software that can be found, while putting these event monitoring and overview statistics features into seperate add-on packages that use the Cacti data. Too many packages today suffer feature bloat - I like the Unix idea of making a piece of software that does one thing, and does it well.

I am sure you might be thinking "chaining tools is ugly" or something similar, but you have to admit that it is very modular. You can just drop another component into the toolchain that accepts the input and produces the output of a different tool, and the chain will still work. A well-designed toolchain works well and is easy to break apart and re-assemble. Why not make Cacti a functional (and stand-alone) part of a systems-monitoring toolchain?

Post by **TheWitness** » Thu Apr 15, 2004 5:14 pm

Here is a simple, I repeat, "simple" example of what I am talking about. This is a screen shot from a simple GNU ASP application written by Ultrajones@hotmail.com. Take the same simple example and make the RGY (Red, Green, Yellow) chart include certain health parameters in addition to simple availability.

Peace,

TheWitness

Post by **raX** » Tue Apr 20, 2004 11:27 pm

I think anyone who has used Cacti for a bit will realize that I definitely strive for a simple and concise feature set. This idea is much more thought out, but I do get this kind of feature request often. The reason that I am very hesitant to make this next "step" is because that is the point where Cacti strays away from its original goal.

As you and others have pointed out, there are other applications out there that attempt to "do it all" when it comes to monitoring. I tend to think of Nagios as Cacti's compliment in the monitoring and reporting realm. Hopefully one day they will interoperate better

. I have no intentention to turn Cacti into Nagios and I assume the same is true for Nagios.

In terms of being the best graphing tool possible, I feel there is still a lot that can be done. Until Cacti reaches a more mature state, I would find it really hard to consider such a major change in direction. You should see the size of my proposed features for the next couple of versions

.

-Ian

moonman · Post by **moonman** » Wed Apr 21, 2004 1:09 am

Hi Ian
i dont think you should make cacti full nms but think about adding some of the things from your competitor projects rrfw and cricket like send alerts when some value is above or bellow a configurable value and using the holt winter algoritm from rrdtool beta

http://rrfw.sourceforge.net/xmlconfig.p ... efinitions
http://cricket.sourceforge.net/support/ ... holds.html
http://cricket.sourceforge.net/aberrant/rrd_hw.htm
http://www.usenix.org/events/lisa2000/f ... index.html

Thanks

Tufqi · Post by **Tufqi** » Wed Apr 21, 2004 12:19 pm

I need to say that I really agree on the fact that Cacti is a Graphing tool, a pretty good one that should probably give inspiration to other NMS projects in terms of GUI and structure, but not mutate to a full fledged NMS.

I think both components are to be tied together, but responsabilities and focus should differ.

As an NMS should focus on statefull alarms and consolidation alert management, the graphing system can focus on data management.

In order to make some progress in that direction without compromising Cacti's integrity, I could still see a few points to be done :

* Easy integration with another poller
(in the scenario where you have the NMS polling the service and managing limits then passing the data/performances values to Cacti)

* More generic userbase or authentication methods
I think Cacti user model with permissions, views, and stuff is pretty good, but we need something to tie together this one and the NMS's one.

* More templates to help sort data from several keys.
Exemple 1: I want a page summarizing all the CPU usage of some group of servers.
Exemple 2: I want a page summarizing all equipements temperature from one group of routers (group being a PoP for exemple)

* User based templates for listing graphs ?
I know Cacti's full of template but maybe it lacks some documentation on how someone could do what he used to do with MRTG (mrtg.cfg with lotsa polling, and a home-crafted index.html with wanted pointers to images).

Guest · Post by **Guest** » Wed Apr 21, 2004 1:38 pm

Hello,
Do you know JFF NMS (http://www.jffnms.org/) ?
This tool is also base on RRD.

It allows to display graphs as Cacti, and much more :
- zoom on graphs are possible
- ability to display different reports
- ability to have alarms (and automatic acknoledge in some case)
- alarm on scren or by email,
- display time on graphs (useful when you print them, ...)
- ....

I have discovered when I have studied the opportunity to choose a free tool to make an SLA for the LAN.

It is just a combination of what Cacti and Nagios do.
A great tool.

The only problem for now is the documentation is not yet completed, but people are currently working on it.

Did

Mika · Post by **Mika** » Tue May 04, 2004 5:34 am

I've tried JFF NMS and found it kind a difficult. Guys still working on documentation as far as I remember since December, 2003. I saw they updating it but still I found it difficult to use.

Never thought too much about how far should or even could cacti go, but I still think that cacti's disadvantage is, it's still too static. I mean, no thresholds and alarms when it comes over or when it comes below realm limit. There is also no automatic host's interface verbose in case if host's number of interfaces has changed. Now it can be done only manualy.

TheWitness started offer features, which would really moved Cacti into one step higher. But practice shows, that very often "all in one" products are not as usefull as big part of users expect.

I think one of the solutions could be add-ons. I.e. cacti as it is now, could be as a base and next features like reports, event management, thresholds could be like add-ons on standart cacti. So users could apply them following their needs.

mattyb77 · Post by **mattyb77** » Tue May 04, 2004 10:01 am

Personally, I'd like to see some kind of convergence between Cacti and Nagios. I use both of them to monitor our systems, and they complement each other very well.

fletch · Post by **fletch** » Tue May 18, 2004 12:36 pm

First off, cacti is by far the most useful and interesting tool in my 10+ years of system administration - its a tribute to its author and the opensource collaborative movement.

I am a big fan of having a set of modularly extensible tools in my toolbox.

One of the big improvements I would suggest is finding a way to separate the data collection and storage from the presentation.

I'll explain:
RRD files are great, but have you ever tried to add a new item to your cacti graph after the rrd file was created?
If the data (graph items in cacti) were stored independently in a relational database then we can do much more - like:
1) Show me a graph of the datacenter ambient temperature.
2) Ok, now add an overlay of the AC output temperature
3) Now add an overlay of the local weather.

RRD files to me are an impediment to presenting the data in new an interesting ways.
Once the data collection and storage is in the relational database we can do some really cool statistical analysis (crickets aberrant events, ad hoc correlation analysis between different Datasources)

Now I suppose someone could write the code to add a DS to an RRDTool file, (in fact I am looking at such a script from the cricket contrib)

Just my 2 cents,
Feel free to point me at the modular tool that already does this

Post by **TheWitness** » Mon May 24, 2004 8:31 pm

When referring to a "DS" are you speaking of a "Data Source", something like ODBC?

TheWitness

Cacti

Next Steps and My Deep Thoughts for Features - Hello!!

Next Steps and My Deep Thoughts for Features - Hello!!

Hmm! Event Management

Graphing suite and NMS: Not the same job

A free rrd tool that manages graphs, alerts, allow zoom, ...

Nagios & Cacti

Separate the data and presentation - open the RRD file

Who is online