More efficient graph creation

mbainter · Post by **mbainter** » Wed Jun 15, 2005 4:27 pm

When there are a lot of data sources, it can take a very long time to create new custom graphs. Right now I'm trying to create a graph to plot the total of a dozen or so datasources, which means adding about 48 or so datasources to the graph (traffic in and out, + gprints).

With each page load for this taking about 30 seconds due to the time it takes to build the list of data sources as you can imagine this is nearly unbearable. I started digging around in the database structure today to try and see if maybe I could just finish building the graph with some sql commands. However, I can't seem to figure out where the graph is stored...especially since it's not using a template, and not tied to any host. I have the graph id, obviously, but from there figuring out how one would go about adding graph_items to it in the database is non trivial. (at least for me).

I could reverse engineer the code, but before doing that I thought I would post here, and see if maybe someone has come up with a hack to make this easier, or if they might recommend an alternate, better, course of action for solving this little problem. (other than buying a bigger server)

I've already hacked the graph tree stuff to allow me to add multiple graphs at a time to a tree page, but doing that here would be a lot more difficult, since customization has to be done with each added graph item.

I thought about trying to write a graph template that would allow me to select a list of interfaces, but this doesn't seem to be something supported within the current code. You can select a list of interfaces to create graphs for, and I could probably create a graph that could summarize all the interfaces by hand and use that template on any particular host, but to be able to just select individual interfaces to represent on a single graph, that doesn't seem to be supported.

If I'm wrong on this, I look forward to having it pointed out, otherwise, I'd greatly appreciate any suggestions you might have for soloving this somewhat thorny issue.

moonman · Post by **moonman** » Wed Jun 15, 2005 6:06 pm

I open a bug about it http://bugs.cacti.net/view.php?id=237
and I think the best way is to add filter by data_template and host_template to the custom graph

Nitzan

Paul Thexton · Post by **Paul Thexton** » Thu Jun 16, 2005 7:51 am

moonman wrote: and I think the best way is to add filter by data_template and host_template to the custom graph

Nitzan

That's a feature I'd like to see as well - I also have some custom graphs that total up a number of (gig/100)ethernet interfaces and it takes quite a while for the page to load each time I need to go and change or add more ...

In fact, instead of doing it that way I think it would be cool to have some kind of virtual data source where you can specify same-type input sources to "total up" - thus meaning you only need 'one' input source for the custom graph ... whether or not that would be feasable, I don't know.

mbainter · Post by **mbainter** » Thu Jun 16, 2005 8:28 am

Paul Thexton wrote: In fact, instead of doing it that way I think it would be cool to have some kind of virtual data source where you can specify same-type input sources to "total up" - thus meaning you only need 'one' input source for the custom graph ... whether or not that would be feasable, I don't know.

Not sure. You definitely wouldn't want just a 'total all data sources of this type for this device' because it's rare you really want that. On a switch for example, all the traffic on one port is going to be reflected on another port. If you want total bandwidth, most switches already provide that. Instead, you usually want to know all the bandwidth for a particular group of ports. Say, all the ports used by the webservers.

It would be nice to be able to create graphs by being able to select a group of datasources and having those summed and plotted on the graph. Similar to how you select interfaces to graph on a device. I don't think that would be simple to do with the current architecture though.

Post by **rony** » Thu Jun 16, 2005 8:31 am

Now that seems useful..

Paul Thexton · Post by **Paul Thexton** » Thu Jun 16, 2005 8:34 am

mbainter wrote: Not sure. You definitely wouldn't want just a 'total all data sources of this type for this device' because it's rare you really want that.

We want it to total ethernet traffic on router interfaces, so yeah, we do

mbainter wrote: On a switch for example, all the traffic on one port is going to be reflected on another port.

Yep.

mbainter wrote: If you want total bandwidth, most switches already provide that.

Yeah, from the trunk ports.

mbainter wrote: Instead, you usually want to know all the bandwidth for a particular group of ports. Say, all the ports used by the webservers.

Yep, although what i'm talking about is instead of filtering by host and having to create new graph templates for x number of interfaces, we could instead create more simple "addition" style virtual data sources... You already know the individual traffic per host from your original data sources, if you want an overview of combined traffic then you don't really need to know how much traffic (again) is on each separate input source.

mbainter wrote: It would be nice to be able to create graphs by being able to select a group of datasources and having those summed and plotted on the graph.

I agree with that - I've done that myself but it is a bit of a chore to setup another template (or modify an existing one) to make it possible to add another data source to it.

mbainter wrote: Similar to how you select interfaces to graph on a device. I don't think that would be simple to do with the current architecture though.

Well that's the problem of course ... It would be a nice feature to have though, the filter is the easier to implement definitely but it still means the laborious task of setting up custom graphs (or templates) with many data input sources, defining CDEFs to correctly add up the data (none of the predefined CDEFs ever worked for me, and using the 'stack' method of graphing looked ugly and not what we required)

mbainter · Post by **mbainter** » Thu Jun 16, 2005 8:48 am

Paul Thexton wrote:
mbainter wrote: Not sure. You definitely wouldn't want just a 'total all data sources of this type for this device' because it's rare you really want that.
We want it to total ethernet traffic on router interfaces, so yeah, we do

Well, it's your graph, and you know what you want out of it. But if you total all the interfaces you're going to get twice the traffic you're actually handling.

mbainter wrote: If you want total bandwidth, most switches already provide that.
Yeah, from the trunk ports.

Actually, I was referring to bandwidth usage across the backplane. One of the more important statistics to monitor on a switch for capacity planning.

Yep, although what i'm talking about is instead of filtering by host and having to create new graph templates for x number of interfaces, we could instead create more simple "addition" style virtual data sources... You already know the individual traffic per host from your original data sources, if you want an overview of combined traffic then you don't really need to know how much traffic (again) is on each separate input source.

Hrm. So a virtual datasource that gets added to the list of datasources that instead of polling a remote machine is instead a pointer to a list of pre-existing datasources that you can select, and an operation to perform on those datasources, which you can then use in a graph?

I'm not sure how much benefit that would bring. It's just another level of indirection between the graph and the data, for not a lot of gain. Adding them to the data source and adding them to the graph directly seems to me to be the same level of effort. I don't see a lot of gain unless you were going to use that same data result in more than one graph, which seems unlikely for what we're talking about.

Paul Thexton · Post by **Paul Thexton** » Thu Jun 16, 2005 9:42 am

mbainter wrote:
Paul Thexton wrote:
mbainter wrote: Not sure. You definitely wouldn't want just a 'total all data sources of this type for this device' because it's rare you really want that.
We want it to total ethernet traffic on router interfaces, so yeah, we do
Well, it's your graph, and you know what you want out of it. But if you total all the interfaces you're going to get twice the traffic you're actually handling.

I'm not totalling data across one switch - I'm doing specific interfaces where I want it, router interfaces and switch trunk ports... It's not an isolated network that we're on.

mbainter wrote: Actually, I was referring to bandwidth usage across the backplane. One of the more important statistics to monitor on a switch for capacity planning.

I'm leaving those calculations to our network manager - I just setup the graphs he asks for, he knows what his switches are capable of transferring traffic wise.

mbainter wrote:
Hrm. So a virtual datasource that gets added to the list of datasources that instead of polling a remote machine is instead a pointer to a list of pre-existing datasources that you can select, and an operation to perform on those datasources, which you can then use in a graph?

Yes, as opposed to setting up loads of custom graphs that are totalling up 2 or more (as said previously, in some cases 14) different data sources (again, this is not off one switch or router, so there is no concern over 'doubled traffic' representation).

mbainter wrote: I'm not sure how much benefit that would bring. It's just another level of indirection between the graph and the data, for not a lot of gain.

You're right - but I would also like a better way of 'adding' interfaces together in graphs, the way cacti does this at the moment is certainly more efficient to the way you would need to do it using mrtg - but it's also very laborious and could be simplified.

mbainter wrote: Adding them to the data source and adding them to the graph directly seems to me to be the same level of effort. I don't see a lot of gain unless you were going to use that same data result in more than one graph, which seems unlikely for what we're talking about.

Initial effort for what i'm talking about would be considerable of course - but consider, two network points coming in to your network (be that private network peering, customer connections, whatever) that you want to see a combined total for - you can setup templates that will do a lot of the work for you of course but *some* way of more easily 'adding' an extra data source for totalling in to a graph would be desirable in this environment because you never know how many additional links a customer may require (or how many more interfaces you wanted to monitor for, say, combiend transit/peering traffic level monitoring).

Cacti

More efficient graph creation

More efficient graph creation

Who is online