Distributed Cacti - Ideas
Moderators: Developers, Moderators
Distributed Cacti - Ideas
I'd like to start a post to discuss how we could handle using cacti to poll extremely large enterprise environments which would require more performance from a Cacti system than a single server can provide.
My thoughts are the problem could be addressed by proceeding with one of the following two approaches:
1) Distributing out the data collection: Setting it up so a number of specific devices are polled from remote pollers that do not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the many distributed pollers to hit would be mandatory here.
or
2) Integrate the data presentation: Either thru a re-write of cacti or potentially via a plugin. My thoughts here would be a simple plugin which could load up the url to the other Cacti box when clicked. Currently this could be done using a modification of the "ntop" plugin example to call a url for exported cacti data; however - the data presentation of the exported data would include some of the table images (wrappers around the table data, images) etc, which would not look right. Also; for this to work, the current bugs in graph_export.php would need to be fixed.
I'd like to invite people to share your thoughts and or experiences with this issue. Hopefully this post goes sticky.
My thoughts are the problem could be addressed by proceeding with one of the following two approaches:
1) Distributing out the data collection: Setting it up so a number of specific devices are polled from remote pollers that do not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the many distributed pollers to hit would be mandatory here.
or
2) Integrate the data presentation: Either thru a re-write of cacti or potentially via a plugin. My thoughts here would be a simple plugin which could load up the url to the other Cacti box when clicked. Currently this could be done using a modification of the "ntop" plugin example to call a url for exported cacti data; however - the data presentation of the exported data would include some of the table images (wrappers around the table data, images) etc, which would not look right. Also; for this to work, the current bugs in graph_export.php would need to be fixed.
I'd like to invite people to share your thoughts and or experiences with this issue. Hopefully this post goes sticky.
Last edited by rcaston on Mon Apr 16, 2007 3:59 pm, edited 2 times in total.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: Distributed Cacti Project
This does not yet solve the issue with "rrdtool update". Where do you suppose the rrd files to live?rcaston wrote:I'd like to start a post to discuss how we could handle using cacti to poll extremely large enterprise environments which would require more performance from a Cacti system than a single server can provide.
My thoughts are the problem could be addressed by proceeding with one of the following two approaches:
1) Distributing out the data collection: Setting it up so certain devices are polled from a remote poller that does not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the distributed pollers to hit would be mandatory here.
If "distributed": you will get an issue with graphing based on distributed rrd files
Centralized, nfs mount: in my opinion a performance bottleneck. I'm aware of at least one very big installation that uses this method currently with success. But I'm quite sure it _will_ become a bottleneck
Centralized, using "rrdtool server": Nice one, but perhaps it will require some more features as "security". I'm quite sure, that this will work "only" with bulk updates (see: boost plugin). I bet it will be too slow for just-in-time updates.
Reinhard
Yes, there would be an I/O bottleneck as the RRDs would end up existing on different systems; but given the idea that this is for the enterprise; I don't see that being an issue for those who need it. ie: if we have to we'll stick this on a high end NAS.
...
further thoughts on the "Distributed Collector" idea would be that a new field would be added to each device in the mysql database which would specify association between the device and a "poller template"
the 'poller template' would have the necessary information associated with it to determine which of several pollers will query and poll for that object.
example: 2 pollers -
Each would query his local replicated copy of the mysql cacti database for the list of objects which match his pollerid field. (ie: give me a list of stuff i am supposed to poll and collect rrds for)
this would keep the rrds and their pollers(various cactids) on different boxes.
---
all this being said; it would still likely be easier to go with idea #2 - which is to integrate the data presentation / web interface of several stand alone cacti boxes.
...
further thoughts on the "Distributed Collector" idea would be that a new field would be added to each device in the mysql database which would specify association between the device and a "poller template"
the 'poller template' would have the necessary information associated with it to determine which of several pollers will query and poll for that object.
example: 2 pollers -
Each would query his local replicated copy of the mysql cacti database for the list of objects which match his pollerid field. (ie: give me a list of stuff i am supposed to poll and collect rrds for)
this would keep the rrds and their pollers(various cactids) on different boxes.
---
all this being said; it would still likely be easier to go with idea #2 - which is to integrate the data presentation / web interface of several stand alone cacti boxes.
- rony
- Developer/Forum Admin
- Posts: 6022
- Joined: Mon Nov 17, 2003 6:35 pm
- Location: Michigan, USA
- Contact:
FYI, distributed polling doesn't mean that the polling machine needs access to the rrdtool files.
In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine.
In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
true, thats another way of doing it; advantages and disadvantages to each method ...rony wrote:FYI, distributed polling doesn't mean that the polling machine needs access to the rrdtool files.
In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine.
if the rrd data collection remains local; you don't need to worry about network connectivity between pollers.
But if the rrd files and the poller are distributed across different datacenters/locations, you'd need to include some sort of local store and delayed update of the rrd files in the event of a network outage between poller and rrd repository.
- rony
- Developer/Forum Admin
- Posts: 6022
- Joined: Mon Nov 17, 2003 6:35 pm
- Location: Michigan, USA
- Contact:
With rrdtool files in a central store and boost plugin functionality being included in 0.9.0, I think it will resolve most of these problems.
There are many ways to skin this cat, but I believe that we can't rely on a distrubuted storage system. So, centralized rrdtool updates based upon access (boost), will vastly improve the polling system.
I should also note that the distributed polling system is being designed to be self healing.
For more information, we should really get Larry (TheWitness) involved in this post.
There are many ways to skin this cat, but I believe that we can't rely on a distrubuted storage system. So, centralized rrdtool updates based upon access (boost), will vastly improve the polling system.
I should also note that the distributed polling system is being designed to be self healing.
For more information, we should really get Larry (TheWitness) involved in this post.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
yeah...
The problem I am facing with using cacti as a true large enterprise solution requires having multiple distributed pollers.
Any solution which results with only a single poller server can not scale well.
imagine trying to manage a pair of routers at about 30 physical sites with about 3,000 interfaces each.
one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)
assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to poll 360,000 elements in a 5 minute cycle?
or... Anyone ever tried to open a directory with 360,000 files in it?
Any solution which results with only a single poller server can not scale well.
imagine trying to manage a pair of routers at about 30 physical sites with about 3,000 interfaces each.
one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)
assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to poll 360,000 elements in a 5 minute cycle?
or... Anyone ever tried to open a directory with 360,000 files in it?
Last edited by rcaston on Fri Apr 13, 2007 11:19 am, edited 1 time in total.
- Howie
- Cacti Guru User
- Posts: 5508
- Joined: Thu Sep 16, 2004 5:53 am
- Location: United Kingdom
- Contact:
Re: yeah...
Small point - it's "only" 180,000 RRAs since the in/out go into the same file. Still a heck of a lot thoughrcaston wrote:assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to 360,000 elements in a 5 minute cycle?
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
Re: yeah...
hah, meant to add in traffic + errors ... ..Howie wrote:Small point - it's "only" 180,000 RRAs since the in/out go into the same file. Still a heck of a lot thoughrcaston wrote:assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to 360,000 elements in a 5 minute cycle?
But yes, if the rra's and pollers were distributed; we could scale sideways until the Web or MySQL server becomes the bottleneck, at which point a potential solution could be using load balanced web servers with Replicated MySQL servers.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: yeah...
I suppose that this amount will be handled by cacti/cactid using the boost plugin. Nevertheless, a distributed thingy would be a better solution.rcaston wrote: one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)
Reinhard
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Boost 1.x should scale to installation with approximately 200k data sources (directory access issue aside). My plans for boost 2.0 include a method to perform the following:
0) Associate a host with a poller.
1) Provide poller based directory structures
2) Create host subdirectories (aka each hosts RRD's are in a single direcory)
3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well).
I may pull 0 and 3 out till 0.9. It all depends on timing. In the mean time, that will resolve the directory access issues.
The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.
With Boost 1.2, you can however, do the following:
1) Primary Database Server with Either MYISAM or MEMORY based Boost
2) Primary Poller with Local Storage (Lot's of it) and Boost Server
3) Web Farm with one to many Web Servers using a smaller version of Cacti with sym links to the RRA and Boost Cache folders.
4) Load Ballancer in front of the Web Farm.
With this configuration, you will have an enterprise system. For most enterprises anyway.
TheWitness
0) Associate a host with a poller.
1) Provide poller based directory structures
2) Create host subdirectories (aka each hosts RRD's are in a single direcory)
3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well).
I may pull 0 and 3 out till 0.9. It all depends on timing. In the mean time, that will resolve the directory access issues.
The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.
With Boost 1.2, you can however, do the following:
1) Primary Database Server with Either MYISAM or MEMORY based Boost
2) Primary Poller with Local Storage (Lot's of it) and Boost Server
3) Web Farm with one to many Web Servers using a smaller version of Cacti with sym links to the RRA and Boost Cache folders.
4) Load Ballancer in front of the Web Farm.
With this configuration, you will have an enterprise system. For most enterprises anyway.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Well, there is truth to that; I would say that if you do manage to implement the feature roadmap you've outlined above with boost; it will go a long way to increasing cacti's ability to manage large environments.TheWitness wrote: The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.
In fact; if you do get the ability to associate a poller machine with a target host; you'd bring cacti into the league of most commercial solutions which would really make cacti shine.
Considering that even a 'budget' commercial poller is going to retail for around $45,000 U.S. per server/software setup. And that is an actual quote for a poller appliance (running linux) which only handles about 30,000 devices.
If Cacti could scale sideways; it would attract a great deal more attention from the business community.
My only concern is, while I agree with and would support the rrd's being broken out to subdirs. it may potentionally create a small migration hurdle if you try to move from a non-boost to a boost setup, and then back again; since the RRA's will need to be moved back and forth. However, I believe someone posted that boost may end up as a standard part of cacti 0.9.x, if so - this point is moot.
All that being said; I really like where boost is going.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
It would love this. IMHO, this feature would allow for different interesting structures:TheWitness wrote:Boost 1.x should scale to installation with approximately 200k data sources (directory access issue aside). My plans for boost 2.0 include a method to perform the following:
0) Associate a host with a poller.
a) Associate a poller with a "site" (location) to "localize" cacti polling traffic by assigning all hosts of this site to that very poller
b) Support for multiple polling intervals (in this case, all polling intervals of this host would be the sam, but I suppose this is not a big issue)
? Please elaborate. I do not get the point why this would be useful, sorry1) Provide poller based directory structures
Yep, yep. IMHO that's a must even for mid-sized installations. I did not make any tests on directory search impact using caczi, but some months ago Tobi Oetiker published a testing script that shows some impact.2) Create host subdirectories (aka each hosts RRD's are in a single direcory)
But all rrd files still in the "same" location (different subdirectories, but same file system)?3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well).
just my 2 cents
Reinhard
Who is online
Users browsing this forum: No registered users and 5 guests