New Cacti Architecture (0.8.8) - RFC Response Location
Moderators: Developers, Moderators
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
New Cacti Architecture (0.8.8) - RFC Response Location
All,
Please submit your RFC comments here. Thanks for your participation. I will attach newer versions of the RFC and provide feedback in this post.
Regards,
TheWitness
Please submit your RFC comments here. Thanks for your participation. I will attach newer versions of the RFC and provide feedback in this post.
Regards,
TheWitness
- Attachments
-
- Cacti Multiple Poller Design v1.0.pdf
- (338.71 KiB) Downloaded 6926 times
Last edited by TheWitness on Mon Feb 09, 2009 10:34 pm, edited 2 times in total.
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Hi Larry,
thank you for opening this long overdue discussion. Beneath "automation" (yes, I know ...) was this the most deeply discussed topic of 2.CCC.eu.
But personally, I have a problem with this picture. It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.)
But I am not sure to fully understand where the application logic lies, in other words: what about the "workflow"?
I suppose. that each Poller Group is governed by a local crontab (or a "real" daemon) that fetches data from the db server (either directly or via table replication). Output is stored in local poller_output table and replicated to the db server? What would be a criteria to associate a host/data source to some specific Poller Group? I suppose, that clock synchronization (ntp?) would be required, as the local time used for poller_output would be used for rrdtool update. What about timezones? Where should hooks like "poller_bottom" be executed?
And another part of the logic would lie with the http servers, for sure (console aka administration). But where lies the rrdtool update logic? With the database server? Or with the RRDTool servers? Would RRDfile Update Groups equal Poller Groups? If not, what would be the criteria this time?
The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well. So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers. As there are two different pipes to RRDfile Servers, perhaps synchronization between updates and graph (rrdtool fetch) is necessary.
That's my first thoughts. Surely more will follow
Reinhard
thank you for opening this long overdue discussion. Beneath "automation" (yes, I know ...) was this the most deeply discussed topic of 2.CCC.eu.
But personally, I have a problem with this picture. It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.)
But I am not sure to fully understand where the application logic lies, in other words: what about the "workflow"?
I suppose. that each Poller Group is governed by a local crontab (or a "real" daemon) that fetches data from the db server (either directly or via table replication). Output is stored in local poller_output table and replicated to the db server? What would be a criteria to associate a host/data source to some specific Poller Group? I suppose, that clock synchronization (ntp?) would be required, as the local time used for poller_output would be used for rrdtool update. What about timezones? Where should hooks like "poller_bottom" be executed?
And another part of the logic would lie with the http servers, for sure (console aka administration). But where lies the rrdtool update logic? With the database server? Or with the RRDTool servers? Would RRDfile Update Groups equal Poller Groups? If not, what would be the criteria this time?
The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well. So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers. As there are two different pipes to RRDfile Servers, perhaps synchronization between updates and graph (rrdtool fetch) is necessary.
That's my first thoughts. Surely more will follow
Reinhard
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Yes, long overdue.gandalf wrote:thank you for opening this long overdue discussion. Beneath "automation" (yes, I know ...) was this the most deeply discussed topic of 2.CCC.eu.
RFC's often times start that way.[/quote]gandalf wrote:But personally, I have a problem with this picture.
I litterally "slapped" this together. I will correct that in v2.gandalf wrote:It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.)
Yes, after sending it out, I realized I left that out. Basically, the poller, will, by default use the main servers poller_item table, for it's list of poller items. If for some reason, the main server is not reachable, it will use it's local copy and store the poller_output table locally.gandalf wrote:But I am not sure to fully understand where the application logic lies, in other words: what about the "workflow"?
The same is intended for the poller_output table. Central server first. The remote pollers will be provided instructions to "update/synchronize" their local poller_items table periodically (aka when things change). Those synchronizations would not happen any more often that every 5 minutes.
If the central server is not available, then each poller will cache the updates in their poller_output tables until such time as the remote connection is available, then it will dump sequntially, by date to the central server. By doing so, no data will be lost.
There would have to be modifications to the poller_output table, or another table to keep track of when it is time to poll things. That is more TBD, until I have more feedback.gandalf wrote:What would be a criteria to associate a host/data source to some specific Poller Group?
Of course...gandalf wrote:I suppose, that clock synchronization (ntp?) would be required, as the local time used for poller_output would be used for rrdtool update.
You tell me...gandalf wrote:What about timezones?
I need feedback from people like Howie and yourself to determine "where" that should be. So, see my comment above. It's an RFC you knowgandalf wrote:Where should hooks like "poller_bottom" be executed?
The RRDfile Services will be asynchronous and running as daemons. They will process all items in the poller_output table as they come in and handle other requests in other threads. The main poller_output table, with some minor modifications, will be used to achieve RRDupdates.gandalf wrote:And another part of the logic would lie with the http servers, for sure (console aka administration). But where lies the rrdtool update logic? With the database server? Or with the RRDTool servers?
No.gandalf wrote:Would RRDfile Update Groups equal Poller Groups? If not, what would be the criteria this time?
Need feedback from Plugin developers as to "how" they would like this to work.gandalf wrote:The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well.
I expect the RRDtool Services to handle this. Each graph will know, in advance, which server it needs to talk to.gandalf wrote:So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers.
Regards,
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
I suppose, separating data logic and workflow is the better way. Else I fear that the picture will become too crowdy.TheWitness wrote:I litterally "slapped" this together. I will correct that in v2.gandalf wrote:It shows very nice where everything's happening from the data point of view. (BTB: I suppose, it would be better to either use poller_item _or_ poller_cache in picture and text. Personally, I prefer poller_item as this is the name of the real table.)
From the current design, the database servers seem to define the limit of this architecture. While Poller and RRDfile Servers are scalable as well as http, database server is existing only once. As I understand, the second one is for failover only.
So, if there's a central poller_item table as well as poller_output, their update/delete performance will be cruical. I suppose, you're thinking of memory tables as boost uses them. And then, like boost does with rrdtool bulk update, there's the SQL bulk insert that will create some more preformance, correct?
Reinhard
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Yes, memory tables have a I/O rate in excess of 40k updates per second, so even though it uses a table lock mechanism, we are safe. I was thinking that making this a separate database altogether though would help other subsystems performance though, and simplify backup.
What do you think about that?
TheWitness
What do you think about that?
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Cacti User
- Posts: 311
- Joined: Tue Jun 29, 2004 12:52 pm
- Location: Indiana
Forgive me if this seems to be a silly question.
Is there some reason why the pollers don't directly update the RRD Files? Does it have to go through the database?
Is there a way for the Pollers to be sent both the host and oid information to query, as well as host information for the Remote RRD Update Service so they could query, and then pass the data directly to the RRD storage devices?
Is there some reason why the pollers don't directly update the RRD Files? Does it have to go through the database?
Is there a way for the Pollers to be sent both the host and oid information to query, as well as host information for the Remote RRD Update Service so they could query, and then pass the data directly to the RRD storage devices?
Dave
- Howie
- Cacti Guru User
- Posts: 5508
- Joined: Thu Sep 16, 2004 5:53 am
- Location: United Kingdom
- Contact:
Once updates have been made scalable, is there a requirement for actual load-balancing of HTTP frontends? I can see why you might want HA, but does anyone really have so many concurrent users that a single server can't cope? All I ever see are queries about the polling limitations...gandalf wrote:The rrdfile data pipeline surely would be used for graphing. But a lot of plugins currently require access to rrd files as well. So there would be more than graphing only. When running from load-balanced http instances, either graph caching will fail or cache must reside on RRDfile Servers. As there are two different pipes to RRDfile Servers, perhaps synchronization between updates and graph (rrdtool fetch) is necessary.
With a HA setup instead, then cache-sharing isn't really necessary. Really, if you have 1000 users all hitting the same graph then you are still reducing the number of graph-drawing operations from 1000 to 2, which is probably enough. If they aren't all hitting the same graphs, then graph caching isn't going to help anyway.
(I don't hit any of these limitations with my own modest needs, so I'm just curious really. Despite what it says under my name on the left, I'm just a lowly user that talks a lot )
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
This sort of answers the "should we invent it outselves" question on load ballancing. HA for sure, but there are already technologies for that. So, I would suspect your answer would be leave that out of the design. The capability will be there for HA, but it's always optional.Howie wrote:With a HA setup instead, then cache-sharing isn't really necessary. Really, if you have 1000 users all hitting the same graph then you are still reducing the number of graph-drawing operations from 1000 to 2, which is probably enough. If they aren't all hitting the same graphs, then graph caching isn't going to help anyway.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Yes, absolutely. When you have lot's of them, the disk i/o required is astounding. So, by batching them you can reduce I/O wait by 80-90% over time. So, the database provides that for us.melchandra wrote:Forgive me if this seems to be a silly question.
Is there some reason why the pollers don't directly update the RRD Files? Does it have to go through the database?
Is there a way for the Pollers to be sent both the host and oid information to query, as well as host information for the Remote RRD Update Service so they could query, and then pass the data directly to the RRD storage devices?
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- Howie
- Cacti Guru User
- Posts: 5508
- Joined: Thu Sep 16, 2004 5:53 am
- Location: United Kingdom
- Contact:
Indeed. I'd say stick to the architectural stuff required to support it (or at least not break it ). HA solutions are usually either platform-specific (CARP, MS NLB, ultramonkey) or external (CSS, Alteon etc) anyway.TheWitness wrote:This sort of answers the "should we invent it outselves" question on load ballancing. HA for sure, but there are already technologies for that. So, I would suspect your answer would be leave that out of the design. The capability will be there for HA, but it's always optional.
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
I tend to agree, built in application level load balancing is never an ideal situation in my opinion. If you really need it, by some hardware to do the job properly (F5, alteon etc).TheWitness wrote:This sort of answers the "should we invent it outselves" question on load ballancing. HA for sure, but there are already technologies for that. So, I would suspect your answer would be leave that out of the design. The capability will be there for HA, but it's always optional.Howie wrote:With a HA setup instead, then cache-sharing isn't really necessary. Really, if you have 1000 users all hitting the same graph then you are still reducing the number of graph-drawing operations from 1000 to 2, which is probably enough. If they aren't all hitting the same graphs, then graph caching isn't going to help anyway.
TheWitness
- Howie
- Cacti Guru User
- Posts: 5508
- Joined: Thu Sep 16, 2004 5:53 am
- Location: United Kingdom
- Contact:
If I understand the layout correctly, it would have to be on the 'master' poller, as the remote pollers report back in to that one. Of the 'big' plugins that I can think of that actually use cacti data * (reportit, thold, weathermap), thold and wmap can both already use poller_output instead of looking at rrd files directly. I don't really see any way around it for reportit though... if the rrdfiles were physically distributed onto different machines, then there would need to be some sort of aggregate-view or 'run this on all the rrd servers' API.TheWitness wrote:I need feedback from people like Howie and yourself to determine "where" that should be. So, see my comment above. It's an RFC you knowgandalf wrote:Where should hooks like "poller_bottom" be executed?
Actually thold probably would work either on the local pollers or the central location, since it works with one DS at a time, but it would be easier to maintain in the centre.
* I don't use Manage, MACTrack or Discovery, but as far as I know they don't deal with rrd data, do they?
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
It's a little late, even for me, so please excuse all the aberations I'm about to write
WHY do we want multiple pollers?
a) Backup - have all the data available, even if one of the Servers is not available, or can't request data from all
the devices for some reason (network outage)
b) Speed - One server can't handle all the required devices
c) Combined (a+b)
IMHO it's a different approach for every one of the three situations listed.
I THINK that speed can be solved "local" (4 separate servers: Web, poller, DB and RRD Updater)
Facts: 10676 DS, 9742 graphs, 10669 RRD files totalling 1.3GB
Polling time: ~15 seconds
Two servers, one DB, one for the rest
Backup, on the other hand, is something else.
What if we run 2 somehow completly "independent" cacti instances, both querying the same hosts, and somehow
syncronize them?
Something like replication for MySQL (I really don't know how this works) and RSNYC for RRD files?
Hmm, nice shit
WHY do we want multiple pollers?
a) Backup - have all the data available, even if one of the Servers is not available, or can't request data from all
the devices for some reason (network outage)
b) Speed - One server can't handle all the required devices
c) Combined (a+b)
IMHO it's a different approach for every one of the three situations listed.
I THINK that speed can be solved "local" (4 separate servers: Web, poller, DB and RRD Updater)
Facts: 10676 DS, 9742 graphs, 10669 RRD files totalling 1.3GB
Polling time: ~15 seconds
Two servers, one DB, one for the rest
Backup, on the other hand, is something else.
What if we run 2 somehow completly "independent" cacti instances, both querying the same hosts, and somehow
syncronize them?
Something like replication for MySQL (I really don't know how this works) and RSNYC for RRD files?
Hmm, nice shit
Error in posting
DEBUG MODE
SQL Error : 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near 'WHERE forum_id = 7' at line 3
UPDATE forums SET forum_posts = forum_posts + 1, forum_last_post_id = WHERE forum_id = 7
Line : 423
File : functions_post.php
[url=http://www.x-graphs.com/]http://www.x-graphs.com[/url] [color=red]X[/color]-[color=blue]graphs[/color] :: All kind of graphs
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Yea, you get that when you sit on a post too long. Back button, copy, back button, repost, paste, post.
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Who is online
Users browsing this forum: No registered users and 2 guests