New Cacti Architecture (0.8.8) - RFC Response Location
Moderators: Developers, Moderators
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: New Cacti Architecture (0.8.8) - RFC Response Location
AFAIK, no consensus has been encountered and no code was produced. I know nevertheless, that some small steps have been made, but to non-public code.
R.
R.
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Thanks for the reply, I'll keep an eye out for this in future releases.gandalf wrote:AFAIK, no consensus has been encountered and no code was produced. I know nevertheless, that some small steps have been made, but to non-public code.
R.
Cheers
Cameron.
--
"The future belongs to those who believe in the beauty of their dreams." - Eleanor Roosevelt
-
- Posts: 44
- Joined: Thu Jul 10, 2008 4:46 pm
Re: New Cacti Architecture (0.8.8) - RFC Response Location
I have read over this thread. Since it has been started some amazing open source software can make this process much easier.
First let me mention apache-cassandra which is a distributed multi-master active/active database. While it will take some work to redesign the cacti mysql-schema into Cassandra. This is not impossible.
Next is the RRD store, there are already many examples of people using Cassandra for RRD-store like applications
http://nosql.mypopescu.com/post/3134031 ... ndra-based
However we can also just simply store RRD files raw inside Cassandra and pull them out where needed.
As for poller failover. I think that is best handled with a tool like linux-ha. Most people would only need one active-passive poller per datacenter. This is easily done with linux-ha.
I really want to see this happen some of the newer tools are making strides in distributed storage and really pushing cacti out of some use cases. cacti was the first open source tool I installed at my job. I love it to death. If we built it they will come. (back)
I do not want to step on anyone's toes with work that is already on going, so I would like to know if I can hack at it.
First let me mention apache-cassandra which is a distributed multi-master active/active database. While it will take some work to redesign the cacti mysql-schema into Cassandra. This is not impossible.
Next is the RRD store, there are already many examples of people using Cassandra for RRD-store like applications
http://nosql.mypopescu.com/post/3134031 ... ndra-based
However we can also just simply store RRD files raw inside Cassandra and pull them out where needed.
As for poller failover. I think that is best handled with a tool like linux-ha. Most people would only need one active-passive poller per datacenter. This is easily done with linux-ha.
I really want to see this happen some of the newer tools are making strides in distributed storage and really pushing cacti out of some use cases. cacti was the first open source tool I installed at my job. I love it to death. If we built it they will come. (back)
I do not want to step on anyone's toes with work that is already on going, so I would like to know if I can hack at it.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Do you have any knowledge about performance when using Cassandra?
Could you provide some hints how to start such a project?
R.
Could you provide some hints how to start such a project?
R.
-
- Posts: 44
- Joined: Thu Jul 10, 2008 4:46 pm
Re: New Cacti Architecture (0.8.8) - RFC Response Location
I have worked with the project extensively. Cassandra is a distributed /sharded key value store. Sharding is based on a user supplied key. Writes are done without reading so writes are really fast. Reads are fast as well because the underlying structures sort based on keys. It linear scalable and there are no SPOF.
As I mentioned, moving the RRD store to Cassandra is really trivial.
row key : server1/graph1
column key : 123452525 (timestamp)
column value 1234
column key : 123452530 (timestamp)
column value : 3434
Many people have used Cassandra to store this type of data (performance data)
For the meta store information in cacti. That is a bit more effort because you have to restructure your data (possibly denormalize and store it multiple times), we may not need as much horizontal scalability here multi-master mysql might fit the bill. But we sure could do that in Cassandra as well.
As I mentioned, moving the RRD store to Cassandra is really trivial.
row key : server1/graph1
column key : 123452525 (timestamp)
column value 1234
column key : 123452530 (timestamp)
column value : 3434
Many people have used Cassandra to store this type of data (performance data)
For the meta store information in cacti. That is a bit more effort because you have to restructure your data (possibly denormalize and store it multiple times), we may not need as much horizontal scalability here multi-master mysql might fit the bill. But we sure could do that in Cassandra as well.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Well, but you know that rrdtool does not only store data?appodictic wrote:As I mentioned, moving the RRD store to Cassandra is really trivial.
row key : server1/graph1
column key : 123452525 (timestamp)
column value 1234
column key : 123452530 (timestamp)
column value : 3434
- It does normalization
- it does consolidation (to avoid rising footprint infinitively)
- it does graphing; well, that's the most important issue, that I see currently with your proposal
R.
-
- Posts: 44
- Joined: Thu Jul 10, 2008 4:46 pm
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Yes. Those are challenges. I am do not understand the RRD format in depth.
My thinking is that it will be much easier to use an already existing distributed data store and make RRD work with it, rather then trying to build a distributed RRDStore from the ground up.
Since RRDTool works with local files, I do not believe the final distributed cacti will use RRDTool as we know it. Is it a requirement to replace RRDTool with something that is completely transparent to cacti?
The other way to do at this very simply is that a column in Cassandra is just a byte []. We could serialize the entire RRDfile in a single column,or a list of sub columns) and just use Cassandra like a distributed file system. Possibly break an RRDfile across multiple cassandra columns, let RRDTool work locally and then intelligently detect the section changed and sync it to the distributed store.
My thinking is that it will be much easier to use an already existing distributed data store and make RRD work with it, rather then trying to build a distributed RRDStore from the ground up.
Since RRDTool works with local files, I do not believe the final distributed cacti will use RRDTool as we know it. Is it a requirement to replace RRDTool with something that is completely transparent to cacti?
The other way to do at this very simply is that a column in Cassandra is just a byte []. We could serialize the entire RRDfile in a single column,or a list of sub columns) and just use Cassandra like a distributed file system. Possibly break an RRDfile across multiple cassandra columns, let RRDTool work locally and then intelligently detect the section changed and sync it to the distributed store.
Re: New Cacti Architecture (0.8.8) - RFC Response Location
hi appo,
I´m very interested in a distributed cacti-system but i think one thing that is overlooked with cassandra: we have rrdtool-actions happening
every poller run (be it 1 minute or 5 minutes). This can be delayed a bit by boost-plugin (afaik) but still every rrd-file is touched+updated
with poller results.
Cacti was built "around" rrdtool that might come with inherent weaknesses but definetly a lot of strengths, too. I don´t know
any other database that can store data for trend-analysis as efficient and compact as rrd.
Do you have any data for I/O perfomance in cassandra and how it´ll work over ie WAN-Links? I´ll google cassandra up, but sure like to
get pointed to the right informations if u don´t mind .
cheers,
jerri
edit:
Is this sth. i should be worried about: http://wiki.apache.org/cassandra/CassandraHardware ? Comparing to our ressources needed for Cacti atm
it´d be a rather hefty upgrade.
I´m very interested in a distributed cacti-system but i think one thing that is overlooked with cassandra: we have rrdtool-actions happening
every poller run (be it 1 minute or 5 minutes). This can be delayed a bit by boost-plugin (afaik) but still every rrd-file is touched+updated
with poller results.
Cacti was built "around" rrdtool that might come with inherent weaknesses but definetly a lot of strengths, too. I don´t know
any other database that can store data for trend-analysis as efficient and compact as rrd.
Do you have any data for I/O perfomance in cassandra and how it´ll work over ie WAN-Links? I´ll google cassandra up, but sure like to
get pointed to the right informations if u don´t mind .
cheers,
jerri
edit:
Is this sth. i should be worried about: http://wiki.apache.org/cassandra/CassandraHardware ? Comparing to our ressources needed for Cacti atm
it´d be a rather hefty upgrade.
-
- Posts: 44
- Joined: Thu Jul 10, 2008 4:46 pm
Re: New Cacti Architecture (0.8.8) - RFC Response Location
@jerrison. The CassandraHardware page is making recomendations on hardware for using Cassandra as a very large dedicated datastore. This page is describing what hardware you would need to run a large heavily accessed multi-gb or multi-tb cluster. Most would probably be able to get along with a single node instance configured with about the same memory as mysql (probably no promises
@Gandalf One nice thing about Cassandra is it has thrift bindings for many languages c, c++, php, etc I was actually thinking about making this work with the most limited upstream changes. I was thinking that we could fork rrdtool to rrdtool-cassandra. The path argument to the RRDTool commands would be used as the Cassandra key, for example and none of the rrdcommands would work with local files, they would interact directly with Cassandra. I am not a crack c coder by any stretch, but I know this is a BIG task, but the pay off is a drop in replacement for rrd that would be transparent to upstream cacti.
@Gandalf One nice thing about Cassandra is it has thrift bindings for many languages c, c++, php, etc I was actually thinking about making this work with the most limited upstream changes. I was thinking that we could fork rrdtool to rrdtool-cassandra. The path argument to the RRDTool commands would be used as the Cassandra key, for example and none of the rrdcommands would work with local files, they would interact directly with Cassandra. I am not a crack c coder by any stretch, but I know this is a BIG task, but the pay off is a drop in replacement for rrd that would be transparent to upstream cacti.
Re: New Cacti Architecture (0.8.8) - RFC Response Location
One thing that turned out to be a bottleneck in quite a few cacti-setups is the I/O performance of read/writes-tasks when updating RRDs in larger environments (boost-plugin can help up to a point).
It´d be awesome if Cassandra could tackle that as well, somehow.
Just my 2c .
It´d be awesome if Cassandra could tackle that as well, somehow.
Just my 2c .
-
- Posts: 44
- Joined: Thu Jul 10, 2008 4:46 pm
Re: New Cacti Architecture (0.8.8) - RFC Response Location
That is one of the neat parts about the cassandra architectue it is a scale out architecture and p2p. So if n nodes can not handle the load adding nodes divides the data per node and the requests. And this can be done on he fly with no downtime.
-
- Posts: 6
- Joined: Tue Sep 20, 2011 5:35 pm
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Funny you mention this because i have been having trouble with the new nodes trying to redirect back through the original N nodes. Wonder what i might be doing wrong. I am running pretty heavy Facebook Application Analytics, so that could have something to do with it. Does anybody have any good ideas?appodictic wrote:That is one of the neat parts about the cassandra architectue it is a scale out architecture and p2p. So if n nodes can not handle the load adding nodes divides the data per node and the requests. And this can be done on he fly with no downtime.
Last edited by beenpricked on Sat Oct 01, 2011 7:14 pm, edited 1 time in total.
- rony
- Developer/Forum Admin
- Posts: 6022
- Joined: Mon Nov 17, 2003 6:35 pm
- Location: Michigan, USA
- Contact:
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Strange discussion to be having on this thread...
But, I have to add that Cassandra, while a pretty cool idea (I have researched it). Still doesn't provide the same interface and data storage that RRDtool does.
Until someone writes a replacement for RRDtool that uses Cassandra, I don't see this as a viable option for Cacti.
But, I have to add that Cassandra, while a pretty cool idea (I have researched it). Still doesn't provide the same interface and data storage that RRDtool does.
Until someone writes a replacement for RRDtool that uses Cassandra, I don't see this as a viable option for Cacti.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
- Howie
- Cacti Guru User
- Posts: 5508
- Joined: Thu Sep 16, 2004 5:53 am
- Location: United Kingdom
- Contact:
Re: New Cacti Architecture (0.8.8) - RFC Response Location
On the other side of things, with new tools, I've been wondering for a while about using some kind of message queue system between spine and cacti, both for poller_tasks and for the results - you'd get natural load-balancing across multiple pollers using the work queue, and you can either immediately consume the results or batch them up like Boost for results. You could also have other backends to spit the data out into things like Carbon/Graphite or other trendy tools.
(I found this thread again while checking to see if it had moved forwards - I've spent the last couple of days playing with Nimsoft, which has a nice distributed poller architecture (with optional SSL VPNs between pollers to get into customer networks and across NATs), central configuration, but horrible UI)
If Cacti could distribute polling the same way, if Autom8 could apply Thold templates, and if Thold could have multiple templates per DS, I think I'd just stop looking
(I found this thread again while checking to see if it had moved forwards - I've spent the last couple of days playing with Nimsoft, which has a nice distributed poller architecture (with optional SSL VPNs between pollers to get into customer networks and across NATs), central configuration, but horrible UI)
If Cacti could distribute polling the same way, if Autom8 could apply Thold templates, and if Thold could have multiple templates per DS, I think I'd just stop looking
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: New Cacti Architecture (0.8.8) - RFC Response Location
Do you see any chance to both haveHowie wrote:On the other side of things, with new tools, I've been wondering for a while about using some kind of message queue system between spine and cacti, both for poller_tasks and for the results - you'd get natural load-balancing across multiple pollers using the work queue, and you can either immediately consume the results or batch them up like Boost for results. You could also have other backends to spit the data out into things like Carbon/Graphite or other trendy tools.
- for a straight-forward approach: an easy, one system setup featuring as few components as possible for the very beginning
- for a complex, distributed approach: a scaling setup, where you can add components (message queues, pollers, plugins and stuff) as need arises
I'm not against heading for more complex use cases. But I don't like to make things unnecessary complex for a simple use case.
Result: when adding a queueing system, this should be an option only, not a must. And it should be able to cope with huge bulk peaks of messages ...
R.
Who is online
Users browsing this forum: No registered users and 3 guests