Spine Progress Reports
Moderators: Developers, Moderators
This message is simply to establish a folder for informing anybody who wants to know about the progress of "Spine" (or whatever else clever we decide to call it.)</P>
"Spine" is going to be a threaded application written in C that will replace the cmd.php program currently used to execute data source tasks.</P>
The main motivation behind Spine is to enable people like myself, drub and other Core network engineers who have to track possibly thousands of interfaces to do so as fast and efficiently as possible.</P>
Currently cmd.php is not concurrent and processes the data source tasks in sequential order. For large numbers of tasks this gets to be slow. slower than the five minute collection interval in fact. (Though you can run two different cmd.php processes simultaneously if you establish two seperate cacti directories and databases. This helps but isn't ideal.)</P>
Although Ian did a fabulous job with cacti I'm old-school and believe that the central program responsible for data manipulation and collection should be as fast as possible. For me this rules out running an interpreted language such as php. For those that disagree , like perl and believe its faster than C we would appreciate you input into either proving that Spine can be done better this way or by helping write supportive routines.</P>
My goal is to make Spine as fast as possible while keeping a focus on making it stable and flexible so that new features can be added after the initial task of duplicating the functionality of cmd.php is finished.</P>
This Topic is here so that I can keep everyone posted on the progress I am making and to keep me interested and on track. If you are an academic type person you may also find this topic interesting because I will illuminate upon certain design considerations I have made in the interests of speed and/or flexibility.</P>
"Spine" is going to be a threaded application written in C that will replace the cmd.php program currently used to execute data source tasks.</P>
The main motivation behind Spine is to enable people like myself, drub and other Core network engineers who have to track possibly thousands of interfaces to do so as fast and efficiently as possible.</P>
Currently cmd.php is not concurrent and processes the data source tasks in sequential order. For large numbers of tasks this gets to be slow. slower than the five minute collection interval in fact. (Though you can run two different cmd.php processes simultaneously if you establish two seperate cacti directories and databases. This helps but isn't ideal.)</P>
Although Ian did a fabulous job with cacti I'm old-school and believe that the central program responsible for data manipulation and collection should be as fast as possible. For me this rules out running an interpreted language such as php. For those that disagree , like perl and believe its faster than C we would appreciate you input into either proving that Spine can be done better this way or by helping write supportive routines.</P>
My goal is to make Spine as fast as possible while keeping a focus on making it stable and flexible so that new features can be added after the initial task of duplicating the functionality of cmd.php is finished.</P>
This Topic is here so that I can keep everyone posted on the progress I am making and to keep me interested and on track. If you are an academic type person you may also find this topic interesting because I will illuminate upon certain design considerations I have made in the interests of speed and/or flexibility.</P>
2/19/02
So far I have made some good progress but haven't acheived anything that runs yet. I am spending a lot of time attempting to abstract most of the data types so that we will be able to support multiple types of databases and so the code will be easy to maintain and extend.</P>
Spine will be a daemon (even if it only processes the task list once.) This is necessary to allow cacti to gather data from sources that have collection periods that differ. You will be able to collect your CPU utilization every second and you will be able to collect you room's temperature every 6 hours</P>
To achieve this the list of data sources will represented as "tasks". A task will not necessarily be a command to run. We can later extend tasks to include such things as watchdog timers and network monitoring tasks. I see cacti becoming able to do things like watch a machine every five minutes and if the task to monitor it fails cacti will mail you about your machine being down. (Just one example.)</P>
The list of tasks will be stored using a Priority Queue data structure. where tasks are prioritized by when the task is due to be executed. I have completed the implementation of the priority queue data structure and it appears to function. This will make it very simple to determine what task a thread should handle next and will allow Spine to suspend its operations while nothing is presently due to be executed. Currently the PQueue is implemented using dynamic tables based on load factors of 1 and 1/4. Since the underlying structure is a continguous array Spine will be limited in the number of tasks it can handle. That limit will be equal to the largest array that can be allocated by the OS and have a size that is a power of 2. For my linux box this sets a rather non-imposing limit of about 2^28 tasks.
So if you need more than 268 million tasks you will need to rewrite the Priority Queue structure In anycase the maintanence of even very large priority queues will be O(lg(n)) so it will be very fast for even very large task lists</P>
Cacti uses variables to represent input and output items. Though I think these symbols can be substitute once at startup I find it desirable to maintain symbol tables for each task and a global symbol table and resolve the symbols at time of task execution.</P>
I have implemented Red-Black Binary trees to implement a symbol table efficiently. These are now functioning and will be used store variable <-> value mappings. This will eventually be a hash data structure for the symbol tables but in the interests of getting something running quickly. the "hash" currently has a single bin and that bin is a single Red-Black tree. For small numbers of symbols this should be sufficient for now.</P>
The next implementation obstacle is to define a abstract data type for representing a single generic task. Once I have this complete I can tie it in with the above and some mysql calls to retrieve/build the task list. Following this I'll implement a function to process a task. Then I should be a point where I can duplicate the functionality (and speed) of cmd.php before moving on to actually thread the application.</P>
So far I have made some good progress but haven't acheived anything that runs yet. I am spending a lot of time attempting to abstract most of the data types so that we will be able to support multiple types of databases and so the code will be easy to maintain and extend.</P>
Spine will be a daemon (even if it only processes the task list once.) This is necessary to allow cacti to gather data from sources that have collection periods that differ. You will be able to collect your CPU utilization every second and you will be able to collect you room's temperature every 6 hours</P>
To achieve this the list of data sources will represented as "tasks". A task will not necessarily be a command to run. We can later extend tasks to include such things as watchdog timers and network monitoring tasks. I see cacti becoming able to do things like watch a machine every five minutes and if the task to monitor it fails cacti will mail you about your machine being down. (Just one example.)</P>
The list of tasks will be stored using a Priority Queue data structure. where tasks are prioritized by when the task is due to be executed. I have completed the implementation of the priority queue data structure and it appears to function. This will make it very simple to determine what task a thread should handle next and will allow Spine to suspend its operations while nothing is presently due to be executed. Currently the PQueue is implemented using dynamic tables based on load factors of 1 and 1/4. Since the underlying structure is a continguous array Spine will be limited in the number of tasks it can handle. That limit will be equal to the largest array that can be allocated by the OS and have a size that is a power of 2. For my linux box this sets a rather non-imposing limit of about 2^28 tasks.
So if you need more than 268 million tasks you will need to rewrite the Priority Queue structure In anycase the maintanence of even very large priority queues will be O(lg(n)) so it will be very fast for even very large task lists</P>
Cacti uses variables to represent input and output items. Though I think these symbols can be substitute once at startup I find it desirable to maintain symbol tables for each task and a global symbol table and resolve the symbols at time of task execution.</P>
I have implemented Red-Black Binary trees to implement a symbol table efficiently. These are now functioning and will be used store variable <-> value mappings. This will eventually be a hash data structure for the symbol tables but in the interests of getting something running quickly. the "hash" currently has a single bin and that bin is a single Red-Black tree. For small numbers of symbols this should be sufficient for now.</P>
The next implementation obstacle is to define a abstract data type for representing a single generic task. Once I have this complete I can tie it in with the above and some mysql calls to retrieve/build the task list. Following this I'll implement a function to process a task. Then I should be a point where I can duplicate the functionality (and speed) of cmd.php before moving on to actually thread the application.</P>
On a side note about development:
I know very little about CVS systems but it would be nice to have something like that available for Spine. Currently I want to maintain authority over Spine's core code that I rely on but I would like to be able to make the code available to others in case somebody wants to help program extensions that I don't feel I have the time for.
For instance: It would be nice for somebody to extend my psuedo-hash rb-tree to a real hash with multiple bins that actually does hashing. If they could do that it could be submitted as a patch and I could work it into the CVS tree later.
If anybody knows an easy way to set up a CVS repository and could lead me to a place where I can use it efficiently I'ld like your help with that.
I know very little about CVS systems but it would be nice to have something like that available for Spine. Currently I want to maintain authority over Spine's core code that I rely on but I would like to be able to make the code available to others in case somebody wants to help program extensions that I don't feel I have the time for.
For instance: It would be nice for somebody to extend my psuedo-hash rb-tree to a real hash with multiple bins that actually does hashing. If they could do that it could be submitted as a patch and I could work it into the CVS tree later.
If anybody knows an easy way to set up a CVS repository and could lead me to a place where I can use it efficiently I'ld like your help with that.
Bravo for this challenging effort!
I'd like to ask you flyers (fun loving youth en-route to success) to not leave the Windows world behind when you do implement Spine. I know about the forking abilities of Linux and the obvious advantages over win32 for this sort of thing, but hopefully SOME sort of contined support will exist when the time comes...
I'd like to ask you flyers (fun loving youth en-route to success) to not leave the Windows world behind when you do implement Spine. I know about the forking abilities of Linux and the obvious advantages over win32 for this sort of thing, but hopefully SOME sort of contined support will exist when the time comes...
Personally I don't touch windows. But I am very concerned about writing portable, extensible code. My motivation for this is develope an end product that can be easily maintained and extended by others and will allow new features and abilities to be added to cacti so that it scales well and grows to handle new requirements. To that end the unusual item that I have designed spine to rely upon is POSIX threads. Everything else so far appears to be able to completed using standard ANSI C (with the exception of inline prototyping.)
There is a POSIX threads Open Source project for windows but it currently does not handle forking from threads so it may not be usable to implement Spine with since Spine will rely on this to gather data that it doesn't know how to obtain internally. But I do think that any competent Windows programmer will be able to take my source code and modify it to use the Win32 API for concurrency. Spine isn't going to need any super fancy syncronization. simple mutexs around the priority queue and maybe some sleep/wake up routines but that is it and the implementation of these items should wind up being fairly neat, clean and independent of the rest of the code. so Win32 API should handle that.
Hopefully, this will mean that Spine will not leave the Windows user base in the dark.
There is a POSIX threads Open Source project for windows but it currently does not handle forking from threads so it may not be usable to implement Spine with since Spine will rely on this to gather data that it doesn't know how to obtain internally. But I do think that any competent Windows programmer will be able to take my source code and modify it to use the Win32 API for concurrency. Spine isn't going to need any super fancy syncronization. simple mutexs around the priority queue and maybe some sleep/wake up routines but that is it and the implementation of these items should wind up being fairly neat, clean and independent of the rest of the code. so Win32 API should handle that.
Hopefully, this will mean that Spine will not leave the Windows user base in the dark.
http://sourceforge.net/ ?If anybody knows an easy way to set up a CVS repository and could lead me to a place where I can use it efficiently I'ld like your help with that.
Speaking of SourceForge, I currently have a project registered for cacti:
http://www.sf.net/projects/cacti/
jwiegley: If you have a SF account and want me to make you a project admin, tell me and I'll go ahead and do that.
I currently only use this SF project for downloads, but CVS would be another good thing to use it for.
-Ian
http://www.sf.net/projects/cacti/
jwiegley: If you have a SF account and want me to make you a project admin, tell me and I'll go ahead and do that.
I currently only use this SF project for downloads, but CVS would be another good thing to use it for.
-Ian
2/26/01 Progress is slower than I had hoped mostly because of the time involved in developing abstract data types to support my long-term feature goals but also because of a bunch of stressors in my life.
Anyhow. As of today, most of the code has been written to give Spine symbol table support. This will allow spine to efficient handle the variable to value mapping such as
<ip>, <community>.
Symbol tables will also be the object used to return output values as generated by either internal or external task functions. This should allow Spine to have a well defined API for internal modules. For instance all internal tasks will, I think, be given two symbol tables as input (one global and one local scope) and will return a single symbol table containing the results.
This way anybody can easily write internal modules for things like snmp and ping.
Though it makes some function interfaces easy to standardize the primary motivation for symbol tables was that it seemed the best and most extensible method of storing the "input" to value mappings that are specified for each data source.
The symbol table implementation is a 256 bin hash of Red-Black trees. the "Symbol" data type is stored by the red-black tree in the appropriate bin.
For small symbol tables this is overkill and probably less efficient than an optimal hash table for these things. But two goals are: 1) Speed and 2) scalability for Spine. This implementation is very close to O(1) for very large symbol tables so Spine will be able to scale like crazy.
The Red-Black Tree implementation has been extended to be usable as an un-ordered list as well. In which case the natural order of the binary tree is based on earlier inserted items being considered "smaller" than those items inserted later in time. This will be useful for implementing lists of things as needed elsewhere in spine. This renders "Searching" for an item an undefined operation but the tree can be "walked" using an iterator type concept.
This gets me quite close to implementing and testing actual "task" types. Once those are done it is a matter of writing the code to process a task and then threading the application. Still several weeks away though and I will probably be spending a couple of days to see what it will take to learn and implement a CVS source for spine on Source Forge. So that will take up some time.
Sub-goal: In two weeks at least have spine functioning enough to retrieve the necessary task list from the MySQL database and store it internally in a manner suitable for processing by threads.
Anyhow. As of today, most of the code has been written to give Spine symbol table support. This will allow spine to efficient handle the variable to value mapping such as
<ip>, <community>.
Symbol tables will also be the object used to return output values as generated by either internal or external task functions. This should allow Spine to have a well defined API for internal modules. For instance all internal tasks will, I think, be given two symbol tables as input (one global and one local scope) and will return a single symbol table containing the results.
This way anybody can easily write internal modules for things like snmp and ping.
Though it makes some function interfaces easy to standardize the primary motivation for symbol tables was that it seemed the best and most extensible method of storing the "input" to value mappings that are specified for each data source.
The symbol table implementation is a 256 bin hash of Red-Black trees. the "Symbol" data type is stored by the red-black tree in the appropriate bin.
For small symbol tables this is overkill and probably less efficient than an optimal hash table for these things. But two goals are: 1) Speed and 2) scalability for Spine. This implementation is very close to O(1) for very large symbol tables so Spine will be able to scale like crazy.
The Red-Black Tree implementation has been extended to be usable as an un-ordered list as well. In which case the natural order of the binary tree is based on earlier inserted items being considered "smaller" than those items inserted later in time. This will be useful for implementing lists of things as needed elsewhere in spine. This renders "Searching" for an item an undefined operation but the tree can be "walked" using an iterator type concept.
This gets me quite close to implementing and testing actual "task" types. Once those are done it is a matter of writing the code to process a task and then threading the application. Still several weeks away though and I will probably be spending a couple of days to see what it will take to learn and implement a CVS source for spine on Source Forge. So that will take up some time.
Sub-goal: In two weeks at least have spine functioning enough to retrieve the necessary task list from the MySQL database and store it internally in a manner suitable for processing by threads.
A very interesting idea indeed. I also plan to collect a vast amount of data and I am already seeing the collection time increasing with just a couple remote snmp polls. I have little to offer this evening(morning), but i would love to help out in any way that I can. This will definatley become something I will through extra brain-flops towards.
Ok, after some sleep I would like to propose this. Instead of working with a programming language such as C (which in my opinion will add a great amount of overhead, software dependancy, etc.) why not parse the output of cmd.php to a shell script that allows each of the data collection processes to be called independantly of one another? This will allow simultaenous collection of data from different colletion sources and also avoid the re-verse engineering of the cmd.php process and porting it to another language. Just a thought.... I need to examine the structure of cmd.php and determine if this could be done on the fly so that changes can be readily made to the data collection processes via the web interface with out the user doing the backend work of parsing cmd.php into a shell script.....more to come as time allows.
<font size=-1>[ This Message was edited by: digger on 2002-03-06 11:27 ]</font>
<font size=-1>[ This Message was edited by: digger on 2002-03-06 11:27 ]</font>
Yep, It is certainly possible to avoid the programming "overhead" involved in C and still get psuedo-concurrent behavior.
When Spine reaches a functional state I'll place a gentleman's bet that Spine will out perform any other language implementation other than C++ (which is nearly identical to C and my object-oriented C programming approach.) At least for raw SNMP data collection.
Other than that you will also find that a parsed/multi shell implmentation will have some pretty strict limits of its own. How many processes is a single user allowed to establish for instance?
Lastly, The "overhead" in Spine is mostly geared towards designing Spine to be flexible as well as reliable. It will be trivial in spine to build "internal" data collection modules such as cmd.php's current snmp abilities.
Cacti is super product aimed at the correct market: People who need to monitor and report on network usage but who need to have something up and running quickly without a huge learning curve.
My goal is to add to this feature set the items: speed, scalability and flexibility. This way Cacti should be able to grow beyond its current limits.
When Spine reaches a functional state I'll place a gentleman's bet that Spine will out perform any other language implementation other than C++ (which is nearly identical to C and my object-oriented C programming approach.) At least for raw SNMP data collection.
Other than that you will also find that a parsed/multi shell implmentation will have some pretty strict limits of its own. How many processes is a single user allowed to establish for instance?
Lastly, The "overhead" in Spine is mostly geared towards designing Spine to be flexible as well as reliable. It will be trivial in spine to build "internal" data collection modules such as cmd.php's current snmp abilities.
Cacti is super product aimed at the correct market: People who need to monitor and report on network usage but who need to have something up and running quickly without a huge learning curve.
My goal is to add to this feature set the items: speed, scalability and flexibility. This way Cacti should be able to grow beyond its current limits.
Progress for 3/6/02
My day job has turned into a day and night job. Taxes are coming due and I've been pretty much swamped with household duties. The only progress I have been able to make is to obtain a decent laptop (my six year old doorstop was useless).
I'm hoping that this will help get me a little more mobile and give me a more relaxed environment in which to develope during my spare time.
Hopefully getting my tax work done and getting a few duties out of the way will allow me to get back on track with the spine programming.
Also hopefully somebody will get an alternative written as suitable replacement
or alternative to Spine. The threaded shell approach seems useful but not for those with 500+ interfaces.
My day job has turned into a day and night job. Taxes are coming due and I've been pretty much swamped with household duties. The only progress I have been able to make is to obtain a decent laptop (my six year old doorstop was useless).
I'm hoping that this will help get me a little more mobile and give me a more relaxed environment in which to develope during my spare time.
Hopefully getting my tax work done and getting a few duties out of the way will allow me to get back on track with the spine programming.
Also hopefully somebody will get an alternative written as suitable replacement
or alternative to Spine. The threaded shell approach seems useful but not for those with 500+ interfaces.
It turns out I know even lessa about PHP than I thought... Stil gonna work on this a bit more when I get the chance.
BTW - I wasn't trying to imply that the coding into C would be the overhead. Just wondering how much extra CPU, memory, etc will be needed for a whole new binary. By trade I am a network guy who can script, not a coder. I will pretty much admit up front that I know very little about the world above OSI layer 4. What I do know from my experience is that coding of a binary may get a bit tricky to make cross platform without distibuting source code. Then you have to rely on a compiler on the user end, etc. Some users of an application like this may not have the experience enough to compile a binary for their system and others may not be able to due to security reasons. ( I do not alow compilers to be installed on many production systems....but that is a totally different topic
Anyway, I hope that I did not offend and if I did this explination clears things up and serves as an apology. If anything else I think we may find that somewhere imbetween lies the right answer. (as that I still don't have much better of a way to control the CPU util. other than nice)
BTW - I wasn't trying to imply that the coding into C would be the overhead. Just wondering how much extra CPU, memory, etc will be needed for a whole new binary. By trade I am a network guy who can script, not a coder. I will pretty much admit up front that I know very little about the world above OSI layer 4. What I do know from my experience is that coding of a binary may get a bit tricky to make cross platform without distibuting source code. Then you have to rely on a compiler on the user end, etc. Some users of an application like this may not have the experience enough to compile a binary for their system and others may not be able to due to security reasons. ( I do not alow compilers to be installed on many production systems....but that is a totally different topic
Anyway, I hope that I did not offend and if I did this explination clears things up and serves as an apology. If anything else I think we may find that somewhere imbetween lies the right answer. (as that I still don't have much better of a way to control the CPU util. other than nice)
Who is online
Users browsing this forum: No registered users and 4 guests