I'ld like to get some feedback on a possible
spine feature. Please pot and send me thoughts.</P>
As I'm programming the abstract data types
for spine I have come across the concept that
I believe it will be possible to update
multiple RRA files or (multiple DSes in a
single RRA file) without much hassle.</P>
I think the best way to handle this is
through the use of regular expressions and
extending the interface to allow multiple
"update rra" outputs. Each output that is
marked to "update rra" should have associated
with it an rra file to update.</P>
Then we add some regex specs to each data
source to inform Spine how to locate the
data in the output stream provided
by the data source.</P>
I think this would be useful because I think
there are lots of instances where we can get
back more than one useful piece of
information.</P>
Take Ping for example: You get back lots of
information from a single task. If you do say
10 packets you could collect three different
data sources, one each for min, average and
max time. I find it useful to compare these
items.</P>
I'm working towards Spine being able to do
some parsing of the data so that we can
give it more generic commands as data sources
and simply tell it how to locate the
information.</P>
Thus a simple specification of:<br>
/round-trip [^0-9.]*([0-9.]*)/([0-9.]*)/([0-9.]*)/ <out1> <out2> <out3></br>
would enable Spine to recognize the line:<br>
round-trip min/avg/max/mdev = 95.713/98.431/101.201/2.240 ms
<br>
and add the min avg and max values to three
seperate data sources.</P>
I'm looking for a way to get Spine to
avoid on a lot of sed and awk extra processes
that are only used for the purpose of
stripping away everything but the data from
the data source</p>
it seems natural to extend this to recognize
and collect more than a single piece of data
from a rich stream.
Multiple RRA updates from a single source proposal...
Moderators: Developers, Moderators
Here are some of my thoughts on the matter.
First of all I really like the idea of using builtin regexps, never thought of that. Like you said, it would eliminate a lot of extra calls to sed/awk.
Second, multiple data source support per .rrd file should be in the next version. I haven't actually coded this yet, but I think I know how I want to do it.
It will be done at the "data input source" level per say and not for each "data source" in cacti.
For instance, you will be able to create a ping script that returns 3 values: min/max/avg. All three of these items will have the "Update RRA" box checked. The only difference is when you go to add a data source you will have to choose which item (min/max/avg) that data source is associated with.
I will also create some method for "tying these data sources together", so cacti knows that 'ping_uunet_min', 'ping_uunet_max', and 'ping_uunet_avg' are all really the same data source, but with different outputs. Perhaps I will impliment some sort of "sub-data source" idea.
Further thoughts are greatly appriciated.
-Ian
First of all I really like the idea of using builtin regexps, never thought of that. Like you said, it would eliminate a lot of extra calls to sed/awk.
Second, multiple data source support per .rrd file should be in the next version. I haven't actually coded this yet, but I think I know how I want to do it.
It will be done at the "data input source" level per say and not for each "data source" in cacti.
For instance, you will be able to create a ping script that returns 3 values: min/max/avg. All three of these items will have the "Update RRA" box checked. The only difference is when you go to add a data source you will have to choose which item (min/max/avg) that data source is associated with.
I will also create some method for "tying these data sources together", so cacti knows that 'ping_uunet_min', 'ping_uunet_max', and 'ping_uunet_avg' are all really the same data source, but with different outputs. Perhaps I will impliment some sort of "sub-data source" idea.
Further thoughts are greatly appriciated.
-Ian
Spine's design for handling multiple outputs is going to function like this: Each task has a task handler associated with it. I think "task" is a better term for the functional level that we currently call a data source; we should use "data source" to refer to a particular sequence of data as stored in an RRDtool database.) Thus the "task" of pinging a machine (to use our example) would update three data sources as a result of the single task, one each for min, avg and max.)
Anyhow, each task handler will be passed the input field values as defined for that task and a list of regular expressions/output field associations. When the task completes the task handler will pass back a symbol table that maps output fields names to data as matched by the regular expression lists.
I think what you are proposing for the database side of this will interface just fine. I was thinking of adding a table with the field "dsid", "fieldid", "rrdtoolfile", "rrdtool ds name". Tehn when the task completes the thread can go through a list of output fields, if they have an entry in the output symbol table the value of the symbol is updated in the corresponding rrdtoolfile as specified by this table, the data input record and the task being executed.
Anyhow, each task handler will be passed the input field values as defined for that task and a list of regular expressions/output field associations. When the task completes the task handler will pass back a symbol table that maps output fields names to data as matched by the regular expression lists.
I think what you are proposing for the database side of this will interface just fine. I was thinking of adding a table with the field "dsid", "fieldid", "rrdtoolfile", "rrdtool ds name". Tehn when the task completes the thread can go through a list of output fields, if they have an entry in the output symbol table the value of the symbol is updated in the corresponding rrdtoolfile as specified by this table, the data input record and the task being executed.
Who is online
Users browsing this forum: No registered users and 2 guests