Baseline monitoring information
Moderators: Developers, Moderators
Baseline monitoring information
Hi to all,
I tried to get some information about "thold baseline monitoring" reading many messages on this forum but I didn't find nothing.
I have some questions about this feature:
1) where can I find some kind of documentation about thold plugin (if exists)?
2) someone can explain me the meaning of "baseline monitoring": I understand that with this feature is possible to trigger some alarms notifications when traffic is "strange" compared to some values on the past (typically default values are the last 3 hours (10800 seconds) of last day (86400 seconds), but I didn't understand which values are considered. In particular is possible configure this feature to show (inside the notification for example) the values considered as reference values. Can anyone explain how this feature works ?
Thank you very much in advance!
I tried to get some information about "thold baseline monitoring" reading many messages on this forum but I didn't find nothing.
I have some questions about this feature:
1) where can I find some kind of documentation about thold plugin (if exists)?
2) someone can explain me the meaning of "baseline monitoring": I understand that with this feature is possible to trigger some alarms notifications when traffic is "strange" compared to some values on the past (typically default values are the last 3 hours (10800 seconds) of last day (86400 seconds), but I didn't understand which values are considered. In particular is possible configure this feature to show (inside the notification for example) the values considered as reference values. Can anyone explain how this feature works ?
Thank you very much in advance!
Last edited by gabar on Wed Jul 09, 2008 6:53 am, edited 1 time in total.
Any news about baseline monitoring?
Hi,
someone is using baseline monitoring with success? Does it work?
Thanks
someone is using baseline monitoring with success? Does it work?
Thanks
Add me to this list too
I have been experimenting a bit with the settings and have had something from it. I set the baseline deviations in the settings and have had a handful of alerts based on these figures (I have no hard thresholds set) but I would like to be able to see the baseline it has calculated and the dynamic thresholds it uses.
Is it possible to get this into a future release or does anybody have a script that can do it?
I have been experimenting a bit with the settings and have had something from it. I set the baseline deviations in the settings and have had a handful of alerts based on these figures (I have no hard thresholds set) but I would like to be able to see the baseline it has calculated and the dynamic thresholds it uses.
Is it possible to get this into a future release or does anybody have a script that can do it?
- Attachments
-
- Note the baseline deviation up / down settings. I've had some alerts from THold based on these settings
- thold.jpg (241.36 KiB) Viewed 13134 times
Baselining
I've got threshholding working fine, but not baselining. I've tried dropping my windows down to 600 seconds, but nothing ever shows that a baseline has been calculated.
Should there be a line drawn on the graph?
Should there be a line drawn on the graph?
After a bit of experimenting, here's my experience with baselining.
The first person to post in this topic was 90% there. Baselining looks back a certain amount of time in the past (by default, 24 hours), and grabs a sample of data from then (by default, 3 hours' worth) to use as a "baseline" value. In particular, I believe that it grabs the minimum and maximum values from that period. Then, if the current value is more than, say, 10% higher than the maximum of the sampled period, the threshold is breached.
That's the basic idea... I haven't worked out all the details, but it's enough to get me started with thresholds.
For an example, check out the attached graph, showing a firewall's cpu usage over the last 48 hours. You can see that at about this time yesterday, the cpu was showing no more than 1% usage. For the last few hours, it's been at almost 20% usage, which is more than the 10% threshold I had set. This triggered the threshold to fire and send me an email, which inspired me to search the forums and see if anybody had any good tips on how to use thresholds, and I found this topic.
The first person to post in this topic was 90% there. Baselining looks back a certain amount of time in the past (by default, 24 hours), and grabs a sample of data from then (by default, 3 hours' worth) to use as a "baseline" value. In particular, I believe that it grabs the minimum and maximum values from that period. Then, if the current value is more than, say, 10% higher than the maximum of the sampled period, the threshold is breached.
That's the basic idea... I haven't worked out all the details, but it's enough to get me started with thresholds.
For an example, check out the attached graph, showing a firewall's cpu usage over the last 48 hours. You can see that at about this time yesterday, the cpu was showing no more than 1% usage. For the last few hours, it's been at almost 20% usage, which is more than the 10% threshold I had set. This triggered the threshold to fire and send me an email, which inspired me to search the forums and see if anybody had any good tips on how to use thresholds, and I found this topic.
- Attachments
-
- firewall_cpu.png (24.98 KiB) Viewed 11709 times
There's another thing to consider about baseline thresholds... they don't necessarily know anything about the data they're analyzing, which may lead to unexpected results.
Here's what I mean: If you're looking at, say, data on an interface, and you set the threshold to 20% above the baseline, then the threshold will trigger if the current traffic is 20% higher than the largest spike around this time yesterday (assuming you use the default settings). This does what you expect it to.
However, if you're monitoring something like processor usage, it's more complicated. Suppose again that the threshold is set to 20% above the baseline. If the device was running at 10% processor usage yesterday, you might expect that it needs to be at 30% today in order to trigger the threshold, because that's an extra 20%. However, the threshold will actually trigger at only 12%... because the number 12 is 20% higher than the number 10!
Here's what I mean: If you're looking at, say, data on an interface, and you set the threshold to 20% above the baseline, then the threshold will trigger if the current traffic is 20% higher than the largest spike around this time yesterday (assuming you use the default settings). This does what you expect it to.
However, if you're monitoring something like processor usage, it's more complicated. Suppose again that the threshold is set to 20% above the baseline. If the device was running at 10% processor usage yesterday, you might expect that it needs to be at 30% today in order to trigger the threshold, because that's an extra 20%. However, the threshold will actually trigger at only 12%... because the number 12 is 20% higher than the number 10!
-
- Cacti User
- Posts: 85
- Joined: Sat Jan 22, 2005 4:51 pm
i'm still trying to figure out how baseline works exactly.
but it seems to me that using baseline for monitoring things that can vary very much, like traffic interface and cpu usage, is not a good idea.
i'm studying baseline monitoring because i want to monitor disk usages with it. Disk usages, different from traffic interface and cpu usage, does not use to vary very much in a short period of time. It seems to me that this is the situation for baseline monitoring.
at it was discussed above, not all values are suitable for baseline monitoring, as i understand it so far. baseline monitoring some values, like cpu as stated, can trigger several false positives.
but it seems to me that using baseline for monitoring things that can vary very much, like traffic interface and cpu usage, is not a good idea.
i'm studying baseline monitoring because i want to monitor disk usages with it. Disk usages, different from traffic interface and cpu usage, does not use to vary very much in a short period of time. It seems to me that this is the situation for baseline monitoring.
at it was discussed above, not all values are suitable for baseline monitoring, as i understand it so far. baseline monitoring some values, like cpu as stated, can trigger several false positives.
Thold Baselining
My $.02... I seldom use the baseline feature because of it's very nature... If I am watching CPU, disk, or networks I usually can set a threshold. If any of these things are less than a threshold I don't normally care. So my CPU is running 30% more today than yesterday, unless it's above a certain threshold, who cares? The one thing I have found it useful for is monitoring cable modem users. The reason it is helpful is that the number of people online is constantly changing because of growth or because users were being moved around to different equipment. I do care if the number of people is less than X% of the number of people online an hour ago... It's good for tracking a deviation from a constantly changing number. FWIW, to those who might not understand it. It takes some fiddling around to get a good window of stats sampled and such. And to be honest I don't understand it 100% myself.
That said, and I already saw a single post with no reply to this issue, but has anyone had an issue where an upper threshold is not set (left blank) but it alarms as having been exceeded anyways? It seems like a bug, but the code apparently hasn't been touched in like 10 months, so I don't think we're likely to see anything changed/fixed with it anytime soon..
Znapel
That said, and I already saw a single post with no reply to this issue, but has anyone had an issue where an upper threshold is not set (left blank) but it alarms as having been exceeded anyways? It seems like a bug, but the code apparently hasn't been touched in like 10 months, so I don't think we're likely to see anything changed/fixed with it anytime soon..
Znapel
Who is online
Users browsing this forum: No registered users and 2 guests