I work for a company (with yousillygoose) that uses Cacti to monitor several thousand devices. We are currently beginning a hardware migration and software version upgrade, and have noticed a small bug that I have been unable to squash myself, with my admittedly limited knowledge.
We are finding that some thresholds have been logged as breaching a high/low threshold of 0 on occasion.
Now, these thresholds are not actually alerting incorrectly - they are triggering correctly in every case, as far as I can tell. However, when they are reporting to rsyslog, they are claiming to be breaching a threshold of 0.
The syslog reporting is also inconsistent. That is, the same threshold will sometimes report correctly (ie. disk usage over 90% full), and then incorrectly; this can swap back and forth as often as every polling cycle.
This is not a common issue. I would estimate that well over 90% of the time, everything works perfectly. As a result, troubleshooting has been very difficult.
We are using Cacti ver 0.8.7i, with Thold 0.4.9-3, on CentOS 5.something. We have modified thold_functions.php just a tiny bit to help our Smarts system sort our Cacti alerts correctly. This was a simple addition of a few characters in the message logged at the end of the logger functions, like so:
Code: Select all
if (strval($breach_up) == 'ok') {
syslog($syslog_level, "CACTIALERT " . $desc . ' restored to normal with ' . $currentval . ' at trigger ' . $trigger . ' out of ' . $triggerct);
} else {
syslog($syslog_level, "CACTIALERT " . $desc . ' went ' . ($breach_up ? 'above' : 'below') . ' threshold of ' . $threshld . " - source: cacti31");
}
Has anyone encountered this already? Have I simply overlooked something that has already been fixed?
Thank you very much for your help. Let me know if I can provide any other relevant information.