The thold plugin converts available swap memory from a positive to a negative value, resulting in incorrect and unnecessary alerts
(e.g., sending an email to say that available swap memory -4,116,960 is below calculated baseline threshold -3,293,567.73).
This isn't caused by PHP's time zone being wrong.
Details:
We have two cacti servers, both monitoring the same devices with the same settings. Each does its own polling and has its own database. Only one has this problem.
One of the servers is running cacti version 0.8.8f and thold plugin version 0.5. It's behaving correctly. I'll refer to it below as "the old cacti server".
The other cacti server ("the new cacti server") used to be running those versions but we recently upgraded it to version 1.2.24, and the thold plugin to 1.5.2. Everything seems to be working correctly except that thold is converting the value for available swap memory (ucd_memAvailSwap) from positive to negative. It's doing this for all devices where ucd_memAvailSwap is monitored. It was not doing this before we upgraded it, and thold on the old server is not doing this.
For example, for one device, the current available swap memory is 4,116,960 as shown by `free`:
Code: Select all
# free
total used free shared buffers cached
Mem: 1938456 1728856 209600 312 163096 1350692
-/+ buffers/cache: 215068 1723388
Swap: 4161532 44572 4116960
On the old cacti server, in the "thold" tab, the data for that device shows that the current value is "4116960" - i.e., it's correct.
However on the new cacti server, in the "Thold" tab, the data for that device is "-4,116,960" - i.e., the right value but negative.
I can't work out why this is happening and I'd love any suggestions / advice / links / theories / guesses that anyone might have. I'm happy to provide any information that might be useful for troubleshooting (with redactions for security where essential).
[EDIT: I've put the `thold_template` settings for the `ucd_memAvailSwap` template in a reply.]
I've seen other forum posts about negative values where the solution was to fix PHP's time zone, but I believe that's not relevant to this case. The timezone is correct in the new cacti server's php.ini file as shown below, and the same time zone is in php.ini on the old cacti server.
Code: Select all
[~]$ grep -i timezone /etc/php.ini | grep -v "^;"
date.timezone = "Australia/Brisbane"
[~]$ ls -l /etc/localtime
lrwxrwxrwx. 1 root root 40 Sep 14 2017 /etc/localtime -> ../usr/share/zoneinfo/Australia/Brisbane
Code: Select all
MariaDB [cacti]> select * from plugin_thold_log where host_id=128 ;
+--------+------------+---------+----------------+--------------+-----------------+----------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+
| id | time | host_id | local_graph_id | threshold_id | threshold_value | current | status | type | description |
+--------+------------+---------+----------------+--------------+-----------------+----------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+
[snip lots of data]
| 243138 | 1682926434 | 128 | 4675 | 926 | 20 | -4116960 | 1 | 1 | Thold Baseline Cache Log |
| 243153 | 1682926745 | 128 | 4675 | 926 | 20 | -4116960 | 1 | 1 | Thold Baseline Cache Log |
| 243168 | 1682927033 | 128 | 4675 | 926 | 20 | -4116960 | 1 | 1 | Thold Baseline Cache Log |
+--------+------------+---------+----------------+--------------+-----------------+----------+--------+------+-------------------------------------------------------------------------------------------------------------------------------------------+
8923 rows in set (0.04 sec)
Code: Select all
MariaDB [cacti]> select * from cdef_items where id=128\G
*************************** 1. row ***************************
id: 128
hash: 2c2bf51719766ffba75900a2768570fc
cdef_id: 48
sequence: 1
type: 6
value: d
1 row in set (0.00 sec)
Code: Select all
MariaDB [cacti]> select * from data_input_fields where id=128\G
*************************** 1. row ***************************
id: 128
hash: 5553162fceec749a281dfc315c0630ad
data_input_id: 23
name: Questions
data_name: Questions
input_output: out
update_rra: on
sequence: 0
type_code:
regexp_match:
allow_nulls:
1 row in set (0.01 sec)
Code: Select all
MariaDB [cacti]> select * from thold_data where host_id=128 and data_source_name='ucd_memAvailSwap'\G
*************************** 1. row ***************************
id: 926
name: [redacted host name] - memAvailSwap [ucd_memAvailSwap]
name_cache: [redacted host name] - memAvailSwap [ucd_memAvailSwap]
local_data_id: 5368
data_template_rrd_id: 19762
local_graph_id: 4675
graph_template_id: 93
data_template_hash: 7fcc8ff25765979b5e1b2694c4530c21
data_template_id: 115
data_source_name: ucd_memAvailSwap
thold_hi: 0
thold_low: -3293568
thold_fail_trigger: 2
thold_fail_count: 0
time_hi:
time_low:
time_fail_trigger: 1
time_fail_length: 1
thold_warning_hi:
thold_warning_low:
thold_warning_fail_trigger: 2
thold_warning_fail_count: 0
time_warning_hi:
time_warning_low:
time_warning_fail_trigger: 1
time_warning_fail_length: 1
thold_alert: 0
prev_thold_alert: 0
thold_enabled: on
thold_type: 1
bl_ref_time_range: 86400
bl_pct_down: 20
bl_pct_up:
bl_fail_trigger: 3
bl_fail_count: 5470
bl_alert: 1
lastread: 4116960
lasttime: 2023-05-01 17:50:02
lastchanged: 2023-04-12 17:53:24
oldvalue: 4116960
repeat_alert: 48
notify_extra:
notify_warning_extra:
notify_warning: 1
notify_alert: 1
snmp_event_category: NULL
snmp_event_severity: 3
snmp_event_warning_severity: 2
thold_daemon_pid:
notes:
host_id: 128
syslog_priority: 3
syslog_facility: NULL
syslog_enabled:
data_type: 0
show_units:
cdef: 0
percent_ds: ucd_memAvailSwap
expression:
upper_ds:
thold_template_id: 22
template_enabled: on
tcheck: 1
exempt: off
acknowledgment:
thold_hrule_alert: NULL
thold_hrule_warning: NULL
restored_alert: off
reset_ack:
persist_ack:
email_body: A warning has been issued that requires your attention.[snip more text]
email_body_warn: A warning has been issued that requires your attention.[snip more text]
email_body_restoral:
trigger_cmd_high:
trigger_cmd_low:
trigger_cmd_norm:
bl_thold_valid: 1682985600
1 row in set (0.00 sec)
I've managed to work out that thold on the new cacti server is multiplying the value by -1 in the code below, from thold's thold_functions.php file:
Code: Select all
function thold_build_cdef($cdef, $value, $local_data_id, $data_template_rrd_id) {
[snip]
while($cursor < $x) {
$type = $cdef_array[$cursor]['type'];
switch($type) {
case 6:
array_push($stack, $cdef_array[$cursor]);
break;
case 2:
// this is a binary operation. pop two values, and then use them.
$v1 = thold_expression_rpn_pop($stack);
$v2 = thold_expression_rpn_pop($stack);
################### This next line is where the multiplication by -1 happens: ###################
$result = thold_rpn($v2['value'], $v1['value'], $cdef_array[$cursor]['value']);
// put the result back on the stack.
array_push($stack, array('type' => 6, 'value' => $result));
Code: Select all
v2 :
(
'id' => '10',
'hash' => 'c888c9fe6b62c26c4bfe23e18991731d',
'cdef_id' => '3',
'sequence' => '1',
'type' => 6,
'value' => '4116960',
)
v1 :
(
'id' => '12',
'hash' => '4355c197998c7f8b285be7821ddc6da4',
'cdef_id' => '3',
'sequence' => '2',
'type' => '6',
'value' => '-1',
)
cdef_array[cursor] :
(
'id' => '11',
'hash' => '1e1d0b29a94e08b648c8f053715442a0',
'cdef_id' => '3',
'sequence' => '3',
'type' => '2',
'value' => '3',
)
$result :
-4116960