Multiple data source templates corrupted after upgrade from 1.1.37 -> 1.2.9

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
eriks
Posts: 30
Joined: Wed Aug 25, 2010 4:57 pm

Multiple data source templates corrupted after upgrade from 1.1.37 -> 1.2.9

Post by eriks »

Hello,
I have a big problem here. It was noticed that a few of our graphs were acting strangely, showing odd peaks and valleys between polling runs. After confirming that the data was correct the problem led me to the maximum value allowed in the data source template for these graphs. When I looked at the template this is where my nightmare started.

There are several templates with this problem (but not all of them) the most serious of which is the "Interface - Traffic" template (ID=41) which has around 4000 data sources using it.

When an affected template is clicked on in the web GUI the following errors appear in the cacti log:

Code: Select all

2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_template_id in file: /usr/share/cacti/site/data_templates.php  on line: 607
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[607]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_template_id in file: /usr/share/cacti/site/data_templates.php  on line: 607
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[607]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_template_id in file: /usr/share/cacti/site/data_templates.php  on line: 607
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[607]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_template_id in file: /usr/share/cacti/site/data_templates.php  on line: 607
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[607]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_template_id in file: /usr/share/cacti/site/data_templates.php  on line: 607
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[607]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_sources in file: /usr/share/cacti/site/data_templates.php  on line: 661
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[661]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_sources in file: /usr/share/cacti/site/data_templates.php  on line: 661
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[661]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_input_id in file: /usr/share/cacti/site/data_templates.php  on line: 739
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[739]:CactiErrorHandler())
2020/03/12 18:39:32 - ERROR PHP NOTICE: Undefined index: data_input_id in file: /usr/share/cacti/site/data_templates.php  on line: 744
2020/03/12 18:39:32 - CMDPHP PHP ERROR NOTICE Backtrace:  (/data_templates.php[67]:template_edit(), /data_templates.php[744]:CactiErrorHandler())
When a affected data source template is viewed in the GUI it looks like this:

Image

Affected templates exhibit the same symptoms:

1. The Name field is empty
2. Data input method shows as "None" and is greyed out
3. The data source profile is wrong. Three are configured in this system and the data sources currently using this template are not using the profile shown here.
4. The data source items are correct
5. There's nothing under "Custom Data" due to #2

Any attempt to fix/change and save these values results in cacti creating a "copy" of the template under a new ID rather then updating the broken template. When the list is refreshed and the broken template opened again it is unchanged.

I spent a couple of hours digging around in the database trying to find out what was broken but I was not successful. With 4000 data sources using this one template (not counting the others) deleting them all and recreating them and the associated data/rrds is not an option for me - so I'm really hoping someone can help.

I'm not afraid to get my hands dirty in the database...

Thanks
Last edited by eriks on Fri Mar 13, 2020 5:10 pm, edited 1 time in total.
eriks
Posts: 30
Joined: Wed Aug 25, 2010 4:57 pm

Re: Multiple data source templates corrupted after upgrade from 1.1.37 -> 1.2.9

Post by eriks »

I'm still working on this but I have found some additional information. On the new version of cacti 1.2.9, running on Ubuntu 18.04 and MySQL 5.7 this query returns no rows for any broken template (including ID=41):

Code: Select all

mysql> SELECT dtd.*, SUM(CASE WHEN dl.data_template_id = '41' THEN 1 ELSE 0 END) AS data_sources FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41' HAVING dtd.local_data_id=0;
Empty set (0.07 sec)
Run the same query on another installation of cacti 1.1.37 under FreeBSD 10 and MySQL 5.6 this same query returns the expected data:

Code: Select all

mysql> SELECT dtd.*, SUM(CASE WHEN dl.data_template_id = '41' THEN 1 ELSE 0 END) AS data_sources FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41' HAVING dtd.local_data_id=0;
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+--------------+
| id | local_data_template_data_id | local_data_id | data_template_id | data_input_id | t_name | name                         | name_cache | data_source_path | t_active | active | t_rrd_step | rrd_step | t_data_source_profile_id | data_source_profile_id | data_sources |
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+--------------+
| 41 |                           0 |             0 |               41 |             2 | on     | |host_description| - Traffic |            | NULL             |          | on     |            |      300 |                          |                      2 |         1304 |
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+--------------+
1 row in set (0.04 sec)
I have noticed, that cacti 1.1.37 does not seem to use the SQL query to sum the data sources using the template or at the very least the queries it makes are quite a bit different when the template is selected.

On the new version, if you remove the SUM(CASE WHEN dl.data_template_id = '41' THEN 1 ELSE 0 END) AS data_sources from the query you get the expected result. I do realize there's no point in the JOIN here since dl is never referenced in the query, but I was trying to make the least number of changes possible and have it still work:

Code: Select all

mysql> SELECT dtd.* FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41' HAVING dtd.local_data_id=0; 
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+
| id | local_data_template_data_id | local_data_id | data_template_id | data_input_id | t_name | name                         | name_cache | data_source_path | t_active | active | t_rrd_step | rrd_step | t_data_source_profile_id | data_source_profile_id |
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+
| 41 |                           0 |             0 |               41 |             2 | on     | |host_description| - Traffic |            | NULL             |          | on     |            |      300 |                          |                      3 |
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+
1 row in set (0.05 sec)
Exactly why MySQL returns 0 rows so long as the SUM/CASE WHEN is present in the query is still mystery to me, but I suspect this is not intentional in the developers part?
Last edited by eriks on Fri Mar 13, 2020 5:11 pm, edited 1 time in total.
eriks
Posts: 30
Joined: Wed Aug 25, 2010 4:57 pm

Re: Multiple data source templates corrupted after upgrade from 1.1.37 -> 1.2.9

Post by eriks »

I believe I know what is going on here. I'm pretty sure this is a bug caused by a query which is not specific enough.

This query

Code: Select all

SELECT dtd.*, SUM(CASE WHEN dl.data_template_id = '41' THEN 1 ELSE 0 END) AS data_sources FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41' HAVING dtd.local_data_id=0;
Can only return a single row from the data_template_data table due to the SUM(). If you remove the HAVING filter from the end and run this query:

Code: Select all

SELECT dtd.*, SUM(CASE WHEN dl.data_template_id = '41' THEN 1 ELSE 0 END) AS data_sources FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41';
On my database you get a row which does NOT have local_data_id = 0 (a data source row, rather then a template). When filtered with the HAVING query this naturally returns no results. In this circumstance MySQL seems to return the first row that was created in the database which under normal circumstances would be the template row with local_data_id = 0. But if you do something crazy like I did and export the database via mysqldump on FreeBSD with version 5.6 then import it again on Linux with version 5.7 it appears that the order in which the rows are added to the database after the import is not maintained. As such with the same database on the original MySQL 5.6 host you get this result for the query:

Code: Select all

mysql> SELECT dtd.* FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41' LIMIT 1;
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+--------------+
| id | local_data_template_data_id | local_data_id | data_template_id | data_input_id | t_name | name                         | name_cache | data_source_path | t_active | active | t_rrd_step | rrd_step | t_data_source_profile_id | data_source_profile_id | data_sources |
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+--------------+
| 41 |                           0 |             0 |               41 |             2 | on     | |host_description| - Traffic |            | NULL             |          | on     |            |      300 |                          |                      2 |         1304 |
+----+-----------------------------+---------------+------------------+---------------+--------+------------------------------+------------+------------------+----------+--------+------------+----------+--------------------------+------------------------+--------------+
1 row in set (0.04 sec)
But after the export, then import on the 5.7 Linux host the SAME query on the SAME database now returns this row instead:

Code: Select all

mysql> SELECT dtd.* FROM data_template_data AS dtd LEFT JOIN data_local AS dl ON dl.id=dtd.local_data_id WHERE dtd.data_template_id = '41' LIMIT 1;                    +-------+-----------------------------+---------------+------------------+---------------+--------+------------------------------------------------+-------------------------------------------------------+---------------------------------------------------------------------+----------+--------+------------+----------+--------------------------+------------------------+
| id    | local_data_template_data_id | local_data_id | data_template_id | data_input_id | t_name | name                                           | name_cache                                            | data_source_path                                                    | t_active | active | t_rrd_step | rrd_step | t_data_source_profile_id | data_source_profile_id |
+-------+-----------------------------+---------------+------------------+---------------+--------+------------------------------------------------+-------------------------------------------------------+---------------------------------------------------------------------+----------+--------+------------+----------+--------------------------+------------------------+
| 11665 |                          41 |         11528 |               41 |             2 | NULL   | |host_description| - Traffic - |query_ifDescr| | Resolute Bay PowerBeam02 SCIN -> CSP - Traffic - eth0 | <path_rra>/resolute_bay_powerbeam02_scin_-_csp_traffic_in_11528.rrd | NULL     | on     | NULL       |      300 |                          |                      3 |
+-------+-----------------------------+---------------+------------------+---------------+--------+------------------------------------------------+-------------------------------------------------------+---------------------------------------------------------------------+----------+--------+------------+----------+--------------------------+------------------------+
1 row in set (0.00 sec)
In the first case the HAVING filter would match and the row would be returned. In the second case the HAVING filter does not match and you get an empty set breaking the template. This is the case with ALL my broken templates. The root cause here is an incorrect assumption about the order in which the rows are actually created in the database. This order is not maintained though an import and export and i'm not sure if there's any switches to mysql/mysqldump to ensure that it is. The query needs to be more specific or even multiple queries are required.

My SQL foo so far has failed me in trying to compose a query to fix this problem and provide the same result, I can only think of ways to do it in multiple queries or patches to the code.
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests