Hi TheWitness ..... i think i found another interesting situation here in v1.4.
I have 'On Demand RRD Updating' enabled and 'How Often Should Boost Update All RRD's' set to 4 hours. So, RRDs will be written to disk every 4 hours.
My last BOOST updated happened about 4:10 PM
Code: Select all
07/14/2007 04:10:38 PM - SYSTEM BOOST STATS: Time:7.0927 RRDUpdates:7275
So, next one would happen about 8PM and some minutes according to my settings ....
At about 7PM, i created a new graph, which uses 3 datasources. It's a normal Linux Load graph, one datasource for 1 minute, other 5 minutes and last one 15 min.
Watching logs, i can see those datasources are being correctly collected and parsed:
07/14/2007 07:05:05 PM - CACTID: Poller[0] Host[16] DS[522] SNMP: v1: 192.168.1.7, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 0.37
07/14/2007 07:05:05 PM - CACTID: Poller[0] Host[16] DS[524] SNMP: v1: 192.168.1.7, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 0.30
07/14/2007 07:05:05 PM - CACTID: Poller[0] Host[16] DS[523] SNMP: v1: 192.168.1.7, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 0.20
07/14/2007 07:15:05 PM - CACTID: Poller[0] Host[16] DS[522] SNMP: v1: 192.168.1.7, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 0.13
07/14/2007 07:15:05 PM - CACTID: Poller[0] Host[16] DS[524] SNMP: v1: 192.168.1.7, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 0.20
07/14/2007 07:15:05 PM - CACTID: Poller[0] Host[16] DS[523] SNMP: v1: 192.168.1.7, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 0.18
07/14/2007 07:55:06 PM - CACTID: Poller[0] Host[16] DS[522] SNMP: v1: 192.168.1.7, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 0.14
07/14/2007 07:55:06 PM - CACTID: Poller[0] Host[16] DS[524] SNMP: v1: 192.168.1.7, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 0.17
07/14/2007 07:55:06 PM - CACTID: Poller[0] Host[16] DS[523] SNMP: v1: 192.168.1.7, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 0.17
(last one before Boost RRD Update happens)
07/14/2007 08:15:07 PM - CACTID: Poller[0] Host[16] DS[522] SNMP: v1: 192.168.1.7, dsname: load_1min, oid: .1.3.6.1.4.1.2021.10.1.3.1, value: 0.06
07/14/2007 08:15:07 PM - CACTID: Poller[0] Host[16] DS[524] SNMP: v1: 192.168.1.7, dsname: load_5min, oid: .1.3.6.1.4.1.2021.10.1.3.2, value: 0.12
07/14/2007 08:15:07 PM - CACTID: Poller[0] Host[16] DS[523] SNMP: v1: 192.168.1.7, dsname: load_15min, oid: .1.3.6.1.4.1.2021.10.1.3.3, value: 0.15
after 20h15 pooling process, Boost RRD Update happens !
I see on the logs the RRD being created:
Code: Select all
07/14/2007 08:15:34 PM - BOOST: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.23/bin/rrdtool create /home/httpd/html/cacti2/rra/[b]mail_gateway_sp_load_1min_522.rrd[/b] --start 1184449805 --step 300 DS:load_1min:GAUGE:600:0:500 RRA:AVERAGE:0.5:1:600 RRA:AVERAGE:0.5:6:700 RRA:AVERAGE:0.5:24:775 RRA:AVERAGE:0.5:1:115200 RRA:MIN:0.5:1:600 RRA:MIN:0.5:6:700 RRA:MIN:0.5:24:775 RRA:MIN:0.5:1:115200 RRA:MAX:0.5:1:600 RRA:MAX:0.5:6:700 RRA:MAX:0.5:24:775 RRA:MAX:0.5:1:115200 RRA:LAST:0.5:1:600 RRA:LAST:0.5:6:700 RRA:LAST:0.5:24:775 RRA:LAST:0.5:1:115200
07/14/2007 08:15:35 PM - BOOST: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.23/bin/rrdtool create /home/httpd/html/cacti2/rra/mail_gateway_sp_load_5min_524.rrd --start 1184449805 --step 300 DS:load_5min:GAUGE:600:0:500 RRA:AVERAGE:0.5:1:600 RRA:AVERAGE:0.5:6:700 RRA:AVERAGE:0.5:24:775 RRA:AVERAGE:0.5:1:115200 RRA:MIN:0.5:1:600 RRA:MIN:0.5:6:700 RRA:MIN:0.5:24:775 RRA:MIN:0.5:1:115200 RRA:MAX:0.5:1:600 RRA:MAX:0.5:6:700 RRA:MAX:0.5:24:775 RRA:MAX:0.5:1:115200 RRA:LAST:0.5:1:600 RRA:LAST:0.5:6:700 RRA:LAST:0.5:24:775 RRA:LAST:0.5:1:115200
07/14/2007 08:15:35 PM - BOOST: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.23/bin/rrdtool create /home/httpd/html/cacti2/rra/mail_gateway_sp_load_15min_523.rrd --start 1184449805 --step 300 DS:load_15min:GAUGE:600:0:500 RRA:AVERAGE:0.5:1:600 RRA:AVERAGE:0.5:6:700 RRA:AVERAGE:0.5:24:775 RRA:AVERAGE:0.5:1:115200 RRA:MIN:0.5:1:600 RRA:MIN:0.5:6:700 RRA:MIN:0.5:24:775 RRA:MIN:0.5:1:115200 RRA:MAX:0.5:1:600 RRA:MAX:0.5:6:700 RRA:MAX:0.5:24:775 RRA:MAX:0.5:1:115200 RRA:LAST:0.5:1:600 RRA:LAST:0.5:6:700 RRA:LAST:0.5:24:775 RRA:LAST:0.5:1:115200
But there's no 'rrdtool update' at all !!! In the above case, i lost about 1 hour of data that was correctly collected, correctly parsed, correctly stored on poller_output_boost table but NOT correctly updated on the RRD file. Next updates didn't updated the lost data as well.
I made another test ..... created the graph, let the poller successfully run for 4-5 times and watched the graph through cacti interface, which forces the updating process. Nothing shows on the logs, even with DEBUG, but i know CREATE happens because the RRD file was created on the disk. But again, with no values, it was just created and apparently data was lost. After the initial CREATE, tough, things happens as expected. A force update, through a graph watching, would issue the expected UPDATEs and graph shows with no problems.
So, as i observed, people using boost can loose some hours of collected values, if they create the graph and didn't force the RRD create process.
This is completly dependant on boost settings. A '10 minutes' on 'How Often....' would made the problem almost dissapear. It would still exist, but data lost would be 1-2 pooling process only. On the other side, with 6 hours, people could potentially lost great amount of data.
Not a MAJOR bug indeed. But loosing collected data is always a bad idea