Boost Upgrade in 3.0 resulting gaps in graphs

General discussion about Plugins for Cacti

Moderators: Developers, Moderators

elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Boost Upgrade in 3.0 resulting gaps in graphs

Post by elfada »

Hi,

I upgraded Boost from 2.4 to 3.0 last week and then I've got many gaps in my graphs.

I take a look at cacti.log and boost.log and there are many errors of this type :

Code: Select all

04/19/2010 02:38:47 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/1125/33830.rrd: expected 2 data source readings (got 2) from 1271677858
I tried to restart boost server : The gap problem wasn't solved on all graphs.

This problem don't affect all graphs but randomly a part of the 40k graphs. :roll:

For informations, I'm using spine 0.8.7e and cacti 0.8.7e.

Code: Select all

04/19/2010 02:46:09 PM - SYSTEM STATS: Time:67.9765 Method:spine Processes:1 Threads:12 Hosts:1327 HostsPerProcess:1327 DataSources:78908 RRDsProcessed:0 
Do you have any ideas to solve this problem ?
Attachments
gap_example.png
gap_example.png (26.13 KiB) Viewed 4506 times
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

The problem is still present...

When I try to access to the graph attached below, I've got this error message in the log :

Code: Select all

2010:04:22 10:16:14 - RRD Command 'update /var/www/html/rra/1302/38806.rrd --template traffic_in:traffic_out 1271921169:23772871 1271921783:18455347 1271922082:18455347:23772871 1271922380:18455347:23772871 1271922686:18455347:23772871 1271922985:18455347:23772871 1271923287:18455347:23772871 1271923574:18455347:23772871 1271923860:18455347:23772871 1271924159:18455347:23772871'
04/22/2010 10:16:14 AM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/1302/38806.rrd: expected 2 data source readings (got 2) from 1271921169
Do you have any ideas to solve it ?
Attachments
gap_example.png
gap_example.png (11.59 KiB) Viewed 4453 times
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Was it doing this before 3.0? Please confirm. Seems that for some reason, you are periodically missing some data that Boost should be rejecting or correcting for. I did not change this area of the code. However, I have been taking quite a bit of contributions on the code. So, it may have been changed by others lately.

Thanks,

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

Hi,

I didn't get this error message before upgrading boost in 3.0.

Do you have any ideas about origin of this strange message ?

Another thing :

I tried to activate the new features about direct population of table_output_boost by spine.
When this option is activated, spine stop to populate anything. I mean, poller_output and poller_output_boost stayed empty during many poll cycle. So, I deactivated this option.

Is there some common thing between these two problems ? :roll:

Don't hesitate to ask me for further informations.

Regards.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Well, to do the direct population you have to be running spine 0.8.7e/f (f is not released).

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

I'm already running spine 0.8.7e :wink:
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

1) Reduce maxOID's and increase snmp timeout. However, was this happening with earlier versions of boost. If you back off to those versions, does this problem go away?
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

I already decreased maxOIDs to 10 per request and set the timeout to 2000 ms.

This problem and particularly this error message came with last version of Boost. :cry:
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

What version were you using last when you had no issues?

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

I was using 2.4 since July 2009 and I had some errors about "duplicate ... timestamp"... I don't remember exactly.

Now, I don't have this error anymore ... but another one ! :wink:
BorisL
Cacti User
Posts: 131
Joined: Sat Mar 31, 2007 9:21 am
Contact:

Post by BorisL »

First let's deal with
I tried to activate the new features about direct population of table_output_boost by spine.
When this option is activated, spine stop to populate anything. I mean, poller_output and poller_output_boost stayed empty during many poll cycle. So, I deactivated this option.
Enable this option, disable poller execution in cron, then manually launch spine process with debugging. This could reveal something hidden.

Are there any errors in MySQL error log/cacti log/boost log?
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

Before re-enabling direct population, I would like to don't have some gap in my graph.

Do you have any idea about the origin of this strange error ?
04/28/2010 03:22:21 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4527.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:22:21 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4533.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:17:19 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4534.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:17:19 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4533.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:17:19 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4527.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:12:17 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4534.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:12:17 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4527.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:07:15 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4534.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:07:14 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4527.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 03:02:12 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4527.rrd: expected 2 data source readings (got 2) from 1272458412
04/28/2010 02:57:10 PM - BOOST_SERVER: Poller[0] ERROR: /var/www/html/rra/197/4527.rrd: expected 2 data source readings (got 2) from 1272458412
BorisL
Cacti User
Posts: 131
Joined: Sat Mar 31, 2007 9:21 am
Contact:

Post by BorisL »

Hmm, show you cron line and poller stats from cacti log.
elfada
Posts: 15
Joined: Wed Jul 01, 2009 8:25 am

Post by elfada »

Here is my /etc/crontab file :

Code: Select all

*/5 * * * * cactiuser /usr/bin/php /var/www/html/poller.php > /dev/null 2>&1
Some stats :

Code: Select all

04/29/2010 08:56:11 AM - SYSTEM STATS: Time:69.8146 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:51:09 AM - SYSTEM STATS: Time:67.7871 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:46:09 AM - SYSTEM STATS: Time:67.7201 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:41:08 AM - SYSTEM STATS: Time:67.6948 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:36:08 AM - SYSTEM STATS: Time:67.6951 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:31:07 AM - SYSTEM STATS: Time:65.7960 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:26:11 AM - SYSTEM STATS: Time:70.6108 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:21:39 AM - SYSTEM STATS: Time:97.0071 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:16:25 AM - SYSTEM STATS: Time:83.6233 Method:spine Processes:1 Threads:12 Hosts:1328 HostsPerProcess:1328 DataSources:78972 RRDsProcessed:0
04/29/2010 08:15:44 AM - SYSTEM BOOST STATS: Time:276.1600 RRDUpdates:933798 
[/quote]
BorisL
Cacti User
Posts: 131
Joined: Sat Mar 31, 2007 9:21 am
Contact:

Post by BorisL »

Code: Select all

SELECT * FROM poller_output_boost WHERE local_data_id=4527 AND time<=FROM_UNIXTIME(1272458412+800) AND time>=FROM_UNIXTIME(1272458412-800);
Alternatively you may use other DS or timestamp, but then I need warnings from rrdtool with corresponding data.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests