Error Updating RDD using Boost please see.

Deviloper · Post by **Deviloper** » Wed Sep 16, 2009 7:15 am

I got this Error since I am using boost: "illegal attempt to update using time 1253102136 when last update time is 1253102136 (minimum one second step)"

This is strange.

Also somebody should mention that Boost and Threshold donÂ´t work together.

Post by **gandalf** » Tue Sep 22, 2009 4:09 pm

Please post this to the BOOST thread. And post your boost settings as well. I suppose I have solved that on our installation a few months ago
Reinhard

Deviloper · Post by **Deviloper** » Wed Sep 23, 2009 8:22 am

could you hand me an url?

I noticed that boost do not mind/value my table limit of 500 000. Boost just ignores it. so the table grow very large and cannot be flushed to file.

(I did a workaround placing boost_poller.php -f into my crontab ever 7 minutes)

I also noticed and please correct me when i am wrong, poller_boost.php slurps x-lines from the db at once. Use them to update rrds. Than delete this entries from the db.

This is the perfect setup to get deadly race conditions. It would be totally clean if you move the "in progesse" to a table (or any persistent media) for example boost_output_update_in_progress indexed by the pid of the running boost_update.rrd. If the process dies unexpected this data can easily be recovered, but it also allows multiple boost_updates.php to run simultaneously without the feared effect of race conditions.

X-dark · Post by **X-dark** » Wed Sep 23, 2009 10:32 am

Deviloper wrote:could you hand me an url?

http://forums.cacti.net/viewtopic.php?t=31278

Post by **TheWitness** » Wed Sep 23, 2009 8:04 pm

Deviloper wrote:could you hand me an url?

I noticed that boost do not mind/value my table limit of 500 000. Boost just ignores it. so the table grow very large and cannot be flushed to file.

(I did a workaround placing boost_poller.php -f into my crontab ever 7 minutes)

I also noticed and please correct me when i am wrong, poller_boost.php slurps x-lines from the db at once. Use them to update rrds. Than delete this entries from the db.

This is the perfect setup to get deadly race conditions. It would be totally clean if you move the "in progesse" to a table (or any persistent media) for example boost_output_update_in_progress indexed by the pid of the running boost_update.rrd. If the process dies unexpected this data can easily be recovered, but it also allows multiple boost_updates.php to run simultaneously without the feared effect of race conditions.

I think that you don't fully understand how boost works. Your comment implies that you do not. You have to "tune" boost to work properly. It makes absolutely no sense to update every 7 minutes. You should be updating every 2-4 hours and with between 2-9M records in the boost table.

In addition, the poller_output table and the poller_output_boost tables should both be memory. However, keep in mind that if you max_heap_table size is not large enough, you system will die. This is one important aspect of the word "tune". The second is that you should reduce the size of the "output" column in both tables so as to maximize the number of rows you can fit in system memory.

There are a number of Cacti users with over 400k Data Sources that have no problems with boost updating almost 10m records during the update. So, education is essential to solving your problems.

TheWitness

Deviloper · Post by **Deviloper** » Thu Sep 24, 2009 3:28 am

Hi,

I do understand how you think that boost should work, but what you setup here is nothing but a gigantic setup for generating race conditions between a buffer table, an enquerer (the poller) and a dequerer (process to update rrds).

Also you do not understand the nature of a) your datasources b) your datasinks. You and the whole cacti project do not honor the technical aspects of a network with its extreme high latency and the db with its cost intensive operations.

The first mayor design flaw is trying to poll as fast as possible instead of giving each network operation a well-defined timestamp and a amount of costs (time) it may use. This is a waste of time, produces unnessesary high load situations in the network and the server and wastes large amounts of memory and IO. You even have the db infrastructure (last_update) but you simply did not manage to make the proper use of it.

Secondly you (the cacti project) do not understand how memory management in modern Operations System works. You are wasting so much IO by loading and unloading code from hdd to memory and writing from memory to hdd that I am getting sick of it. Instead of daemonizing processes keeping code in memory, increasing operation times due to avoiding interpreter and programm startup times you start and stop hundreds of operation processes, which do not share information between each other, instead they all gather the same information from the db

. There is only one major disadvantage (the start up overhead) in the use of scripting-languages and you really managed to make the worst out of it.

The Third thing is you have not understood the difference between serial and parallel. You are seriallizing read and communication processes while processing high latency datasources and parallelize process of (costly) writing to low latency datasinks. It would be brilliant the other way around.

Looking at a near realtime cpu and network load graph would open your eyes.

To make it a little more detailed for you:
In my setup I parallized the process of WRITING to harddrive located files (the rrds) with the process of WAITING on network device responses. Get It? Can you feel now how clever this is?

To make it short: I am making $ 8000 a month. Pay me two month and I turn this Frankenstein Monster of script programming into the Sharon Stone of script programming.

BTW: It was my job to design and programm datacollectors for over 8 years.

>There are a number of Cacti users with over 400k Data Sources that
>have no problems with boost updating almost 10m records during the
>update. So, education is essential to solving your problems.

I am using only a cheap quad core system with 4 gb ram and a single hdd. A system for less than $ 500 to collect 240 000 data-sources.
I see some setups descriptions in this forum for more than $ 10 000 having performance problems. Surely I could spread the system to a dedicated webserver, a dedicated db-server and a dedicated polling station, but I think when the system is wasting much of its time waiting, than there is no need to spend money and enough space for optimization.

The memory table can only be used with latest mysql, there is no real stable release of it for the debian os. and instability and memory-tables to not mix very well.

Using MyISAM Tables, the table should also be kept in memory as long as it is not larger than the configured table size. I am using 7 minutes because its a low number with a high Least common multiple to 5 so the 5 minutes polling and the 7 minutes of rrd-updating only collides once a hour. If I would have more memory I could use a large heap_size and could go to 11-14 minutes. But this is not the case at the moment.

The side effect is: I have recently updated rrds without wait-time.
(In the 7 minutes the boost buffer table grows to about 500 000 entries.)

Post by **TheWitness** » Thu Sep 24, 2009 6:49 am

Ok, let's square off then. Let's talk about the points one at a time shall we:

You must keep one thing in mind first. This is "Open Source", and a Hobby for most. Although, just about everyone here uses Cacti for work purposes. Cacti has made careers for some.

Now, let's take the points one at a time.

1) Cacti processes are not daemonized - This is not a big deal IMHO. The overhead of loading information into the spine memory, does not cause significant load. This is especially true. Granted, we could load the poller items table once and then simply keep a memory images of what to poll and when. However, the reason that we did it this way was simple. We can not correct for problems in dependent packages that sometimes cause problems. So, by using Cron, we keep ourselves isolated from these issues. In addition, we attempt to support Windows, which add complexity that we don't have too much time for.

2) Parallel vs. Serial - No one uses cmd.php in large deployments. The serial nature of the cmd.php can be partially rectified by using multiple processes. The cmd.php script was, and still is, intended for beginners to allow them to "ramp up". Spine on the other hand is a highly parallelized process that is designed, for 10k hosts per minute on an 8 core box. You know the code, do the math. Latency aside, that's the approximate number.

3) Performing Boost updates every 7 minutes on a 240k DS system is "insane" and misses the entire point of Boost. Its purpose is to reduce I/O wait by batch updating RRDfiles as infrequently as possible. 7 minutes is not "infrequent". If you are going to update every 7 minutes. You should not use Boost, but may something like SSD, or RAM disk.

4) Separating polling from RRDupdates. Yes, this is "exactly what boost does". Sure, neither Boost or Spine are daemons. Again, this is by design. The Race condition that you speak of between the "enqueuer" and the "dequeuer" is by you own hand in running boost so frequently. You are not using Boost the way it was designed to be used. Until you understand that, you will be bashing your head against a wall!!

5) Not understanding the difference between Datasources and Datasinks. Uhm give me a break. I am an engineer an understand this all too well. I have multiple (unpublished) plugins that use N-Tier collection asynchronously so that the data to be graphed is always in the database and not 400ms away. Even in cases where I can not avoid the 400ms, I have designed in MAX_OIDS so that you can gather as many OID's as possible in one UDP frame.

6) I tend to agree with you relative to costs and high load during polling (due to get it as fast as possible). However, this can be curtailed using settings. If you use the N-Tier approach, and separate the Database from the Web Server, and even the Web Server from the Poller, which are all possible even using the old 0.8.7x, you can eliminate the load generated by the Poller from affecting overall performance of the system.

7) 500k Rows in the boost table is nothing. Larger customer than you, who use boost keep several million rows in the table. Since they all use SNMP for data collection, they keep the size of the output columns in both the poller_output and poller_output_boost to approximately varchar(50). This allows tens of millions of rows to be kept in memory if your max_heap_table is approximately 2GB. Its true that the table never completely empties, but that is not an issue except for the paranoid.

In summary, outside of your clear misunderstanding of the way that Boost works, and your apparent denial of how I told you so, and change it's (your) behavior, I tend to agree that in the ideal word we would daemonize both Boost and the Poller, and build in some remote agent infrastructure to allow Cacti to be N-Tier. However, when you go all the way back to my first unnumbered point. This is after all open source. And for most, it works just fine.

TheWitness

ps. I expect someone to tell us to go get a room soon

Post by **TheWitness** » Thu Sep 24, 2009 6:53 am

Oh, one more thing. Sorry you are having a bad day

Deviloper · Post by **Deviloper** » Thu Sep 24, 2009 9:56 am

If you give me a change I am willing to improve things also for free. But If you tell me I donÂ´t know what I am doing I get angry, them same with you. Thats normal.

I am getting in the same race conditions running boost in 7minutes or 2 hours. but dumping a table with a load of 7 minutes to /dev/null is much more painless than a table with 2 h of date

.

Some day I hit the 8 GRows table size and thats it.

BTW:
Do you mean for sure poller_boost.php line 124:

$rows = db_fetch_row("SELECT * FROM poller_output_boost WHERE time<='$current_time' LIMIT 1");

Actuelly I am at this performance level:
09/24/2009 04:58:48 PM - SYSTEM BOOST STATS: Time:1242.3508 RRDUpdates:1202001
and after switching to MEMORY
09/25/2009 12:53:50 PM - SYSTEM BOOST STATS: Time:1146.2892 RRDUpdates:657077

I would say to increase boost performance we could easily use a value of up to 100 rows at a time without doing a huge impact to the system, because of db and os caches.

Post by **TheWitness** » Thu Sep 24, 2009 10:01 am

No, problem, we were bashing each other. So, as you stated, this response is normal.

That specific query is simply to see if you are done yet. You don't need more than a limit of 1 as you are just trying to get a row count, nothing more. If it comes back 0 rows, then you are done.

Helping is also welcomed. Between you BorisL and myself, I think we can do all sorts of damage. So, I'm all for it.

TheWitness

Deviloper · Post by **Deviloper** » Thu Sep 24, 2009 10:05 am

Tomorrow I will get 4 GB extra ram...

Post by **TheWitness** » Thu Sep 24, 2009 10:07 am

That will help fix your problem. Most of my big systems have 32gb and the database is on another box. Leaves lot's of room for making boost get some breathing room (memory wise). We can talk later. We should meet, the three of us via Skype at some time.

TheWitness

Post by **gandalf** » Thu Sep 24, 2009 1:00 pm

TheWitness wrote:That will help fix your problem. Most of my big systems have 32gb and the database is on another box. Leaves lot's of room for making boost get some breathing room (memory wise). We can talk later. We should meet, the three of us via Skype at some time.

TheWitness

If a moderator is needed, I will volunteer. But my russian is bad. Nasdrovje

Reinhard

Deviloper · Post by **Deviloper** » Thu Sep 24, 2009 1:28 pm

I could hand you some of my pollers, but they are done in spaghetti perl to get max performance ( 1 scope no io

).

But what would you place TMPFS-Partition more likely RRDs or the Cacti-Database to enhance performance. (I guess the rrds should be sync to a persitent filesystem at least once a day.)

My problem still is that boost_output table is increasing a lot faster, by 1.000000 per hour than boost_update can move it to the rrds.
I tried to activate lowpriority updates to increase select perfromance, this slows the grow-rate but did not stop.

Have somebody tried to concurrent updates type 2 which leaves empty deleted rows in the table?

Post by **TheWitness** » Thu Sep 24, 2009 7:57 pm

Can you run the following two commands:

Code: Select all

du -hs /var/www/html/cacti/rra
ls -1 | wc -l

.

Also, don't use MyISAM, it wont be able to keep up. You will find MEMORY 10x faster. Also, where are the RRDfiles? Are they on NFS? That's a bad move GFS is not good either.

TheWitness

Error Updating RDD using Boost please see.

Error Updating RDD using Boost please see.

Who is online