Error Updating RDD using Boost please see.

Deviloper · Post by **Deviloper** » Fri Sep 25, 2009 4:18 am

okay.
The rras resist in the only local filesystem. and they are
11G /var/www/cacti7/rra
49701 Files
so I can forget about placing them on the TMPFS.

But the DB is only
510M /var/lib/mysql/cacti8/

I guess i could completely but it into memory after installing the extra 4gb.

I got another idea enhancing db performance.
What about rotating buffer tables like oracle sap erp systems use them to increase performance? This would reduce collisions between Inserts, Selects and Updates on the table. Ok we need to have 1 or to additional boost tables and a place to store which buffer should be used.

And I guess there is no need to put U values into the boost_buffer_table.
(This is about 10%-12% of my data.)

Would you count this error as normal behavior?
09/25/2009 12:00:12 PM - CMDPHP: Poller[0] ERROR: A DB Exec Failed!, Error:'1114', SQL:"INSERT INTO poller_output_boost (local_data_id, rrd_name, time, output) VALUES ('12','proc','2009-09-25 12:00:11','153'), ('11','users','2009-09-25 12:00:11','4'), ('10','','2009-09-25 12:00:11','1min:3.94 5min:4.38 10min:4.93'), ('9','mem_swap','2009-09-25 12:00:11','2639988'), ('8','mem_buffers','2009-09-25 12:00:11','535064') ON DUPLICATE KEY UPDATE output=VALUES(output)'

Post by **TheWitness** » Fri Sep 25, 2009 6:41 am

No, that is a bad thing. Bottom line, don't use innodb for either poller_output or poller_output_boost. The reason for this is that INNODB grows eternally on the file system. Without an optimization, it will not shrink. It's a nasty thing InnoDB. Yes it's good, but you have to ballance it's goodness against it's badness, and for the poller output tables, it's badness.

Memory is the only way to go. Memory tables are fixed in size and use linked lists and do not cause the same issues. They are very efficient and don't have all the overhead of InnoDB, and are faster than MyISAM. Since they run at memory speed, the table level locking is not a problem.

Also, Since you only have ~0.5GB, you should not be having these problems. If you are running RRDtool 1.3.x, try going back to RRDtool 1.2.27-30. You might be pleasantly surprised. Just as an example, I have one customer with >100k RRDfiles and 180GBytes of storage, their boost updates all RRDfiles in about 15 minutes once every four hours.

I have a few hours this morning if you want to do a GotoMeeting...

TheWitness

Deviloper · Post by **Deviloper** » Thu Oct 01, 2009 3:05 am

Sorry, I got a cold and was pinned down the last 6 days.

Just hand me an info for the next meeting.
Thx,
Bo

Deviloper · Post by **Deviloper** » Thu Oct 01, 2009 4:53 am

My standard snmp poller design:

Storing poller data:
+-------------+
| host-id |
| oids* |
| interval |
+-------------+

*To distinguish MIB-Table-Objects from single instance MIB-Objects I usually use a . at the end of MIB-Table. So the MIB-Object .1.3.6.1.2.2.1. would indicate a MIB-Table-Object which we can walk/bulkwalk/get-table, while a .1.3.6.1.2.2.1.1 will indicate a single Mibobject we can simple snmp-get.

To reduce the amount of data read from the db and passed to the script,
I sometimes use templates which are common sets of oids. This templates are stored in an additional table. Due to the fact that this template table does not change very often and it is rather small, it can be cached insight the poller-engine to reduce db-load.

+---------------+
| task-id........|
| host-id........|.........+-------------+
| template-id..|---->| template-id.|
| interval*......|.........| oids...........|
| last success*1|......+-------------+
+---------------+

*We do not need to save a last or next poll time to the database. This can be handled by the master-process. The master-process only needs to know the interval in which a data should be gathered from a host.
The master-process itself will create a dynamic schedule for polling. It uses the number of hosts and oids to create "tasks". This tasks contains the information "time to poll", "node to poll", "oid to poll".

*1 If we want to be able to determite the time of the last successful poll, which is not necessary in every situation, we can add a value "last successful poll" to the database. I prefer to do this in the master-process and output statistics (like unreachable hosts, backlogs etc.) only by request to reduce db-load.

The master-process is a deamon, which continously checks if a task is scheduled by now (+n Seconds for finetuning) (or older than 'now' which means we have a backlog.)

The master-process schedules the task to prevent high system and networkload gently over the n-seconds of a interval. For Example having a typical 300 seconds interval and 30 hosts with 10 oids we could create 300 tasks handing every task to on accurately defined second. Knowing, from a baselining, that this kind of operations takes an average of n Milliseconds the master-process will dynamicly start n poller threads to fulfill the requestes actions within the interval.
(To fine tune this process I usually declare 10-20 seconds of an intervall as spare time.) When a thread returns a host/oid/value-datastructure to the master-process. The master-process adds a time stamp and calculates the "moment for the next poll" (now + interval seconds).

(Usually I do not have to deal with databases by myself, I store the output into fifo-files also known as pipes. From which a db-load/importer script reads them into the fault and performance database. This pipes are working as small and high performance buffers between the poller and the database.)

An alternative solution I use is in-memory-caching of output-data:
A Master-Process enqueues polling tasks and creates as many as necessary threads (*dynamicly calculatet by the number of oids and hosts to work on) which poll the nodes unblocking and return the data nusually "HOST OID VALUE" to the master-process, when done.
Therefor that db-write and print are expensive but not time sensitive operations, this operations will be performed partially (or if possible completly), when there nothing else is to do for the master process (queue is empty, no threads waiting return).

In this setup the master-process also do the handling of problematic hosts. Unreachable host will be marked as unreachable and not polled for a given number of intervals. (Polling unreachable host creates much more load, than polling reachable hosts.) In some setups a ping-test is performt before removing the unreachable-mark from the host.

I also did some pollers which do delta calculations (like traffic etc) on the fly. Therefor I cached the counter values of the last poll on a per node per interface base in memory. (The poller is usually so accurate that you do not need to interpollate the data like rrd does. You can simply substracted the old value from the new value to create the delta data, but you have to implement a continuity check to notice overflows, resets and possible reindexes.)

Cacti

Error Updating RDD using Boost please see.

Who is online