Processes blocked.

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
magnetyk
Posts: 34
Joined: Tue May 10, 2005 10:04 am

Processes blocked.

Post by magnetyk »

Hi,

When a timeout is reached by the poller, all processes linked to this poll are not killed and in my ps i've got :

Code: Select all

root     11168  0.0  0.0  2056  936 ?        S    12:50   0:00 /USR/SBIN/CRON
www-data 11169  0.0  0.0     0    0 ?        Zs   12:50   0:00 [sh] <defunct>
www-data 11180  0.0  0.8 18412 9136 ?        S    12:50   0:00 /usr/bin/php -q /usr/share/cacti/cmd.php 0 30
www-data 11190  0.0  0.7 16720 7272 ?        S    12:50   0:00 /usr/bin/php /usr/share/cacti/script_server.php cmd
Debian-  11236  0.0  0.1  4456 1600 ?        S    12:50   0:00 /usr/sbin/sendmail -i -FCronDaemon -oem www-data
www-data 11626  0.0  0.1  4404 1956 ?        S    12:50   0:00 /usr/bin/snmpget -O fntUev -c        -v 1 -t 1 -r 2...
I don't know how to automaticaly kill these processes when a poll fails. In few days, more than 250 process are kept in the cache and I've to reboot the computer :(

'Xcuse for bag english, i'm french ;)

Thx for helping me !
Mikf
Posts: 47
Joined: Fri Aug 13, 2004 11:35 am
Location: Paris, France

Re: Processes blocked.

Post by Mikf »

Hello,
salut
magnetyk wrote: more than 250 process
hmmm, more than 250 processes,
check if the value "max user processes" is enough with ulimit
default is 256

#ulimit -u

#ulimit -u <limit>

test with a higher value.

but 250 processes is a very high value!!!
what is your poller configuration ? (processes, threads etc)

Mikf
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Processes blocked.

Post by gandalf »

magnetyk wrote:When a timeout is reached by the poller, all processes linked to this poll are not killed
Bonjour,
first I'd like to address the fact of the poller reaching the timeout. This should not occur. If it does, please clarify what is the reason for that. Are there some "strange scripts" running to "infinity"?
Personally, I had that problem when two pollers where running concurrently (one in /etc/crontab and one in /etc/cron.d/cacti).
Reinhard
magnetyk
Posts: 34
Joined: Tue May 10, 2005 10:04 am

Post by magnetyk »

what is your poller configuration ? (processes, threads etc)
I tested several configs, now it's a 12 processes for 23 hosts, 2 hosts/processus. The poller time is about 44 secondes. When a poll is reaching the 296 secs limit 6 processes are blocked, always the same... :

Code: Select all

root     25940  0.0  0.0  2056  936 ?        S    00:40   0:00 /USR/SBIN/CRON
www-data 25941  0.0  0.0     0    0 ?        Zs   00:40   0:00 [sh] <defunct>
www-data 25950  0.0  0.7 17488 8284 ?        S    00:40   0:00 /usr/bin/php -q /usr/share/cacti/cmd.php 13 14
www-data 25997  0.0  0.7 16720 7272 ?        S    00:40   0:00 /usr/bin/php /usr/share/cacti/script_server.php cmd
Debian-  26040  0.0  0.1  4456 1600 ?        S    00:40   0:00 /usr/sbin/sendmail -i -FCronDaemon -oem www-data
www-data 26476  0.0  0.1  4404 1956 ?        S    00:40   0:00 /usr/bin/snmpget -O fntUev -c        -v 1 -t 1 -r 2 ...

I had that problem when two pollers where running concurrently (one in /etc/crontab and one in /etc/cron.d/cacti)
This is my crontab jobs :

Code: Select all

#more /etc/crontab

*/5 * * * * www-data php /usr/share/cacti/poller.php 2>&1 && php /usr/share/cacti/thold/check-thold.php > /dev/null 2>&1
*/4 * * * * www-data php /usr/share/cacti/fast_poller.php 2>&1
poller.php for cacti
check-thold.php for threshold
fast_poller.php for monitor

There is not another cron job.

There is an another network supervisor tool : Nagios. It's checking more than 100 services and running very well ;)
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

For 22 devices, by running 12 processes, you are doing yourself a disservice. Depending on how php is configured, this can cause all sorts of issues. Under high multiprocess load, PHP tends to behave badly. Try 2 processes and see if the problem fixes itself.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
magnetyk
Posts: 34
Joined: Tue May 10, 2005 10:04 am

Post by magnetyk »

Try 2 processes and see if the problem fixes itself.
I tried this purposed configuration this night but few polls has failed, like this one :
10/06/2005 08:35:45 AM - : Poller[0] STATISTIQUES: Tps sondage: 43.1772 s, Méthode: cmd.php, Processus: 2, Hôtes: 23, Hôte/Processus: 12
10/06/2005 08:34:59 AM - POLLER: Poller[0] Temps maximum de 296 secondes excédé (in english : 296 sec time excedeed. Poll Fails). Abandon.
10/06/2005 08:25:44 AM - : Poller[0] STATISTIQUES: Tps sondage: 43.0422 s, Méthode: cmd.php, Processus: 2, Hôtes: 23, Hôte/Processus: 12
And processes attached to these polls are blocked :(

Always the same ones :

Code: Select all

root     32708  0.0  0.0  2056  936 ?        S    08:30   0:00 /USR/SBIN/CRON
www-data 32709  0.0  0.0     0    0 ?        Zs   08:30   0:00 [sh] <defunct>
www-data 32713  0.0  0.8 18396 9104 ?        S    08:30   0:00 /usr/bin/php -q /usr/share/cacti/cmd.php 0 30
www-data 32719  0.0  0.6 16656 7236 ?        S    08:30   0:00 /usr/bin/php /usr/share/cacti/script_server.php cmd
Debian-  32743  0.0  0.1  4444 1456 ?        S    08:30   0:00 /usr/sbin/sendmail -i -FCronDaemon -oem www-data
www-data   590  0.0  0.1  4316 1892 ?        S    08:30   0:00 /usr/bin/snmpget -O fntUev -c        -v 1 -t 1 -r 2 10.2.1.13:1
Are there some "strange scripts" running to "infinity"?
I don't think I've got strange infinity script 'coz these scripts run fine for 90 to 95 percent polls... This night, 4 polls failed for a total of 120 polls.
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

There may be some crashing/locking script causing the problem. Cactid is more efficient here in that we can control the timeout of the script and force the connection closed if it does not return in time.

PHP has a mechanism to do this. Unfortunately, it is not functional in the current PHP builds (aka I guess it's experimental). So, you scripts must time themselves out.

The newest version of Cactid should be out in a few days. It is much more stable that prior releases and should be good for almost everyone. Once you have it in place, look under Settings->Poller for the script timeout value.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
magnetyk
Posts: 34
Joined: Tue May 10, 2005 10:04 am

Post by magnetyk »

Thanks for the tip ;) I'll wait the next Cactid version :)
magnetyk
Posts: 34
Joined: Tue May 10, 2005 10:04 am

Post by magnetyk »

I switched to cactid poller method : few polls failed BUT processes are successfully killed ;) That's good for me ! Thx a lot
magnetyk
Posts: 34
Joined: Tue May 10, 2005 10:04 am

Post by magnetyk »

In fact, after few days of testing, the problem persists...

this error

Code: Select all

10/17/2005 06:30:01 AM - CACTID: Poller[0] ERROR: Cactid Timed Out While Processing Host Shutdown
provoques the blocking of polling processes
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Please keep the number of "Concurrent Processes" to no more than 2 x CPU's. If you take it too high, you will get to high of context switching and thread starvation causing processes to lockup.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 6 guests