2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

Afternoon,
My history of hiccups and issues aside; My poller stats were down around 6seconds regularly.

I logged in today to do a check of things and noticed that my poller sessions were regularly timing out. However, the devices that seem to be causing the timeouts; One is a 10 OID device, on my LAN that responds to an SNMP test in under 0.2s.

The other is an ICMP Ping only device, no SNMP in use.

The intropage (which is new, and could be wrong) only shows one device > 1s, let alone the two that are timing out.
The log itself looks similarly. Everything seems to resolve in ~3 seconds, then I get alerts about my LAN device and an ICMP ping device.

Any suggestions where I should be looking to solve this?
cacti-polling time.png
cacti-polling time.png (30.36 KiB) Viewed 2422 times
cacti-polling-devices.png
cacti-polling-devices.png (24.5 KiB) Viewed 2422 times
cacti-log.png
cacti-log.png (364.62 KiB) Viewed 2422 times
User avatar
macan
Cacti Guru User
Posts: 1137
Joined: Tue Mar 18, 2008 2:30 am
Location: Czech

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by macan »

DNS problem? I last intorpage (develop branch) is test for dns resolving.

It seems that you have 6 spine processes. Better is less processess and add threads (console->Data collection-> data collectors)
Let the Cacti grow!
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

macan wrote: Wed Oct 25, 2023 1:14 am DNS problem? I last intorpage (develop branch) is test for dns resolving.

It seems that you have 6 spine processes. Better is less processess and add threads (console->Data collection-> data collectors)
Not DNS; It's an IP only for ping, see below.

I can reduce that process count, but it's been 6 for quite some time w/o having this issue.
Why does it suddenly need to be reduced?
cacti-level3.png
cacti-level3.png (97.07 KiB) Viewed 2278 times
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

I reduced the count to 2, and it made no difference.
Now it says it's waiting on more threads, and different devices are reported as timing out.
cacti-log2.png
cacti-log2.png (507.87 KiB) Viewed 2271 times
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

I also tried to go the other direction and ridiculously cranked the processes/threads, No change:
cacti-log3.png
cacti-log3.png (200.15 KiB) Viewed 2264 times
User avatar
macan
Cacti Guru User
Posts: 1137
Joined: Tue Mar 18, 2008 2:30 am
Location: Czech

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by macan »

Try to increase log level (Configuration->settings->general tab) and check out cacti log
Let the Cacti grow!
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

macan wrote: Thu Oct 26, 2023 2:12 am Try to increase log level (Configuration->settings->general tab) and check out cacti log
Good idea; Here's what I got out of that:

A couple of devices fill the log. The warnings are the precursor to the red "Timed Out" message. However, One of the devices is an "ICMP Ping Only" device.
I enabled device debugging and demonstrated it as well; I don't see any obvious issues here?
cacti-log4.png
cacti-log4.png (241.33 KiB) Viewed 2025 times
cacti-icmp-ping only.png
cacti-icmp-ping only.png (82.52 KiB) Viewed 2025 times
User avatar
macan
Cacti Guru User
Posts: 1137
Joined: Tue Mar 18, 2008 2:30 am
Location: Czech

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by macan »

Try to use cmd poller for few poller cycles (configuration->settings-> poller)->poller_type

Which cacti and spine versions?

You can try:
chmod +s /path/to/spine
Utilities-> system utilities -> rebuild poller cache
for snmp devices change Bulk Walk Maximum Repetitions to auto detect/set on first re-index and do reindex
Let the Cacti grow!
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

macan wrote: Fri Oct 27, 2023 3:40 am Try to use cmd poller for few poller cycles (configuration->settings-> poller)->poller_type
Same issue:
cacti-cmd-test.png
cacti-cmd-test.png (27.33 KiB) Viewed 2002 times
macan wrote:Which cacti and spine versions?
1.2.25 across the board.
macan wrote:You can try:
chmod +s /path/to/spine
This is done as part of the process of building spine.
macan wrote: Utilities-> system utilities -> rebuild poller cache
for snmp devices change Bulk Walk Maximum Repetitions to auto detect/set on first re-index and do reindex
I have already performed the first and the second is the default, I have only changed it on one device as part of troubleshooting this issue.
User avatar
macan
Cacti Guru User
Posts: 1137
Joined: Tue Mar 18, 2008 2:30 am
Location: Czech

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by macan »

any network issues? Restart server? OS/package update? I met similar thing after PHP upgrade.
Let the Cacti grow!
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

macan wrote: Fri Oct 27, 2023 1:47 pm any network issues? Restart server? OS/package update? I met similar thing after PHP upgrade.
No network issues. This is a large infrastructure and I've used Cacti for more than a decade.

If you can tell me how to troubleshoot the error more, I'm happy to do so.
At the moment all I see is that it says the poller limit is being overrun. If I debug the device in specific it finishes polling in under 5 seconds, always. So I can't even find what the poller is complaining about.

Let's narrow down to one device;

This is an ICMP device. No SNMP polling occurs. The ping is always faster than 1 second, and the only script it runs for graphing is also a ping. So why does this device constantly appear in the log as having timed out?
cacti-level3.png
cacti-level3.png (97.07 KiB) Viewed 1976 times
Here's the timeout error in the log:
cacti-lvl3-dns-timeoud.png
cacti-lvl3-dns-timeoud.png (3.04 KiB) Viewed 1976 times
So, I went and performed a device specific debug on it here, which shows no issues:
cacti-icmp-ping only.png
cacti-icmp-ping only.png (82.52 KiB) Viewed 1976 times
So how do I troubleshoot this?
User avatar
TheWitness
Developer
Posts: 17047
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by TheWitness »

Advanced Ping has a lot of scalability issues when you have so many hosts. The other thing is how long it takes spine to initialize with so many threads. What is your script server count? How many script server calls do you use?

One option would to reduce the ping count. I think the default in the script is 20 samples, you might want to move it to 10 for example. That'll speed things up.

I almost think that we need to move Advanced Ping to it's own plugin as the polling times make it real difficult to complete in the 5 minute cycle. For me, it's an easy thing, just making the time to do it is the issue.

To answer some of those question, years ago, I wrote a 'showproc' script. It's been refined over the years to do other things. Below is a copy. Root just needs super access. Put it in root's home directory and mark it executable.

Code: Select all

#!/bin/sh
while true;do
  clear
  echo "---------------------------------------------------------"
  echo -n "Uptime:"
  uptime

  # Collect metrics
  SLEEPING=`mysql -e "show processlist" | grep Sleep | wc -l`
  RUNNING=`mysql -e "show processlist" | grep -v Sleep | wc -l`
  TOTAL=`expr $SLEEPING + $RUNNING`
  ITEMS=`mysql -e "SELECT TABLE_ROWS FROM information_schema.TABLES WHERE TABLE_NAME='poller_output'" | grep -v TABLE_ROWS`
  PROCS=`mysql -e "SELECT count(*) FROM processes WHERE tasktype='boost'" cacti | grep -v count`
  SPINE=`mysql -e "SELECT count(*) FROM poller_time WHERE end_time='0000-00-00'" cacti | grep -v count`
  HISTORY=`mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_history_list_length'" | grep -v Value | sed -e 's/Innodb_history_list_length//g' | tr -d " \t"`

  # Echo status
  echo "---------------------------------------------------------"
  echo "Host: $HOSTNAME"
  echo "Pending Poller Items: $ITEMS"
  echo "---------------------------------------------------------"
  echo "Running Processes:  $RUNNING"
  echo "Sleeping Processes: $SLEEPING"
  echo "---------------------------------------------------------"
  echo "Total Processes: $TOTAL"
  echo "History: $HISTORY"
  echo "Spine Processes: $SPINE"
  if [ -z $PROCS ]; then
          echo "Boost Processes: 0"
  else
          echo "Boost Processes: $PROCS"
  fi
  echo ""
  echo "Memory Topology"
  echo "---------------------------------------------------------"
  free -g
  echo ""

  #echo "Poller Stats"
  #ps -ef | egrep "(php|spine|poller.php)" | grep -v "script_server" | grep -v grep
  #echo "---------------------------------------------------------"
  #echo "Comm Stats"
  #echo "---------------------------------------------------------"
  #mysql -e "SHOW GLOBAL STATUS" | egrep "(Com_update|Com_select|Com_insert|Com_set_option|Com_delete|wsrep_cluster_size|Aborted_clients|Uptime)" | awk '{printf("%-25s %10i\n", $1, $2)}'

  echo "MariaDB Processes"
  mysql -e "SELECT id, user, state, ROUND(time_ms/1000,1) AS time, SUBSTRING(REPLACE(REPLACE(REPLACE(info, '\n', ' '), '  ', ' '), '\t', ' '),1,245) AS info FROM processlist WHERE info NOT LIKE '%processlist%' ORDER BY time DESC LIMIT 35" information_schema

  running=`mysql -e 'SELECT COUNT(*) AS running FROM cacti.poller_time WHERE end_time="0000-00-00"' | grep -v running`

  if [ "$running" -ne "0" ]; then
    echo ""
    echo "Running Collector Threads"
    mysql -e "SELECT * FROM poller_time WHERE end_time='0000-00-00'" cacti

    echo ""
    echo "Last Hosts Inserted"
    mysql -e "SELECT h.id, h.description, b.last_updated, COUNT(b.local_data_id) AS data_sources FROM poller_output_boost AS b INNER JOIN data_local AS dl ON b.local_data_id = dl.id INNER JOIN host AS h ON h.id=dl.host_id  WHERE b.last_updated = (SELECT max(last_updated) FROM poller_output_boost) GROUP BY h.id ORDER BY b.last_updated DESC, data_sources DESC LIMIT 5" cacti
  else
    echo ""
    echo "Poller Not running"
  fi

  sleep 2
done
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

TheWitness wrote: Mon Oct 30, 2023 8:10 am Advanced Ping has a lot of scalability issues when you have so many hosts. The other thing is how long it takes spine to initialize with so many threads. What is your script server count? How many script server calls do you use?
I had mine set at 5, and it hasn't really changed.
TheWitness wrote: One option would to reduce the ping count. I think the default in the script is 20 samples, you might want to move it to 10 for example. That'll speed things up.
I'll test this; but the question of note is why did it change? When we were doing DB troubleshooting a month ago(two?) it was wrapping out all hosts in 6-7 seconds, with adv. ping.
TheWitness wrote: I almost think that we need to move Advanced Ping to it's own plugin as the polling times make it real difficult to complete in the 5 minute cycle. For me, it's an easy thing, just making the time to do it is the issue.
I totally understand the free time and I have been liking the move of features that had been added over the years being moved into managed and separate code bases.
TheWitness wrote: To answer some of those question, years ago, I wrote a 'showproc' script. It's been refined over the years to do other things. Below is a copy. Root just needs super access. Put it in root's home directory and mark it executable.

Code: Select all

#!/bin/sh
while true;do
  clear
  echo "---------------------------------------------------------"
  echo -n "Uptime:"
  uptime

  # Collect metrics
  SLEEPING=`mysql -e "show processlist" | grep Sleep | wc -l`
  RUNNING=`mysql -e "show processlist" | grep -v Sleep | wc -l`
  TOTAL=`expr $SLEEPING + $RUNNING`
  ITEMS=`mysql -e "SELECT TABLE_ROWS FROM information_schema.TABLES WHERE TABLE_NAME='poller_output'" | grep -v TABLE_ROWS`
  PROCS=`mysql -e "SELECT count(*) FROM processes WHERE tasktype='boost'" cacti | grep -v count`
  SPINE=`mysql -e "SELECT count(*) FROM poller_time WHERE end_time='0000-00-00'" cacti | grep -v count`
  HISTORY=`mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_history_list_length'" | grep -v Value | sed -e 's/Innodb_history_list_length//g' | tr -d " \t"`

  # Echo status
  echo "---------------------------------------------------------"
  echo "Host: $HOSTNAME"
  echo "Pending Poller Items: $ITEMS"
  echo "---------------------------------------------------------"
  echo "Running Processes:  $RUNNING"
  echo "Sleeping Processes: $SLEEPING"
  echo "---------------------------------------------------------"
  echo "Total Processes: $TOTAL"
  echo "History: $HISTORY"
  echo "Spine Processes: $SPINE"
  if [ -z $PROCS ]; then
          echo "Boost Processes: 0"
  else
          echo "Boost Processes: $PROCS"
  fi
  echo ""
  echo "Memory Topology"
  echo "---------------------------------------------------------"
  free -g
  echo ""

  #echo "Poller Stats"
  #ps -ef | egrep "(php|spine|poller.php)" | grep -v "script_server" | grep -v grep
  #echo "---------------------------------------------------------"
  #echo "Comm Stats"
  #echo "---------------------------------------------------------"
  #mysql -e "SHOW GLOBAL STATUS" | egrep "(Com_update|Com_select|Com_insert|Com_set_option|Com_delete|wsrep_cluster_size|Aborted_clients|Uptime)" | awk '{printf("%-25s %10i\n", $1, $2)}'

  echo "MariaDB Processes"
  mysql -e "SELECT id, user, state, ROUND(time_ms/1000,1) AS time, SUBSTRING(REPLACE(REPLACE(REPLACE(info, '\n', ' '), '  ', ' '), '\t', ' '),1,245) AS info FROM processlist WHERE info NOT LIKE '%processlist%' ORDER BY time DESC LIMIT 35" information_schema

  running=`mysql -e 'SELECT COUNT(*) AS running FROM cacti.poller_time WHERE end_time="0000-00-00"' | grep -v running`

  if [ "$running" -ne "0" ]; then
    echo ""
    echo "Running Collector Threads"
    mysql -e "SELECT * FROM poller_time WHERE end_time='0000-00-00'" cacti

    echo ""
    echo "Last Hosts Inserted"
    mysql -e "SELECT h.id, h.description, b.last_updated, COUNT(b.local_data_id) AS data_sources FROM poller_output_boost AS b INNER JOIN data_local AS dl ON b.local_data_id = dl.id INNER JOIN host AS h ON h.id=dl.host_id  WHERE b.last_updated = (SELECT max(last_updated) FROM poller_output_boost) GROUP BY h.id ORDER BY b.last_updated DESC, data_sources DESC LIMIT 5" cacti
  else
    echo ""
    echo "Poller Not running"
  fi

  sleep 2
done
Thanks for the block of code. I'll get some time myself and jump into cacti and see what it reveals.
Thanks again!
User avatar
Jeeves
Cacti User
Posts: 91
Joined: Wed Jun 12, 2013 6:25 pm

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Jeeves »

While running the bash script, it seems to constantly throw an "Unknown Column" error on b.last_updated;
I recorded a gif, but once a minute I get a single refresh that has some poller data in it, so it's a lot of nothingness. My values all seem pretty trivial:
cacti-2023-11-debugscript.png
cacti-2023-11-debugscript.png (18.43 KiB) Viewed 1786 times
However, the rest of the minute, I see 0 pending items but two long-running collector threads:
cacti-2023-11-debugscrip2t.png
cacti-2023-11-debugscrip2t.png (29.86 KiB) Viewed 1786 times
User avatar
Osiris
Cacti Guru User
Posts: 1424
Joined: Mon Jan 05, 2015 10:10 am

Re: 2 Devices: "Polling timed out while waiting for 4 Threads to End" - Little Help on Poller Statistics?

Post by Osiris »

For those two devices try testing spine at the command like using the -H host_id option.
Before history, there was a paradise, now dust.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest