Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

After doing that, pull a fresh copy of spine. I did find one unrelated bug, but also did some re-organizing as I'm trying to isolate the root cause, still assuming it's spine.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

Alrighty. There's no noticeable difference in running with snmp_host_cleanup commented out. I just pulled the latest spine and have that installed now.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

Okay, thanks for that. I don't see any leaks on a valgrind (tool to test for leaks). Do we know what host it's crashing on or is it totally random?
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

It's 8677 :)

Code: Select all

./spine -V 6 -S --mibs -f 8677 -l 8677
See if it crashes just on that.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

As far as I can tell it's totally random. The number of RRDs processed is different anytime so to me that makes it sound like it's stopping at different spots.

I ran that command... it runs just fine.

Wait, here's something. I ran this command: /usr/local/spine/bin/spine -R -S -V 3
Found that from a how-to on installing spine. https://medium.com/make-it-easy/how-to- ... bc4c75c502

Anyway, I've run that every time I update spine and it's always fine. I did it again for the heck of it and it kicked out an error.

--------------------
FATAL: Spine Encountered a Segmentation Fault
Generating backtrace...0 line(s)...
Total[33.2246] Device[8692] SNMP Ping Unknown Error
root@oss:/usr/local/spine/bin# Unable to flush stdout: Broken pipe
Unable to flush stdout: Broken pipe
Unable to flush stdout: Broken pipe
--------------------

I grepped for ping in the cacti.log and found this:
2022/06/21 20:10:21 - SPINE: Poller[1] PID[3290061] PT[140041938515712] WARNING: Invalid Response, Device[2450] HT[1] DS[37672] SCRIPT: perl /usr/share/cacti/site/scripts/ping.pl '10.10.10.10', output: U

Shoot, are we looking for the wrong thing? That ping error repeats a few times in cacti.log and it's present each time spine crashes. BUT, it's also present during polling cycles where spine doesn't crash. It also happens with various device IDs
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

Hmm that may be a red herring. I've run it a few more times and this was the latest ending:

Total[41.7185] Device[3364] HT[1] DS[38065] TT[21.20] SCRIPT: perl /usr/share/cacti/site/scripts/ping.pl 'x.x.x.x', output: 14.1
Total[41.7197] Device[1122] Checking for System Information Update
Total[41.7197] Device[1122] Updating Full System Information Table
Total[41.7215] Device[911] Checking for System Information Update
Total[41.7215] Device[911] Updating Full System Information Table
Total[41.7229] Device[3786] Checking for System Information Update
Total[41.7229] Device[3786] Updating Full System Information Table
double free or corruption (fasttop)
FATAL: Spine Interrupted by Abort Signal
Aborted (core dumped)
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

Notable. I just made another update. Download and test again.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

It didn't like that. Weird though - the test runs go fine. But as soon as the poller runs, it fails. I had to back out of that one because I wasn't getting any good poll cycles.
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

I've gotta get offline for now. Tomorrow I might mass disable a whole bunch of hosts and start adding back to see if it's a problem with something. Actually that brings up a question. Does the poller process devices in any certain order? I think I've noticed that devices added a long time ago always seem to be processed and those graphs are fine. It's newer stuff that has the gaps in the graphs when the poller fails to complete.

Edit: By older devices, I mean ones added years ago. It's a not a difference of just devices added since the update to 1.2.21. It's like things added years ago are fine but say, maybe, devices added in the last 6 months are affected by the crashes. I haven't really taken time to narrow that down yet.
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

I'm cautiously optimistic that I've found one single host causing the problem. I found this thread: viewtopic.php?t=53496

The last comment talks about setting collection threads to 1. I did a query on the database and found 40 hosts that were set to 5 threads. 39 of them had been added since the upgrade (I guess that's a new default?) and 1 of them had been added years ago. I went to that host and changed the number of threads to 1.

Spine hasn't crashed for 4 hours. Previously the longest run of good polls was about an hour, maybe 90 minutes at most. I'm not calling it a win yet - I want to see it go all day or maybe 24 hours.

I also went into settings and changed device defaults from 5 threads to 1 so that we don't get any more 5's in there.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

What is the device type? What type of metrics? How many OIDS for the device?
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

Goto device defaults under setting to find the default.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

It's a Miktrotik router, HapLite, Board name RB941-2nD. We've got 2000+ of these being polled. After the change, this host looks identical to all the rest. The only thing I changed was the number of collection threads earlier today.

All we're monitoring is traffic rate on port 1.

SNMPv3, SHA, AES-128, port 161, timeout 500ms, Max OIDs per request is 5.

There's 1 single graph
Associated Data Queries: SNMP - Interface Statistics
Graph created for port 1 with In/Out Bits with 95th Percentile template

There are 2 OIDs polled: 1 for traffic in, 1 for traffic out. I put it in debug mode for one cycle. Output below:

Code: Select all

2022/06/22 12:45:02 - SPINE: Poller[1] PID[3764037] PT[139726619141888] DEBUG: Device[5355] HT[1] In Poller, About to Start Polling
2022/06/22 12:45:02 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] DEBUG: Entering SNMP Ping
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] Updating Full System Information Table
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] DEBUG: Device[5355] HT[1] RECACHE: Processing 1 items in the auto reindex cache for 'x.x.x.x'
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] HT[1] DQ[1] Extended Uptime Result: U, Is Numeric: 0
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] HT[1] DQ[1] RECACHE OID: .1.3.6.1.2.1.1.3.0, (assert: 157945200 < output: 157974700)
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] HT[1] NOTE: There are '2' Polling Items for this Device
2022/06/22 12:45:03 - PCOMMAND Device[1356] NOTE: Recache Event Detected for Device
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] HT[1] DS[32764] TT[115.14] SNMP: v3: x.x.x.x, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 2516712657
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] HT[1] DS[32764] TT[115.22] SNMP: v3: x.x.x.x, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 3661114414
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] Device[5355] HT[1] Total Time: 1.2 Seconds
2022/06/22 12:45:03 - SPINE: Poller[1] PID[3764037] PT[139726619141888] DEBUG: Device[5355] HT[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function
warben61
Posts: 34
Joined: Mon Jan 22, 2018 9:52 pm

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by warben61 »

Actually I mis-spoke. I didn't just change that one, I changed all that have been added. I thought I remembered my query only hit that one device but I changed all 40 rows.

Code: Select all

update host set device_threads = '1' where device_threads != '1';
The statements in my previous post though don't change. The only devices that have been added are all Mikrotiks all added the same way with the same graph created for port #1. There have also been a handful of snmp1 devices added. But from what I'm understanding, the problem is focused on snmpv3.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Upgrade 1.2.18 to 1.2.21 - Error: Maximum runtime of 298 seconds exceeded

Post by TheWitness »

I see the clue right there with the Extended Uptime OID failing. I'll be making another spine update after work tonight on that front.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest