Spine doesn't poll hosts after migrating Cacti to new server

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
fcorazza
Posts: 9
Joined: Fri Jul 17, 2009 6:09 am

Spine doesn't poll hosts after migrating Cacti to new server

Post by fcorazza »

Hi everyone,

I've completed a migration of Cacti from version 0.8.7a (with cactid 0.8.6i) to a brand new machine with Cacti 0.8.7e and Spine 0.8.7e - both servers are sparc based on Solaris 10.

Spine has been compiled against the net-snmp sources provided by Sun (SUNWsmaS package) which is a Sun packaged version of net-snmp 5.0.9. Database has been dumped to the new one and Cacti/Spine can succesfully connect to it.

A note in regard of PHP: it was the version taken out of the Sun GlassFish Webstack which doesn't have SNMP support. However I don't look at this as a showstopper since I've provided the paths to the snmp binaries to Cacti and it should be able to snmpwalk/snmpget to the hosts just fine.

The existing RRD files (about 7000 total, 3GB ca. of data) have been migrated first to xml with rrdtool 1.2.x present on the old machine, and then imported back on the new one with rrdtool 1.3 to rrd format again.

So far so good, the graphs display correctly until the time of the switch-over.

However the big problem is that Spine doesn't poll any of the host! I've done everything I could, compiled Spine with the Solaris 10 privileges option and gave access to cactiuser to the icmp sockets in the Solaris kernel, but I always get loads of "ICMP: Ping timed out" messages even though it later says "Host responded to SNMP".

What's wrong with it? I'm almost tearing my hairs out because of this issue.. simply because I cannot see any logic behind this behavior. If I try snmpwalk or ping manually against that host I get a reply, hence a network problem has to be excluded in the first place.

And then, even if the host is up, it just says "There are XX polling items for this host" and immediately "HOST COMPLETE: About to exit host polling thread function". Why? It neither tries to do a single polling..

I've reported some Spine printouts if anything meaningful could be found at all:
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Basic privset is: 'basic'.
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Privilege PRIV_NET_ICMPACCESS is: 'Enabled'.
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 23
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 22
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 23
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Basic privset is: 'basic'.
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Privilege PRIV_NET_ICMPACCESS is: 'Enabled'.
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 24
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Basic privset is: 'basic'.
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: Privilege PRIV_NET_ICMPACCESS is: 'Enabled'.
07/17/2009 01:53:49 PM - SPINE: Poller[0] Host[514] PING Result: ICMP: Ping timed out
07/17/2009 01:53:49 PM - SPINE: Poller[0] Host[514] SNMP Result: Host did not respond to SNMP
07/17/2009 01:53:49 PM - SPINE: Poller[0] Host[501] NOTE: There are '75' Polling Items for this Host
07/17/2009 01:53:49 PM - SPINE: Poller[0] Host[501] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/17/2009 01:53:49 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 23
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 24
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: Basic privset is: 'basic'.
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: Privilege PRIV_NET_ICMPACCESS is: 'Enabled'.
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[505] NOTE: There are '7' Polling Items for this Host
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[505] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 23
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 24
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: Basic privset is: 'basic'.
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: Privilege PRIV_NET_ICMPACCESS is: 'Enabled'.
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[503] NOTE: There are '62' Polling Items for this Host
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[503] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 23
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[504] NOTE: There are '62' Polling Items for this Host
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[504] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/17/2009 01:53:50 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 22
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[534] PING Result: ICMP: Ping timed out
07/17/2009 01:53:50 PM - SPINE: Poller[0] Host[534] SNMP Result: Host responded to SNMP
What could it be? A bug? I cannot guess anything different.. (no offense meant to the developers!)

I'm able to provide all of the information needed if some dev would like to dig more into this weird behavior. Apparently the latest version of Spine hasn't been tested quite well on Solaris 10 Sparc; lurking around the forum I see most of the users rely on Linux as main OS platform - this is unfortunately not an option as we have received a brand new SunFire T1000 server to host Cacti standalone and it would be a shameful waste of hardware if we had to pick up a lousy x86 box bloated already with other services.

PS: we have quite a large config of Cacti, around 500 hosts and 10k datasources. Having the framework working without scrapping the actual host/datasource configuration is the only way to go for us. All of the possible checks that could have been done after the migration have been already made (poller/programs paths, ran ldd against all of the binaries to make sure no libraries were missing, checked PHP extensions etc).

PS2: currently Spine has been compiled 64bit with SunStudio 12u1 CC against net-snmp/mysql sparcv9 libraries, however I've also produced a 32bit binary but that doesn't make any difference.

Thanks in advance for any help provided.


Regards
Fabio
Last edited by fcorazza on Mon Jul 20, 2009 7:32 am, edited 1 time in total.
fcorazza
Posts: 9
Joined: Fri Jul 17, 2009 6:09 am

Post by fcorazza »

I think I've tracked down the problem to pretty much the fact that SNMP monitoring doesn't happen as all hosts are seen "Down" (checked the Devices web page). Accessing the device it says "Ping Results
ICMP ping Timed out".

I've tried changing the Ping type on the Poller config but when launching Spine it still tries to do ICMP ping to detect the host availability.

I've tried both with TCP Ping and SNMP... but as already said Spine seems to ignore the settings I put within the Cacti page.

Any hint?

UPDATE: I've finally found out what was preventing Spine from pinging only SNMP. Practically what I was modifying was only the global setting while all of the devices have a per-device setting where the availability_mode DB parameter was set to "ICMP Ping". I've modified this to SNMP and it worked!!

So this is definitely the ICMP Ping function not working. I've seen inside the cacti-spine source directory the source file ping.c.. perhaps something is wrong inside there that makes it a no-go under Solaris 10 sparcv9.

Call to dev... it is really urgent to solve this bug or us people running Solaris on sparc will have a hard time modifying all of our devices settings (I know it can be done with a SQL query, however I'd tend to avoid this).

Looking forward to replies..


Cheers
Fabio
fcorazza
Posts: 9
Joined: Fri Jul 17, 2009 6:09 am

Post by fcorazza »

Is nobody interested to this issue? It is persisting and I would encourage more people running Spine 0.8.7e on Solaris 10 to bring this issue up since it is a big pain to use Cacti without ICMP ping working..
User avatar
TheWitness
Developer
Posts: 17062
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Can you accommodate a "GoToMeeting" on this subject. Please PM me with your interest. I would like to know more about your issue.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
Clipper
Posts: 12
Joined: Mon Jun 06, 2005 3:39 am
Location: Switzerland

Post by Clipper »

Hi,

I actually have the exact same problem. I immediately found out that ICMP ping was the issue but I spent hours in trying to find out what's going on.

Here is my setup :

Solaris 10 sparc
Using Coolstack Web
cacti-spine compiled with gcc and --enable-solaris-priv flag

First, without doing anything there is a privilege issue because the user does not have net_icmpaccess by default. This is clearly a lack of documentation in cacti-spine as it's mentionned nowhere.

To find out I had to :

Code: Select all

ppriv -e -D /usr/local/bin/spine -C /usr/local/etc/spine.conf
which then displays the privilege issue :

Code: Select all

spine[13166]: missing privilege "net_icmpaccess" (euid = 104, syscall = 230) for "devpolicy" needed at so_socket+0xc8
09/04/2009 05:34:58 PM - SPINE: Poller[0] ERROR: ping_icmp: cannot open an ICMP socket (Spine thread)
So I have added this privilege to the cacti user :

Code: Select all

usermod -K defaultpriv=basic,net_icmpaccess watchmon
But then it still does NOT work, and the above "ppriv" command results in nothing special, just spine telling the host is down :

Code: Select all

09/04/2009 05:36:19 PM - SPINE: Poller[0] Host[3] PING: Result ICMP: Ping timed out
I would appreciate any help, and I'm willing to give all information asked.

Thanks

PS: Forgot to mention that running all this as user root produces exactly the same result, i.e. Ping timed out.
--
Clipper
Gray Hat
User avatar
TheWitness
Developer
Posts: 17062
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Please grap ping.c from SVN. There was a recent patch for Solaris. Please also, do not use TCP ping as it is currently unreliable on that platform. Don't know if the patch made it to the web site as expected on Wednesday night.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
Clipper
Posts: 12
Joined: Mon Jun 06, 2005 3:39 am
Location: Switzerland

Post by Clipper »

Still doesn't work for me. Actually the 0.8.7e tarball, plus the 3 official patches is setting ping.c to the exact same level as it is on SVN.

I'm considering to rebuild spine without --enable-solaris-priv and run it as root, but I'm pretty sure this will work.
--
Clipper
Gray Hat
Rulio
Posts: 1
Joined: Fri Sep 25, 2009 2:37 am

Post by Rulio »

I had exactly the same problem.
I am running CACTI on a SOLARIS 10 SPARC server

I solved it by applying a patch to the ping.c file that I saw in the SVN changes for spine 0.87d (or earlier version) and that was apparently not reported in the 0.8.7e branch :

In function get_checksum,
I replaced
sum += *(unsigned char*)w;

by

*(unsigned char *)(&answer) = *(unsigned char *)w;
sum+= answer;


I compile with the --enable-solar-priv option, and I did a
usermod -K defaultpriv=basic,net_icmpaccess cacti
to enable the cacti user to execute icmp requests

Hope it helps[/i]
User avatar
TheWitness
Developer
Posts: 17062
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

There is a patch to Spine 0.8.7e already on the patches page on the main site for this issue with Solaris.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests