Graphs with Holes

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Graphs with Holes

Post by knebb »

Hi all,

anyone having an idea what I have to look for? For a single host my graphs have more or less regularly holes (see attached pic).

I do not see any CPU spikes nor IO waits on the monitored host. Neither on the Cacti host- all other hosts are fine. I have these missing items only on this single host.

This is cacti.log:

Code: Select all

04/03/2016 11:36:23 PM - CACTID: Poller[0] Host[175] PING Result: UDP: Host is Alive
04/03/2016 11:36:23 PM - CACTID: Poller[0] Host[175] SNMP Result: SNMP not performed due to setting or ping result
04/03/2016 11:36:23 PM - CACTID: Poller[0] DEBUG: MySQL Insert ID '159': 'update host set status='3', status_event_count='0', status_fail_date='2016-03-25 23:47:00', status_rec_date='2016-03-24 09:26:00', status_last_error='Host did not respond to SNMP, UDP: Ping timed out', min_time='0.194080', max_time='15003.350500', cur_time='0.231980', avg_time='5.386709', total_polls='4162', failed_polls='6', availability='99.8558' where id='175''
04/03/2016 11:36:23 PM - CACTID: Poller[0] DEBUG: MySQL Query ID '222': 'SELECT data_query_id, action, op, assert_value, arg1 FROM poller_reindex WHERE host_id=175'
04/03/2016 11:36:23 PM - CACTID: Poller[0] Host[175] RECACHE: Processing 2 items in the auto reindex cache for 'host.domain.com'
[...]
04/03/2016 11:36:24 PM - CACTID: Poller[0] Host[175] DS[3503] SNMP: v2: host.domain.com, dsname: cpu_idle, oid: .1.3.6.1.4.1.2021.11.53.0, value: 1608684
04/03/2016 11:36:24 PM - CACTID: Poller[0] Host[175] DS[3504] SNMP: v2: host.domain.com, dsname: cpu_interrupt, oid: .1.3.6.1.4.1.2021.11.56.0, value: 1
Any hints where to look for?

Thanks!

/KNEBB
Attachments
Example Graph
Example Graph
leaks.png (40.01 KiB) Viewed 1687 times
tgrtjake
Posts: 6
Joined: Wed Sep 11, 2013 10:55 am

Re: Graphs with Holes

Post by tgrtjake »

Maybe increase Ping Timeout Value and SNMP Timeout on the monitored host?
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

tgrtjake wrote:Maybe increase Ping Timeout Value and SNMP Timeout on the monitored host?
I will do so. But if it works it does not explain why it worked fine before!

I wil lreport if it helps.
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

Hi,

I want to give an update as promised.

I increased the timeout values by multiplying them with 10. It got better, but it did not go away. Instead, since Friday evening I did not have ANY values in my graphs.

Since noon today everything is smooth as before. Even when I decrease the timeout values to the previous ones. :o

And now a next information for troubleshooting:

Since last week Sunday my Internet connection was very unreliable- it was available for 5-20 minutes, and then broke down. The re-connection took between 1 and 5 minutes. Since this Friday the connection totally broke down and I had no Internet for 2.5 days! Since noon today Internet is back again- and stable.

If you compare both you will notice the Cacti graphs for this single host where broken while Internet connection was unstable. No data while Internet was gone at all. And all fine since Internet ist back again :evil:

So it has something to do with Internet access.

I verified this morning snmpd was running on the target host and it replied when queried by snmpwalk from the Cacti host.

So it is obvious something wants a connection to the internet....is it Cacti querying for some unknown MIBs? Or is it the OS (Raspbian) on the target host? However, why does Cacti not log graphs even though snmpwalk runs fine?

Seriously confused... :roll:


/KNEBB
User avatar
micke2k
Cacti User
Posts: 261
Joined: Wed Feb 03, 2016 3:38 pm

Re: Graphs with Holes

Post by micke2k »

Hi,

Most likely its cacti not being able to reach hosts that are queryied through the internet connection, if you have 10 retries configured then all other hosts will fail as well because it will spend all its time trying to reach a downed host, and no time to poll the rest of the hosts. Keep retries to 1 or 2.

Do you have any advanced ping/smokeping enabled for internet checks?

What is your poller intervall? Can you show the SystemStats in the log during these errors.
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

Hi,

looks like I did not clearly state it.

The Cacti host and the target host are on the same subnet! There is no Internet connection needed for both to see each other!

And my SNMp-Replies are set to 3 (now 2).

Well, the failing host was indeed the last one added- may I assume it will be queried as last one, too? If so it could be possible- even though all other hosts where running fine.

But this does not explain why it partially helped to increase the polling timeout.....

I will increase polling threads and cut off Internet again- we will see what happens.

Greetings

/KNEBB
User avatar
phalek
Developer
Posts: 2838
Joined: Thu Jan 31, 2008 6:39 am
Location: Kressbronn, Germany
Contact:

Re: Graphs with Holes

Post by phalek »

What is the polling time in overall of your cacti server ?

Also, is this happening during every polling cycle :

Code: Select all

04/03/2016 11:36:23 PM - CACTID: Poller[0] Host[175] RECACHE: Processing 2 items in the auto reindex cache for 'host.domain.com'
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

Hi,
phalek wrote:What is the polling time in overall of your cacti server ?
How do I find out?
Also, is this happening during every polling cycle :

Code: Select all

04/03/2016 11:36:23 PM - CACTID: Poller[0] Host[175] RECACHE: Processing 2 items in the auto reindex cache for 'host.domain.com'
This means?

Thanks!

/KNEBB
User avatar
phalek
Developer
Posts: 2838
Joined: Thu Jan 31, 2008 6:39 am
Location: Kressbronn, Germany
Contact:

Re: Graphs with Holes

Post by phalek »

Goto:

Code: Select all

Console -> System Utilities -> Technical Support 
and check the "Last Run Statistics". e.g:

Code: Select all

Last Run Statistics	Time:169.9096 Method:spine Processes:1 Threads:30 Hosts:146 HostsPerProcess:146 DataSources:3626 RRDsProcessed:1236
Alternatively go to:

Code: Select all

Console -> System Utilities -> View Cacti Log File 
Then sort/filter by "SYSTEM STATS".


For the Recache, do this:

Code: Select all

Console -> System Utilities -> View Cacti Log File 
Then sort/filter by "RECACHE".
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

Code: Select all

04/11/2016 04:02:37 AM - SYSTEM STATS: Time:156.1584 Method:spine Processes:1 Threads:4 Hosts:45 HostsPerProcess:45 DataSources:1309 RRDsProcessed:805
[...]
04/09/2016 01:14:55 AM - SYSTEM STATS: Time:294.1995 Method:spine Processes:1 Threads:4 Hosts:45 HostsPerProcess:45 DataSources:1309 RRDsProcessed:237
First one is from today where the Internet connection is alive. Second one ist from the day where Internet was broken at all. As I have a polling intervall of five minutes I would say it could be correct that there has not been enough time to poll the last host- the one which is affected. Because it ran for 294sec = 5 minutes. Am I right?


Here is the head of the RECACHE entries- I am going to check some docs what a RECACHE means...

Code: Select all

# grep "RECACHE" cacti.log| head
04/11/2016 04:02:37 AM - PCOMMAND: Poller[0] Host[105] RECACHE: Recache for Host, data query #1
04/11/2016 04:02:37 AM - PCOMMAND: Poller[0] Host[105] RECACHE: Recache successful.
04/11/2016 04:02:37 AM - RECACHE STATS: RecacheTime:0.2753 HostsRecached:1
04/11/2016 04:05:01 AM - CACTID: Poller[0] Host[25] RECACHE: Processing 2 items in the auto reindex cache for '10.101.0.10'
04/11/2016 04:05:01 AM - CACTID: Poller[0] Host[47] RECACHE: Processing 2 items in the auto reindex cache for 'cacti'
04/11/2016 04:05:02 AM - CACTID: Poller[0] Host[74] RECACHE: Processing 2 items in the auto reindex cache for 'backup'
04/11/2016 04:05:02 AM - CACTID: Poller[0] Host[77] RECACHE: Processing 2 items in the auto reindex cache for 'my'
04/11/2016 04:05:02 AM - CACTID: Poller[0] Host[90] RECACHE: Processing 2 items in the auto reindex cache for 'inf'
04/11/2016 04:05:02 AM - CACTID: Poller[0] Host[105] RECACHE: Processing 1 items in the auto reindex cache for 'ab3'
04/11/2016 04:05:02 AM - CACTID: Poller[0] Host[107] RECACHE: Processing 1 items in the auto reindex cache for 'ab2'
User avatar
phalek
Developer
Posts: 2838
Joined: Thu Jan 31, 2008 6:39 am
Location: Kressbronn, Germany
Contact:

Re: Graphs with Holes

Post by phalek »

First one is from today where the Internet connection is alive. Second one ist from the day where Internet was broken at all. As I have a polling intervall of five minutes I would say it could be correct that there has not been enough time to poll the last host- the one which is affected. Because it ran for 294sec = 5 minutes. Am I right?
Yes indeed.
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

phalek wrote:
First one is from today where the Internet connection is alive. Second one ist from the day where Internet was broken at all. As I have a polling intervall of five minutes I would say it could be correct that there has not been enough time to poll the last host- the one which is affected. Because it ran for 294sec = 5 minutes. Am I right?
Yes indeed.
Will it help to increase the number of threads for Spine?
User avatar
phalek
Developer
Posts: 2838
Joined: Thu Jan 31, 2008 6:39 am
Location: Kressbronn, Germany
Contact:

Re: Graphs with Holes

Post by phalek »

Well, es. As each thread is polling a device and will do so until finished or the timeout occurrs. So you may actually want to decrease the timeout of some devices to a number where you know it's working ok when everything is fine. If you keep high timeouts and the device is not reachable, the spine thread will wait until that timeout is reached before proceeding with the next device.

It's a matter of playing around with different settings to find out the one that fits best for you,
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
User avatar
knebb
Cacti User
Posts: 138
Joined: Tue Sep 19, 2006 11:29 am

Re: Graphs with Holes

Post by knebb »

Hi,

thanks for all your tips!

I increased the number of max threads for Spine from 4 to 8 and under default conditions it looks much better:

Code: Select all

04/11/2016 11:32:30 PM - SYSTEM STATS: Time:149.3865 Method:spine Processes:1 Threads:4 Hosts:45 HostsPerProcess:45 DataSources:1309 RRDsProcessed:805
04/11/2016 11:36:37 PM - SYSTEM STATS: Time:95.5160 Method:spine Processes:1 Threads:8 Hosts:45 HostsPerProcess:45 DataSources:1309 RRDsProcessed:805
Additionally I decreased the retry value from 3 to 2.

So I assume the next connection issue will work fine.

Thanks!

/KNEBB
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest