LOGS worning

ahmedgamil · Post by **ahmedgamil** » Tue Jun 26, 2012 4:44 am

Dears

i have a big problem that im working in big environment that containing more than 500 device with more than 27,000 graph

the main problem that i have broken images in most graphs, and while we check the CACTI logs we find that

06/26/2012 10:57:13 AM - SYSTEM STATS: Time:1030.7991 Method:spine Processes:5 Threads:5 Hosts:517 HostsPerProcess:104 DataSources:53643 RRDsProcessed:26601

06/26/2012 11:20:02 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 38079, Data Sources: traffic_in(DS[644]), traffic_out(DS[644]), traffic_in(DS[708]), traffic_out(DS[708]), traffic_in(DS[755]), traffic_out(DS[755]), traffic_in(DS[957]), traffic_out(DS[957]), traffic_in(DS[958]), traffic_out(DS[958]), traffic_in(DS[1059]), traffic_out(DS[1059]), traffic_in(DS[1078]), traffic_out(DS[1078]), traffic_in(DS[1105]), traffic_out(DS[1105]), traffic_in(DS[1106]), traffic_out(DS[1106]), traffic_in(DS[1109]), traffic_out(DS[1109]), traffic_in(DS[1113]), Additional Issues Remain. Only showing first 20

im trying to flush the poller_output but it the error return again

also the
MAX current (ms) some times reach 2000 ms
the average (ms) reach 700
some device availability varies from 1 till 100 in some devices

the poller setting :
Poller Type spine
Maximum Concurrent Poller Processes 5
Maximum Threads per Process 5
Number of PHP Script Servers 4
Script and Script Server Timeout Value 3000
The Maximum SNMP OID's Per SNMP Get Request 10

and the SNMP Timeout 1000

is there a relation between this logs and the broken image
and could you tell me the reason of this errors and its effects on my graphs and how to solve it

artagel · Post by **artagel** » Tue Jun 26, 2012 10:48 pm

It would seem that yes, that error would be why your graphs aren't updating. Did you recently upgrade your cacti?

Check out this post: http://forums.cacti.net/about25765.html

Let me know if any of that is relevant.

Also, I suggestion googling this: site:forums.cacti.net Poller Output Table not Empty

There's quite a few posts on this topic. If none of them help, chime back in here. If they do, please update this post with the solution.
-Dan

ahmedgamil · Post by **ahmedgamil** » Wed Jun 27, 2012 6:52 am

artagel wrote:It would seem that yes, that error would be why your graphs aren't updating. Did you recently upgrade your cacti?

Check out this post: http://forums.cacti.net/about25765.html

Let me know if any of that is relevant.

Also, I suggestion googling this:

There's quite a few posts on this topic. If none of them help, chime back in here. If they do, please update this post with the solution.
-Dan

really no upgrades done .

and as per the posts and search i flush the poller output table, then it work fine for 4 hours then the problem repeated

also there are 2 error logs appear :

06/26/2012 05:31:22 PM - CMDPHP: Poller[0] ERROR: There are no RRA's assigned to local_data_id: 10802.

06/26/2012 05:02:09 PM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.

but i cant find a solution while im searching

so, have you any information about these logs and what is leads to and how to solve it

thanks

artagel · Post by **artagel** » Wed Jun 27, 2012 7:08 am

So it seems that at some point your poller is isn't finishing a full poller cycle, which is then causing your poller_output table issue.

You need to figure out why your poller isn't finishing in 300 seconds.

Does this happen every poller cycle:
06/26/2012 10:57:13 AM - SYSTEM STATS: Time:1030.7991 Method:spine Processes:5 Threads:5 Hosts:517 HostsPerProcess:104 DataSources:53643 RRDsProcessed:26601

If it does, then you definitely have a problem. That is saying that it takes 17 minutes to perform a full poller cycle. Have you played around with optimizing the processes/threads of spine?
Are you sure your system can handle this many data sources? 53000 is a lot.

Can you post the technical support page information? Has this worked in the past or is this a relatively new installation? Can you check the cpu usage on the computer?
-Dan

ahmedgamil · Post by **ahmedgamil** » Thu Jun 28, 2012 4:16 am

the attached file is the technical support

also these logs appear every cycle

06/28/2012 11:05:02 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 39120, Data Sources: traffic_in(DS[644]), traffic_out(DS[644]), traffic_in(DS[708]), traffic_out(DS[708]), traffic_in(DS[755]), traffic_out(DS[755]), traffic_in(DS[957]), traffic_in(DS[958]), traffic_out(DS[958]), traffic_in(DS[1059]), traffic_out(DS[1059]), traffic_in(DS[1078]), traffic_out(DS[1078]), traffic_in(DS[1105]), traffic_out(DS[1105]), traffic_in(DS[1106]), traffic_out(DS[1106]), traffic_in(DS[1109]), traffic_out(DS[1109]), traffic_in(DS[1113]), traffic_out(DS[1113]), Additional Issues Remain. Only showing first 20

06/28/2012 11:04:02 AM - SYSTEM STATS: Time:840.8509 Method:spine Processes:5 Threads:5 Hosts:510 HostsPerProcess:102 DataSources:52039 RRDsProcessed:26093

but the time of "SYSTEM STATS: Time:840.8509" may changed but it always over 500

artagel · Post by **artagel** » Thu Jun 28, 2012 4:39 am

You need the system stat time to be consistently under 300 if you want to graph things every 5 minutes.
Do you have boost installed? I suppose it could be since you have so many data sources that boost could help you considerably. (http://docs.cacti.net/plugin:boost)
Earlier you said "the average (ms) reach 700". Did you mean that devices are averaging 700ms a run?

The trick here is to figure out why your data collection is taking so long. You could move your SNMP timeout back to something lower, like 300 to see if lowers your polling cycle. You might have some items timeout, but it'll help us narrow down the problem.

You see, if you have 517 devices, and each runs for 700ms, then you'd be taking 361900ms to run, which is 361.9 seconds, that's already more than 300 seconds! If you have some going all the way up to 2000 to timeout, then you'll definitely be at 1000 seconds per run.

You could also decrease your overall time if you increases your threads. Try increasing your threads to a number between 15 and 50. Increases threads means you'll have more threads running at the same time to complete the polling process. That means that even if some of the devices are slow, since you have more threads going at once, it won't keep as many things in the queue behind it. This could reduce your polling time significantly.

If you are going to try any of this, make sure you do it one at a time so you know how it's changing things.

Let us know what you find out.
-Dan

ahmedgamil · Post by **ahmedgamil** » Thu Jun 28, 2012 7:07 am

Dear Dan
many thanks for your reply

artagel wrote: Earlier you said "the average (ms) reach 700". Did you mean that devices are averaging 700ms a run?

the average (ms) may reach 700 for some devices not overall devices, but let me be sure that the current time (ms) is the time of device response and this time is not related to SNMP timeout

really i increased the no of threads before but no changes and the same lod wornings appears

but i will do it again and feed you back

Cacti

LOGS worning

LOGS worning

Re: LOGS worning

Re: LOGS worning

Re: LOGS worning

Re: LOGS worning

Re: LOGS worning

Re: LOGS worning

Who is online