DOCSIS Stats

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

jt72
Posts: 5
Joined: Mon Jul 14, 2008 6:25 pm
Location: Cork, Ireland

Answered my own question

Post by jt72 »

Was able to get docsIfDownChannelPower > 80 to be graphed by changing the data template for DOCSIS Stats to use 200 to -200.

Removed the data sources and graphs using the old template and adding them back then did the trick.
Attachments
example graph
example graph
dspow.png (34.16 KiB) Viewed 10774 times
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Answered my own question

Post by BSOD2600 »

jt72 wrote:Was able to get docsIfDownChannelPower > 80 to be graphed by changing the data template for DOCSIS Stats to use 200 to -200.

Removed the data sources and graphs using the old template and adding them back then did the trick.
Interesting that worked.

Only need to modify the data template, since that is what the rrd file is created from. Once the rrd file exists, then you either must run 'rrdtool tune' to change the new DS size(s); cacti does not do this for you.

No need to delete the graph template(s). Glad you figured it out.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

Thread revival powers ACTIVATE!

So, I've got gapping in my DOCSIS graphs, and only my DOCSIS graphs. The gaps coincide with the poller warning over half the time:
09/25/2008 06:08:06 PM - SPINE: Poller[0] Host[165] DS[904] SS[3] WARNING: Result from SERVER not valid. Partial Result: ...
Each polling cycle I get one or two errors like this. Sometimes it goes without a problem. I just switched over to debug, and the request is not malformed -- just the response.

Not matching the example above because I'm lazy, you can see the gaps in one of the more common error devices in the attached picture.

The signal levels are perfectly within spec (nearly 0 forward, 44 return, great snr -- we shoot for 0/40/30+ around here), so it's not a signal issue that's causing this. Version information:

Ubuntu 8.04.1 / 2.6.24-16-server
Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.3 with Suhosin-Patch
SPINE 0.8.7c-beta1
Cacti v0.8.7b
RRDTool 1.2.19

I have a feeling it has something to do with the fact that there are 2 data sources for DOCSIS Stats on the same device. I've been pruning these off (and have removed console access for several people to prevent this from happening again...), and that seems to help the situation a little bit. However, in the example I attached, there are only two data sources for this host -- traffic and signal levels.

The other problem, which isn't too big of a deal, is noticeable on the attached graph. Forward signals between -1 and 1 show up improperly formatted. Is there a fix for this?
Attachments
docsisgaps.jpg
docsisgaps.jpg (35.89 KiB) Viewed 10508 times
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

For the gaps, you sure the modem is returning data? How is the returned data malformed?

As for the download signal level number issue, read what the m means.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

BSOD2600 wrote:For the gaps, you sure the modem is returning data? How is the returned data malformed?
The modem is online, active, pinging, etc. It just returns "U". See below for snippet.
09/25/2008 08:33:04 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:33:04 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:33:11 PM - SPINE: Poller[0] Host[108] DS[508] SS[3] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: docsIfDownChannelPower:62 docsIfSigQSignalNoise:370 docsIfSigQMicroreflections:32 docsIfCmRangingTimeout:5 docsIfCmStatusTxPower:337 docsIfCmStatusResets:636 docsIfCmStatusLostSyncs:3 docsIfCmStatusT1Timeouts:600 docsIfCmStatusT2Timeouts:0 docsIfCmStatusT3Timeouts:581 docsIfCmStatusT4Timeouts:2
09/25/2008 08:33:12 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055415558
09/25/2008 08:33:12 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105224887
09/25/2008 08:33:12 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:34:04 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:34:04 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:34:07 PM - SPINE: Poller[0] Host[108] DS[508] SS[0] WARNING: Result from SERVER not valid. Partial Result: ...
09/25/2008 08:34:07 PM - SPINE: Poller[0] Host[108] DS[508] SS[0] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: U
09/25/2008 08:34:08 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055452106
09/25/2008 08:34:08 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105227739
09/25/2008 08:34:08 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:35:04 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:35:04 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:35:11 PM - SPINE: Poller[0] Host[108] DS[508] SS[4] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: docsIfDownChannelPower:60 docsIfSigQSignalNoise:369 docsIfSigQMicroreflections:31 docsIfCmRangingTimeout:5 docsIfCmStatusTxPower:337 docsIfCmStatusResets:636 docsIfCmStatusLostSyncs:3 docsIfCmStatusT1Timeouts:600 docsIfCmStatusT2Timeouts:0 docsIfCmStatusT3Timeouts:581 docsIfCmStatusT4Timeouts:2
09/25/2008 08:35:11 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055480597
09/25/2008 08:35:11 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105230428
09/25/2008 08:35:11 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:36:05 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:36:05 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:36:08 PM - SPINE: Poller[0] Host[108] DS[508] SS[0] WARNING: Result from SERVER not valid. Partial Result: ...
09/25/2008 08:36:08 PM - SPINE: Poller[0] Host[108] DS[508] SS[0] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: U
09/25/2008 08:36:09 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055523279
09/25/2008 08:36:09 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105233108
09/25/2008 08:36:09 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:37:04 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:37:04 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:37:08 PM - SPINE: Poller[0] Host[108] DS[508] SS[0] WARNING: Result from SERVER not valid. Partial Result: ...
09/25/2008 08:37:08 PM - SPINE: Poller[0] Host[108] DS[508] SS[0] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: U
09/25/2008 08:37:09 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055596255
09/25/2008 08:37:09 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105235049
09/25/2008 08:37:09 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:38:04 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:38:04 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:38:11 PM - SPINE: Poller[0] Host[108] DS[508] SS[5] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: docsIfDownChannelPower:59 docsIfSigQSignalNoise:368 docsIfSigQMicroreflections:31 docsIfCmRangingTimeout:5 docsIfCmStatusTxPower:337 docsIfCmStatusResets:636 docsIfCmStatusLostSyncs:3 docsIfCmStatusT1Timeouts:600 docsIfCmStatusT2Timeouts:0 docsIfCmStatusT3Timeouts:581 docsIfCmStatusT4Timeouts:2
09/25/2008 08:38:11 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055621686
09/25/2008 08:38:11 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105237965
09/25/2008 08:38:11 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:39:04 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:39:04 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:39:11 PM - SPINE: Poller[0] Host[108] DS[508] SS[3] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: docsIfDownChannelPower:56 docsIfSigQSignalNoise:371 docsIfSigQMicroreflections:32 docsIfCmRangingTimeout:5 docsIfCmStatusTxPower:337 docsIfCmStatusResets:636 docsIfCmStatusLostSyncs:3 docsIfCmStatusT1Timeouts:600 docsIfCmStatusT2Timeouts:0 docsIfCmStatusT3Timeouts:581 docsIfCmStatusT4Timeouts:2
09/25/2008 08:39:11 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055681073
09/25/2008 08:39:11 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105241585
09/25/2008 08:39:11 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/25/2008 08:40:03 PM - SPINE: Poller[0] Host[108] SNMP Result: Host responded to SNMP
09/25/2008 08:40:03 PM - SPINE: Poller[0] Host[108] RECACHE: Processing 1 items in the auto reindex cache for 'xx.xx.xx.xx'
09/25/2008 08:40:10 PM - SPINE: Poller[0] Host[108] DS[508] SS[3] SERVER: /usr/share/cacti/site/scripts/ss_docsis_stats.php ss_docsis_stats xx.xx.xx.xx lolsnmpkey 1 161 500 , output: docsIfDownChannelPower:61 docsIfSigQSignalNoise:367 docsIfSigQMicroreflections:32 docsIfCmRangingTimeout:5 docsIfCmStatusTxPower:337 docsIfCmStatusResets:636 docsIfCmStatusLostSyncs:3 docsIfCmStatusT1Timeouts:600 docsIfCmStatusT2Timeouts:0 docsIfCmStatusT3Timeouts:581 docsIfCmStatusT4Timeouts:2
09/25/2008 08:40:10 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.2, value: 3055703631
09/25/2008 08:40:10 PM - SPINE: Poller[0] Host[108] DS[516] SNMP: v1: xx.xx.xx.xx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 105243299
09/25/2008 08:40:10 PM - SPINE: Poller[0] Host[108] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
I played with some settings. I currently have Spine at 2 max proc, 32 thread, 10 php script servers, 20 OID's per request. The results above were the same settings but with only 6 PHP script servers. I turned that setting down to 1 and I would get about 15-25 of these warnings per cycle. With it at 10, the warnings have all but gone away (once every 10-15 cycles perhaps?).

I also have a new warning that came up:
09/25/2008 09:45:12 PM - PCOMMAND: Poller[0] Host[132] WARNING: Recache Event Detected for Host
This is with 10 php script servers.
BSOD2600 wrote:As for the download signal level number issue, read what the m means.
I assumed as much, but is there a way to lock units? I don't recall seeing it, though I haven't scoured the graph settings.


I'll play with some of the poller settings to see if I can eliminate the traces of warnings I'm still seeing.

Thanks for the input.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

Cut down to 1 process@32 threads, 10 php script servers, no warnings at all.

Purging logfile, will let it go overnight and grep for warnings.

Thanks!
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

Scratch that, still errors now and then. Went 50 minutes without an error though, heh.

Any ideas?
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

Offhand, my WAG would be that the modem isn't responding to the frequent requests... but if that was the case then the script server should be returning NaN for each field. hmmm

For testing, if you start up the php script server and manually run the docsis script in rapid succession, does every attempt return data?
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

I have other snmp-based tools that pull the same information at request, and those can be automated at 15 seconds or manually refreshed at 4-5 second rates.

I fired up a script server, and pasted in the script query in rapid succession without getting an error.

I then pasted in about 25 queries in at once, and the script server errored out saying "Maximum runtime of 52 seconds exceeded for the Script Server. Exiting." after only 10 replied.

Could it be a timeout issue causing this? I'm at 1 minute intervals, but there are only 35 entries. I'm at 1 spine process, 10 threads, 10 script servers -- the whole poll takes 8 seconds, including my other ~150 devices. Now and then it'll bump up to ~30 seconds for a couple days (have NO idea why) then drop back to 8 seconds.

When I did a debug spine query, I saw nothing about the script server timing out due to it taking too long.


Now, I'm not a pro at cacti or even snmp, but is there a way to eliminate the PHP script and have it query these values just as it would Interface Statistics? That would at least help narrow down the issue.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

asellus wrote:I fired up a script server, and pasted in the script query in rapid succession without getting an error.

I then pasted in about 25 queries in at once, and the script server errored out saying "Maximum runtime of 52 seconds exceeded for the Script Server. Exiting." after only 10 replied.

Could it be a timeout issue causing this?
I'm thinking you might be onto something there... although, I haven't played around with my script in ages (and my current cable modem is snmp locked out -- damn you Comcast) so I can't even play.
asellus wrote:Now, I'm not a pro at cacti or even snmp, but is there a way to eliminate the PHP script and have it query these values just as it would Interface Statistics? That would at least help narrow down the issue.
No, because we need to fetch specific OIDs, which are not indexed like Interface Statistics is. The OIDs for the cable modem stats do not change, where as Interface Statistics can/do.

If you open up ss_docsis_stats.php, you'll see that it merely executes the cacti_snmp_get function for each of those OIDs in the array. There are many other of my scripts (like the MIB stats) which do to exact same thing and users haven't reported problems... so I'm leaning towards an issue with your cacti installation or cable modem.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

Makes sense. A shame that Comcast binfiles lock out snmp traffic on all but the cable interface...

I'm led to believe it is on the server end of things, since adjusting the settings down to one script server and trying to run these every minute causes the issue more than having it at 10 servers. On the other hand, I found out the script server will only stay running for 52 seconds period because of the polling cycle.

I'm pretty stumped. It's happening way less now with 10 script servers, so whatever I guess.

Thanks for the help, I'll be sure to hit this thread up if something else develops.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

Figured out why it errors. It simply takes too long to return all those results, and the script server halts (max 52 second runtime or whatever), hence my desire to make it part of the regular poller. Too bad it isn't indexed.

I trimmed the docsis script to only pull out the data I actually use, and it works just fine now. I get a reindex cache warning here and there, but otherwise everything polls as it should.

There is one issue that remains, which was present during all of these issues. Hosts randomly will not populate the downstream power on the graph, despite the script returning a valid value for it. Manual runs show that yes, the value is indeed there. I only have it pulling downstream power, upstream power, and downstream snr.

It's truly random -- there are no similarities between each host outside of the cable modem and firmware version. But even at that, the working hosts are on the same hardware and software. They're on different systems, different hubs, different nodes... it's baffling.

Any ideas to point me at what I should start breaking? :)

Thanks,
-Ian
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

I'm still happy to blame your modem for all these issues -- it shouldn't be timing out for 52+ seconds for SNMP queries. Maybe tweaking cacti's setting for how many snmp requests it sends at once, will make it better.

You should change your cacti logging level to medium and then watch if the blank downstream holes coincide with a lack of data returned during that polling session. Furthermore, you could do an rrdtool fetch (or dump) and validate that there truly is downstream data logged.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

I figured you might want to know...

It's been going for over half an hour now, but I believe I've sort of fixed the situation.

I rewrote the php script to not require the script server, then ghettotastically called it with a perl script, completely bypassing the script server. This fixed all the problems, but unfortunately added about 7 seconds onto my poll time. We'll see if the time starts growing as more and more devices are added, but it's looking promising so far!

I still have that weird downstream power issue, but no more gaps at this time! I suspect it to be a min/max conflict of some sort, as the data is being returned by the script, but not input into the rra.
asellus
Cacti User
Posts: 50
Joined: Thu Feb 21, 2008 12:09 am

Post by asellus »

Unf.

Last night I sat down and fixed things finally. I created some data templates for the specific OID's needed (causing multiple files for the single graph, but a little overhead is worth it imo). Decreased server load immensely, sped up polling a TON as well.

Went from 14-18 sec spiking to 40 seconds per poll down to 5.7 seconds.

Code: Select all

11/20/2008 02:06:07 PM - SYSTEM STATS: Time:5.6942 Method:spine Processes:1 Threads:16 Hosts:213 HostsPerProcess:213 DataSources:682 RRDsProcessed:479
11/20/2008 02:05:07 PM - SYSTEM STATS: Time:5.7220 Method:spine Processes:1 Threads:16 Hosts:213 HostsPerProcess:213 DataSources:682 RRDsProcessed:476
11/20/2008 02:04:07 PM - SYSTEM STATS: Time:5.7301 Method:spine Processes:1 Threads:16 Hosts:213 HostsPerProcess:213 DataSources:679 RRDsProcessed:476
11/20/2008 02:03:07 PM - SYSTEM STATS: Time:5.7210 Method:spine Processes:1 Threads:16 Hosts:213 HostsPerProcess:213 DataSources:679 RRDsProcessed:475
11/20/2008 02:02:07 PM - SYSTEM STATS: Time:5.7464 Method:spine Processes:1 Threads:16 Hosts:212 HostsPerProcess:212 DataSources:679 RRDsProcessed:477
11/20/2008 02:01:07 PM - SYSTEM STATS: Time:5.7352 Method:spine Processes:1 Threads:16 Hosts:212 HostsPerProcess:212 DataSources:679 RRDsProcessed:479
No more issues polling the forward signals either, and the graph only gaps when the host is actually down instead of randomly when the script server (or script in my case) times out. :)

So, I'm going to have to disagree with your previous statement. Yes, you CAN poll specific, un-indexed OID's using spine. :)
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest