Where does the CPU statistic data templates perform the math to convert the ssCPURaw counters into percentages? The problem I am having is that FreeBSD/Net-SNMP reports 128 ticks per second as opposed to the "standard" 100 ticks per second. Many of the Linux hosts in our network with a SMP kernel also report 100 ticks per CPU per second.
Somewhere there is an equation that takes the number of ticks for ssCPURawIdle collected and divides this by the total number of possible ticks in the 5 minute interval to determine the percentage.
(ssCPURawIdle) / ( 100ticks * 60seconds * 5minutes)
If I can find this, I should be able to modify it in a new data template that uses 128ticks for the BSD hosts and 200 for the Dual CPU SMP Linux hosts.
Help? Rax, can you tell me where this is? I will be happy to share my work/templates when I get this solved for anyone interested.
Monitoring CPU Statistics
Moderators: Developers, Moderators
Temporary? Workaround
I have a temporary work around for anyone who is interested. What I have done is made a BSD specific CPU graph template. I also created two new CDEF's. One CDEF divides the current data source by 1.28 and the other divides all data sources without duplicates by 1.28. I then applied the appropriate CDEF to all of the data sources and their GPRINTS in the new BSD graph template. I plan to repeat this process for SMP Linux hosts shortly. If anyone is interested I will attempt to export my templates and/or my CDEF's. Please post on this thread and not to PM.
While this works for graphing the data "accurately" I still would like to fix the actual data collection process. I mine data out of the rrd files from time to time and it can be quite difficult to remember which file needs what type of data conversion. If anyone figures out where that ticks-to-percentage math is performed please let me know.
Thank you, thank you, thank you Ian for coming up with templates! This is yet another instance where they have saved the day.
While this works for graphing the data "accurately" I still would like to fix the actual data collection process. I mine data out of the rrd files from time to time and it can be quite difficult to remember which file needs what type of data conversion. If anyone figures out where that ticks-to-percentage math is performed please let me know.
Thank you, thank you, thank you Ian for coming up with templates! This is yet another instance where they have saved the day.
Two more things
You will also need to modify your "ucd/net - CPU Usage - type" data templates to have a max value that incorporates the 128 time ticks. (or 200, 400, etc for SMP Linux hosts) I have set this to 130 for now.
After modifying these templates the rrd files need to be updated with the new max. This can be accomplished in several ways but the easiest is to remove the files and start with fresh data. If you can not give up your old data, look into using rrdtoo dump/restore commands. Not terribly hard for most users.
If you are uncomfortable with modifying the data templates you can always create duplicates and modify them. Be sure to modify your graph templates accordingly.
After modifying these templates the rrd files need to be updated with the new max. This can be accomplished in several ways but the easiest is to remove the files and start with fresh data. If you can not give up your old data, look into using rrdtoo dump/restore commands. Not terribly hard for most users.
If you are uncomfortable with modifying the data templates you can always create duplicates and modify them. Be sure to modify your graph templates accordingly.
Once again - the thick plottens
Another setback - HyperThreading
It appears that HyperThreading CPU's are also causing issues with timeticks. I have now created an additional BSD Template the uses additional CDEF's to divide everything by 2.56 to account for the 256 timeticks per second that is reported on a Dell 2550 with Dual Pentium III 1.4GHz procs running FreeBSD 4.8.
Has anyone else had to deal with this?
Can someone also help get Ian's attention so I can get an answer to the data collection part of this thread?
It appears that HyperThreading CPU's are also causing issues with timeticks. I have now created an additional BSD Template the uses additional CDEF's to divide everything by 2.56 to account for the 256 timeticks per second that is reported on a Dell 2550 with Dual Pentium III 1.4GHz procs running FreeBSD 4.8.
Has anyone else had to deal with this?
Can someone also help get Ian's attention so I can get an answer to the data collection part of this thread?
I don't know what you really mean with "ticks", but looking at the dmesg of my FreeBSD 5.3, I found this :
Isn't that nearer to 100 than 128 ?
Maybe what I said is irrelevant, just wondering. Please tell me how to make sure what my tick value is, and how to check if CPU time is properly reported in Cacti.
Thanks.
Code: Select all
Timecounters tick every 10.000 msec
Maybe what I said is irrelevant, just wondering. Please tell me how to make sure what my tick value is, and how to check if CPU time is properly reported in Cacti.
Thanks.
Perhaps they have fixed it in 5.3. All of the boxes I have to work with are 4.8-4.10. They way I calculated the number of ticks was to walk the box at .1.3.6.1.4.1.2021.11, grep out the ssCpuRaw lines, and added their values. I then performed the same operation 60 seconds later, determined the delta, and divided by 60 to figure out the number of ticks reported per second.
Hope this helps.
Hope this helps.
How SNMP cpu is calculated and graphed
For systems, like linux <=2.6.(i don't remember), there are 100 ticks/sec. For each tick that cpu is in a certain state: idle, system, user, the respective snmp counter is incremented. Remember, "100 ticks/sec."
Now, the snmp data type is a "Counter." This data type when recorded by RRD is always calculated and stored as <value>/sec. So, if we snmpget and find 100,000 for cpusystem, then in 5 minutes, the default timing for cacti and the resolution of our RRds, we probe again and get 124,000, rrdtool will take 124,000-100,000 to get 24,000. Since this is of type counter, it divides what it gets by 300 (5 minute's worth of seconds) and gets 80. It then plots 80 which represents 80%. No calculation is done in cacti.
The max ticks that could have happened in 5 minutes is 300 * 100 = 30,000. So, you can see that 24,000/30,000 = 80%. Viola.
For systems that have multiple CPUs or have a different tick count, just create a custom CDEF to adjust the display.
Now, the snmp data type is a "Counter." This data type when recorded by RRD is always calculated and stored as <value>/sec. So, if we snmpget and find 100,000 for cpusystem, then in 5 minutes, the default timing for cacti and the resolution of our RRds, we probe again and get 124,000, rrdtool will take 124,000-100,000 to get 24,000. Since this is of type counter, it divides what it gets by 300 (5 minute's worth of seconds) and gets 80. It then plots 80 which represents 80%. No calculation is done in cacti.
The max ticks that could have happened in 5 minutes is 300 * 100 = 30,000. So, you can see that 24,000/30,000 = 80%. Viola.
For systems that have multiple CPUs or have a different tick count, just create a custom CDEF to adjust the display.
Who is online
Users browsing this forum: No registered users and 2 guests