95th percentile : way to calculate

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
cornut
Posts: 16
Joined: Tue Oct 25, 2005 11:04 am

95th percentile : way to calculate

Post by cornut »

Hi,

I know it's not really a cacti question but I'd like to expose you my problem.
I have cacti Graphs with IN, OUT, and 95th percentile which works great (|95:bits:0:aggregate_max:2|). My question is :
How is the 95th percentile calculated? With RRDFetch I'm able to get the different values of IN and OUT for my graphs, and I can calculate the average bandwidth for example --> it gives me the same result as Cacti does. I'd like to do the same with 95th percentile :

- What is the mathematic formula of 95th percentile ?
- Is it possible to get the 95th percentile of a period from a RRD file ?
- Is 95th percentile IN+OUT or just the highest of the two values?

Thank you by advance to help me, I know it is not a cacti problem, but I would like to calculate from a perl script the 95th percentile of a RRD file to get the same result as cacti.
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

Taken from: http://www.cacti.net/downloads/docs/htm ... _VARIABLES
aggregate_max:
Calculates the Nth percentile by selecting the highest value for each summed value of like data sources and selecting the maximum value of that set to calculate the Nth percentile value. Example, you have a graph with 5 traffic_in and 18 traffic_out data sources. The traffic_in rows are summed together, then the traffic_out rows are summed together, then for each row, the higher of the 2 values is selected. The Nth percentile is calculated from the resulting maximum values.
Wikipedia Entry:
http://en.wikipedia.org/wiki/Burstable_billing

More questions, let me know, and I will sit down and write out how Cacti does it.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
cornut
Posts: 16
Joined: Tue Oct 25, 2005 11:04 am

Post by cornut »

Thanx for your anwser, but I'm not sure to understand.
Example :
I use "Monthly (2 Hour Average)" (I know It's better to use Monthly 5min but that's OK for my use... it fits me.). So I have a february month with 28 days, 12*2hours for each day, so in mr RRD i've got :
28*12=336 lines.
336 IN 336 OUT
How can I calculate the 95th percentile with these values?

Thank you by advance for your lights :D
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

Ok, for the sake of this post, we are going to talk about a generic data set called "whatchas". This data set is a ever increasing counter in RRDTool. Why "whatchas"? Because I think it's funny, and we need to have some fun. :)

Ok, so we have a series of data stored, we want to calculate the 95th percentile for this series of data.

1. We take the set of data, this data is an array of values retrieved from RRDtool.

Code: Select all

whatchas = array: [ 2345, 3456, 1234, 5678, 7890, 5675, 3452, 56758, 2345, 234, 57788 ] 
2. We have to sort the array, because we are only interested in knowing the 95th highest value.

Code: Select all

whatchas = array: [ 234, 1234, 2345, 2345, 3452, 3456, 5675, 5678, 7890, 56758, 57788 ] 
3. We take the number of elements in the array (11) and we multiply by the Nth percentage we want (.95) and we then round that value.

Code: Select all

round( 11 * .95 ) = 10 = Array element of interest
4. We then return the 95th percentile number, which is the 10 element in the array.

Code: Select all

[ 234, 1234, 2345, 2345, 3452, 3456, 5675, 5678, 7890, 56758, 57788 ](10) = 56758 = 95th Percentile
Note: When rounding, this is a normal round. Decimal value < .5 = +0, > .5 = +1

Also, this is a simplified version of 95th percentile, it is important to note that all Nth percentile calculations use this simple principle, most of them are just an alteration of the data set array before selecting the Nth percentile value.

Calculating multiple Nth percentile values and then adding them together will not get the same value as adding the individual values from the same time span and then calculated the 95th percentile. That is why you seem aggregate Nth percentile variable in Cacti, these variable add the data sets together before they calculate Nth percentile.

Example perl code:

Code: Select all

#!/usr/bin/perl -w

use strict;

# Nth percentile data
my $Nth = 95;
my @data = qw(2345 3456 1234 5678 7890 5675 3452 56758 2345 234 57788);

# Sort the data
my @data_sorted = sort { $a <=> $b } @data;

# Calculate the position
my $pos = round( scalar(@data) * ( $Nth / 100 ) );

# Print the data set
print "Data: " . join(",", @data) . "\n";
print "Sorted: " . join(",", @data_sorted) . "\n";

# Print the Nth percentile
print $Nth . "th Percentile: " . $data_sorted[$pos - 1] . "\n";

exit;

sub round {
        return int($_[0] + .5 * ($_[0] <=> 0));
}
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
cornut
Posts: 16
Joined: Tue Oct 25, 2005 11:04 am

Post by cornut »

Hi Tony,
An thank you very much for your answer! I think I've understood the way to calculate but I've just a last question for you :

Must I take the OUT or the IN or 2 of them to have the result?

for now I do this, using only OUT traffic :

Code: Select all

my @SortedValuesOut = sort { $a <=> $b } @ValuesOut; 
sub roundOut { 
        return int($_[0] + .5 * ($_[0] <=> 0)); 
} 
my $posOut = roundOut( scalar(@ValuesOut) * ( $Nth / 100 ) ); 
$SortedValuesOut[$posOut - 1] = (((($SortedValuesOut[$posOut - 1]* 8 ) / 1024) / 1024) / 1024) / 1024;
$SortedValuesOut[$posOut - 1] = sprintf("%.2f", $SortedValuesOut[$posOut - 1]);

print $Nth . "th Percentile OUT : " . $SortedValuesOut[$posOut - 1] . "\n"; 

With that, on some graphs I'm very near the value of 95th given by cacti, but on some other graphs it's not at all good. Is it hasard? Is it because I use agregate_Max and the formula is not exactly the same? what about Inbound traffic (gennerally on my graph it is very low...)

Thanx again for your precisou help
cornut
Posts: 16
Joined: Tue Oct 25, 2005 11:04 am

Post by cornut »

That's OK, I had a bug in my program, I have exactly the right result, I don't need to divide by 1024.
Thank you for your Help roni!

Bye
Post Reply

Who is online

Users browsing this forum: No registered users and 9 guests