NetApp Filer: graphing Performance Stats and IO's (template)

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

whippy
Posts: 4
Joined: Thu Jun 18, 2009 5:50 am

Re: Not discovering objects

Post by whippy »

kkoduru wrote:Hi Gurus

I was able to get the ontap sdk and import the template. The first issue I faced was with perl where it complains about "\N" and I had to give the entire path with double-backslashes
use lib "C:\\manage-ontap-sdk-1.6\\lib\\perl\\NetApp"

Now, when i discover the filer, it cannot find any objects with the below message.
This data query returned 0 rows, perhaps there was a problem executing this data query. You can run this data query in debug mode to get more information.

Upon running in verbose mode, below is the output

+ Running data query [17].
+ Found type = '4 '[script query].
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system index'
+ Executing script query 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system query index'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'

If i run the script manually, it works just fine returning the information that is expected to see..the lun info etc.

Could you please point me where I am doing wrong?

thanks in advance
KK
I'm getting exactly the same as what you had above any clues as to what im doing wrong?

Data Query Debug Information

+ Running data query [12].
+ Found type = '4 '[script query].
+ Found data query XML file at '/var/www/html/resource/script_queries/query-netapp-ontapsdk-volume.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl /var/www/html/scripts/netapp-ontapsdk-perf.pl **** "*****" "****" volume index'
+ Executing script query 'perl /var/www/html/scripts/netapp-ontapsdk-perf.pl **** "****" "****" volume query index'
+ Found data query XML file at '/var/www/html/resource/script_queries/query-netapp-ontapsdk-volume.xml'
+ Found data query XML file at '/var/www/html/resource/script_queries/query-netapp-ontapsdk-volume.xml'
+ Found data query XML file at '/var/www/html/resource/script_queries/query-netapp-ontapsdk-volume.xml'
User avatar
wwwdrich
Cacti User
Posts: 91
Joined: Thu Feb 03, 2005 5:53 pm
Location: San Jose, CA
Contact:

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by wwwdrich »

I have been using these scripts for a few months now and love them! I have one problem though - our storage team frequently adds new volumes and also moves things around. Does anyone have any suggestions for automating the process of managing this?

Adding new volumes is easy, I have a script that will walk all of the NetApps in the database and add new graphs using the CLI tools (host type 28 is my netapp host templatem snmp-query-id 31 is my NetApp OntapSDK volume query, snmp query types 67 and 68 are IO and Latency):

Code: Select all

#!/bin/sh

cd /var/www/html/cacti/cli

# Reindex each host
for host in `php add_graphs.php --list-hosts | awk '{if ($3 == 28) print $1}'`; do
  echo $host
  # Reindex host
  php host_update_template.php --host-id=$host --host-template=28

  # Add Latency and Ops graphs
  for vol in `./add_graphs.php --list-snmp-values --snmp-field=index --snmp-query-id=31 --host-id=${host} | egrep -vi 'Known'`; do
    ./add_graphs.php --host-id=${host} --graph-type=ds --graph-template-id=163 --snmp-query-id=31 --snmp-field=index --snmp-value="$vol" --snmp-query-type-id=67
    ./add_graphs.php --host-id=${host} --graph-type=ds --graph-template-id=164 --snmp-query-id=31 --snmp-field=index --snmp-value="$vol" --snmp-query-type-id=68
  done
done
However, the one thing I can't figure out how to do is to automate the removal of graphs where the volumes have been removed. Sure I can search the logs and do them by hand for the volumes that are reporting errors, but that is painful and time consuming. Does anyone have any ideas for how to automate it? Unfortunately there isn't a rm_graphs.php script in the CLI directory.
- Dan
[i] "Step up to red alert!" "Are you sure, sir?[/i]
[i] It means changing the bulb in the sign..." - Red Dwarf[/i]
fotan
Posts: 7
Joined: Tue Dec 13, 2011 1:59 am
Location: Russia, Ryazan
Contact:

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by fotan »

I try to use SDK, but in my data queurie i have ststus Success [0 Items, 0 Rows] in any positions. I install my SDK in cacti follow instruction. In debug mode i have next information:
+ Running data query [26]. next
+ Found type = '4 '[script query].
+ Found data query XML file at '/var/www/cacti/resource/script_queries/query-netapp-ontapsdk-lun.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl /var/www/cacti/scripts/netapp-ontapsdk-perf.pl xxx.xxx.xxx.xxx "changeto username" "changeto password" lun index'
+ Executing script query 'perl /var/www/cacti/scripts/netapp-ontapsdk-perf.pl xxx.xxx.xxx.xxx "changeto username" "changeto password" lun query index'
+ Found data query XML file at '/var/www/cacti/resource/script_queries/query-netapp-ontapsdk-lun.xml'
+ Found data query XML file at '/var/www/cacti/resource/script_queries/query-netapp-ontapsdk-lun.xml'
+ Found data query XML file at '/var/www/cacti/resource/script_queries/query-netapp-ontapsdk-lun.xml'

HTTP on netapp is enabled. What i am doing wrong? I'll be glad to any help...
jbossert
Posts: 3
Joined: Thu Jan 26, 2012 2:04 pm

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by jbossert »

Can someone provide me with Manage OnTap SDK 3.0 Perl apis files ?

As Netapp OnTap SDK current version is now 4.1, i don't see any Perl script in the archive (it's a .net bindings now) ...

Or is there any workaround to get stats and io's without SDK ? (i'm able to get the usage, cpu, cache age ... , but not the io's out of snmp)
yalla
Posts: 2
Joined: Tue May 19, 2009 9:44 am

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by yalla »

He guys,

first post for me! :-) So. We have a bunch of FAS3240 Filers and I was quite happy to find Cacti Templates which seemed to work right away. I was a little annoyed by the fact that my filer's APIs give microseconds so I added a CDEF which divided by 10^6.

However, the volume latency numbers were totally off. It showed latencies of around 2-7 seconds for read and average, but only 20 milliseconds for write. OK, this filer churns about 450 MB/s, but *anyway*, this is surely not normal. Also I'm currently trying to debug some performance problems and so I investigated a little further.

I logged into the Solaris clients and did measure the NFS latency with iostat -x. It showed latencies of around 4 - 5 milliseconds. I was so confused that I took it one step further and checked out the nfsv3 API and the nfs_read_latency counters, which are milliseconds by the way. They also gave readings of about 4 - 5 milliseconds.

So we took a whiteboard and made up some figures.

The original formula

Image

does basically the following. You grab a counter at t0. Then the Cacti poller will wait for it's 300 seconds, and get the next value (at time t1=t0+300 seconds). The above formula is being applied. But, and now comes the point, you still have to divide by the 300 seconds polling interval!

So now, because I really like to have seconds, a logarithmic graph with SI-units, I divide by 300 * 10^6 to get the correct readings.

I'm using this formula now in all my time-based graphs:

Image
(for dt the polling interval in seconds - might differ if you run scripts with data input methods, like I do most of the time)

And - abracadabra! - the values from volume-latency, nfsv3-latency and iostat from Solaris match.

EDIT: You don't need to divide by the polling interval for processor:processor_busy by the way - because these are timeticks.

And, for the fun, here the graph where I didn't divide by the polling interval. I almost had an heart attack:

Image

What I didn't find out yet (will do that tomorrow) is if the read- and write-counters have differnt bases. Because, to be honest, I don't believe that 20 MICROseconds is a reasonable write-latency... More on that later.

Fly safe,
Alex.
Exo7
Cacti User
Posts: 136
Joined: Wed Jul 13, 2005 4:50 pm

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by Exo7 »

When you issue a write to a Netapp filer, it is journalized in NVRAM and a ack is sent back.
so the write latency is basicaly 2 x network transfert time + NVRAM latency, which could be in the microsecond range.

NVRAM is flushed to disk every 10 seconds or when it is half full. Write latency typically increase when disks cannot keep up with NVRAM flushing, but WAFL is optimized in a way so that random write result in sequential write to disk, resulting in very efficient writes.
ezaton
Posts: 1
Joined: Sun Jun 23, 2013 3:37 pm

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by ezaton »

I have been using this template/scripts for a very long while now, and I was very happy with them. However, when accessing several large NetApp devices, with hundreds of objects (tenths of volumes, hundreds of LUNs), the script became too slow. It consumes high amounts of CPU and rarely makes it in time (well, never, to be exact). Since I query the storage once every minute, it has become impossible to grab graphs.

I am not a Perl programmer, but a shell-script one. I wrote a wrapper script around the excellent Perl script which allows only a single data query to the NetApp per-minute, and every consecutive query is performed using local result files. It has reduces the run time from 'timed out (58 seconds) on 4 cores VM, with 100% CPU flat all the time' to 6-10 seconds full run. I am attaching the modified version. All credit goes (and so it should be!) to the original author. He has made it possible. I have only adjusted it a little for large-scale setups.
You can read some more about it on my blog, at http://run.tournament.org.il/cacti-neta ... ata-query/
Attachments
NetApp_OnTap-SDK_cacti-20130623.tgz
cacti template, scripts and .xml files
(257.76 KiB) Downloaded 540 times
saq
Posts: 1
Joined: Thu Aug 15, 2013 2:31 pm

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by saq »

I've been loving that data I can get with this script and I've expanded my own templates to include more data that can be retrieved with the perl script, but I have run into a problem getting at a certain bit of data. Unfortunately my programming ability is and always has been rather weak, basic modifications to existing code would be the best way to describe it, I was hoping somebody familiar with this script could make some modifications.

I am trying to chart flashcache hit_percent (ext_cache_obj:ec0:hit_percent) data from my filers, and I have the counter data and how to interpret it, but unfortunately the collection script (and probably the graphs associated xml) needs to work a little differently.

Currently the netapp-ontapsdk-perf.pl script provided in this thread only gets one counter result at a time, which works fine for all base counters, but for counters where the real data you want is derived from other counters you can have a bit of a problem as the data can be a little off.
ext_cache_obj:ec0:hit_percent is a derived number from ext_cache_obj:ec0:accesses

In order to get the actual hit % number you do this little formula, which I fixed up with a little CDEF.
cdef=a,b,/,100,*
where B = hit percent and A = accesses

The problem comes in that in order for your real hit % value to be accurate you need to get accesses and hit_percent at the exact same time, which the script doesn't do. With the slight variation in retrieval of these two counters your data gets skewed and the number is no longer accurate.
Any way someone could modify the script so it returns multiple query results at once?

My envisioned usage
perl /usr/share/cacti/site/scripts/netapp-ontapsdk-perf.pl <filer> <user> <pass> ext_cache_obj get accesses hit_percent
ec0:accesses:18472641920
ec0:hit_percent:8808098116
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by um3n »

Hello Guys,

i work with this netapp ontap sdk script from you.
Thank you very much for that... the most of them works pretty well.

But when i perform a request for the read latency on a volume i get a number of "seconds" i dont now to deal with it...

Code: Select all

root@cacti64:/var/tmp# /usr/share/cacti/site/scripts/netapp-get-data.sh hostname "cacti" "my_secret_pass_i_dont_tell_you" volume get read_latency vol_vm_dev;
58834408331
What is the unit of this number?

When i check the same volume on the netapp i get these values.

Code: Select all

hostname> stats show -i 1 -n 10 volume:vol_vm_dev:read_latency
Instance read_latency
                   us
vol_vm_dev            0
vol_vm_dev      5082.75
vol_vm_dev     14706.00
vol_vm_dev            0
vol_vm_dev            0
vol_vm_dev      9291.00
vol_vm_dev      5293.00
vol_vm_dev            0
vol_vm_dev            0
vol_vm_dev            0
Can anyone help me, to get a clear view on that?

Thank you for the response and have a great day...

Ulli


btw:
gheppner wrote: ... Ok, after some additional investigation I've concluded the following:

1) the units returned by the API are in milliseconds, not microseconds.
2) the value returned by a call to avg_latency is not representative of the average latency per operation, but the avg latency of the total ops in a given polling period.

I added total_ops as a data source to the lun latency graph template, and then used a CDEF to divide the latency by the total ops. I now get values in the 3 - 8 ms range that are consistent with what the filer shows with lun stats -o -i 5 <lun name>.

I'm curiuos if anyone else using these templates has noticed what I've noticed, or if I'm way out in left field here.
Can anyone explain me where i have to make this configuration and how would that be done?

Edit:
There is something else what drives me crazy :)

When i take a look at the graphs in cacti, there are values that i dont understand.

The Values dont match to the value i got as response via the perlscript. Ok i know... its a Counter, so i have to take the delta to get the right values. But, was does cacti do with the values to get this result.

Thank you for the Response.
Attachments
cacti.png
cacti.png (49.74 KiB) Viewed 5832 times
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by um3n »

no one here... that can help me?

what a sad story :( iam to late to get any answer :)
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by um3n »

*push*
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by eschoeller »

We shutdown our filers a long time ago. Now we have new ones. I am leagues behind you in getting the graphs working again. I'm getting an API error - so I have nothing to work with to begin helping you. Once I get graphing again, I can offer to help you.

If you want to help *me* with this error:

Code: Select all

Unable to find API: perf-object-instance-list-info
Then maybe I can help *you* :)

In general. I've seen problems with latency counters and graphs for all sorts of storage systems. It's something I've come to accept. It's sometimes very difficult to get the raw data the storage system provides to sync up with the pretty pictures generated within their own UI's. How the developers calculate and provide latency data seems like voodoo magic sometimes. With a lot of graphs I don't necessarily focus on the values, but more the change in trends. Has the latency (whatever it is) gone up or down over time? I am not discounting the value in having accurate data, but sometimes you just can't get there.

In the last part of your post it seems like you're interested in how Cacti uses a COUNTER to graph data.
But, was does cacti do with the values to get this result
This is really an RRDTool question, not a Cacti question. RRD folks (Tobi, or the rrd-users mail list) could do a much better job explaining this than I could. Look at these links for a start:
http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
http://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html

Search cor COUNTER in both documents. hope this helps.
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by um3n »

The configuration of the netapp ist correct?
We have to activate http (ok, we are using https :) ) and create a user with the right permissions to get the data we want to.

The Firewall?
I am sorry for these "low-format questions" :)
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by eschoeller »

I am getting data back for some of the commands. I can see HTTP traffic flowing just fine between the Cacti server and the NetApp. I will be sitting down with the storage admin sometime next week to hash it out and see if we have some sort of permissions problem.
User avatar
um3n
Posts: 39
Joined: Thu Jul 03, 2014 1:35 am

Re: NetApp Filer: graphing Performance Stats and IO's (templ

Post by um3n »

Waiting for the Storageadmin... i know this.

What happens if you send the request on the shell?
If you get an Response it must be an issue with cacti.

Code: Select all

'/usr/share/cacti/site/scripts/netapp-get-data.sh netapp.your.domain "USER" "PASSWORD" volume index'
'/usr/share/cacti/site/scripts/netapp-get-data.sh netapp01.your.domain "USER" "PASSWORD" volume query index'
Is there a positiv output?
Three tomatoes are walking down the street pappa tomato, mamma tomato, and a little baby tomato.Baby tomato starts lagging behind. Poppa tomato gets angry, goes over to the baby tomato, and smooshes him... and says, "Catch up"
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests