NetApp Filer: graphing Performance Stats and IO's (template)
Moderators: Developers, Moderators
-
- Posts: 8
- Joined: Wed Aug 10, 2005 10:05 pm
- Location: Montreal, Canada
NetApp Filer: graphing Performance Stats and IO's (template)
Hello,
Here is the host template and scripts I did this to graph storage performance for Netapp Filer using Manage Ontap SDK 3.0: Perl API.
graph list:
- LUN: IOPS, Latency, data throuput
- Volume: IOPS, Latency
- Target interfaces: IOPS
- filer total IOPS per protocols (FC/iSCSI/nfs/cifs/...)
See screenshot.
With the host template of Network-Appliance using SNMPv1 available on this forum, Gathering NetApp SAN performance statistics with Cacti is quite complete.
Requirements:
- Manage OnTap SDK 3.0 perl api install on cacti host
- Netapp Filer: http enable
tested on cacti Version 0.8.7b
Here is the host template and scripts I did this to graph storage performance for Netapp Filer using Manage Ontap SDK 3.0: Perl API.
graph list:
- LUN: IOPS, Latency, data throuput
- Volume: IOPS, Latency
- Target interfaces: IOPS
- filer total IOPS per protocols (FC/iSCSI/nfs/cifs/...)
See screenshot.
With the host template of Network-Appliance using SNMPv1 available on this forum, Gathering NetApp SAN performance statistics with Cacti is quite complete.
Requirements:
- Manage OnTap SDK 3.0 perl api install on cacti host
- Netapp Filer: http enable
tested on cacti Version 0.8.7b
- Attachments
-
- graph sample of Netapp-ontapsdk template.
- netapp_OnTap_graph.jpg (321.02 KiB) Viewed 62442 times
-
- NetApp_OnTap-SDK_cacti-20080602.tgz
- cacti template, scripts and .xml files
- (256.95 KiB) Downloaded 5669 times
Good work
I've been playing with the SDK for a few weeks and had a half working implementation of this when I saw your post. Templates all installed with no trouble. Everything seems to be graphing correclty. I wanted to make a dig at the color scheme, but it's growing on me -=]
Awesome work, you saved me a ton of time.
Awesome work, you saved me a ton of time.
Beautiful!! This was the first that I'd heard of the SDK. The installation was simple and the result are great. It is nice to have another view of what is going on inside my filer.
Just a question, would it make sense to repalce the other NetApp graphs done via SNMP with similar ones done via the SDK?
Thanks for your hard work.
Just a question, would it make sense to repalce the other NetApp graphs done via SNMP with similar ones done via the SDK?
Thanks for your hard work.
Problem
Hi,
those template looks great but I must confess that I couldn't make them work.
First I discovered that with cactid the full perl path should be provided in the xml files (query-netapp-ontapsdk-lun.xml..)
I fixed this but I keep getting a partial results error :
but if I run the script manually I get the correct answer :
Thx for any help,
Olivier
those template looks great but I must confess that I couldn't make them work.
First I discovered that with cactid the full perl path should be provided in the xml files (query-netapp-ontapsdk-lun.xml..)
I fixed this but I keep getting a partial results error :
Code: Select all
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] DS[2160] WARNING: Result from SCRIPT not valid. Partial Result: ...
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] DS[2160] SCRIPT: /usr/bin/perl /opt/apache/php/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 10.12.2.3 "cacti" "cacti2008" volume get avg_latency voloracle_1_archive, output: U
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] DEBUG: The POPEN returned the following File Descriptor 16
06/11/2008 12:22:05 AM - CACTID: Poller[0] Host[42] ERROR: Empty result [10.12.2.3]: '/usr/bin/perl /opt/apache/php/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 10.12.2.3 "cacti" "cacti2008" volume get avg_latency voloracle_1_data'
Code: Select all
# /usr/bin/perl /opt/apache/php/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 10.12.2.3 "cacti" "cacti2008" volume get avg_latency voloracle_1_data
8363761585
#
Thx for any help,
Olivier
-
- Posts: 6
- Joined: Wed Jan 25, 2006 7:06 pm
Cool NetAPP Query
Hi,
This looks cool, I have got the attached template working - thank you this great use of the SDK (very little known of in many circles of NetAPP apparently speaking to one of there SE) however it does not seem to include a lot of the graphs I can see from you pics the script queries systems etc seem to be present. but only get the following graphs when I use the host template.
I could quite easily concede I am missing something
All Nics+
cache age
CIFS Ops
CPU % Busy
NFS Ops
Many thanks for posting this info, its nice to be able to put NetAPP Performance in our common dash board i.e. Cacti and not just use DFM OppMan.
Kind regards,
Mark Kaye
This looks cool, I have got the attached template working - thank you this great use of the SDK (very little known of in many circles of NetAPP apparently speaking to one of there SE) however it does not seem to include a lot of the graphs I can see from you pics the script queries systems etc seem to be present. but only get the following graphs when I use the host template.
I could quite easily concede I am missing something
All Nics+
cache age
CIFS Ops
CPU % Busy
NFS Ops
Many thanks for posting this info, its nice to be able to put NetAPP Performance in our common dash board i.e. Cacti and not just use DFM OppMan.
Kind regards,
Mark Kaye
-
- Posts: 6
- Joined: Wed Jan 25, 2006 7:06 pm
plz Ignore previous post - user error (mine)
Sorry
Mark
Mark
-
- Posts: 8
- Joined: Wed Aug 10, 2005 10:05 pm
- Location: Montreal, Canada
Hi Mark,
Regarding missing graph, I would say that regarding protocol specific graph, it can be very easy to add since NFS and CIFS IOPS are provide for the "Per Protocol" graph.
All nics and cache age, for now we are monitoring them using SNMP and another cacti template provide somewhere in this forums.
It could be a good idea to add these feature in the SDK template and used only the SDK to gather stats... future project...
Thanks all for your comments.
Regarding missing graph, I would say that regarding protocol specific graph, it can be very easy to add since NFS and CIFS IOPS are provide for the "Per Protocol" graph.
All nics and cache age, for now we are monitoring them using SNMP and another cacti template provide somewhere in this forums.
It could be a good idea to add these feature in the SDK template and used only the SDK to gather stats... future project...
Thanks all for your comments.
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
This is great data! It seems to come with a heavy cost for us however. Is anyone else noticing severe performance issues after using this template?
I initially added over 300 data sources using this template in my development environment. It ran fine for a little while, until I noticed that performance had degraded so badly that my poller was timing out. Before adding these data sources my poller runtime was around 15s, then it was timing out after 56s.
I trimmed this down to only 60 data sources, but I'm still seeing terrible performance. I am using a 1 minute poller so I don't have much flexibility to run a polling interval longer than 45s. I attached several charts to indicate the issues I am seeing.
This is running on a Dell Precision 450 desktop. It only has one CPU and one disk, so take that into account, but please don't completely blame it on the hardware. If I had added an additional 60 SNMP data sources I would have never seen such performance loss.
I've tried tweaking #threads, #processes, #script servers but it hasn't improved anything. I am already running the latest version of spine. From the logs I can always see that the netapp-ontapsdk-perf.pl script is running towards the end of the polling cycle, so I know that's what is prolonging the runtime.
Running the netapp-ontapsdk-perf.pl by hand while my poller isn't running usually takes about half a second. Then, while the poller is running it can take as long as 4-5 seconds to run. This leads me to believe it's possibly a system issue.
But why is this data collection script so resource intensive?
I have also noticed that on many of the context menus, the Netapp graphs show up first in the list and not in alphabetical order, but this is probably an entirely different issue.
I initially added over 300 data sources using this template in my development environment. It ran fine for a little while, until I noticed that performance had degraded so badly that my poller was timing out. Before adding these data sources my poller runtime was around 15s, then it was timing out after 56s.
I trimmed this down to only 60 data sources, but I'm still seeing terrible performance. I am using a 1 minute poller so I don't have much flexibility to run a polling interval longer than 45s. I attached several charts to indicate the issues I am seeing.
This is running on a Dell Precision 450 desktop. It only has one CPU and one disk, so take that into account, but please don't completely blame it on the hardware. If I had added an additional 60 SNMP data sources I would have never seen such performance loss.
I've tried tweaking #threads, #processes, #script servers but it hasn't improved anything. I am already running the latest version of spine. From the logs I can always see that the netapp-ontapsdk-perf.pl script is running towards the end of the polling cycle, so I know that's what is prolonging the runtime.
Running the netapp-ontapsdk-perf.pl by hand while my poller isn't running usually takes about half a second. Then, while the poller is running it can take as long as 4-5 seconds to run. This leads me to believe it's possibly a system issue.
But why is this data collection script so resource intensive?
I have also noticed that on many of the context menus, the Netapp graphs show up first in the list and not in alphabetical order, but this is probably an entirely different issue.
- Attachments
-
- cpu usage of cacti host before and after using this template
- cpu.png (31.34 KiB) Viewed 60477 times
-
- load of cacti host before and after using this template
- load.png (41.68 KiB) Viewed 60477 times
-
- number of objects before and after using this template
- objects.png (28.29 KiB) Viewed 60477 times
-
- poller runtime before and after using this template
- runtime.png (24.92 KiB) Viewed 60477 times
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
I also noticed that Interface traffic went through the roof as well. It seems that this script is pulling in LOTS of data and doing some sort of computational work to come up with the figures it needs. I haven't had a chance to look closely at the script to see if there is any room for optimization.
- Attachments
-
- interface traffic before and after using this template
- eth0.png (26.15 KiB) Viewed 60365 times
-
- Posts: 8
- Joined: Wed Aug 10, 2005 10:05 pm
- Location: Montreal, Canada
Hello eschoeller,
Yes, the script netapp-ontapsdk-perf.pl is not optimized! I got a bug with the Manage-OnTap-SDK while I was devellopping the template, the SDK was unable to return a specific value for a specific object (example: query avg_latency for a LUN). So, The actual API that work is to query all LUN for the avg_latency and than grab the selected one. This mean that if there is 300 LUN into your Filer, the API will return 300 value to the netapp-ontapsdk-perf.pl script. Here is the HUGE over head of this template.
There is a very small thread on NetApp forum regarding this issue: http://communities.netapp.com/thread/1405?tstart=0
So because the API "perf-object-get-instances" wasn't working I used the
API "perf-object-get-instances-iter-*" it almost like querying the universe to grab a mosquito.
I hope future release of SDK will fix this issue so it would improve performance...
Regarding the graph sorting, I didn't try something to do the sorting in alphabetic order. the actual sorting is based on the index provide by the API which is by objects creation date. I'm not sure if changing the sorting index in the query-netapp-ontapsdk-*.xml files would fix it or create another issue???
P-L
Yes, the script netapp-ontapsdk-perf.pl is not optimized! I got a bug with the Manage-OnTap-SDK while I was devellopping the template, the SDK was unable to return a specific value for a specific object (example: query avg_latency for a LUN). So, The actual API that work is to query all LUN for the avg_latency and than grab the selected one. This mean that if there is 300 LUN into your Filer, the API will return 300 value to the netapp-ontapsdk-perf.pl script. Here is the HUGE over head of this template.
There is a very small thread on NetApp forum regarding this issue: http://communities.netapp.com/thread/1405?tstart=0
So because the API "perf-object-get-instances" wasn't working I used the
API "perf-object-get-instances-iter-*" it almost like querying the universe to grab a mosquito.
I hope future release of SDK will fix this issue so it would improve performance...
Regarding the graph sorting, I didn't try something to do the sorting in alphabetic order. the actual sorting is based on the index provide by the API which is by objects creation date. I'm not sure if changing the sorting index in the query-netapp-ontapsdk-*.xml files would fix it or create another issue???
P-L
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
I read the short post mentioned above. I have another member on our team looking into optimizing the code. In the meantime we upgraded to a Dell 2950 quad core 3ghz xeon with 8 GB of ram, 4 column RAID 10 disk.
Here are the performance metrics of the cacti server before and after the upgrade in case anyone is interested.
But, Long story short, these templates will work OK with a fast enough system, Despite the fact that there is a lot of room for performance improvements. I still have around 750 Data sources and 550 RRDs.
Here are the performance metrics of the cacti server before and after the upgrade in case anyone is interested.
But, Long story short, these templates will work OK with a fast enough system, Despite the fact that there is a lot of room for performance improvements. I still have around 750 Data sources and 550 RRDs.
- Attachments
-
- CPU usage before and after upgrade
- cpu.png (32.52 KiB) Viewed 60045 times
-
- Load before and after upgrade
- load.png (33.14 KiB) Viewed 60045 times
-
- poller runtime before and after upgrade.
- runtime.png (22.68 KiB) Viewed 60045 times
Not discovering objects
Hi Gurus
I was able to get the ontap sdk and import the template. The first issue I faced was with perl where it complains about "\N" and I had to give the entire path with double-backslashes
use lib "C:\\manage-ontap-sdk-1.6\\lib\\perl\\NetApp"
Now, when i discover the filer, it cannot find any objects with the below message.
This data query returned 0 rows, perhaps there was a problem executing this data query. You can run this data query in debug mode to get more information.
Upon running in verbose mode, below is the output
+ Running data query [17].
+ Found type = '4 '[script query].
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system index'
+ Executing script query 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system query index'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
If i run the script manually, it works just fine returning the information that is expected to see..the lun info etc.
Could you please point me where I am doing wrong?
thanks in advance
KK
I was able to get the ontap sdk and import the template. The first issue I faced was with perl where it complains about "\N" and I had to give the entire path with double-backslashes
use lib "C:\\manage-ontap-sdk-1.6\\lib\\perl\\NetApp"
Now, when i discover the filer, it cannot find any objects with the below message.
This data query returned 0 rows, perhaps there was a problem executing this data query. You can run this data query in debug mode to get more information.
Upon running in verbose mode, below is the output
+ Running data query [17].
+ Found type = '4 '[script query].
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system index'
+ Executing script query 'perl C:\Inetpub\wwwroot\cacti\scripts\netapp-ontapsdk-perf.pl nfiler2.rws.ad.ea.com "xxxx" "xxxx" system query index'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at 'C:/Inetpub/wwwroot/cacti/resource/script_queries/query-netapp-ontapsdk-system.xml'
If i run the script manually, it works just fine returning the information that is expected to see..the lun info etc.
Could you please point me where I am doing wrong?
thanks in advance
KK
Last edited by kkoduru on Thu Sep 18, 2008 3:58 pm, edited 2 times in total.
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
This is what mine looks like:
I have 4 lines of data query XML, you only have 3. Another thing, since you're using windows, you may have to specify the full path to your perl binary.
Hope this helps!
Code: Select all
+ Running data query [14].
+ Found type = '4 '[script query].
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ XML file parsed ok.
+ Executing script for list of indexes 'perl /usr/local/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 255.255.255.123 "USER" "PASSWORD" system index'
+ Executing script query 'perl /usr/local/cacti-0.8.7b/scripts/netapp-ontapsdk-perf.pl 255.255.255.123 "USER" "PASSWORD" system query index'
+ Found item [index='system'] index: system
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
+ Found data query XML file at '/usr/local/cacti-0.8.7b/resource/script_queries/query-netapp-ontapsdk-system.xml'
Hope this helps!
Who is online
Users browsing this forum: No registered users and 0 guests