I've fixed up the perl script to work with clusters. I'm also making some changes to the graphs. I don't know how much effort I'm going to invest in it though. When I'm done, I'll post the results.
@um3n:
Here is some documentation coming back directly from the perf-object-counter-list-info API on the filer:
I understand what gheppner is doing, but I had to give it some thought. It's all about how you choose to represent the data. Neither approach is technically incorrect, you just need to understand what you're looking at.'content' => 'Average latency in microseconds for the WAFL filesystem to process read request to the volume; not including request processing or network communication time'
'name' => 'properties' 'content' => 'average',
'name' => 'unit' 'content' => 'microsec',
It's somewhat complicated to understand, but the value coming back from avg_latency, read_latency and write_latency is a COUNTER of the number of microseconds of latency that has occurred since some arbitrary point in the past (system reboot perhaps, or counter roll-over). This is quite typical of how most storage systems report latency. To see this in real-time run a command like this:
Code: Select all
watch -n 1 ./netapp-ontapsdk-perf.pl FILER USER PASSWORD volume get write_latency VOLUME
What gheppner did was simply modify this to produce a "microseconds of latency per second per operation" value and graph it. This may be more indicative of what the filer reports through CLI commands ... but I wouldn't know ... I don't have access to the filer's CLI.
In your particular case what's really screwing with the graph is the other_latency. It's certainly larger than the read/write. I haven't investigated what other_latency really is. I'm working with a brand-new storage system that has no production traffic on it, so it's also difficult to compare to what I'm capturing now. Looking back into ancient history at our older 3040C data I would typically have other_latency: 300 write_latency: 170 read_latency: 50 , while the system was in production. These were functioning as storage back-ends for a large mail system. It's entirely possible that you indeed have an application / storage system with a large amount of latency. But overall, you're still going to see trends in latency - it will go up, and go down, and when it goes up you should take notice