Host Templates for HP LeftHand Storage System (P4500, DL320)
Moderators: Developers, Moderators
Re: Host Templates for HP LeftHand Storage System (P4500, DL
io latency doesn't work in PRTG either. The newest version of PRTG is our preferred product, but the io values are unreconcilable and the trends don't line up.
Did you followup with HP or find a solution?
Best I can figure is they're not the same measurements which seems very strange...
From the MIB.. "The total time spent waiting for read operations to complete in the cluster." &
CMC Says “Average time, in milliseconds, to service read/write requests”
Did you followup with HP or find a solution?
Best I can figure is they're not the same measurements which seems very strange...
From the MIB.. "The total time spent waiting for read operations to complete in the cluster." &
CMC Says “Average time, in milliseconds, to service read/write requests”
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
I am not familiar with the PRTG product, but gosh I hope you can create templates or at least copy configs from one host to another. It seems like setting up Sensors in that software could take years of repetitive work (way worse than developing a Cacti template!)
I/O Latency is probably the absolute worst metric to try and graph on these units ... but probably one of the most useful metrics to trend. Go figure My graphs are working, but I have considerable amounts of gaps in them. This is by choice, because I don't want to have the absurd spikes that come with monitoring these latency objects. I've had to go back and tweak individual data sources using 'rrdtool tune' after the fact numerous times to get the results I'm looking for - and this depends on the cluster as well (we have 4). Here are our maximums over the past week:
cluster 1: 2.79k / 2.41k
cluster 2: 5.34k / 4.84k
cluster 3: 2.56k / 1.89k
cluster 4: 1.30k / 1.38k
And do not be surprised by the difference in the CMC and what the MIB explains. The SNMP object is a COUNTER type, which is actually far better than what you might expect to find ... a GAUGE. Clearly, with a COUNTER (like interface stats) you will never miss a spike in latency, no matter what your polling interval is, you're guaranteed to capture what is actually happening for the object and not 'miss' a spike or dip. The CMC is most likely using the same information, it just obfuscates the technical details behind the calculation. Given their explanation I'd want to know how that "average time in milliseconds" is calculated ... since the 'node' came on-line? Over the past 5 minutes? Since the cluster was built? Furthermore, how is this latency computed across multiple nodes in the entire cluster? Is this an average of an average collected from every node? Maybe the SNMP COUNTER needs to be divided by the number of nodes in the cluster, perhaps each node tacks its own latency data onto the COUNTER?
I am sure this all makes sense to the LeftHand engineers who designed it, but I am left asking far too many questions. I ended up moving on, realising that despite how absurd some of the values may seem, the most important part still works ... seeing a trend. I know it is far off from the CMC, but I secretly think it was in their best interests to make the latency look good in their own software
And perhaps I am missing something else entirely, we are still on SanIQ 9.0
I/O Latency is probably the absolute worst metric to try and graph on these units ... but probably one of the most useful metrics to trend. Go figure My graphs are working, but I have considerable amounts of gaps in them. This is by choice, because I don't want to have the absurd spikes that come with monitoring these latency objects. I've had to go back and tweak individual data sources using 'rrdtool tune' after the fact numerous times to get the results I'm looking for - and this depends on the cluster as well (we have 4). Here are our maximums over the past week:
cluster 1: 2.79k / 2.41k
cluster 2: 5.34k / 4.84k
cluster 3: 2.56k / 1.89k
cluster 4: 1.30k / 1.38k
And do not be surprised by the difference in the CMC and what the MIB explains. The SNMP object is a COUNTER type, which is actually far better than what you might expect to find ... a GAUGE. Clearly, with a COUNTER (like interface stats) you will never miss a spike in latency, no matter what your polling interval is, you're guaranteed to capture what is actually happening for the object and not 'miss' a spike or dip. The CMC is most likely using the same information, it just obfuscates the technical details behind the calculation. Given their explanation I'd want to know how that "average time in milliseconds" is calculated ... since the 'node' came on-line? Over the past 5 minutes? Since the cluster was built? Furthermore, how is this latency computed across multiple nodes in the entire cluster? Is this an average of an average collected from every node? Maybe the SNMP COUNTER needs to be divided by the number of nodes in the cluster, perhaps each node tacks its own latency data onto the COUNTER?
I am sure this all makes sense to the LeftHand engineers who designed it, but I am left asking far too many questions. I ended up moving on, realising that despite how absurd some of the values may seem, the most important part still works ... seeing a trend. I know it is far off from the CMC, but I secretly think it was in their best interests to make the latency look good in their own software
And perhaps I am missing something else entirely, we are still on SanIQ 9.0
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Hi,
we have a P4500 multi site cluster.
I see differences between all graphes for the volume usage and the vSphere vCenter datastore view (please ignore the different name in the images, only the part sanvol13 is important). In cacti the graph for sanvol13 shows 2.15 TB as size and 1.52 TB as used, so 0.63 TB are free. vCenter shows only 0.145 TB free for sanvol13 and 1.95 TB total. All volumes are thick/full provisioned.
We are running SAN/IQ 9.5.00.1215.0.
Any ideas what could be wrong?
we have a P4500 multi site cluster.
I see differences between all graphes for the volume usage and the vSphere vCenter datastore view (please ignore the different name in the images, only the part sanvol13 is important). In cacti the graph for sanvol13 shows 2.15 TB as size and 1.52 TB as used, so 0.63 TB are free. vCenter shows only 0.145 TB free for sanvol13 and 1.95 TB total. All volumes are thick/full provisioned.
We are running SAN/IQ 9.5.00.1215.0.
Any ideas what could be wrong?
- Attachments
-
- vmware-sanvol13.png (5.68 KiB) Viewed 4423 times
-
- cacti-sanvol13.png (122.16 KiB) Viewed 4423 times
Re: Host Templates for HP LeftHand Storage System (P4500, DL
The latency graphs seems also a bit different from the reality. vCenter shows max. 40 ms for the datastore on sanvol03.
Does anyone use the template with SAN/IQ 9.5? Any idea what to change to get the right Volume usage and latency graphs?
Does anyone use the template with SAN/IQ 9.5? Any idea what to change to get the right Volume usage and latency graphs?
- Attachments
-
- cacti-sanvol3.png (110.6 KiB) Viewed 4413 times
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Thanks for the follow-up, eschoeller.
PRTG does make it very easy to create template and duplicate configs to different clusters, nodes, etc. I'm not endorsing one product over another, simply pointing out that the problems exists in the underlying SNMP MIB's and perhaps we don't understand MIB's or HP's incorrectly reporting the data.
We also have several different clusters and my problems occur throughout. To illustrate the problem with read /write atency at the cluster level, I exprted HP CMC data to .csv and did the same with PRTG/SNMP data. Then, I compared the results form a one hour period earlier today using excel. I'm including a couple screenshots to show that the collected data follows a similar trend, but even after adjusting the scale by reducing the SNMP data to 1% of actual values, I still see crazy spikes and abnormalitites in the snmp data when compared to the CMC or ESXtop.
We're still on SanIq 9 but I've tried the 9.5 MIB's and they have the same result.
Do you know why the values disagree? Have you compared them in this manner or similar and found better consistency? This is obviously too inconsistent to alert when the latency values spike which is one of my goals.
Have you tried comparing the values at the LUN or Node level?
Ty.
PRTG does make it very easy to create template and duplicate configs to different clusters, nodes, etc. I'm not endorsing one product over another, simply pointing out that the problems exists in the underlying SNMP MIB's and perhaps we don't understand MIB's or HP's incorrectly reporting the data.
We also have several different clusters and my problems occur throughout. To illustrate the problem with read /write atency at the cluster level, I exprted HP CMC data to .csv and did the same with PRTG/SNMP data. Then, I compared the results form a one hour period earlier today using excel. I'm including a couple screenshots to show that the collected data follows a similar trend, but even after adjusting the scale by reducing the SNMP data to 1% of actual values, I still see crazy spikes and abnormalitites in the snmp data when compared to the CMC or ESXtop.
We're still on SanIq 9 but I've tried the 9.5 MIB's and they have the same result.
Do you know why the values disagree? Have you compared them in this manner or similar and found better consistency? This is obviously too inconsistent to alert when the latency values spike which is one of my goals.
Have you tried comparing the values at the LUN or Node level?
Ty.
- Attachments
-
- IO Latency Read.PNG (59.76 KiB) Viewed 4406 times
-
- IO Latency Write.PNG (49.28 KiB) Viewed 4406 times
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Hi we are trying to get this to work but we aren't getting IOPS graphs:
Other graph's are working.
Can you help us?
Thanks in advance and for the great work!
Other graph's are working.
Can you help us?
Thanks in advance and for the great work!
- Attachments
-
- 16-3-2012 14-16-25.png (42.73 KiB) Viewed 4394 times
Re: Host Templates for HP LeftHand Storage System (P4500, DL
I had to change some of the max values for the datasources because I had gaps in my graphs (rrdtool tune....). Maybe your values are also too high and you need to tune the rrd file. Anything in the logs?
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Hi we've changed the SNMP version to 2 and it started working.
Maybe good to add to the documentation.
Maybe good to add to the documentation.
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Hi pirx or anyone else,
We have the same differences you described.
Do you have any news about that?
thanks and best regards,
cokoNux
We have the same differences you described.
Do you have any news about that?
thanks and best regards,
cokoNux
-
- Cacti User
- Posts: 141
- Joined: Thu Apr 10, 2008 6:52 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
I know this is an old thread, but I'm hoping people might notice my post anyways.
Does anyone know if HP changed the SNMP counter calculations in Lefthand OS 11.0 ?? I am just wondering, because my cluster latency graph is way different on my VSA 11.0 when compared to my P4500 G2 on OS ver 10.5. I'm having trouble telling if the counters changed between OS versions of if the change is due to VSA vs. hardware.
Does anyone know if HP changed the SNMP counter calculations in Lefthand OS 11.0 ?? I am just wondering, because my cluster latency graph is way different on my VSA 11.0 when compared to my P4500 G2 on OS ver 10.5. I'm having trouble telling if the counters changed between OS versions of if the change is due to VSA vs. hardware.
-
- Cacti User
- Posts: 141
- Joined: Thu Apr 10, 2008 6:52 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Also, does anyone know where inside cacti the raw poller data for latency [3683653699] is converted to something like 1321. I can't find where it is converted.
-
- Posts: 24
- Joined: Wed Jan 07, 2009 3:00 am
- Location: Bristol, UK
Re: Host Templates for HP LeftHand Storage System (P4500, DL
I believe I have found the problem with the latency counters.
You need to devide the latency by the number of IOPS to get the avg latency per polling cycle.
The counter for latency is adding each IO's latency so you need to divide it back down to get a proper avg. This makes sense as to why the latency can go through the roof at times when iops are high.
I am in the process of testing with my data and will try and post a new graph template if I can get it working. It does mean that the latency graphs will need to also poll IO at the same level (cluster or raid)
You need to devide the latency by the number of IOPS to get the avg latency per polling cycle.
The counter for latency is adding each IO's latency so you need to divide it back down to get a proper avg. This makes sense as to why the latency can go through the roof at times when iops are high.
I am in the process of testing with my data and will try and post a new graph template if I can get it working. It does mean that the latency graphs will need to also poll IO at the same level (cluster or raid)
Who is online
Users browsing this forum: No registered users and 5 guests