Host Templates for HP LeftHand Storage System (P4500, DL320)
Moderators: Developers, Moderators
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Host Templates for HP LeftHand Storage System (P4500, DL320)
I have completed an initial version for two host templates. One for HP LeftHand Systems and one for HP LeftHand Clusters. Please see the following URL for the template homepage and more information:
http://docs.cacti.net/usertemplate:host:hp:lefthand
This is my first attempt at using the new template repository. Please keep all discussions and troubleshooting questions related to this template here, in this thread.
Thanks!
http://docs.cacti.net/usertemplate:host:hp:lefthand
This is my first attempt at using the new template repository. Please keep all discussions and troubleshooting questions related to this template here, in this thread.
Thanks!
Just deployed this for my P4300 solution and it works just as advertized. Very nice to finally have a good view of the system loads on the SAN!
However, it seems I do have a bit of a problem with the CPU and FAN graphs; the scale is not in any temperature format I know of and they seem suspiciously stable...
See below.
Otto
However, it seems I do have a bit of a problem with the CPU and FAN graphs; the scale is not in any temperature format I know of and they seem suspiciously stable...
See below.
Otto
- Attachments
-
- Capture1.JPG (52.81 KiB) Viewed 11769 times
-
- Capture2.JPG (50.97 KiB) Viewed 11769 times
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
YUP! Tell me about it!!
Not sure if you read through the 'Information' section on the docs page, but I mentioned that many of the graphs will be completely broken because the data HP is presenting via SNMP is completely bogus in some cases.
Originally, they were using the old Compaq CPQ HEALTH stuff which worked about 50% of the time. There was some agent running on those boxes that would die eventually and the temperature data would drop off. Since these are appliances, there was no way to restart that agent - to fix the temperature readings the whole darn appliance had to be restarted.
Now, with the newer P4300 and P4500 units, they've stopped using the CPQ objects altogether, and they've added in some stats directly to the LeftHand object tree. It's bizarre, on my older DL320s units, both the CPQ Health and the new LeftHand object report CPU temperatures correctly ... but the new P4500 units have no joy for CPU Temp and FAN speed.
What's interesting is that none of this changes with SAN iQ updates. The DL320s units continue to use CPQ, and the new P4300/4500 units don't, so there is some logic in their startup scripts that tell the appliance what software to run based on the underlying hardware being used. I can only imagine how much of a mess that is.
I gave up waiting for HP to answer my support case. When they asked if I had the most up to date MIBs, I realized that I was probably going to wait a long time for a real answer I needed to move on to other things, so I did my best to wrap the template set up and get it out the door, despite the fact that a good chunk of it was broken. I should probably specifically list all the graphs that don't work on the docs page just so folks don't fret over it.
Not sure if you read through the 'Information' section on the docs page, but I mentioned that many of the graphs will be completely broken because the data HP is presenting via SNMP is completely bogus in some cases.
Originally, they were using the old Compaq CPQ HEALTH stuff which worked about 50% of the time. There was some agent running on those boxes that would die eventually and the temperature data would drop off. Since these are appliances, there was no way to restart that agent - to fix the temperature readings the whole darn appliance had to be restarted.
Now, with the newer P4300 and P4500 units, they've stopped using the CPQ objects altogether, and they've added in some stats directly to the LeftHand object tree. It's bizarre, on my older DL320s units, both the CPQ Health and the new LeftHand object report CPU temperatures correctly ... but the new P4500 units have no joy for CPU Temp and FAN speed.
What's interesting is that none of this changes with SAN iQ updates. The DL320s units continue to use CPQ, and the new P4300/4500 units don't, so there is some logic in their startup scripts that tell the appliance what software to run based on the underlying hardware being used. I can only imagine how much of a mess that is.
I gave up waiting for HP to answer my support case. When they asked if I had the most up to date MIBs, I realized that I was probably going to wait a long time for a real answer I needed to move on to other things, so I did my best to wrap the template set up and get it out the door, despite the fact that a good chunk of it was broken. I should probably specifically list all the graphs that don't work on the docs page just so folks don't fret over it.
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Yes, I did read that section but I didn't make the connection with the fan and cpu graphs. Sorry about that.
For my implementation, the Cluster graphs seem to match what I see in the CMC, except for the Cache Hits which seems to be a different value (% in CMC, rather than Operations).
I can't tell about the Node graphs though; the RAID graphs seems pretty out of whack as you say, but they are plotting _something_ so I think they might be useful in a historical sense to keep track of increased load over time.
In any case, great job so far!
This is a significant improivement over CMC even with the current limitations.
For my implementation, the Cluster graphs seem to match what I see in the CMC, except for the Cache Hits which seems to be a different value (% in CMC, rather than Operations).
I can't tell about the Node graphs though; the RAID graphs seems pretty out of whack as you say, but they are plotting _something_ so I think they might be useful in a historical sense to keep track of increased load over time.
In any case, great job so far!
This is a significant improivement over CMC even with the current limitations.
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Yes, the cache hits graph is in # of operations, whereas the CMC reports it as a % ... namely (CACHE_HITS/(READ_IOPS + WRITE_IOPS))*100.
I thought about re-creating this but it gets tricky because there is no TOTAL_IOPS metric, so I'd have to write a CDEF to accomplish this, and a data template would end up being re-used, which would create another data source with the same information as the RAID plot. In short, it would add some overhead to the templates. I may consider doing this.
We have a whole team of people working on our LeftHand infrastructure right now because we've been experiencing some significant problems. I walked in today and a bunch of folks were buzzing about new patches that were released several days ago, but HP hadn't told us about them yet.
http://bizsupport1.austin.hp.com/bizsup ... ntver=true
Basically every problem I outlined within several support cases have been directly addressed by these recent patches (8/24 & 8/25). That makes me very happy We haven't applied these patches yet, and it's not up to me to decide when to do that. We are in a change-freeze period.
I'm not sure if you're able to easily apply patches or not, but give it a shot when you can and let me know if things start working correctly. One of the patches re-enables the HP insight manager CPQ objects. If that is truly the case, I'll most certainly be releasing a new version of the template that adds a LOT more statistics ... all the way down to disk level.
Stay tuned ...
I thought about re-creating this but it gets tricky because there is no TOTAL_IOPS metric, so I'd have to write a CDEF to accomplish this, and a data template would end up being re-used, which would create another data source with the same information as the RAID plot. In short, it would add some overhead to the templates. I may consider doing this.
We have a whole team of people working on our LeftHand infrastructure right now because we've been experiencing some significant problems. I walked in today and a bunch of folks were buzzing about new patches that were released several days ago, but HP hadn't told us about them yet.
http://bizsupport1.austin.hp.com/bizsup ... ntver=true
Basically every problem I outlined within several support cases have been directly addressed by these recent patches (8/24 & 8/25). That makes me very happy We haven't applied these patches yet, and it's not up to me to decide when to do that. We are in a change-freeze period.
I'm not sure if you're able to easily apply patches or not, but give it a shot when you can and let me know if things start working correctly. One of the patches re-enables the HP insight manager CPQ objects. If that is truly the case, I'll most certainly be releasing a new version of the template that adds a LOT more statistics ... all the way down to disk level.
Stay tuned ...
-
- Posts: 2
- Joined: Thu Nov 18, 2010 7:38 am
Re: Host Templates for HP LeftHand Storage System (P4500, DL
@eschoeller
great job! i just registred to say THANKS for this work!
i would be more than happy&interested if you implement a per-initiator (IOps/sec) option to reflect all the view capabilities from CMC. and let us know how it goes with the support call.
peter
great job! i just registred to say THANKS for this work!
i would be more than happy&interested if you implement a per-initiator (IOps/sec) option to reflect all the view capabilities from CMC. and let us know how it goes with the support call.
peter
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Thanks Peter!
I'll certainly work on per-initiator stats once we get our LH nodes patched and I get the next release posted. The additional metrics will mostly leverage the objects in the CPQ OID tree which look specifically at the RAID controller in each of these devices. It provides *very* detailed information, all the way down to every SMART attribute for every disk ... which is very useful for predictive fault analysis. I've had this working on our DL320s units for some time now and they work great, but didn't release them because the CPQ objects weren't available on P4500 units. The patches are supposed to address that ...
As always, stay tuned!
I'll certainly work on per-initiator stats once we get our LH nodes patched and I get the next release posted. The additional metrics will mostly leverage the objects in the CPQ OID tree which look specifically at the RAID controller in each of these devices. It provides *very* detailed information, all the way down to every SMART attribute for every disk ... which is very useful for predictive fault analysis. I've had this working on our DL320s units for some time now and they work great, but didn't release them because the CPQ objects weren't available on P4500 units. The patches are supposed to address that ...
As always, stay tuned!
-
- Posts: 5
- Joined: Fri Oct 15, 2010 3:53 am
Re: Host Templates for HP LeftHand Storage System (P4500, DL
great template, one thing i have noticed since upgrading the firmware on the nodes to v 9.0.00.3561.0 it fails to report the cpu temp/fan speeds
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
bummer, sorry that happened.
What are the contents of the .1.3.6.1.4.1.9804.3.1.1.2.1.15 object? An example query below ...
snmpwalk -m ALL -t 90 -v2c -c 'COMMUNITY' IPADDRESS .1.3.6.1.4.1.9804.3.1.1.2.1.15
What are the contents of the .1.3.6.1.4.1.9804.3.1.1.2.1.15 object? An example query below ...
snmpwalk -m ALL -t 90 -v2c -c 'COMMUNITY' IPADDRESS .1.3.6.1.4.1.9804.3.1.1.2.1.15
Re: Host Templates for HP LeftHand Storage System (P4500, DL
That object is now marked obsolete in the MIB.
It refers to the object infoFanTable (.1.3.6.1.4.1.9804.3.1.1.2.1.111), but the fanspeed attribute of that table always returns 0.
Fan status seems to work correctly, though (.1.3.6.1.4.1.9804.3.1.1.2.1.111.1.91) - so you can at least trigger alerts when a fan fails.
We (LogicMonitor.com) had to modify our LeftHand SAN monitoring templates for this and a few other things that changed with the new release.
It refers to the object infoFanTable (.1.3.6.1.4.1.9804.3.1.1.2.1.111), but the fanspeed attribute of that table always returns 0.
Fan status seems to work correctly, though (.1.3.6.1.4.1.9804.3.1.1.2.1.111.1.91) - so you can at least trigger alerts when a fan fails.
We (LogicMonitor.com) had to modify our LeftHand SAN monitoring templates for this and a few other things that changed with the new release.
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Hi guys,
I am new to Cacti and have a question about this template. I was able to successfully import the cacti_host_template_hp_lefthand_system.xml file. I also was able to create devices using this template.
How do I import (or upload) the lefthand-*.xml (clusters, raid, temp, volumes) into Cacti? What are these used for?
Thanks,
-John
I am new to Cacti and have a question about this template. I was able to successfully import the cacti_host_template_hp_lefthand_system.xml file. I also was able to create devices using this template.
How do I import (or upload) the lefthand-*.xml (clusters, raid, temp, volumes) into Cacti? What are these used for?
Thanks,
-John
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
The data queries rely on those files in order to function. They need to be placed in the resource/snmp_queries sub directory of your cacti installation. So for example, it could be /usr/local/cacti/resource/snmp_queries . It all depends on where you installed Cacti. And, you'd probably just use some file transfer mechanism to get the files to your Cacti server like scp.
-
- Cacti User
- Posts: 234
- Joined: Mon Dec 13, 2004 3:03 pm
Re: Host Templates for HP LeftHand Storage System (P4500, DL
@sfrancis
So, what's your deal - you just troll the open source Cacti templates and then roll those into some commercial product?
So, what's your deal - you just troll the open source Cacti templates and then roll those into some commercial product?
Re: Host Templates for HP LeftHand Storage System (P4500, DL
Hi!
First off: thanks for making these templates! They've already helped us a lot to troubleshoot some outstanding issues!
I'm having one problem though: I think my volume throughput data is incorrect.
The attached graph shows a write througput of about 118 K (I assume per second) (I disabled the stacking on the write throughput, because that didn't really make sense to me.)
Now, I believe the volume is reading and writing a lot more than 100 KB/sec.
If I use snmpget to retrieve the counters at 10 second intervals, calculate the difference and then divide by 10 seconds, I get something like this:
iostat on the host itself also shows a high amount of MB being read/written:
So I think there is something going wrong with the data query?
The values in the rrd are consistent with those in the graph. :/
Any ideas?
First off: thanks for making these templates! They've already helped us a lot to troubleshoot some outstanding issues!
I'm having one problem though: I think my volume throughput data is incorrect.
The attached graph shows a write througput of about 118 K (I assume per second) (I disabled the stacking on the write throughput, because that didn't really make sense to me.)
Now, I believe the volume is reading and writing a lot more than 100 KB/sec.
If I use snmpget to retrieve the counters at 10 second intervals, calculate the difference and then divide by 10 seconds, I get something like this:
Code: Select all
Old: 25453781 MB , current: 25454170 MB, Diff: 389 MB
389 MB (38 MB/s)
Old: 25454170 MB , current: 25454177 MB, Diff: 7 MB
7 MB (0 MB/s)
Old: 25454177 MB , current: 25454269 MB, Diff: 92 MB
92 MB (9 MB/s)
Old: 25454269 MB , current: 25454500 MB, Diff: 231 MB
231 MB (23 MB/s)
Old: 25454500 MB , current: 25454861 MB, Diff: 361 MB
361 MB (36 MB/s
Code: Select all
[root@logging-prod ~]# iostat -m vdb 10
Linux 2.6.18-194.8.1.el5 (logging-prod.server.eu) 04/13/2011
avg-cpu: %user %nice %system %iowait %steal %idle
16.51 0.02 20.72 4.40 0.00 58.36
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
vdb 139.41 4.40 6.63 16911716 25473977
avg-cpu: %user %nice %system %iowait %steal %idle
10.82 0.05 17.32 41.99 0.00 29.82
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
vdb 653.10 11.13 52.96 111 529
avg-cpu: %user %nice %system %iowait %steal %idle
4.58 0.03 8.33 6.00 0.00 81.07
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
vdb 147.60 3.62 3.80 36 37
avg-cpu: %user %nice %system %iowait %steal %idle
4.82 0.02 6.12 17.80 0.00 71.23
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
vdb 470.43 23.91 8.08 239 80
avg-cpu: %user %nice %system %iowait %steal %idle
7.73 0.00 10.85 49.69 0.00 31.73
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
vdb 670.20 36.50 9.18 364 91
The values in the rrd are consistent with those in the graph. :/
Any ideas?
Who is online
Users browsing this forum: No registered users and 1 guest