Problem with cluster disk partitions
Moderators: Developers, Moderators
-
- Cacti User
- Posts: 111
- Joined: Fri Sep 28, 2012 6:52 pm
Problem with cluster disk partitions
Hello,
We have a SQL Server Cluster that consists of 3 physical nodes, and then several virtual instances distributed across this environment.
Problem is, for some reason I haven't figured out yet, all the graphs created under the "Used Space" Graph Template currently show the behavior displayed in the screenshots, wether the host I'm graphing is a virtual instance or a physical node.
I'm using SNMP v2.
Does anyone have an idea what might be causing this?
We have a SQL Server Cluster that consists of 3 physical nodes, and then several virtual instances distributed across this environment.
Problem is, for some reason I haven't figured out yet, all the graphs created under the "Used Space" Graph Template currently show the behavior displayed in the screenshots, wether the host I'm graphing is a virtual instance or a physical node.
I'm using SNMP v2.
Does anyone have an idea what might be causing this?
- Attachments
-
- used space problem.png (18.57 KiB) Viewed 2330 times
-
- used space problem.png (19.49 KiB) Viewed 2330 times
- phalek
- Developer
- Posts: 2838
- Joined: Thu Jan 31, 2008 6:39 am
- Location: Kressbronn, Germany
- Contact:
Re: Problem with cluster disk partitions
Out of curiosity, what is the size of the disks ?
Maybe one of these may help you:
http://docs.cacti.net/usertemplate:data ... disk_usage
or this one:
http://docs.cacti.net/usertemplate:data ... disk_usage
Maybe one of these may help you:
http://docs.cacti.net/usertemplate:data ... disk_usage
or this one:
http://docs.cacti.net/usertemplate:data ... disk_usage
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Re: Problem with cluster disk partitions
Graphs look valid to me (neither used or total counters missing). Thus the question should be asked what is your SQL db doing with those partitions for such large data usage swings? Backups?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
-
- Cacti User
- Posts: 111
- Joined: Fri Sep 28, 2012 6:52 pm
Re: Problem with cluster disk partitions
Sorry for the long delay.
I've applied the templates from phalek's 1st link. Thanks for that.
So far, the graphs are behaving well, both on old template and new, so I've spent some time investigating and I seem to have discovered something interesting.
This issue only seems to occur when instances fail over from one node to another. In the huge majority of times, that occurs in a planned manner, for example such as Windows Updates in which each physical node is restarted at a time and the instances are switched around nodes during that time. Others examples do include other forms of planned or unplanned downtime.
In all those cases of failover, this issue seems to appear. However, when there's no failover, the graphs appear to be fine. I've yet to determine if this issue is related to something like a node-instance preference. For example: instance A's graphs are only displayed correctly when it's being hosted on node B, and so on.
And BSOD, I've monitored the usage rates and the graphs actually are wrong. The second graph, for example has 440GB and when it (wrongly) displays a smaller total number, the usage also decreases. It's a standard production database, there's no data usage swings like that. The graphs are wrong.
We also have a few active/active application clusters and that problem doesnt happen, so I'm guessing the instances find it strange when they're shipped to another node and can't figure out how to map their disk volumes, and thus my problem.
Sorry for the long post. Ideas, anyone?
I've applied the templates from phalek's 1st link. Thanks for that.
So far, the graphs are behaving well, both on old template and new, so I've spent some time investigating and I seem to have discovered something interesting.
This issue only seems to occur when instances fail over from one node to another. In the huge majority of times, that occurs in a planned manner, for example such as Windows Updates in which each physical node is restarted at a time and the instances are switched around nodes during that time. Others examples do include other forms of planned or unplanned downtime.
In all those cases of failover, this issue seems to appear. However, when there's no failover, the graphs appear to be fine. I've yet to determine if this issue is related to something like a node-instance preference. For example: instance A's graphs are only displayed correctly when it's being hosted on node B, and so on.
And BSOD, I've monitored the usage rates and the graphs actually are wrong. The second graph, for example has 440GB and when it (wrongly) displays a smaller total number, the usage also decreases. It's a standard production database, there's no data usage swings like that. The graphs are wrong.
We also have a few active/active application clusters and that problem doesnt happen, so I'm guessing the instances find it strange when they're shipped to another node and can't figure out how to map their disk volumes, and thus my problem.
Sorry for the long post. Ideas, anyone?
- phalek
- Developer
- Posts: 2838
- Joined: Thu Jan 31, 2008 6:39 am
- Location: Kressbronn, Germany
- Contact:
Re: Problem with cluster disk partitions
That sounds like an old issue with how SNMP may represent the disks.
Basically the disk have indexes, e.g.
But updates or restarts "may" change this order to e.g.
Cacti only matches the index number, nothing else as it's unaware of the changes that happened on the system.
This should probably occur only to virtual disks e.g. iSCSI as physically attached ones usually keep their order.
Now to fix this, you will have to figure out something else to use as an index. I did this sometimes by creating a script and creating my own index.
Basically the disk have indexes, e.g.
Code: Select all
Disk 1 = index.0
Disk 2 = index.1
Disk 3 = index.2
Code: Select all
Disk 1 = index.0
Disk 3 = index.1
Disk 2 = index.2
This should probably occur only to virtual disks e.g. iSCSI as physically attached ones usually keep their order.
Now to fix this, you will have to figure out something else to use as an index. I did this sometimes by creating a script and creating my own index.
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
Re: Problem with cluster disk partitions
Then the re-indexing method should be changed from Uptime to either of the two other options so the new indexes are picked up.phalek wrote:Cacti only matches the index number, nothing else as it's unaware of the changes that happened on the system..
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
-
- Cacti User
- Posts: 111
- Joined: Fri Sep 28, 2012 6:52 pm
Re: Problem with cluster disk partitions
@phalek
Uhm, that makes sense.
I've never tackled scripting related to disk indexes. Do you still have some of those scripts you made? Would you be willing to share them, part of their logic or the resoures I must go after, or at least point me in the right direction?
@BSOD
I've wondered about that, but I wasn't sure yet, so I haven't thoroughly tested that option. I'm guessing "Verify All Fields" would be most suitable in this case, right?
Aside from removing, re-adding and reloading the query, is it necessary to perform any other action? i.e. deleting .rrd files, etc?
Thanks for the input
Uhm, that makes sense.
I've never tackled scripting related to disk indexes. Do you still have some of those scripts you made? Would you be willing to share them, part of their logic or the resoures I must go after, or at least point me in the right direction?
@BSOD
I've wondered about that, but I wasn't sure yet, so I haven't thoroughly tested that option. I'm guessing "Verify All Fields" would be most suitable in this case, right?
Aside from removing, re-adding and reloading the query, is it necessary to perform any other action? i.e. deleting .rrd files, etc?
Thanks for the input
Re: Problem with cluster disk partitions
Yea, sounds like [Verify All Fields] is the best option for this device. remove the query and readd it back with the different reindex method, no other action should be required except possibly a poller cache clear.victorantunes wrote: I'm guessing "Verify All Fields" would be most suitable in this case, right?
Aside from removing, re-adding and reloading the query, is it necessary to perform any other action? i.e. deleting .rrd files, etc?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
-
- Cacti User
- Posts: 111
- Joined: Fri Sep 28, 2012 6:52 pm
Re: Problem with cluster disk partitions
I've changed the re-index method. I'm guessing we'll perform a failover within the next couple days and see how it works.
Will post updates on this as they occur.
Thanks a ton, guys.
Will post updates on this as they occur.
Thanks a ton, guys.
Who is online
Users browsing this forum: No registered users and 0 guests