Problem with cluster disk partitions

Post support questions that relate to the Windows 2003/2000/XP operating systems.

Moderators: Developers, Moderators

Post Reply
victorantunes
Cacti User
Posts: 111
Joined: Fri Sep 28, 2012 6:52 pm

Problem with cluster disk partitions

Post by victorantunes »

Hello,

We have a SQL Server Cluster that consists of 3 physical nodes, and then several virtual instances distributed across this environment.


Problem is, for some reason I haven't figured out yet, all the graphs created under the "Used Space" Graph Template currently show the behavior displayed in the screenshots, wether the host I'm graphing is a virtual instance or a physical node.

I'm using SNMP v2.

Does anyone have an idea what might be causing this?
Attachments
used space problem.png
used space problem.png (18.57 KiB) Viewed 2330 times
used space problem.png
used space problem.png (19.49 KiB) Viewed 2330 times
User avatar
phalek
Developer
Posts: 2838
Joined: Thu Jan 31, 2008 6:39 am
Location: Kressbronn, Germany
Contact:

Re: Problem with cluster disk partitions

Post by phalek »

Out of curiosity, what is the size of the disks ?

Maybe one of these may help you:

http://docs.cacti.net/usertemplate:data ... disk_usage

or this one:

http://docs.cacti.net/usertemplate:data ... disk_usage
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Problem with cluster disk partitions

Post by BSOD2600 »

Graphs look valid to me (neither used or total counters missing). Thus the question should be asked what is your SQL db doing with those partitions for such large data usage swings? Backups?
victorantunes
Cacti User
Posts: 111
Joined: Fri Sep 28, 2012 6:52 pm

Re: Problem with cluster disk partitions

Post by victorantunes »

Sorry for the long delay.

I've applied the templates from phalek's 1st link. Thanks for that.

So far, the graphs are behaving well, both on old template and new, so I've spent some time investigating and I seem to have discovered something interesting.

This issue only seems to occur when instances fail over from one node to another. In the huge majority of times, that occurs in a planned manner, for example such as Windows Updates in which each physical node is restarted at a time and the instances are switched around nodes during that time. Others examples do include other forms of planned or unplanned downtime.

In all those cases of failover, this issue seems to appear. However, when there's no failover, the graphs appear to be fine. I've yet to determine if this issue is related to something like a node-instance preference. For example: instance A's graphs are only displayed correctly when it's being hosted on node B, and so on.

And BSOD, I've monitored the usage rates and the graphs actually are wrong. The second graph, for example has 440GB and when it (wrongly) displays a smaller total number, the usage also decreases. It's a standard production database, there's no data usage swings like that. The graphs are wrong.

We also have a few active/active application clusters and that problem doesnt happen, so I'm guessing the instances find it strange when they're shipped to another node and can't figure out how to map their disk volumes, and thus my problem.

Sorry for the long post. Ideas, anyone?
User avatar
phalek
Developer
Posts: 2838
Joined: Thu Jan 31, 2008 6:39 am
Location: Kressbronn, Germany
Contact:

Re: Problem with cluster disk partitions

Post by phalek »

That sounds like an old issue with how SNMP may represent the disks.

Basically the disk have indexes, e.g.

Code: Select all

Disk 1 = index.0
Disk 2 = index.1
Disk 3 = index.2
But updates or restarts "may" change this order to e.g.

Code: Select all

Disk 1 = index.0
Disk 3 = index.1
Disk 2 = index.2
Cacti only matches the index number, nothing else as it's unaware of the changes that happened on the system.

This should probably occur only to virtual disks e.g. iSCSI as physically attached ones usually keep their order.

Now to fix this, you will have to figure out something else to use as an index. I did this sometimes by creating a script and creating my own index.
Greetings,
Phalek
---
Need more help ? Read the Cacti documentation or my new Cacti 1.x Book
Need on-site support ? Look here Cacti Workshop
Need professional Cacti support ? Look here CereusService
---
Plugins : CereusReporting
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Problem with cluster disk partitions

Post by BSOD2600 »

phalek wrote:Cacti only matches the index number, nothing else as it's unaware of the changes that happened on the system..
Then the re-indexing method should be changed from Uptime to either of the two other options so the new indexes are picked up.
victorantunes
Cacti User
Posts: 111
Joined: Fri Sep 28, 2012 6:52 pm

Re: Problem with cluster disk partitions

Post by victorantunes »

@phalek
Uhm, that makes sense.

I've never tackled scripting related to disk indexes. Do you still have some of those scripts you made? Would you be willing to share them, part of their logic or the resoures I must go after, or at least point me in the right direction?

@BSOD
I've wondered about that, but I wasn't sure yet, so I haven't thoroughly tested that option. I'm guessing "Verify All Fields" would be most suitable in this case, right?

Aside from removing, re-adding and reloading the query, is it necessary to perform any other action? i.e. deleting .rrd files, etc?



Thanks for the input
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Problem with cluster disk partitions

Post by BSOD2600 »

victorantunes wrote: I'm guessing "Verify All Fields" would be most suitable in this case, right?

Aside from removing, re-adding and reloading the query, is it necessary to perform any other action? i.e. deleting .rrd files, etc?
Yea, sounds like [Verify All Fields] is the best option for this device. remove the query and readd it back with the different reindex method, no other action should be required except possibly a poller cache clear.
victorantunes
Cacti User
Posts: 111
Joined: Fri Sep 28, 2012 6:52 pm

Re: Problem with cluster disk partitions

Post by victorantunes »

I've changed the re-index method. I'm guessing we'll perform a failover within the next couple days and see how it works.

Will post updates on this as they occur.

Thanks a ton, guys.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests