Storage Spaces Direct: Server Maintenance

I use the following procedure to perform maintenance on Storage Spaces Direct servers.

Following the process to the letter is critical, as it’s more than just taking a server offline: portions of the storage are shared across all servers in the cluster.

Safety First

Before doing anything, check that all volumes (virtual disks) are healthy:

Get-VirtualDisk

S2DMaintenance01

For each volume (virtual disk), the HealthStatus must be Healthy before proceeding.

Pause & Drain

Before performing any maintenance, pause & drain any roles (e.g. VMs):

Suspend-ClusterNode -Drain -Cluster [CLUSTER NAME] -Name [SERVER NAME]

e.g.

Suspend-ClusterNode -Drain -Cluster S2DCLUST1 -Name X500S2DP01

S2DMaintenance02

All virtual machines will begin live migrating to other servers in the cluster.  This can take a few minutes.

Note: the screenshot above shows the node status as Paused.  This is the next state, use Failover Cluster Manager and don’t proceed until the status changes from Draining to Paused.

S2DMaintenance03

S2DMaintenance04

Perform Maintenance

Perform whatever maintenance tasks you need to (e.g. Windows Updates).

To reboot:

Restart-Computer -Force -ComputerName [SERVER NAME]

e.g.

Restart-Computer -Force -ComputerName X500S2DP01

S2DMaintenance05

Resuming

Make the server operational in the cluster with the following command, note I’m using the -Failback flag (this is optional) to move any roles that were previously running on the server back to it:

Resume-ClusterNode -Failback Immediate -Cluster S2DCLUST1 -Name X500S2DP01

S2DMaintenance06

Monitor Resync

Any new writes that occured while the server was paused need to be resynched.  Only changed data needs to be resynched, this typically takes a couple of minutes.

Check the repair (resync) jobs with the following command, you can use the BytesTotal & PercentComplete values to monitor progress:

Get-StorageJob

S2DMaintenance08

Check the volume (virtual disk) status, it is normal to see OperationalStatus: InService and HealthStatus: Warning while the above repair jobs are running:

Get-VirtualDisk

S2DMaintenance07

It is critical that you wait for the repair (resync) to complete successfully before taking any other servers in the cluster offline!

S2DMaintenance09

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s