I use the following procedure to perform maintenance on Storage Spaces Direct servers.
Following the process to the letter is critical, as it’s more than just taking a server offline: portions of the storage are shared across all servers in the cluster.
Before doing anything, check that all volumes (virtual disks) are healthy:
For each volume (virtual disk), the HealthStatus must be Healthy before proceeding.
Pause & Drain
Before performing any maintenance, pause & drain any roles (e.g. VMs):
Suspend-ClusterNode -Drain -Cluster [CLUSTER NAME] -Name [SERVER NAME]
Suspend-ClusterNode -Drain -Cluster S2DCLUST1 -Name X500S2DP01
All virtual machines will begin live migrating to other servers in the cluster. This can take a few minutes.
Note: the screenshot above shows the node status as Paused. This is the next state, use Failover Cluster Manager and don’t proceed until the status changes from Draining to Paused.
Perform whatever maintenance tasks you need to (e.g. Windows Updates).
Restart-Computer -Force -ComputerName [SERVER NAME]
Restart-Computer -Force -ComputerName X500S2DP01
Make the server operational in the cluster with the following command, note I’m using the -Failback flag (this is optional) to move any roles that were previously running on the server back to it:
Resume-ClusterNode -Failback Immediate -Cluster S2DCLUST1 -Name X500S2DP01
Any new writes that occured while the server was paused need to be resynched. Only changed data needs to be resynched, this typically takes a couple of minutes.
Check the repair (resync) jobs with the following command, you can use the BytesTotal & PercentComplete values to monitor progress:
Check the volume (virtual disk) status, it is normal to see OperationalStatus: InService and HealthStatus: Warning while the above repair jobs are running:
It is critical that you wait for the repair (resync) to complete successfully before taking any other servers in the cluster offline!