I hate mdstat and Linux RAID. There, I said it. The sentiment is more likely a testament to my lack of understanding than the actual worthiness of Linux RAID.
Often times I find that Linux RAID dies upon reboot. I’ve had this happen perhaps a half a dozen times on different servers over the past couple of years. Every time the issue is *not* do to a faulty drive.
Recently, upon a forced reboot (power cycle) of a Linux server, the /home folder would not mount (RAID striped with parity, 3 devices). I went through the logs and customary machinations until I stumbled upon this entry in the /var/log/messages file:
kicking non-fresh sdd1 from array
Searching duckduckgo.com (I’ve given up on Google — I value my privacy too much) I came upon the following snippet (paraphrased). Unfortunately I can no longer find the original web page, so I can’t give credit:
When /proc/mdstat is missing one (or more?) of the devices in the array, try removing then re-adding the device.
This can happen after an unclean shutdown (like a power fail).
/sbin/mdadm /dev/md0 --fail /dev/sda5 --remove /dev/sda5 /sbin/mdadm /dev/md0 --add /dev/sda5 /sbin/mdadm /dev/md1 --fail /dev/sda6 --remove /dev/sda6 /sbin/mdadm /dev/md1 --add /dev/sda6
If this works, /proc/mdstat will show a recovery process indicator.
The recovery process takes a **long** time. At least that was the case on my 2TB drives. I’m talking a full day. The drive was mounted automatically when the recovery started, and users used it. I was concerned that there might be problems since the array was in a recovery process, but I haven’t seen any, save slow performance.