Replacing A Failed Hard Drive In A Software RAID1 Array
This guide shows how to remove a failed hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.
NOTE: Use gdisk instead of fdisk to support GPT partitions.
An alternative (valid for both MBR and GPT cases) is the "Disk" Manager App,
if you have a graphic environment or, also, sfdisk from cmd line
1 Preliminary Note
In this example I have two hard drives,
/dev/sda and
/dev/sdb, with the partitions
/dev/sda1 and
/dev/sda2 as well as
/dev/sdb1 and
/dev/sdb2.
/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0
/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1
/dev/sda1 + /dev/sdb1 = /dev/md0
/dev/sda2 + /dev/sdb2 = /dev/md1
/dev/sdb has failed, and we want to replace it.
NOTE: The same logic applies to a raid1 made on disks instead of partitions
(i.e. /dev/sda + /dev/sdb = /dev/md0 )
2 How can I know if a Disk has failed?
If a disk has failed, you will probably find a lot of error messages in the log files, e.g.
/var/log/messages or
/var/log/syslog.
You can also run
cat /proc/mdstat
and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.
3 Removing The Failed Disk
To remove
/dev/sdb, we will mark
/dev/sdb1 and
/dev/sdb2 as failed and remove them from their respective RAID arrays (
/dev/md0 and
/dev/md1 )
First we mark
/dev/sdb1 as failed:
mdadm --manage /dev/md0 --fail /dev/sdb1
The output of
cat /proc/mdstat
should look like this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
Then we remove
/dev/sdb1 from
/dev/md0
mdadm --manage /dev/md0 --remove /dev/sdb1
The output should be like this:
server1:~# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
And
cat /proc/mdstat
should show this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
Now we do the same steps again for
/dev/sdb2 (which is part of
/dev/md1)
mdadm --manage /dev/md1 --fail /dev/sdb2
cat /proc/mdstat
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[2](F)
24418688 blocks [2/1] [U_]
unused devices: <none>
mdadm --manage /dev/md1 --remove /dev/sdb2
server1:~# mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
cat /proc/mdstat
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0]
24418688 blocks [2/1] [U_]
unused devices: <none>
Then power down the system:
shutdown -h now
and replace the old
/dev/sdb hard drive with a new one
NOTE: new disk
must have at least the same size as the old one
if it's only a few MB smaller than the old one then rebuilding the arrays will fail
4 Adding The New Hard Disk
After you have changed the hard disk
/dev/sdb, boot the system.
The first thing we must do now is to create the exact same partitioning as on
/dev/sda.
We can do this with one simple command
fdisk -d /dev/sda | fdisk /dev/sdb
NOTE: or gdisk for GPT disks or, in general, using 'Disk' graphical app
You can run
fdisk -l
to check if both hard drives have the same partitioning now.
Next we add
/dev/sdb1 to
/dev/md0 and
/dev/sdb2 to
/dev/md1
mdadm --manage /dev/md0 --add /dev/sdb1
server1:~# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: re-added /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sdb2
server1:~# mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: re-added /dev/sdb2
Now both arays (
/dev/md0 and
/dev/md1) will be synchronized. Run
cat /proc/mdstat
NOTE: or ' mdadm -D
/dev/md1 '
to see when it's finished.
During the synchronization the output will look like this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/1] [U_]
[=>...................] recovery = 9.9% (2423168/24418688) finish=2.8min speed=127535K/sec
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/1] [U_]
[=>...................] recovery = 6.4% (1572096/24418688) finish=1.9min speed=196512K/sec
unused devices: <none>
When the synchronization is finished, the output will look like this:
server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>
That's it, you have successfully replaced
/dev/sdb