Home » Posts tagged "fail drive"

Failing/Replacing a drive in Linux Software Raid

Lets assume this is the drive setup:

/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.
/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.

/dev/sda1 + /dev/sdb1 = /dev/md0
/dev/sda2 + /dev/sdb2 = /dev/md1


# cat /proc/mdstat

and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.

If it is not showing as degraded and you need to remove /dev/sdb, mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).

First mark /dev/sdb1 as failed:

# mdadm --manage /dev/md0 --fail /dev/sdb1


# cat /proc/mdstat

should see something like this:

# server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
24418688 blocks [2/1] [U_]
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/2] [UU]
unused devices: <none>

Then remove /dev/sdb1 from /dev/md0 with:

# mdadm --manage /dev/md0 --remove /dev/sdb1

Now we do the same steps again for /dev/sdb2:

# mdadm --manage /dev/md1 --fail /dev/sdb2
# mdadm --manage /dev/md1 --remove /dev/sdb2

NOTE: If you were removing the sda drive for example remember to install Grub on the sdb device prior to rebooting or you going to have a bad time.

Shutdown the server:

# shutdown -h now

and replace the old /dev/sdb hard drive with a new one.

Again if you removed sda drive you might need to change the boot order in the BIOS to the working drive.

After adding The New Hard Disk copy the partition layout to the new drive:

# sfdisk -d /dev/sda | sfdisk /dev/sdb

Check the partitioning on both drives:

# fdisk -l
# server1:~# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: re-added /dev/sdb1
# server1:~# mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: re-added /dev/sdb2

Now both arays (md0 & md1) will re-sync.

# cat /proc/mdstat

to see the progress

During the sync the output will look like this:

# server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
24418688 blocks [2/1] [U_]
[=>...................] recovery = 9.9% (2423168/24418688) finish=2.8min speed=127535K/sec
md1 : active raid1 sda2[0] sdb2[1]
24418688 blocks [2/1] [U_]
[=>...................] recovery = 6.4% (1572096/24418688) finish=1.9min speed=196512K/sec
unused devices: <none>

Recovering Linux software raid

Encountered a drive failure in a server the other day and proceeded to the process of swapping it. The server was set up with a software raid and going by how all the servers I work on it should be something along the lines of:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0] sda1[1]
104320 blocks [2/2] [UU]
md2 : active raid1 sdb5[0] sda5[1]
226532416 blocks [2/2] [UU]
md1 : active raid1 sdb2[0] sda2[1]
10241344 blocks [2/2] [UU]
unused devices: <none>

Unfortunately the server kept constantly throwing the following error on the screen:

iscsi 0:0:1:0: rejecting I/O to dead device

which would not allow me to log in and fail the device if needed and remove it form the mdadm.

Shutting it down and swapping the drive should work so proceeded to do so — outcome kernel panic.

Had no idea what caused it, tried single user mode – no luck. Figured it might be due to it trying to look for the partitions on the new hdd to add it to mdadm configuration to boot from it however I was sceptical about that.

Next step boot in recovery and try to access mdadm, of I place the recovery cd in the drive and boot.

Ok I am still pretty new to this but having my hopes high. Booted from cd:

# linux rescue

Set of standard screens to set up keyboard layout and locale, finding root file system — not found.

press any key to go to bash

I’m here, what now? Try mount sda2 (which I knew was root from the standard setup – no luck. Ha knew it wont be this easy, it is a Linux Software Raid after all. I have to mount the md1 device then, now the question it -how?

After a consultation with Google 😀 I have the steps

First I have edited/created the /etc/mdadm.conf

# vi /etc/mdadm.conf

And added the devices that were used to crate the md devices:

DEVICE /dev/sd[ab]1
DEVICE /dev/sd[ab]2
DEVICE /dev/sd[ab]5


# mdadm --examin --scan >> /etc/mdadm.conf

this adds entries for any RAID devices it finds on the disk specified.

# mdadm --assemble --scan /dev/md0
# mdadm --assemble --scan /dev/md1

This will assemble the raid devices to be used.

In my case the / partition was on md1 as suspected however was set up as a RAID 0 and it ended being rebuilt from scratch

Some good info on the way though 😀