Sunday, January 03, 2010

A system recovery report

A linux server was unable to boot up after someone powered it off during an unresponsive state. The boot process was getting stuck at this error :

REDHAT nash version 5.1.19.6 Starting
Reading all volumes, This make take a while....
Found volume "volgroup01" using metadata type lvm2
VFS: can't find ext3 file system on dev dm-0
Mount: error mounting /dev/root on /sysroot as ext3: invalid argument
Setuproot: moving /DEV failed: no such file or directory
Setuproot: error mounting /proc: no such file or directory
Setuproot: error mounting /sys: no such file or directory
Switchroot: mount failed: no such file or directory
Kernel panic - not syncing attempted to kill init !


So I went into rescue mode by inserting the RHEL original DVD and entering
linux rescue

2) The system booted up on the DVD in rescue mode but failed to discover any existing installations.

3) I then manually scanned and activated logical volumes, but mounting the partitions on the logical volume with the /root partition failed. The other partition on the Logical Volume 2 was cleanly mountable without any errors.

lvm pvscan
lvm vgscan
lvm vgchange -ay
lvm lvscan

mount /dev/mapper/Vol--storage-Vol00

4) This pointed to an instance of corruption of partition information on the root partition of the system, thus rendering it unbootable.

Corrective Actions
1) As no existing backup was available during the time of this recovery, and since the data on the disk was in a totally inaccessible state, the only way ahead was to proceed with data recovery procedures.
2) The partition table on the disk, since it was an ext3 journaling filesystem, was scanned for backup filesystem superblocks.

dumpe2fs /dev/mapper/Vol--storage-Vol00 | grep superblock

3) The listed backup superblocks were then passed on to the mount utility one by one till a noncorrupted superblock was found.

e2fsck -f -b 32768 /dev/mapper/Vol--storage-Vol00

4) The system was then mounted with this superblock and the partition data was corrected. The journaling was automatically converted to ext2 at this point to avoid conflicting superblocks.
5) The system was rebooted at this point but failed when it attempted to read any actual data necessary for booting up on the /root partition .
6) The system was again rebooted with the installation DVD into rescue mode and the partition was then scanned for filesystem errors. All diagnosed errors on the filesystem were fixed.
7) The filesystem scan resulted in the entire existing filesystem to be moved to the 'lost+found' location as the top level file structure was lost though ALL data within it was intact.
8) The data within the partition as available at this stage was then copied to the other empty partition on Logical Volume 2 as a backup before proceeding further.
9) The top level directories had been renamed by the scan utility to '#294942343' like numerical names. Based on the files contained in each of these folders, the actual names of the folders were identified correctly (as later verified) and moved correspondingly.
10) The entire top level file structure was thus recreated and verified with the fstab entries in the system. The permissions of all files were preserved intact throughout the procedures by using the corresponding options provided by the rsync utility.
11) The partition was then re-journaled to return to an ext3 state.

cd /lost+found
tune2fs -j /dev/mapper/Vol--storage-Vol00

12) The system then booted fine into a normal mode and all services configured at boot time started without errors.

1 comment:

Anonymous said...

mount cannot read /etc/fstab
after lvm vgchange -ay