Complete online backup of Linux system with LVM on RAID over network and restore to new baremetal hardware.
a step-by-step howto guide
Foreword
There are many guides in the wild but none to do a
full system backup that resides on
LVM partition not even mentioning
RAID. To do it even simpler, we want to do it
online (or hot backup) whithout any downtime. And the cherry on a cake will be that we don't even need physical access to the machine and we'll do it
over the network through ssh remote access.
My system is Fedora on RAID 1 with only 2 drives (sda, sdb) devided to 2 partitions. This should be usable for any linux distribution.
A 500MiB sda1 and sdb1 to form md0 for ext2 /boot partition and the rest of the drive is sda2, sdb2 under md1 for LVM volume group vg00 with logical volumes for / (slash or root), /home, /var, and so on. It doesn't matter how many you have.
Preparation
We will put the backup to directory /home/backup/:
BACKUPTO=/home/backup
Let's take it from the bottom. Backup the harddrive partition layout:
sfdisk -d /dev/sda >$BACKUPTO/sfdisk.sda.out
sfdisk -d /dev/sdb >$BACKUPTO/sfdisk.sdb.out
I've backed up the RAID info as well althought it's not necessary:
cp /proc/mdstat $BACKUPTO/proc.mdstat
cp /etc/mdadm.conf $BACKUPTO/mdadm.conf
cp /dev/md/md-device-map $BACKUPTO/md-device-map
But the UUIDs of all drives will be usefull:
blkid >$BACKUPTO/blkid.out
Next we backup LVM configuration with:
vgcfgbackup -f $BACKUPTO/vgcfg.out
And for completeness you can save info about mountpoints too:
cp /etc/fstab $BACKUPTO/etc.fstab
The backup
date;time nice tar -cpjf $BACKUPTO/fullsysbkp.tbz\
--directory=/ --exclude=home/backup/fullsysbkp.tbz\
--exclude=proc --exclude=sys --exclude=dev/pts --exclude=media\
--exclude=home/samba/public/incoming --checkpoint=10000 /;date
I like to know how much time some long running task took so that's what for the
date and
time is.
Because we are doing this on live system it would be nice to be nice so that's what the
nice is for.
The
tar will tar everything that we need. I have
excluded one huge samba folder that shouldn't miss anyone. Incoming should be like /tmp. That' my opinion. And of course the archive itself (
fulsysbkp.tbz) should be exluded too.
The
checkpoint will just report when tar passes every 10000th file.
I've used bzip2 compression (option
-j) for it's compression ratio beacuse the archive will be transfered through slow aDSL network connection.
There will be some errors about ingnorred sockets. That's fine.
You can transfer the archive beforhand or during the backup to a prepared and mounted harddrives. That's what I've done. Look for the
scp command below.
The restore
We will restore the system to completely different hardware. I think the only condition is to use the same CPU architecture. Mine is x86_64. Of course your new hardware should have all resorces of the old one. E.g. same or more number of network cards installed. Same or more number and capacity of harddrives, and so on. For restoration you'll need some LiveCD. Any will do, but some require to install lvm2 package before you start.
If the restoration harddrives aren't new, don't forget to zero its superblocks:
mdadm --zero-superblock /dev/sd[ab][12]
To prevent confusion for utilities like mdadm I've zeroed at least the begining of the drives as well:
dd if=/dev/zero of=/dev/sda bs=512 count=1024000
dd if=/dev/zero of=/dev/sdb bs=512 count=1024000
Now we can begin with restoration (for new drives, you start from here)
Restore partition layout (if yu have drives with the same capacity. If you don't, you have to do it manualy or the rest of the bigger drive will remain unallocated):
cat sfdisk.sda.out|sfdisk /dev/sda
cat sfdisk.sdb.out|sfdisk /dev/sdb
And build a new RAID 1 array:
mdadm -Cve0 /dev/md0 -l1 -n2 /dev/sda1 /dev/sdb1
mdadm -Cve0 /dev/md1 -l1 -n2 /dev/sda2 /dev/sdb2
I'm using metadata version 0.9 because I like Grub legacy (0.97) and it cannot boot off RAID with metadata 1.0+.
There is no need to wait for the array rebuild to create physical volume:
pvcreate /dev/md1
Maybe there is better way to restore RAID configuration so the UUID won't change but because I don't know how to do this, we have to update UUIDs in our backed up
vgcfg.out file for every device we have created manualy.
vgcreate vg00 /dev/md1
vgdisplay|grep UUID
pvdisplay|grep UUID
Then edit vg00 id and pv0 id (pv0=md1) in the file:
vi vgcfg.out
And restore from backup:
vgcfgrestore -f vgcfg.out vg00
Now we can create filesystems:
mkfs.ext2 -L boot /dev/md0
mkfs.ext4 -L root /dev/vg00/root
mkfs.ext4 -L home /dev/vg00/home
mkfs.ext4 -L var /dev/vg00/var
mkswap -L swap /dev/vg00/swap
... and so on.
If you have changed filesystem type for any mountpoint, don't forget to update /etc/fstab to reflect the change.
We need to mount / (root) to some mount point in the LiveCD's filesystem and prepare folders to mount other filesystems. You need to create folder you have excluded in the backup as well so the system won't miss them:
mount /dev/vg00/root /media
cd /media
mkdir boot var home proc sys run dev
... and so on if you have more.
Let's mount them all:
mount /dev/md0 boot/
mount /dev/vg00/home home/
If you have excluded any subdirectories, create them right after mounting it's parent directory, because the system could miss them.
mkdir -p home/samba/public/incoming
mount /dev/vg00/var var/
mount -t proc proc proc/
mount -t sysfs sys sys/
mount -o bind /dev dev/
mount -t devpts pts dev/pts/
... and so on.
When we have prepared and mounted filesystems we can copy the backup here. or extract it directly:
date;time scp -c arcfour -l 256 user@server /fullsysbkp.tbz;date
date;time tar -xjf /fullsysbkp.tbz --checkpoint=10000 --numeric-owner .;date
Or without storing the archive to new drives:
date;time ssh user@server 'cat /fullsysbkp.tbz'|tar -xjf -
--checkpoint=10000 --numeric-owner .;date
I have used the lightest cipher available (
arcfour) to go easy on CPU (remember the system is fully loaded. We don't want to disturb it's services). It is way too much for aDSL to use half of the upload speed constantly but we have no choice, if the upload should take no more than a week. But if time is not an issue and server perfomance is priority
-l 128 is safe option here. If you use ssh instedad of scp for transfer, you can't throttle bandwidth.
(If you ask: "Why on earth you run server on aDSL?" Don't ask. You don't know where I live.)
We're almost there. Now
chroot into your "new" system:
chroot . /bin/bash
Create
mdadm.conf with new UUIDs by hand or with:
mdadm -Es >/etc/mdadm.conf
To boot off the new hardware we need to generate new initial ramdisk image:
mkinitrd /boot/initramfs-`uname -r`.img `uname -r`
If you have Debian based distro it will be:
mkinitramfs -o /boot/initrd.img-`uname -r` `uname -r`
Just look in the
/boot folder to see how your initial ramdisk files are named.
Alternatively, you can just reinstall kernel and your package manager will do initial ramdisk for you. For yum based distros it's just matter of:
yum reinstall kernel
Finally install a bootloader and you'll be ready to boot into your "new" system:
grub-install /dev/md0
or
grub install /dev/sda && grub install /dev/sdb
Sync it to be sure everything has been written down:
sync
Exit the chroot enviroment:
exit
Unmount all mounted filesystems:
umount {proc,sys,dev/pts,dev,boot,\
home/samba/public/incoming,home,var}
Step out of the / (root) folder:
cd ..
And unmount it:
umount /media
Reboot and pray:
reboot
Afterword
If there is something unclear or wrong or worth improve, let me know in the comments below or drop me a gmail to aasami.
I'm not a native speaker so forgive any mistakes or drop me private message so I can correct what's wrong.
If you find this guide usefull, please say "thank you" in the comments below for me to know that it has sense to write more of these guides.
Belive me or not this is my first guide so your commets will improve it over time.
Thank you for all your contributions.