I've had so many hard drives fail over the years that I'm sadly accustomed to running these checks on my drives to test them, and I even now run on new/replacement drives because I get faulty drives that have probably been dropped by the courier a couple of times a year as well. My longest serving drive is only around 3 years old ! Just purchased a 4GB M2 card for my Sony Ericsson K800i, that had errors straight out of the box!
New drivesFor a new drive, it's fine to run a destructive test, which writes patterns of data and does a through "soak" test (this may take over 24hrs on a 350GB drive though):
# badblocks -b 4096 -c 512 -s -v -w /dev/sdd1
^ This tests 512 blocks (4096 bytes each) at a time, which means the process takes an order of magnitude less time than the defaults. The -w flag makes it write the pattters 0xaa, 0x55, 0xff, 0x00 over every block of the drive, read it back and compare. -s and -v gets the program to display information while it is running so we know where it is up to.
Drives with valid dataIf you're like me, and you need to also check filesystems with valid data that you can't overwrite you need to do a safe read-only check of each block. Unfortunately no distros come with a a ram root disc which includes the utils necessary to do these checks, so you'll need to boot up from a Live disc, either Knoppix, or a standard Ubuntu disc will do. Run these commands on the drive:
# badblocks -b 4096 -c 512 -s -v -n /dev/sdd1
Also, you can display the blocks which are reserved as bad on an ext2/ext3 filesystem with the command:
dumpe2fs -b /dev/sda3
Memory cardsOld style DOS filesystems are still around on some memory cards (they've not reached the hard-coded limits of FAT32 yet!). So if like me, you've got a Sony Ericsson K800i, plug in the USB cable and run these commands to do an interactive check, which also scans for bad sectors:
# umount /dev/sdd1
# fsck.vfat -rtvV /dev/sdd1
Mount count checksMake sure your mount-counts are set to something reasonable, if you reboot 3 times a day set it higher, but if you only boot twice a month it might be worth having the max-mount-counts set to 1 with the -c option. Stagger your different drives with different counts (primes eh!?) to avoid overlap. So 2,3,7,11,13,17,19,23,29 etc
Set a time interval with the tune2fs -i command.
Feature wish list
fsck.vfat should include a % complete indicator which updates as it progresses, so we know its still working! When your drives reach the mount count they have a display in that mode, so could go the same, like this:
|====================............|*
fsck.vfat provide a way to view the bad blocks on a drive.
fsck.ext3 should provide a way to get the list of bad blocks out of the drive. (Currently we have to run dumpe2fs)
dumpe2fs is actually a very useful util, for developers mostly though. (and bad blocks check as above)
tune2fs lets us set the mount count, which forces a check when the system reboots (why can't distros also run other checks in this read-only mounted state?) Set the mount count with -C 4096 to force a check on the next boot (as that number is higher than the max-mount-counts)
e2image useful for dumping the filesystem to a file for analysis purposes. debugfs is an interactive filesystem debugger.
Tips
Got a failed drive that you want to recover files from? Sometimes it's not possible to mount them, so the trick is to copy it to a new drive with: dd if=/dev/sda of=/dev/sdb bs=512 As the partition may have errors add the conv=noerror so it just uses zero for sectors it can't read. Then you can run recovery tools on the new copy you made, without damaging the faulty drive further. There are various tricks like actually writing the data back to the same drive, which will cause the drive to remap the sectors (you all understand engineering tracks and that all drives ship with a certain number of mapped out bad sectors using spare sector pool right?).. which may allow you to mount the faulty drive directly.
Give your partitions a name with the tune2fs -L command, to make it simpler to identify your drives. You can also do this when creating your partitions by passing the name to: mke2fs -L (GNU+Linux distributions still aren't setting meaningful names like "boot", "root" and "home" as partition labels, doh!). The e2label /dev/sda1 Root_FS command achieves the same result as using tune2fs.
Check the name of your partition by calling: dumpe2fs -h /dev/sdxx
You can also name your Swap partition, first disable swap, then setup a new swap file in it with the name:
# swapoff /dev/sdb1
# mkswap -L SWAP_DRV /dev/sdb1
You can list the labels of your partitions by using the "blkid" shell command too! As fdisk -l only gives the partition types.
Finally, be really careful when using these commands, as you could destroy your data! If you ever need it, you can force the kernel to sync and remount all filesystems read-only by pressing Ctrl + Alt + SysRq + s, followed by Ctrl + Alt + SysRq + u. Then do Ctrl + Alt + SysRq + b to reboot the system.
Labels: file-systems, GNU-Linux