next up previous contents
Next: Disks without filesystems Up: Filesystems Previous: Mounting and unmounting

Keeping filesystems healthy

Filesystems are complex creatures, and as such, they tend to be somewhat error-prone. A filesystem's correctness and validity can be checked using the fsck(8) command. It can be instructed to repair any minor problems it finds, and to alert the user if there any unrepairable problems. Fortunately, the code to implement filesystems is debugged quite effectively, so there are seldom any problems at all, and they are usually caused by power failures, failing hardware, or operator errors; for example, by not shutting down the system properly.

Most systems are setup to run fsck automatically at boot time, so that any errors are detected (and hopefully corrected) before the system is used. Use of a corrupted filesystem tends to make things worse: if the data structures are messed up, using the filesystem will probably mess them up even more, resulting in more data loss. However, fsck can take a while to run on big filesystems, and since errors almost never occur if the system has been shut down properly, a couple of tricks are used to avoid doing the checks in such cases. The first is that if the file /etc/fastboot exists, no checks are made. The second is that the ext2 filesystem has a special marker in its superblock that tells whether the filesystem was unmounted properly after the previous mount. This allows e2fsck (the version of fsck for the ext2 filesystem) to avoid checking the filesystem if the flag indicates that the unmount was done (the assumption being that a proper unmount indicates no problems). Whether the /etc/fastboot trick works on your system depends on your startup scripts, but the ext2 trick works every time you use e2fsck--it has to be explicitly bypassed with an option to e2fsck to be avoided. (See the e2fsck(8) man page for details on how.)

The automatic checking only works for the filesystems that are mounted automatically at boot time. Use fsck manually to check other filesystems, e.g., floppies.

If fsck finds unrepairable problems, you need either in-depth knowlege of how filesystems work in general, and the type of the corrupt filesystem in particular, or good backups. The latter is easy (although sometimes tedious) to arrange, the former can sometimes be arranged via a friend, the Linux newsgroups and mailing lists, or some other source of support, if you don't have the know-how yourself. I'd like to tell you more about it, but my lack of education and experience in this regard hinders me. The debugfs(8) program by Theodore T'so should be useful.

fsck must only be run on unmounted filesystems, never on mounted filesystems (with the exception of the read-only root during startup). This is because it accesses the raw disk, and can therefore modify the filesystem without the operating system realizing it. There will be trouble, if the operating system is confused.

It can be a good idea to periodically check for bad blocks. This is done with the badblocks command. It outputs a list of the numbers of all bad blocks it can find. This list can be fed to fsck to be recorded in the filesystem data structures so that the operating system won't try to use the bad blocks for storing data. The following example will show how this could be done.


next up previous contents
Next: Disks without filesystems Up: Filesystems Previous: Mounting and unmounting

Andrew Anderson
Thu Mar 7 22:36:29 EST 1996