The Superblock
The superblock stores much of the information about the file system. A few of the more important things contained in a superblock are
- Size and status of the file system
- Label (file system name and volume name)
- Size of the file system's logical block
- Date and time of the last update
- Cylinder group size
- Number of data blocks in a cylinder group
- Summary data block
- File system state: clean, stable, or active
- Pathname of the last mount point
Without a superblock, the file system becomes unreadable. The superblock is located at the beginning of the disk slice and is replicated in each cylinder group. Because the superblock contains critical data, multiple superblocks are made when the file system is created. A copy of the superblock for each file system is kept up-to-date in memory. If the system gets halted before a disk copy of the superblock gets updated, the most recent changes to the superblock are lost and the file system becomes inconsistent. The sync command forces every superblock in memory to write its data to disk. The file system check program fsck can fix problems that occur when the sync command hasn't been used before a shutdown.
A summary information block is kept with the superblock. It is not replicated but is grouped with the first superblock, usually in cylinder group 0. The summary block records changes that take place as the file system is used, listing the number of inodes, directories, fragments, and storage blocks within the file system.
Inodes
An inode contains all of the information about a file except its name, which is kept in a directory. An inode is 128 bytes. The inode information is kept in the cylinder information block and contains the following:
- The type of the file (regular, directory, block special, character, link, and so on)
- The mode of the file (the set of read/write/execute permissions)
- The number of hard links to the file
- The user-id of the owner of the file
- The group-id to which the file belongs
- The number of bytes in the file
- An array of 15 disk-block addresses
- The date and time the file was last accessed
- The date and time the file was last modified
- The date and time the file was created
The maximum number of files per UFS file system is determined by the number of inodes allocated for a file system. The number of inodes depends on how much disk space is allocated for each inode and the total size of the file system. By default, one inode is allocated for each 2KB of data space. You can change the default allocation by using the -i option of the newfs command.
Storage Blocks
The rest of the space allocated to the file system is occupied by storage blocks, also called data blocks. The size of these storage blocks is determined at the time a file system is created. Storage blocks are allocated, by default, in two sizes: an 8KB logical block size and a 1KB fragmentation size.
For a regular file, the storage blocks contain the contents of the file. For a directory, the storage blocks contain entries that give the inode number and the filename of the files in the directory.
Free Blocks
Blocks not currently being used as inodes, indirect address blocks, or storage blocks are marked as free in the cylinder group map. This map also keeps track of fragments to prevent fragmentation from degrading disk performance.
How to Create a UFS File System
Use the newfs command to create UFS file systems. newfs is a convenient front-end to the mkfs command, the program that creates the new file system on a disk slice. On Solaris 2.x systems, information used to set some of the parameter defaults, such as number of tracks per cylinder and number of sectors per track, is read from the disk label. newfs determines the file system parameters to use, based on the options you specify and information provided in the disk label. Parameters are then passed to the mkfs (make file system) command, which builds the file system. Although you can use the mkfs command directly, it's more difficult to use and you must supply many of the parameters manually. The use of the newfs command is discussed later in this chapter.
The disk must be formatted and divided into slices before you can create UFS file system on it. newfs removes any data on the disk slice and creates the skeleton of a directory structure, including the directory named lost+found. After you run newfs successfully, the slice is ready to be mounted as a file system.
To create a UFS file system on a formatted disk that has already been divided into slices, you need to know the raw device filename of the slice that will contain the file system. If you are re-creating or modifying an existing UFS file system, back up and unmount the file system before performing these steps:
1. Become superuser.
2. Type newfs /dev/rdsk/device-name and press Enter. You are asked if you want to proceed. The newfs command requires the use of the raw device name and not the buffered device name.
CAUTION! Be sure you have specified the correct device name for the slice before performing the next step. You will erase the contents of the slice when the new file system is created, and you don't want to erase the wrong slice.
3. Type y to confirm.
The following example creates a file system on /dev/rdsk/c0t3d0s7:
1. Become superuser, type su, and enter the root password.
2. Type newfs /dev/rdsk/c0t3d0s7. The system responds with newfs: construct a new file system /dev/rdsk/c0t3d0s7 (y/n)? y /dev/rdsk/c0t3d0s7: 163944 sectors in 506 cylinders of 9 tracks, 36 sectors 83.9MB in 32 cyl groups (16 c/g, 2.65MB/g, 1216 i/g) super-block backups (for fsck -b #) at: 32, 5264, 10496, 15728, 20960, 26192, 31424, 36656, 41888, 47120, 52352, 57584, 62816, 68048, 73280, 78512, 82976, 88208, 93440, 98672, 103904, 109136, 114368, 119600, 124832, 130064, 135296, 140528, 145760, 150992, 156224, 161456,
The newfs command uses optimized default values to create the file system. The default parameters used by the newfs command are
- File system block size = 8,192.
- File system fragment size (the smallest allocatable unit of disk space) = 1,024 bytes.
- Percentage of free space = 10%.
- Number of inodes or bytes per inode = 2,048. This controls how many inodes are created for the file system (one inode for each 2KB of disk space).
Understanding Custom File System Parameters
Before you choose to alter the default file system parameters assigned by the newfs command, you need to understand them. This section describes each of these parameters:
- Block size
- Fragment size
- Minimum free space
- Rotational delay
- Optimization type
- Number of inodes
Logical Block Size
The logical block size is the size of the blocks the UNIX kernel uses to read or write files. The logical block size is usually different from the physical block size (usually 512 bytes), which is the size of the smallest block the disk controller can read or write.
You can specify the logical block size of the file system. After the file system is created, you cannot change this parameter without rebuilding the file system. You can have file systems with different logical block sizes on the same disk.
By default, the logical block size is 8,192 bytes (8KB) for UFS file systems. The UFS file system supports block sizes of 4,096 or 8,192 bytes (4 or 8KB, with 8KB the recommended logical block size).
To choose the best logical block size for your system, consider both the performance desired and the available space. For most UFS systems, an 8KB file system provides the best performance, offering a good balance between disk performance and use of space in primary memory and on disk.
As a general rule, a larger logical block size increases efficiency for file systems in which most of the files are very large. Use a smaller logical block size for file systems in which most of the files are very small. You can use the quot -c file system command on a file system to display a complete report on the distribution of files by block size.
Fragment Size
As files are created or expanded, they are allocated disk space in either full logical blocks or portions of logical blocks called fragments. When disk space is needed to hold data for a file, full blocks are allocated first and then one or more fragments of a block are allocated for the remainder. For small files, allocation begins with fragments.
The ability to allocate fragments of blocks to files, rather than just whole blocks, saves space by reducing fragmentation of disk space resulting from unused holes in blocks.
You define the fragment size when you create a UFS file system. The default fragment size is 1KB. Each block can be divided into 1, 2, 4, or 8 fragments, resulting in fragment sizes from 8,192 bytes to 512 bytes (for 4KB file systems only). The lower bound is actually tied to the disk sector size, typically 512 bytes.
NOTE. The upper bound might equal the full block size, in which case the fragment is not a fragment at all. This configuration might be optimal for file systems with very large files when you are more concerned with speed than with space.
When choosing a fragment size, look at the trade-off between time and space: a small fragment size saves space but requires more time to allocate. As a general rule, a larger fragment size increases efficiency for file systems in which most of the files are large. Use a smaller fragment size for file systems in which most of the files are small.
Minimum Free Space
The minimum free space is the percentage of the total disk space held in reserve when you create the file system. The default reserve is 10%. Free space is important because file access becomes less and less efficient as a file system gets full. As long as there is an adequate amount of free space, UFS file systems operate efficiently. When a file system becomes full, using up the available user space, only root can access the reserved free space.
Commands such as df report the percentage of space available to users, excluding the percentage allocated as the minimum free space. When the command reports that more than 100% of the disk space in the file system is in use, some of the reserve has been used by root.
If you impose quotas on users, the amount of space available to the users does not include the free space reserve. You can change the value of the minimum free space for an existing file system by using the tunefs command.
Rotational Delay (Gap)
The rotational delay is the expected minimum time (in milliseconds) it takes the CPU to complete a data transfer and initiate a new data transfer on the same disk cylinder. The default delay depends on the type of disk and is usually optimized for each disk type.
When writing a file, the UFS allocation routines try to position new blocks on the same disk cylinder as the previous block in the same file. The allocation routines also try to optimally position new blocks within tracks to minimize the disk rotation needed to access them.
To position file blocks so they are "rotationally well behaved," the allocation routines must know how fast the CPU can service transfers and how long it takes the disk to skip over a block. By using options to the mkfs command, you can indicate how fast the disk rotates and how many disk blocks (sectors) it has per track. The allocation routines use this information to figure out how many milliseconds the disk takes to skip a block. Then, by using the expected transfer time (rotational delay), the allocation routines can position or place blocks so the next block is just coming under the disk head when the system is ready to read it.
NOTE. It is not necessary to specify the rotational delay (-d option to newfs) for some devices.
Place blocks consecutively only if your system is fast enough to read them on the same disk rotation. If the system is too slow, the disk spins past the beginning of the next block in the file and must complete a full rotation before the block can be read, which takes a lot of time. You should try to specify an appropriate value for the gap so the head is located over the appropriate block when the next disk request occurs.
You can change the value of this parameter for an existing file system by using the tunefs command. The change applies only to subsequent block allocation, not to blocks already allocated.
Optimization Type
The optimization type is either space or time.
When you select space optimization, disk blocks are allocated to minimize fragmentation and optimize disk use. Space is the default when you set the minimum free space to less than 10%.
When you select time optimization, disk blocks are allocated as quickly as possible, with less emphasis on their placement. Time is the default when you set the minimum free space to 10% or greater. With enough free space, the disk blocks can be allocated effectively with minimal fragmentation.
You can change the value of the optimization type parameter for an existing file system by using the tunefs command.
Number of Bytes per Inode
The number of inodes determines the number of files you can have in the file system: one inode for each file. The number of bytes per inode determines the total number of inodes created when the file system is made: the total size of the file system divided by the number of bytes per inode. After the inodes are allocated, you cannot change the number without re-creating the file system.
The default number of bytes per inode is 2,048 bytes (2KB), which assumes the average size of each file is 2KB or greater. Most files are larger than 2KB. A file system with many symbolic links will have a lower average file size. If your file system is going to have many small files, you can give this parameter a lower value. Note, however, that having too many inodes is much better than running out of them. If you have too few inodes, you could reach the maximum number of files on a disk slice that is practically empty.
File System Operations
This section describes the Solaris utilities used for creating, checking, repairing, and mounting file systems. Use these utilities to make file systems available to the user and to ensure their reliability.
Synchronizing a File System
The UFS file system relies on an internal set of tables to keep track of inodes and used and available blocks. When a user performs an operation that requires data to be written out to the disk, the data to be written is first copied into a buffer in the kernel. Normally, the disk update is not handled until long after the write operation has returned. At any given time, the file system, as it resides on the disk, might lag behind the state of the file system represented by the buffers located in physical memory. The internal tables finally get updated when the buffer is required for another use or when the kernel automatically runs the fsflush daemon (at 30-second intervals). If the system is halted without writing out the memory-resident information, the file system on the disk will be in an inconsistent state. If the internal tables are not properly synchronized with data on a disk, inconsistencies result and file systems need repairing. File systems can be damaged or become inconsistent because of abrupt termination of the operating system in these ways:
- Power failure
- Accidental unplugging of the system
- Turning off the system without the proper shutdown procedure
- A software error in the kernel
To prevent unclean halts, the current state of the file system must be written to disk (that is, "synchronized") before you halt the CPU or take a disk offline.
Repairing File Systems
During normal operation, files are created, modified, and removed. Each time a file is modified, the operating system performs a series of file system updates. When a system is booted, a file system consistency check is automatically performed. Most of the time, this file system check repairs any problems it encounters. File systems are checked with the fsck (file system check) program.
The Solaris fsck command uses a state flag, which is stored in the superblock, to record the condition of the file system. This flag is used by the fsck command to determine whether a file system needs to be checked for consistency. The flag is used by the /etc/bcheckrc script during booting and by the fsck command when run from a command line using the -m option. The possible state values are
- FSCLEAN--If the file system was unmounted properly, the state flag is set to FSCLEAN. Any file system with an FSCLEAN state flag is not checked when the system is booted.
- FSSTABLE--The file system is (or was) mounted but has not changed since the last check point--sync or fsflush--which normally occurs every 30 seconds. For example, the kernel periodically checks to see if a file system is idle and, if so, flushes the information in the superblock back to the disk and marks it FSSTABLE. If the system crashes, the file system structure is stable, but users might lose a small amount of data. File systems marked FSSTABLE can skip the checking before mounting.
- FSACTIVE--When a file system is mounted and then modified, the state flag is set to FSACTIVE and the file system might contain inconsistencies. A file system is marked as FSACTIVE before any modified data is written to the disk. When a file system is unmounted gracefully, the state flag is set to FSCLEAN. A file system with the FSACTIVE flag must be checked by fsck because it might be inconsistent. The system does not mount a file system for read/write unless the file system state is FSCLEAN or FSSTABLE.
- FSBAD--If the root file system is mounted when its state is not FSCLEAN or FSSTABLE, the state flag is set to FSBAD. The kernel does not change this file system state to FSCLEAN or FSSTABLE. A root file system flagged FSBAD as part of the boot process is mounted read-only. You can run fsck on the raw root device and then remount the root file system as read/write.
fsck is a multipass file system check program that performs successive passes over each file system, checking blocks and sizes, pathnames, connectivity, reference counts, and the map of free blocks (possibly rebuilding it) and performing some cleanup. The phases (passes) performed by the UFS version of fsck are
Initialization
|
Phase 1 |
Check blocks and sizes. |
|
Phase 2 |
Check pathnames. |
|
Phase 3 |
Check connectivity. |
|
Phase 4 |
Check reference counts. |
|
Phase 5 |
Check cylinder groups. |
Normally, fsck is run noninteractively at bootup to preen the file systems after an abrupt system halt in which the latest file system changes were not written to disk. Preening automatically fixes any basic file system inconsistencies and does not try to repair more serious errors. While preening a file system, fsck fixes the inconsistencies it expects from such an abrupt halt. For more serious conditions, the command reports the error and terminates. It then gives the operator a message to run fsck manually.
How to Determine If a File System Needs Checking
File systems must be checked periodically for inconsistencies to avoid unexpected loss of data. As stated in the previous section, checking the state of a file system is automatically done at bootup; however, it is not necessary to reboot a system to check if the file systems are stable. The following procedure outlines a method for determining the current state of the file systems and whether they need to be fixed.
1. Become superuser.
2. Type fsck -m /dev/rdsk/cntndnsn and press Enter. The state flag in the superblock of the file system you specify is checked to see whether the file system is clean or requires checking. If you omit the device argument, all the UFS file systems listed in /etc/vfstab with a fsck pass value of greater than 0 are checked.
In this example, the first file system needs checking; the second file system does not: fsck -m /dev/rdsk/c0t0d0s6 ** /dev/rdsk/c0t0d0s6 ufs fsck: sanity check: /dev/rdsk/c0t0d0s6 needs checking fsck -m /dev/rdsk/c0t0d0s7 ** /dev/rdsk/c0t0d0s7 ufs fsck: sanity check: /dev/rdsk/c0t0d0s7 okay
To Run fsck Manually
You might need to manually check file systems when they cannot be mounted or when you've determined that the state of a file system is unclean. Good indications that a file system might need to be checked are error messages displayed in the console window or system crashes for no reason.
When you run fsck manually, fsck reports each inconsistency found and fixes innocuous errors. For more serious errors, the command reports the inconsistency and prompts you to choose a response. Sometimes corrective actions performed by fsck result in some loss of data. The amount and severity of data loss can be determined from the fsck diagnostic output.
To check a file system manually, follow these steps:
1. Become superuser.
2. Unmount the file system.
3. Type fsck and press Enter. All file systems in the /etc/vfstab file with entries greater than zero in the fsck pass field are checked. You can also specify the mount point directory or /dev/rdsk/ cntndnsn as arguments to fsck. The fsck command requires the raw device filename. Any inconsistency messages are displayed. The only way to successfully change the file system and correct the problem is to answer "yes" to these messages.
NOTE. The fsck command has an option -y that will automatically answer yes to every question. But be careful: if fsck asks to delete a file, it will answer yes and you will have no control over it. If it doesn't delete the file, however, the file system remains unclean and cannot be mounted.
4. If you corrected any errors, type fsck and press Enter. fsck might not be able to fix all errors in one execution. If you see the message FILE SYSTEM STATE NOT SET TO OKAY, run the command again and continue to run fsck until it runs clean with no errors.
5. Rename and move any files put in lost+found. Individual files put in the lost+found directory by fsck are renamed with their inode numbers, and figuring out what they were named originally can be difficult. If possible, rename the files and move them where they belong. You might be able to use the grep command to match phrases with individual files and the file command to identify file types, ownership, and so on. When whole directories are dumped into lost+found, it is easier to figure out where they belong and move them back.
Mounting File Systems
After you create a file system, you need to make it available. You make file systems available by mounting them. Using the mount command, you attach a file system to the system directory tree at the specified mount point and it becomes available to the system. The root file system is mounted at boot time and cannot be unmounted. Any other file system can be mounted or unmounted from the root file system at any time.
The various methods used to mount a file system are described in the next sections.
Creating an Entry in the /etc/vfstab File to Mount File Systems
The /etc/vfstab (virtual file system table) file contains a list of file systems to be automatically mounted when the system is booted to the multiuser state. The system administrator places entries in the file, specifying what file systems are to be mounted at bootup. The following is an example of the /etc/vfstab file:
#device device mount FS fsck mount mount #to mount to fsck point type pass at boot options /dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no - /proc - /proc proc no - /dev/dsk/c0t0d0s1 - - swap no - swap - /tmp tmpfs - yes - /dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /usr ufs 2 no - /dev/dsk/c0t3d0s7 /dev/rdsk/c0t3d0s7 /data ufs 2 no -
Each column of information follows this format:
- device to fsck--The raw (character) special device that corresponds to the file system being mounted. This determines the raw interface used by fsck. Use a dash (-) when there is no applicable device, such as for a read-only file system or a network-based file system.
- mount point--The default mount point directory.
- FS type--The type of file system.
- fsck pass--The pass number used by fsck to decide whether to check a file. When the field contains a dash (-), the file system is not checked. When the field contains a value of 1, the file system is checked sequentially. When fsck is run on multiple UFS file systems that have fsck pass values of greater than one, fsck automatically checks the file systems on different disks in parallel to maximize efficiency. Otherwise, the value of the pass number does not have any effect.
NOTE. In SunOS system software, the fsck pass field does not specify the order in which file systems are to be checked. During bootup, a preliminary check is run on each file system to be mounted from a hard disk, using the boot script /sbin/rcS, which checks the /, /usr, and /usr/kvm file systems. The other rc shell scripts then use the fsck command to check each additional file system sequentially. They do not check file systems in parallel. File systems are checked sequentially during booting even if the fsck pass numbers are greater than 1. The values can be any number greater than 1.
- mount at boot--Specify whether the file system should be automatically mounted when the system is booted.
- mount options--A list of comma-separated options (with no spaces) used when mounting the file system. Use a dash (-) to show no options.
Using the Command Line to Mount File Systems
File systems can be mounted from the command line by using the mount command. The following commands are used from the command line to mount and unmount file systems:
|
mount |
Mounts specified file systems and remote resources |
|
mountall |
Mounts all file systems specified in a file system table (vfstab) |
|
umount |
Unmounts specified file systems and remote resources |
|
umountall |
Unmounts all file systems specified in a file system table |
CD-ROMs containing file systems are automatically mounted when the CD-ROM is inserted. Disks containing file systems are mounted by running the volcheck command.
As a general rule, local disk slices should be included in the /etc/vfstab file so they automatically mount at bootup.
Unmounting a file system removes it from the file system mount point. Some file system administration tasks cannot be performed on mounted file systems. You should unmount a file system when
- It is no longer needed
- You check and repair it by using the fsck command
- You are about to do a complete backup of it
NOTE. File systems are automatically unmounted as part of the system shutdown procedure.
Displaying Mounted File Systems
Whenever you mount or unmount a file system, the /etc/mnttab (mount table) file is modified to show the list of currently mounted file systems. You can display the contents of the mount table by using the cat or more commands, but you cannot edit it as you would the /etc/vfstab file. Here is an example of a mount table file:
/dev/dsk/c0t3d0s0 / ufs rw,suid 693186371 /dev/dsk/c0t1d0s6 /usr ufs rw,suid 693186371 /proc /proc proc rw,suid 693186371 swap /tmp tmpfs,dev=0 693186373
You can also view a mounted file system by typing /etc/mount from the command line. The system displays the following:
/ on /dev/dsk/c0t3d0s0 read/write/setuid/largefiles on ... /usr on /dev/dsk/c0t3d0s6 read/write/setuid/largefiles on ... /proc on /proc read/write/setuid on Fri May 16 11:39:05 1997 /dev/fd on fd read/write/setuid on Fri May 16 11:39:05 1997 /export on /dev/dsk/c0t3d0s3 setuid/read/write/largefiles on ... /export/home on /dev/dsk/c0t3d0s7 setuid/read/write/largefiles on ... /export/swap on /dev/dsk/c0t3d0s4 setuid/read/write/largefiles on ... /opt on /dev/dsk/c0t3d0s5 setuid/read/write/largefiles on ... /tmp on swap read/write on Fri May 16 11:39:07 1997
How to Mount a File System with Large Files
The new largefiles mount option enables users to mount a file system containing files larger than 2GB. The largefiles mount option is the default state for the Solaris 2.6 environment. The largefiles option means a file system mounted with this option may contain one or more files larger than 2GB.
You must explicitly use the nolargefiles mount option to disable this behavior. The nolargefiles option provides total compatibility with previous file system behavior, enforcing the 2GB maximum file size limit.
Displaying a File System's Disk Space Use
Use the df command and its options to see the capacity of each disk mounted on a system, the amount available, and the percentage of space already in use. Use the du (directory usage) command to report the number of free disk blocks and files.
NOTE. File systems at or above 90% of capacity should be cleared of unnecessary files. You can do this by moving them to a disk, or you can remove them after obtaining the user's permission.
The following is an example of how to use the df command to display disk space information. The command syntax is
$ df directory -F fstype -g -k -t
The following is an explanation of the df command and its options:
df |
The df command with no options lists all mounted file systems and their device names. It also lists the number of total 512-byte blocks used and the number of files. |
directory |
Directory whose file system you want to check. The device name, blocks used, and number of files are displayed. |
-F fstype |
Displays a list of unmounted file systems, their device names, the number of 512-byte blocks used, and the number of files on file systems of type fstype. |
-g |
Displays the statvfs structure for all mounted file systems. |
-k |
Displays a list of file systems, kilobytes used, free kilobytes, percent capacity used, and mount points. |
-t |
Displays total blocks as well as blocks used for all mounted file systems. |
Displaying Directory Size Information
By using the df command, you displayed file system disk use. You can use the du command to display the disk use of a directory and all of its subdirectories in 512-byte blocks.
The du command shows you the disk use of each subdirectory. To get a list of subdirectories in a file system, cd to the pathname associated with that file system and run the following pipeline:
$ du | sort -r -n
This pipeline, which uses the reverse and numeric options of the sort command, pinpoints large directories. Use ls -l to examine the size (in bytes) and modification times of files within each directory. Old files or text files over 100KB often warrant storage offline.
Controlling User Disk Space Use
Quotas enable system administrators to control the size of UFS file systems by limiting the amount of disk space individual users can acquire. Quotas are especially useful on the file systems where user home directories reside. After the quotas are in place, they can be changed to adjust the amount of disk space or number of inodes users can consume. Additionally, quotas can be added or removed as system needs change. In addition, quota status can be monitored. Quota commands enable administrators to display information about quotas on a file system or to search for users who have exceeded their quotas.
After you have set up and turned on disk and inode quotas, you can check for users who exceed their quotas. You can also check quota information for entire file systems by using the following commands:
|
quota |
Displays the quotas and disk use within a file system for individual users on which quotas have been activated. |
|
repquota |
Displays the quotas and disk use for all users on one or more file systems. |
You won't see quotas in to use much today, as the cost of disk space continues to fall. In most cases, the system administrator simply watches disk space to identify users who might be using more than their fair share. As we saw in this section, you can easily do this by using the du command. On a large system with many users, disk quotas can be an effective way to control disk space use.
This chapter described disk file systems and how they are managed. The system administrator will spend a great deal of time managing and fine-tuning file systems to improve system efficiency. |