|
|
|
|
 |
Chapter 3: Introduction to File Systems
Solaris 2.6 Administrator Certification Training Guide, Part I
|
The following are the test objectives for this chapter:
- Defining and understanding the Solaris 2.x file system structure, parameters, and utilities
- Identifying utilities used to create, check, mount, and display file systems
- Comparing the Logical Volume Manager to standard Solaris file systems
- Understanding disk geometry and disk slicing
- Managing and controlling disk-space use
- Defining Volume Manager
ll disk-based computer systems have a file system. In UNIX, file systems have two basic components: files and directories. A file is the actual information as it is stored on the disk, and a directory is a listing of the filenames. In addition to keeping track of filenames, the file system must also keep track of files' access dates and of file ownership. Managing the UNIX file systems is one of the system administrator's most important tasks. Administration of the file system involves:
- Ensuring users have access to data. This means that systems are up and operational, file permissions are set up properly, and data is accessible.
- Protecting file systems against file corruption and hardware failures. This is accomplished by checking the file system regularly and maintaining proper system backups.
- Securing file systems against unauthorized access. Only authorized users should have access to them. The data must be protected from intruders.
- Providing users with adequate space for their files.
- Keeping the file system file clean. In other words, data in the file system must be relevant and not wasteful of disk space. Procedures are needed to make sure users follow proper naming conventions and data is stored in an organized manner.
This chapter discusses the basic structures that make up the file system, the utility that creates file systems, and how Solaris accesses the file system.
A File System Defined
A file system is a structure of directories used to organize and store files on disk. It is a collection of files and directories stored on disk in a standard UNIX file system format. You'll see the term "file system" used in several ways. Usually file system describes a particular type of file system (disk-based, network-based, or pseudo file system). It might also describe the entire file tree from the root directory downward. In another context, the term "file system" might be used to describe the structure of a disk slice, described later in this chapter.
The Solaris system software uses the virtual file system (VFS) architecture, which provides a standard interface for different file system types. The VFS architecture enables the kernel to handle basic operations, such as reading, writing, and listing files, without requiring the user or program to know about the underlying file system type. Furthermore, Solaris provides file-system administrative commands that enable you to maintain file systems.
Defining a Disk's Geometry
Before creating a file system on a disk, you need to understand the basic geometry of a disk drive. Disks come in many shapes and sizes. The number of heads, tracks, and sectors and the disk capacity vary from one model to another.
A hard disk consists of several separate disks mounted on a common spindle. Data stored on each disk surface is written and read by disk heads. The circular path a disk head traces over a spinning disk is called a track.
Each track is made up of a number of sectors laid end to end. A sector consists of a header, a trailer, and 512 bytes of data. The header and trailer contain error-checking information to help ensure the accuracy of the data. Taken together, the set of tracks traced across all of the individual disk surfaces for a single position of the heads is called a cylinder.
Disk Controller
Associated with every disk is a controller, an intelligent device responsible for organizing data on the disk. Some disk controllers are located on a separate circuit board and some are embedded in the disk drive.
Defect List
Disks might contain areas where data cannot be written and retrieved reliably. These areas are called defects. The controller uses the error-checking information in each disk block's trailer to determine whether a defect is present in that block. When a block is found to be defective, the controller can be instructed to add it to a defect list and avoid using that block in the future. The last two cylinders are set aside for diagnostic use and for storing the disk defect list.
Disk Labels
A special area of every disk is set aside for storing information about the disk's controller, geometry, and slices. This information is called the disk's label or Volume Table of Contents (VTOC). To label a disk means to write slice information onto the disk. You usually label a disk after defining its slices. If you fail to label a disk after creating slices, the slices will be unavailable because the operating system has no way of "knowing" about the slices.
Partition Table
An important part of the disk label is the partition table that identifies a disk's slices, the slice boundaries (in cylinders), and the total size of the slices. A disk's partition table can be displayed by using the format utility.
Solaris File System Types
Solaris file systems can be put into three categories: disk-based, network-based, and pseudo.
Disk-Based File Systems
Disk-based file systems reside on the system's local disk. The four types of disk file systems are
- UFS--The UNIX file system, which is based on the BSD FAT Fast file system (the traditional UNIX file system). The UFS file system is the default disk-based file system used in Solaris.
- HSFS--The High Sierra and ISO 9660 file system. The HSFS file system is used on CD-ROMs and is a read-only file system.
- PCFS--The PC file system, which allows read/write access to data and programs on DOS-formatted disks written for DOS-based personal computers.
- S5--The System V file system, which is seldom used. It is supported for backward compatibility purposes only.
Network-Based File Systems
Network-based file systems are file systems accessed over the network. Typically, network-based file systems reside on one system and are accessed by other systems across the network.
The Network File System (NFS) or remote file systems are systems made available from remote systems. NFS is the only available network-based file system.
Pseudo File Systems
Pseudo file systems are virtual or memory-based file systems that provide access to special kernel information and facilities. Most pseudo file systems do not use file-system disk space, although a few exceptions exist. Cache File Systems, for example, use a file system to contain the cache. Some pseudo file systems, such as the temporary file system, might use the swap space on a physical disk.
- SWAPFS--A file system or one used by the kernel for swapping.
- PROCFS--The Process File System resides in memory. It contains a list of active processes, by process number, in the /proc directory. Information in the /proc directory is used by commands such as ps. Debuggers and other development tools can also access the address space of the processes by using file system calls.
- LOFS--The Loopback File System enables you to create a new virtual file system. You can access files by using an alternative path name. The entire file system hierarchy looks as though it is duplicated under /tmp/newroot, including any file systems mounted from NFS servers. All files are accessible with either a pathname starting from / or a pathname starting from /tmp/newroot.
- CacheFS--The Cache File System enables you to use disk drives on local work- stations to store frequently used data from a remote file system or CD-ROM. The data stored on the local disk is the cache.
- TMPFS--The temporary file system uses local memory for file system reads and writes. Because TMPFS uses physical memory and not the disk, access to files in a TMPFS file system is typically much faster than to files in a UFS file system. Files in the temporary file system are not permanent; they are deleted when the file system is unmounted and when the system is shut down or rebooted. TMPFS is the default file system type for the /tmp directory in the SunOS system software. You can copy or move files into or out of the /tmp directory just as you would in a UFS /tmp file system. The TMPFS file system uses swap space as a temporary backing store as long as adequate swap space is present.
Disk Slices
Disks are divided into regions called disk slices or disk partitions. This book attempts to use the term slice whenever possible; however, certain interfaces, such as the format utility, refer to slices as partitions. A slice is composed of a single range of contiguous blocks. It is a physical subset of the disk (except for slice 2, which represents the entire disk). A UNIX file system is built within these disk slices. The boundaries of a disk slice are defined when a disk is formatted by using the Solaris format utility. Each disk slice appears to the operating system (and to the system administrator) as though it is a separate disk drive.
NOTE. Solaris device names use the term "slice" (and the letter "s" in the device name) to refer to the slice number. Slices were called "partitions" in SunOS 4.x.
A physical disk consists of a stack of circular platters. Data is stored on these platters in a cylindrical pattern. Cylinders can be grouped and isolated from one another. A group of cylinders is referred to as a slice. A slice is defined with start and end points, defined from the center of the stack of platters, which is called the spindle. To define a slice, the administrator provides a starting cylinder and an ending cylinder. A disk can have up to eight slices, named 0-7. See Chapter 2, "Installing the Solaris 2.x Software," for a discussion of disk-storage systems and sizing partitions.
When setting up slices, remember these rules:
- Each disk slice holds only one file system.
- No file system can span multiple slices.
- After a file system is created, its size cannot be increased or decreased without repartitioning the entire disk and restoring all data from a backup.
- Slices cannot span multiple disks; however, multiple swap slices on separate disks are allowed.
Also follow these guidelines when planning the layout of file systems:
- Distribute the workload as evenly as possible among different I/O systems and disk drives. Distribute /home and swap directories evenly across disks.
- Keep projects or groups within the same file system.
- Use as few file systems per disk as possible. On the system (or boot) disk, you usually have three slices: /, /usr, and a swap area. On other disks, create one or--at most--two slices. Fewer, roomier slices cause less file fragmentation than many small, overcrowded slices. Higher-capacity tape drives and the capability of ufsdump to handle multiple volumes facilitate backing up larger file systems.
- It is not important for most sites to be concerned about keeping similar types of user files in the same file system.
- Infrequently, you might have some users who consistently create very small or very large files. You might consider creating a separate file system with more inodes for users who consistently create very small files. See the sections on inodes and changing the number of bytes per inode later in this chapter.
Displaying Disk Configuration Information
As described earlier, disk configuration information is stored in the disk label. If you know the disk and slice number, you can display information for a disk by using the prtvtoc (print volume table of contents) command. You can specify the volume by specifying any non-zero-size slice defined on the disk (for example, /dev/rdsk/c0t3d0s2 for all of disk 3 or /dev/rdsk/c0t3d0s5 for the sixth slice of disk 3). If you know the target number of the disk but do not know how it is divided into slices, you can show information for the entire disk by specifying either slice 2 or slice 0. The following steps show how you can examine information stored on a disk's label by using the prtvtoc command.
1. Become superuser.
2. Type prtvtoc /dev/rdsk/cntndnsn and press Enter.
Information for the disk and slice you specify is displayed. In the following steps, information is displayed for all of disk 3:
1. Become superuser.
2. Type prtvtoc /dev/rdsk/c0t3d0s2 and press Enter. The system responds with: * /dev/rdsk/c0t3d0s2 (volume "") partition map * * Dimensions: * 512 bytes/sector * 36 sectors/track * 9 tracks/cylinder * 324 sectors/cylinder * 1272 cylinders * 1254 accessible cylinders * * Flags: * 1: unmountable * 10: read-only *
|
* * Partition 2 6 7 |
Tag 5 4 0
|
First Flags 01 00 00 |
Sector Last Sector Count 0 0 |
Last Count |
Sector 406296 242352 242352 |
Mount 406295 242351 163944 |
Directory
406295 /files7 |
The prtvtoc command shows the number of cylinders and heads, as well as how the disk's slices are arranged.
Using the format Utility to Create Slices
Before you can create a file system on a disk, the disk must be formatted and you must divide it into slices by using the Solaris format utility. Formatting involves two separate processes:
- Formatting--Writing format information to the disk
- Surface analysis--Compiling an up-to-date list of disk defects
When a disk is formatted, header and trailer information is superimposed on the disk. When the format utility runs a surface analysis, the controller scans the disk for defects. It needs to be noted that defects and formatting information reduce the total disk space available for data. This is why a new disk usually holds only 90-95% of its capacity after formatting. This percentage varies according to disk geometry and decreases as the disk ages and develops more defects.
The need for performing a surface analysis on a disk drive has dropped as more manufacturers ship their disk drives formatted and partitioned. You should not need to use the format utility when adding a disk drive to an existing system unless you think disk defects are causing problems or you want to change the partitioning scheme.
CAUTION! Formatting and creating slices is a destructive process, so make sure user data is backed up before you start.
The format utility searches your system for all attached disk drives and reports the following information about the disk drives it finds:
- Target location
- Disk geometry
- Whether the disk is formatted
- Whether the disk has mounted partitions
In addition, the format utility is used in disk repair operations to do the following:
- Retrieve disk labels
- Repair defective sectors
- Format and analyze disks
- Partition disks
- Label disks (write disk name and configuration information to the disk for future retrieval)
The Solaris installation program partitions and labels disk drives as part of installing the Solaris release. However, you might need to use the format utility when
- Displaying slice information
- Dividing a disk into slices
- Adding a disk drive to an existing system
- Formatting a disk drive
- Repairing a disk drive
The main reason a system administrator uses the format utility is to divide a disk into disk slices. The process of creating slices is as follows:
1. Become superuser.
2. Type format. The system responds with AVAILABLE DISK SELECTIONS: 0. c0t0d0 at scsibus0 slave 24 sd0: <SUN0207 cyl 1254 alt 2 hd 9 sec 36> 1. c0t3d0 at scsibus0 slave 0: test sd3: <SUN0207 cyl 1254 alt 2 hd 9 sec 36>
3. Specify the disk (enter its number). The system responds with FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision volname - set 8-character volume name quit
4. Type partition at the format prompt and the partition menu is displayed. format> partition PARTITION MENU: 0 - change `0' partition 1 - change `1' partition 2 - change `2' partition 3 - change `3' partition 4 - change `4' partition 5 - change `5' partition 6 - change `6' partition 7 - change `7' partition select - select a predefined table modify - modify a predefined partition table name - name the current table print - display the current table label - write partition map and label to the disk quit
5. Type print to display the current partition map. The system responds with
Starting Point |
partition> print Volume: test Current partition table (original sd3): Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 39 14.06MB (40/0/0) 1 swap wu 40 - 199 56.25MB (160/0/0) 2 backup wm 0 - 1150 404.65MB (1151/0/0) 3 unassigned wm 0 0 (0/0/0) 4 unassigned wm 0 0 (0/0/0) 5 - wm 0 10.20MB (29/0/0) 6 usr wm 200 - 228 121.29MB (345/0/0) 7 home wm 574 - 1150 202.85MB (577/0/0)
6. After partitioning the disk, it must be labeled by typing label at the partition prompt. partition> label
7. After labeling the disk, type quit to exit the partition menu. partition> quit
8. Type quit again to exit the format utility. format> quit
Logical Volumes
|
|
On a large server with many disk drives, standard methods of disk slicing are inadequate and inefficient. Limitations imposed by standard file systems include the inability to be larger than the size of the file system that holds them. Because file systems cannot span multiple disks, the size of the file system is limited to the size of the disk. Another problem with standard file systems is that they cannot be increased in size without destroying data on the file system. Sun has addressed these issues with two unbundled Sun packages: Solstice DiskSuite and Sun Enterprise Volume Manager. Both packages allow file systems to span multiple disks and provide for improved I/O and reliability compared to the standard Solaris file system. We refer to these types of file systems as logical volumes (LVMs). Both Sun packages are purchased separately and are not part of the standard Solaris operating system distribution. Typically, DiskSuite is used on Sun's multipacks and the Enterprise Volume Manager package is used on the SparcStorage arrays.
The following is an overview of the primary elements of a logical volume:
- Concatenation and striping
- Concatenations and stripes work much the way the UNIX cat(1) command program is used to concatenate two or more files to create one larger file. When partitions are concatenated, the addressing of the component blocks is done on the components sequentially. The file system can use the entire concatenation.
- Striping is similar to concatenation, except the addressing of the component blocks is interlaced on the slices rather than sequentially. Striping is used to gain performance. When data is striped across disks, multiple controllers can access data simultaneously.
- Mirroring (mirrors and submirrors)
- Mirroring replicates all writes to a single logical device (the mirror) and then to multiple devices (the submirrors), and distributes read operations. This provides redundancy of data in the event of a disk or hardware failure.
- UFS logging (journaled file system)
- UFS logging records UNIX file system (UFS) updates in a log (the logging device) before the updates are applied to the UNIX file system. With this technique, the risk of file system corruption due to a power failure or unsafe shutdown is greatly reduced.
- Hot spare pools
- A hot spare pool is a group of spare disk drives that automatically replace failed components.
- Disksets
- A diskset is an association of two hosts (servers) and a group of disk drives in which all of the drives are accessible by each host. If one host fails, the other takes over. This scenario is used where high availability is critical.
- RAID devices
- RAID is an acronym for "redundant arrays of inexpensive disks." Many disks are housed in a cabinet to provide large amounts of disk space.
- Ability to expand mounted file systems
- On a server requiring high availability, disk space can be added to a file system without shutting down the system or unmounting the file system.
Logical volumes can be made up of one or more component slices. You can configure the component slices of one disk or use slices from multiple disks. After you create the logical volumes, they are used like physical disk slices. Logical volumes provide increased capacity, higher availability, and better performance. To gain increased capacity, you create logical volumes that are concatenations, stripes, or RAID devices. Disk concatenations and stripes can help performance and address capacity issues. Mirroring, UFS logging, and RAID devices provide higher availability. Logical volumes are transparent to the application software and to the hardware.
Parts of a UFS File System
UFS is the default disk-based file system used in the Solaris system software. It provides the following features:
- State flags--Shows the state of the file system as clean, stable, active, or unknown. These flags eliminate unnecessary file system checks. If the file system is "clean" or "stable," fsck (file system check) is not run when the system boots.
- Extended fundamental types (EFT)--32-bit user ID (UID), group ID (GID), and device numbers.
- Large file systems--A UFS file system can be as large as 1 terabyte (1TB) and can have regular files up to 2 gigabytes (2GB). By default, the Solaris system software does not provide striping, which is required to make a logical slice large enough for a 1TB file system. Optional software packages, such as Solstice DiskSuite, provide this capability.
During the installation of the Solaris software, several UFS file systems are created on the system disk. These default file systems and their contents are described in Table 3-1.
TABLE 3-1 Solaris Default File Systems
|
Slice |
File System |
Description |
|
0 |
root |
Root (/) is the top of the hierarchical file tree. Root holds files and directories that make up the operating system. The root directory contains the directories and files critical for system operation, such as the kernel, the device drivers, and the programs used to boot the system. It also contains the mount point directories, in which local and remote file systems can be attached to the file tree. The root (/) file system is always in slice 0. |
|
1 |
swap |
Provides virtual memory or swap space. Swap space is used when running programs too large to fit in a computer's memory. The Solaris operating environment then "swaps" programs from memory to the disk and back, as needed. The swap slice is always located in slice 1 unless /var is set up as a file system. If /var is set up, /var uses slice 1 and swap is put on slice 4. The /var file system is for files and directories likely to change or grow over the life of the local system. These include system logs, vi and ex backup files, and uucp files. On a server, a good idea is to have these files in a separate file system. |
|
2 |
|
Refers to the entire disk and is defined automatically by Sun's format utility and the Solaris installation programs. The size of this slice should not be changed. |
|
3 |
/export |
Holds alternate versions of the operating system. These alternate versions are required by client systems whose architectures differ from that of the server. Clients with the same architecture type as the server obtain executables from the /usr file system, usually slice 6. |
|
4 |
/export/swap |
Provides virtual memory space for client systems if the system is set up for client support. |
|
5 |
/opt |
Holds optional third-party software added to a system. If a slice is not allocated for this file system during installation, the /opt directory is put in slice 0. |
|
6 |
/usr |
Holds operating system commands--also known as executables--designed to be run by users. This slice also holds documentation, system programs (init and syslogd, for example), and library routines. The /usr file system also includes system files and directories that can be shared with other users. Files (such as man pages) that can be used on all types of systems are in /usr/share. |
|
7 |
/home |
Holds files created by users (also named /export/home). |
You only need to create (or re-create) a UFS file system when you start point2
- Add or replace disks
- Change the slices of an existing disk
- Do a full restore on a file system
- Change the parameters of a file system, such as block size or free space
When you create a UFS file system, the disk slice is divided into cylinder groups. The slice is then divided into blocks to control and organize the structure of the files within the cylinder group. A UFS file system has the following four types of blocks, with each performing a specific function in the file system:
- Boot block--Stores information used when booting the system
- Superblock--Stores much of the information about the file system
- Inode--Stores all information about a file except its name
- Storage or data block--Stores data for each file
The Boot Block
The boot block stores the procedures used in booting the system. Without a boot block, the system does not boot. If a file system is not to be used for booting, the boot block is left blank. The boot block appears only in the first cylinder group (cylinder group 0) and is the first 8KB in a slice.
|
The Superblock
The superblock stores much of the information about the file system. A few of the more important things contained in a superblock are
- Size and status of the file system
- Label (file system name and volume name)
- Size of the file system's logical block
- Date and time of the last update
- Cylinder group size
- Number of data blocks in a cylinder group
- Summary data block
- File system state: clean, stable, or active
- Pathname of the last mount point
Without a superblock, the file system becomes unreadable. The superblock is located at the beginning of the disk slice and is replicated in each cylinder group. Because the superblock contains critical data, multiple superblocks are made when the file system is created. A copy of the superblock for each file system is kept up-to-date in memory. If the system gets halted before a disk copy of the superblock gets updated, the most recent changes to the superblock are lost and the file system becomes inconsistent. The sync command forces every superblock in memory to write its data to disk. The file system check program fsck can fix problems that occur when the sync command hasn't been used before a shutdown.
A summary information block is kept with the superblock. It is not replicated but is grouped with the first superblock, usually in cylinder group 0. The summary block records changes that take place as the file system is used, listing the number of inodes, directories, fragments, and storage blocks within the file system.
Inodes
An inode contains all of the information about a file except its name, which is kept in a directory. An inode is 128 bytes. The inode information is kept in the cylinder information block and contains the following:
- The type of the file (regular, directory, block special, character, link, and so on)
- The mode of the file (the set of read/write/execute permissions)
- The number of hard links to the file
- The user-id of the owner of the file
- The group-id to which the file belongs
- The number of bytes in the file
- An array of 15 disk-block addresses
- The date and time the file was last accessed
- The date and time the file was last modified
- The date and time the file was created
The maximum number of files per UFS file system is determined by the number of inodes allocated for a file system. The number of inodes depends on how much disk space is allocated for each inode and the total size of the file system. By default, one inode is allocated for each 2KB of data space. You can change the default allocation by using the -i option of the newfs command.
Storage Blocks
The rest of the space allocated to the file system is occupied by storage blocks, also called data blocks. The size of these storage blocks is determined at the time a file system is created. Storage blocks are allocated, by default, in two sizes: an 8KB logical block size and a 1KB fragmentation size.
For a regular file, the storage blocks contain the contents of the file. For a directory, the storage blocks contain entries that give the inode number and the filename of the files in the directory.
Free Blocks
Blocks not currently being used as inodes, indirect address blocks, or storage blocks are marked as free in the cylinder group map. This map also keeps track of fragments to prevent fragmentation from degrading disk performance.
How to Create a UFS File System
Use the newfs command to create UFS file systems. newfs is a convenient front-end to the mkfs command, the program that creates the new file system on a disk slice. On Solaris 2.x systems, information used to set some of the parameter defaults, such as number of tracks per cylinder and number of sectors per track, is read from the disk label. newfs determines the file system parameters to use, based on the options you specify and information provided in the disk label. Parameters are then passed to the mkfs (make file system) command, which builds the file system. Although you can use the mkfs command directly, it's more difficult to use and you must supply many of the parameters manually. The use of the newfs command is discussed later in this chapter.
The disk must be formatted and divided into slices before you can create UFS file system on it. newfs removes any data on the disk slice and creates the skeleton of a directory structure, including the directory named lost+found. After you run newfs successfully, the slice is ready to be mounted as a file system.
To create a UFS file system on a formatted disk that has already been divided into slices, you need to know the raw device filename of the slice that will contain the file system. If you are re-creating or modifying an existing UFS file system, back up and unmount the file system before performing these steps:
1. Become superuser.
2. Type newfs /dev/rdsk/device-name and press Enter. You are asked if you want to proceed. The newfs command requires the use of the raw device name and not the buffered device name.
CAUTION! Be sure you have specified the correct device name for the slice before performing the next step. You will erase the contents of the slice when the new file system is created, and you don't want to erase the wrong slice.
3. Type y to confirm.
The following example creates a file system on /dev/rdsk/c0t3d0s7:
1. Become superuser, type su, and enter the root password.
2. Type newfs /dev/rdsk/c0t3d0s7. The system responds with newfs: construct a new file system /dev/rdsk/c0t3d0s7 (y/n)? y /dev/rdsk/c0t3d0s7: 163944 sectors in 506 cylinders of 9 tracks, 36 sectors 83.9MB in 32 cyl groups (16 c/g, 2.65MB/g, 1216 i/g) super-block backups (for fsck -b #) at: 32, 5264, 10496, 15728, 20960, 26192, 31424, 36656, 41888, 47120, 52352, 57584, 62816, 68048, 73280, 78512, 82976, 88208, 93440, 98672, 103904, 109136, 114368, 119600, 124832, 130064, 135296, 140528, 145760, 150992, 156224, 161456,
The newfs command uses optimized default values to create the file system. The default parameters used by the newfs command are
- File system block size = 8,192.
| | | |