JOURNALING

As previously mentioned, the journal is used to increase the likelihood of the filesystem being in a consistent state. As with most things Linux, the journaling behavior is highly configurable. The default is to only write metadata (not data blocks) through the journal. This is done for performance reasons. The default can be changed via the mount data option. The option data=journal causes all data blocks to be written through the journal. There are other options as well. See the mount man page for details.

The journal causes data to be written twice. The first time data is written to the disk as quickly as possible. To accomplish this, the journal is stored in one block group and often is the only thing stored in the group. This minimizes disk seek times. Later, after the data has been committed to the journal, the operating system will write the data to the correct location on the disk and then erase the commitment record. This not only improves data integrity but it also improves performance by caching many small writes before writing everything to disk.

The journal is normally stored in inode 8, but it may optionally be stored on an external device. The latter does not seem to be very common. Regardless of where it is stored, the journal contains a special superblock that describes itself. When examining the journal directly it is important to realize that the journal stores information in big endian format.

The journal superblock is summarized in Table 7.25.

Table 7.25. The journal superblock.

Offset Type Name Description
0x0 be32 h_magic Jbd2 magic number, 0xC03B3998
0x4 be32 h_blocktype Should be 4, journal superblock v2
0x8 be32 h_sequence Transaction ID for this block
0xC be32 s_blocksize Journal device block size.
0x10 be32 s_maxlen Total number of blocks in this journal.
0x14 be32 s_first First block of log information.
0x18 be32 s_sequence First commit ID expected in log.
0x1C be32 s_start Block number of the start of log.
0x20 be32 s_errno Error value, as set by jbd2_journal_abort().
0x24 be32 s_feature_compat Compatible features. 0x1 = Journal maintains checksums
0x28 be32 s_feature_incompat Incompatible feature set.
0x2C be32 s_feature_ro_compat Read-only compatible feature set. There aren’t any of these currently.
0x30 u8 s_uuid[16] 128-bit uuid for journal. This is compared against the copy in the ext4 super block at mount time.
0x40 be32 s_nr_users Number of file systems sharing this journal.
0x44 be32 s_dynsuper Location of dynamic super block copy.
0x48 be32 s_max_transaction Limit of journal blocks per transaction.
0x4C be32 s_max_trans_data Limit of data blocks per transaction.
0x50 u8 s_checksum_type Checksum algorithm used for the journal. Probably 1=crc32 or 4=crc32c.
0x51 0xAB Padding 0xAB bytes of padding
0xFC be32 s_checksum Checksum of the entire superblock, with this field set to zero.
0x100 u8 s_users[16*48] IDs of all file systems sharing the log.

The general format for a transaction in the journal is a descriptor block, followed by one or more data or revocation blocks, and a commit block that completes the transaction. The descriptor block starts with a header (which is the same as the first twelve bytes of the journal superblock) and then has an array of journal block tags that describe the transaction. Data blocks are normally identical to blocks to be written to disk. Revocation blocks contain a list of blocks that were journaled in the past but should no longer be journaled in the future. The most common reason for a revocation is if a metadata block is changed to a regular file data block. The commit block indicates the end of a journal transaction.

I will not provide the internal structures for the journal blocks here for a couple of reasons. First, the journal block structures can differ significantly based on the version of journaling and selected options. The journal is an internal structure that was never really meant to be read by humans. Microsoft has released nothing publicly about their NTFS journaling internals. The only reason we can know about the Linux journaling internals is that it is open source.

Second, there are filesystem utilities in Linux, such as fsck, that can properly read the journal and make any required changes. It is likely a better idea to use the built-in utilities than to try and fix a filesystem by hand. If you do want to delve into the journaling internals, there is no better source than the header and C files themselves. The wiki at kernel.org may also be helpful.

results matching ""

    No results matching ""