JOURNALING

As previously mentioned, the journal is used to increase the likelihood of the filesystem being in a consistent state. As with most things Linux, the journaling behavior is highly configurable. The default is to only write metadata (not data blocks) through the journal. This is done for performance reasons. The default can be changed via the mount data option. The option data=journal causes all data blocks to be written through the journal. There are other options as well. See the mount man page for details.

The journal causes data to be written twice. The first time data is written to the disk as quickly as possible. To accomplish this, the journal is stored in one block group and often is the only thing stored in the group. This minimizes disk seek times. Later, after the data has been committed to the journal, the operating system will write the data to the correct location on the disk and then erase the commitment record. This not only improves data integrity but it also improves performance by caching many small writes before writing everything to disk.

The journal is normally stored in inode 8, but it may optionally be stored on an external device. The latter does not seem to be very common. Regardless of where it is stored, the journal contains a special superblock that describes itself. When examining the journal directly it is important to realize that the journal stores information in big endian format.

The journal superblock is summarized in Table 7.25.

Table 7.25. The journal superblock.

Offset	Type	Name
0x0	be32	h_magic	Jbd2 magic number, 0xC03B3998
0x4	be32	h_blocktype	Should be 4, journal superblock v2

0x8	be32	h_sequence	Transaction ID for this block
0xC	be32	s_blocksize	Journal device block size.
0x10	be32	s_maxlen	Total number of blocks in this journal.
0x14	be32	s_first	First block of log information.
0x18	be32	s_sequence	First commit ID expected in log.
0x1C	be32	s_start	Block number of the start of log.
0x20	be32	s_errno	Error value, as set by jbd2_journal_abort().
0x24	be32	s_feature_compat	Compatible features. 0x1 = Journal maintains checksums
0x28	be32	s_feature_incompat	Incompatible feature set.
0x2C	be32	s_feature_ro_compat	Read-only compatible feature set. There aren’t any of these currently.
0x30	u8	s_uuid[16]	128-bit uuid for journal. This is compared against the copy in the ext4 super block at mount time.
0x40	be32	s_nr_users	Number of file systems sharing this journal.
0x44	be32	s_dynsuper	Location of dynamic super block copy.
0x48	be32	s_max_transaction	Limit of journal blocks per transaction.
0x4C	be32	s_max_trans_data	Limit of data blocks per transaction.
0x50	u8	s_checksum_type	Checksum algorithm used for the journal. Probably 1=crc32 or 4=crc32c.
0x51	0xAB	Padding	0xAB bytes of padding
0xFC	be32	s_checksum	Checksum of the entire superblock, with this field set to zero.
0x100	u8	s_users[16*48]	IDs of all file systems sharing the log.

The general format for a transaction in the journal is a descriptor block, followed by one or more data or revocation blocks, and a commit block that completes the transaction. The descriptor block starts with a header (which is the same as the first twelve bytes of the journal superblock) and then has an array of journal block tags that describe the transaction. Data blocks are normally identical to blocks to be written to disk. Revocation blocks contain a list of blocks that were journaled in the past but should no longer be journaled in the future. The most common reason for a revocation is if a metadata block is changed to a regular file data block. The commit block indicates the end of a journal transaction.

I will not provide the internal structures for the journal blocks here for a couple of reasons. First, the journal block structures can differ significantly based on the version of journaling and selected options. The journal is an internal structure that was never really meant to be read by humans. Microsoft has released nothing publicly about their NTFS journaling internals. The only reason we can know about the Linux journaling internals is that it is open source.

Second, there are filesystem utilities in Linux, such as fsck, that can properly read the journal and make any required changes. It is likely a better idea to use the built-in utilities than to try and fix a filesystem by hand. If you do want to delve into the journaling internals, there is no better source than the header and C files themselves. The wiki at kernel.org may also be helpful.

JOURNALING

JOURNALING

results matching ""

No results matching ""