Extents

The system for storing lists of data blocks described in the previous section is fine when most files are small. When file sizes are measured in megabytes, gigabytes, or terabytes, performance quickly becomes an issue, however. To be fair, the Linux extended filesystem has been around for decades and yet is perfectly acceptable as is for many users. The elegant solution to improve performance for large files is the use of extents.

Extents store lists of contiguous blocks in a tree structure. Those familiar with NTFS will know that it uses a list of data runs (contiguous clusters) in a similar manner. As with most things in Linux, storing block lists in a tree is much more efficient than the simple list used in Windows. Three types of structures are required to build an extent tree. Every extent tree node must have a header that indicates what is stored in the node. The top node of a tree is often referred to as the root node. The root node header is always stored at the start of the inode block array. All non-empty extent trees must have at least one leaf node which is called an extent (which is essentially equivalent to the NTFS data run). There may be middle nodes between the root node and leaf nodes which are called indexes. All three types of structures have a length of 12 bytes which allows the root node header plus four entries to be stored in the inode block array.

The extent header is summarized in Table 7.12. The header begins with a magic number which is a double-check that what follows is not a traditional direct data block entry. If the depth is zero, the entries that follow the header in this node are leaf nodes (extents). Otherwise the entries are for index (middle) nodes. If you work out the math, storing the largest file supported by ext4 should never require more than five levels in the tree.

Table 7.12. Extent header format.

Offset	Size	Name	Description
0x0	2	Magic	Magic number 0xF30A
0x2	2	Entries	Entries that follow the header
0x4	2	Max Entries	Maximum number of entries that might follow header
0x6	2	Depth	0=this node points to data blocks 1-5=this node points to other other extents
0x8	4	Generation	Generation of the tree

The extent index node entry is summarized in Table 7.13. The block in the first field is a logical block number (the file is contained in logical block zero through total blocks minus one). For those familiar with NTFS, this is similar to a Virtual Cluster Number (VCN). The only real data in an index node is a 48-bit block address for the node one level lower in the tree.

Table 7.13. Extent index format.

Offset	Size	Name	Description
0x0	4	Block	This node covers block x and following
0x4	4	Leaf lo	Lower 32 bits of block containing the node (leaf or another index) one level lower in tree
0x8	2	Leaf hi	Upper 16 bits of the block described above
0xA	2	Unused	Padding to 12 byte size

The extent (leaf) node entry is summarized in Table 7.14. As with the index node, the first field is a logical block number. This is followed by a length. This length is normally less than 32,768. However, if uninitialized blocks are supported on this filesystem (say with the lazy block group feature), a value greater than 32,768 indicates that the extent consists of (32,768 – length) uninitialized data blocks. This is not common. The extent ends with a 48-bit data block address for the first block in this extent.

Table 7.14. Extent node format.

Offset	Size	Name	Description
0x0	4	Block	First block covered by this extent
0x4	2	Len	If <= 32768 initialized blocks in extent If > 32768 extents is (len-32768) uninit blocks
0x6	2	Start hi	Upper 16 bits of the starting block
0x8	4	Start lo	Lower 32 bits of the starting block

Based on what we have learned about extents, we can update our extfs.py and istat.py files to more accurately report what is stored in the inode block array when extents are being used. I should point out that if there are multiple levels in the extent tree that the entire tree will not be printed out, only the four entries from the inode block array will be included. The reasons for this is that parsing a multi-level tree requires data blocks to be read. This is not as big of an issue as it might seem at first. Four leaf nodes storing 32,768 block extents permit files as large as 32,768 4 4096 or 536,870,912 bytes (512 MB) to be stored entirely within the inode block array. The following code, containing three small classes and a helper function, needs to be added to our extfs.py file.

“”” Class ExtentHeader. Parses the 12-byte extent header that is present in every extent node in a Linux ext4 filesystem inode with the extent flag set. Accepts a 12-byte packed string.

Usage: eh = ExtentHeader(data) eh.prettyPrint()

“”” class ExtentHeader():

def init(self, data): self.magic = getU16(data) self.entries = getU16(data, 0x2) self.max = getU16(data, 0x4) self.depth = getU16(data, 0x6) self.generation = getU32(data, 0x8) def prettyPrint(self):

print(“Extent depth: %s entries: %s max-entries: %s generation: %s” \

% (self.depth, self.entries, self.max, self.generation))

“”” Class ExtentIndex. Represents a middle or index node in an extent tree. Accepts a 12-byte packed string containing the index. Usage: ei = ExtentIndex(data) ei.prettyPrint()

“”” class ExtentIndex():

def init(self, data): self.block = getU32(data) self.leafLo = getU32(data, 0x4) self.leafHi = getU16(data, 0x8) def prettyPrint(self):

print(“Index block: %s leaf: %s” \

% (self.block, self.leafHi * pow(2, 32) + self.leafLo))

“”” Class Extent. Represents a leaf node or extent in an extent tree. Accepts a 12-byte packed string containing the extent. Usage: ext = Extent(data) ext.prettyPrint()

“”” class Extent():

def init(self, data): self.block = getU32(data) self.len = getU16(data, 0x4) self.startHi = getU16(data, 0x6) self.startLo = getU32(data, 0x8) def prettyPrint(self):

print(“Extent block: %s data blocks: %s - %s” \

% (self.block, self.startHi pow(2, 32) + self.startLo, \ self.len + self.startHi pow(2, 32) + self.startLo - 1))

“”” Function getExtentTree(data). This function will get an extent tree from the passed in packed string. Note 1: Only the data passed in to the function is parsed. If the nodes are index nodes the tree is not traversed. Note 2: In the most common case the data passed in will be the 60 bytes from the inode block array. This permits files up to 512 MB to be stored. “”” def getExtentTree(data):

first entry must be a header retVal = [] retVal.append(ExtentHeader(data)) if retVal[0].depth == 0:

leaf node for i in range(0, retVal[0].entries):

retVal.append(Extent(data[(i + 1) * 12 : ])) else:

index nodes for i in range(0, retVal[0].entries):

retVal.append(ExtentIndex(data[(i + 1) * 12 : ]))

return retVal

This code uses no techniques that have not been covered in this book thus far. We must also modify the Inode class in order to handle the extents properly. The updated Inode class is shown in the following code. New code is shown in bold italics.

“””

Class Inode. This class stores extended filesystem inode information in an orderly manner and provides a helper function for pretty printing. The constructor accepts a packed string containing the inode data and inode size which is defaulted to 128 bytes as used by ext2 and ext3. For inodes > 128 bytes the extra fields are decoded.

Usage inode = Inode(dataInPackedString, inodeSize)

inode.prettyPrint()

“”” class Inode():

def init(self, data, inodeSize=128):

store both the raw mode bitvector and interpretation self.mode = getU16(data) self.modeList = getInodeModes(self.mode) self.fileType = getInodeFileType(self.mode) self.ownerID = getU16(data, 0x2) self.fileSize = getU32(data, 0x4) # get times in seconds since epoch

note: these will rollover in 2038 without extra # bits stored in the extra inode fields below self.accessTime = time.gmtime(getU32(data, 0x8)) self.changeTime = time.gmtime(getU32(data, 0xC)) self.modifyTime = time.gmtime(getU32(data, 0x10)) self.deleteTime = time.gmtime(getU32(data, 0x14)) self.groupID = getU16(data, 0x18) self.links = getU16(data, 0x1a) self.blocks = getU32(data, 0x1c) # store both the raw flags bitvector and interpretation self.flags = getU32(data, 0x20) self.flagList = getInodeFlags(self.flags) # high 32-bits of generation for Linux self.osd1 = getU32(data, 0x24) # store the 15 values from the block array

note: these may not be actual blocks if # certain features like extents are enabled self.block = [] self.extents = [] if self.flags & 0x80000:

self.extents = getExtentTree(data[0x28 : ]) else: for i in range(0, 15):

self.block.append(getU32(data, 0x28 + i * 4)) self.generation = getU32(data, 0x64) # the most common extened attributes are ACLs # but other EAs are possible self.extendAttribs = getU32(data, 0x68) self.fileSize += pow(2, 32) getU32(data, 0x6c) # these are technically only correct for Linux ext4 filesystems # should probably verify that that is the case self.blocks += getU16(data, 0x74) pow(2, 32) self.extendAttribs += getU16(data, 0x76) pow(2, 32) self.ownerID += getU16(data, 0x78) pow(2, 32) self.groupID += getU16(data, 0x7a) * pow(2, 32) self.checksum = getU16(data, 0x7c) # ext4 uses 256 byte inodes on the disk

as of July 2015 the logical size is 156 bytes # the first word is the size of the extra info if inodeSize > 128:

self.inodeSize = 128 + getU16(data, 0x80)

this is the upper word of the checksum

if self.inodeSize > 0x82:

self.checksum += getU16(data, 0x82) * pow(2, 16)

these extra bits give nanosecond accuracy of timestamps # the lower 2 bits are used to extend the 32-bit seconds

since the epoch counter to 34 bits # if you are still using this script in 2038 you should # adjust this script accordingly :-) if self.inodeSize > 0x84: self.changeTimeNanosecs = getU32(data, 0x84) >> 2 if self.inodeSize > 0x88: self.modifyTimeNanosecs = getU32(data, 0x88) >> 2 if self.inodeSize > 0x8c: self.accessTimeNanosecs = getU32(data, 0x8c) >> 2 if self.inodeSize > 0x90:

self.createTime = time.gmtime(getU32(data, 0x90)) self.createTimeNanosecs = getU32(data, 0x94) >> 2 else:

self.createTime = time.gmtime(0) def prettyPrint(self):

for k, v in sorted(self.dict.iteritems()) :

if k == ‘extents’ and self.extents: v[0].prettyPrint() # print header for i in range(1, v[0].entries + 1):

v[i].prettyPrint() elif k == ‘changeTime’ or k == ‘modifyTime’ or \ k == ‘accessTime’ \ or k == ‘createTime’:

print k+”:”, time.asctime(v) elif k == ‘deleteTime’:

if calendar.timegm(v) == 0:

print ‘Deleted: no’ else: print k+”:”, time.asctime(v) else:

print k+”:”, v

The results of running istat.py with the updated extfs.py file are shown in Figure 7.22. The highlighted lines shown what has been added.

FIGURE 7.22

Results of running istat.py against an inode associated with a rootkit on the PFE subject system. The highlighted lines show information about the extent used to store this file.

Now that we can read the block information for both traditional blocks and extents, a script to retrieve a file from its inode is easily created. The new script will be named icat.py. The Sleuth Kit provides a similar utility named icat. We will begin by adding two new helper functions to extfs.py. The new code follows.

get a datablock from an image def getDataBlock(imageFilename, offset, blockNo, blockSize=4096): with open(str(imageFilename), ‘rb’) as f:

f.seek(blockSize blockNo + offset 512) data = str(f.read(blockSize)) return data

“”” function getBlockList This function will return a list of data blocks if extents are being used this should be simple assuming there is a single level to the tree. For extents with multiple levels and for indirect blocks additional “disk access” is required. Usage: bl = getBlockList(inode, imageFilename, offset, blockSize) where inode is the inode object, imageFilename is the name of a raw image file, offset is the offset in 512 byte sectors to the start of the filesystem, and blockSize (default 4096) is the size of a data block. “”” def getBlockList(inode, imageFilename, offset, blockSize=4096):

now get the data blocks and output them datablocks = [] if inode.extents:

great we are using extents

extent zero has the header # check for depth of zero which is most common if inode.extents[0].depth == 0:

for i in range(1, inode.extents[0].entries + 1): sb = inode.extents[i].startHi * pow(2, 32) + \ inode.extents[i].startLo eb = sb + inode.extents[i].len # really ends in this minus 1 for j in range(sb, eb):

datablocks.append(j) else: # load this level of the tree currentLevel = inode.extents leafNode = [] while currentLevel[0].depth != 0: # read the current level nextLevel = [] for i in range(1, currentLevel[0].entries + 1):

blockNo = currentLevel[i].leafLo + \

currentLevel[i].leafHi * pow(2, 32) currnode = getExtentTree(getDataBlock(imageFilename, \

offset, blockNo, blockSize)) nextLevel.append(currnode) if currnode[0].depth == 0:

if there are leaves add them to the end leafNode.append(currnode[1: ]) currentLevel = nextLevel

now sort the list by logical block number leafNode.sort(key=lambda x: x.block) for leaf in leafNode: sb = leaf.startHi * pow(2, 32) + leaf.startLo eb = sb + leaf.len for j in range(sb, eb):

datablocks.append(j) else:

we have the old school blocks blocks = inode.fileSize / blockSize # get the direct blocks for i in range(0, 12): datablocks.append(inode.block[i]) if i >= blocks:

break

now do indirect blocks if blocks > 12:

iddata = getDataBlock(imageFilename, offset, \ inode.block[12], blockSize) for i in range(0, blockSize / 4): idblock = getU32(iddata, i * 4) if idblock == 0:

break else:

datablocks.append(idblock) # now double indirect blocks if blocks > (12 + blockSize / 4):

diddata = getDataBlock(imageFilename, offset, \ inode.block[13], blockSize) for i in range(0, blockSize / 4):

didblock = getU32(diddata, i * 4) if didblock == 0:

break else:

iddata = getDataBlock(imageFilename, offset, \ didblock, blockSize)

for j in range(0, blockSize / 4): idblock = getU32(iddata, j * 4) if idblock == 0:

break

else:

datablocks.append(idblock)

now triple indirect blocks if blocks > (12 + blockSize / 4 + blockSize * blockSize / 16):

tiddata = getDataBlock(imageFilename, offset, \ inode.block[14], blockSize) for i in range(0, blockSize / 4): tidblock = getU32(tiddata, i * 4) if tidblock == 0:

break else:

diddata = getDataBlock(imageFilename, offset, \ tidblock, blockSize) for j in range(0, blockSize / 4): didblock = getU32(diddata, j * 4) if didblock == 0:

break else:

iddata = getDataBlock(imageFilename, offset, \ didblock, blockSize) for k in range(0, blockSize / 4): idblock = getU32(iddata, k * 4) if idblock == 0:

break

else:

datablocks.append(idblock)

return datablocks

The first helper function, getDataBlock, simply seeks to the correct place in the file image and then reads a block of data. The second function, getBlockList, is a bit more involved. It begins with a check to see if extents are in use. If extents are being used, most files have nothing but leaf nodes in the four entries from the inode block array. We do a quick check to see if the tree depth is zero and, if this is the case, simply read the entries from the inode block array. We do this not just to simplify the code for the most common case, but also because no further disk access is required to generate a block list.

If we have a multi-level tree, we save the current level of inodes and create an empty list of leaf nodes. We then begin a while loop on the line while currentLevel[0].depth != 0:. This loop will execute until the lowest level (leaf nodes) of the tree has been found. Any leaf nodes encountered while walking through the tree are appended to the leafNode list.

After exiting the while loop the leafNode list is sorted by logical block number on the line leafNode.sort(key=lambda x: x.block). Python has the ability to sort a list in place. In order to sort the list we require a sorting function that is passed into the sort method as the parameter named key. This is where the lambda comes in. In Python a lambda is an anonymous function. The construct key=lambda x: x.block is essentially the same as saying key=f(x) where f(x) is defined as follows:

def f(x):

return x.block

You can easily see why you wouldn’t want to define a named function like this every time you wanted to perform a sort or other operation requiring a single use function. The lambda keyword makes your Python code much more compact and easier to understand once you know how it works.

The code to handle old school blocks is straightforward, but somewhat cumbersome, thanks to the nested loops to handle the indirect blocks. First we calculate the number of blocks required. Then we read in the direct blocks. If the file has any singly indirect blocks we use getDataBlock to read the block and then iterate over the list of up to 1024 blocks. We keep going until we hit the end of the list or an address of zero (which is invalid). If the address is zero, the break command is executed. This command will exit the current loop. If the current loop is nested inside another loop, only the innermost loop is exited. The doubly and triply indirect block handling code is similar, but with extra levels of nested loops.

The icat.py script follows. It is similar to the istat.py file with the biggest difference being a call to getBlockList followed by a loop that prints (writes) every block to standard out.

!/usr/bin/python

icat.py

This is a simple Python script that will # print out file for in an inode from an ext2/3/4 filesystem inside # of an image file.

Developed for PentesterAcademy # by Dr. Phil Polstra (@ppolstra) import extfs import sys import os.path

import subprocess import struct import time from math import log def usage():

print(“usage “ + sys.argv[0] + \

“ <offset> \n”\

“Displays file for an inode from an image file”) exit(1) def main():

if len(sys.argv) < 3: usage()

read first sector if not os.path.isfile(sys.argv[1]):

print(“File “ + sys.argv[1] + “ cannot be openned for reading”) exit(1) emd = extfs.ExtMetadata(sys.argv[1], sys.argv[2]) # get inode location

inodeLoc = extfs.getInodeLoc(sys.argv[3], \ emd.superblock.inodesPerGroup) offset = emd.bgdList[inodeLoc[0]].inodeTable emd.superblock.blockSize + \ inodeLoc[1] emd.superblock.inodeSize with open(str(sys.argv[1]), ‘rb’) as f:

f.seek(offset + int(sys.argv[2]) * 512) data = str(f.read(emd.superblock.inodeSize)) inode = extfs.Inode(data, emd.superblock.inodeSize) datablock = extfs.getBlockList(inode, sys.argv[1], sys.argv[2], \ emd.superblock.blockSize) for db in datablock:

sys.stdout.write(extfs.getDataBlock(sys.argv[1], long(sys.argv[2]), \

db, emd.superblock.blockSize))

if name == “main”:

main()

Partial results from running icat.py against an inode associated with a rootkit are shown in Figure 7.23. The output from the script has been piped to xxd in order to properly display the hex values inside this program. The screenshot shows several embedded strings which contain error messages and even the reverse shell password of “sw0rdm4n”.

FIGURE 7.23

Partial results from icat.py against an inode associated with a rootkit on the PFE subject system. The output has been piped to xxd. Note that several error messages and the remote login password are visible in this screenshot.

Extents

Extents

first entry must be a header retVal = [] retVal.append(ExtentHeader(data)) if retVal[0].depth == 0:

leaf node for i in range(0, retVal[0].entries):

index nodes for i in range(0, retVal[0].entries):

store both the raw mode bitvector and interpretation self.mode = getU16(data) self.modeList = getInodeModes(self.mode) self.fileType = getInodeFileType(self.mode) self.ownerID = getU16(data, 0x2) self.fileSize = getU32(data, 0x4) # get times in seconds since epoch

note: these may not be actual blocks if # certain features like extents are enabled self.block = [] self.extents = [] if self.flags & 0x80000:

as of July 2015 the logical size is 156 bytes # the first word is the size of the extra info if inodeSize > 128:

this is the upper word of the checksum

these extra bits give nanosecond accuracy of timestamps # the lower 2 bits are used to extend the 32-bit seconds

FIGURE 7.22

get a datablock from an image def getDataBlock(imageFilename, offset, blockNo, blockSize=4096): with open(str(imageFilename), ‘rb’) as f:

now get the data blocks and output them datablocks = [] if inode.extents:

great we are using extents

extent zero has the header # check for depth of zero which is most common if inode.extents[0].depth == 0:

if there are leaves add them to the end leafNode.append(currnode[1: ]) currentLevel = nextLevel

now sort the list by logical block number leafNode.sort(key=lambda x: x.block) for leaf in leafNode: sb = leaf.startHi * pow(2, 32) + leaf.startLo eb = sb + leaf.len for j in range(sb, eb):

we have the old school blocks blocks = inode.fileSize / blockSize # get the direct blocks for i in range(0, 12): datablocks.append(inode.block[i]) if i >= blocks:

now do indirect blocks if blocks > 12:

now triple indirect blocks if blocks > (12 + blockSize / 4 + blockSize * blockSize / 16):

!/usr/bin/python

icat.py

This is a simple Python script that will # print out file for in an inode from an ext2/3/4 filesystem inside # of an image file.

Developed for PentesterAcademy # by Dr. Phil Polstra (@ppolstra) import extfs import sys import os.path

read first sector if not os.path.isfile(sys.argv[1]):

FIGURE 7.23

results matching ""

No results matching ""