The file command

Unlike Windows, which stupidly uses file extensions to determine file type, Linux is smart enough to determine what a file is by looking at its file signature. The Linux file command is used to display the file type to the user. The file command goes way beyond providing information on a file’s extension.

The results of running file on some of the files associated with the Xing Yi Quan rootkit from the PAS subject system are shown in Figure 10.1. Notice that file is able to distinguish between the install, README, and xingyi_addr.c files, which are an executable Perl script in ASCII text, a plain ASCII text file, and a C source file in ASCII text, respectively. Compare this to Windows, which cannot distinguish install from README, because there is no file extension (which normally indicates a directory for Windows).

FIGURE 10.1

Results of running the file command on rootkit files saved from the PAS subject system.

The information provided for binary files is much more specific. Here we see that the binaries are 64-bit Executable and Linkable Format (ELF) files for little endian (LSB) systems based on the System V branch of UNIX (Linux is such a system). They are also dynamically linked, meaning they use shared libraries, as opposed to statically linking all the code they need in the executable (which would make it huge). The file is compatible with kernel versions 2.6.24 and higher. The files have not been stripped of the debugging symbol information. A Secure Hash Algorithm (SHA) hash for each binary build identifier is also given.

As you can see, this one command provides quite a bit of information. If file doesn’t identify the suspicious file as an executable binary or a script, it is probably some sort of data file, or a component that was used to build some malware. For files that are not executable, there are ways of telling if executables use a file. These methods will be discussed later in this chapter (although there are also ways of obscuring this from wouldbe reverse engineers).

Is it a known-bad file?

A number of organizations maintain databases of known malware MD5 and SHA hashes. Naturally, most of these hashes pertain to Windows, the king of malware, but many databases list Linux malware as well. Some of these databases must be downloaded and others must be accessed online. One of the online databases that is accessible via multiple services is the Malware Hash Registry (MHR) maintained by Team Cymru (http://teamcymru.org/MHR.html ).

One of the nice things about MHR is that it uses both MD5 and SHA hashes. The SHA hash is easily calculated using sha1sum <filename>. Perhaps the easiest way to use the MHR is via the whois command. The whois service is normally used to lookup information on a web domain. The syntax for using this service to check a binary is whois -h hash.cymru.com . If the file is known, the

UNIX timestamp for the last seen time, along with the anti-virus detection percentage is returned. The results of running this command against one of the Xing Yi Quan files are shown in Figure 10.2.

FIGURE 10.2

Checking a binary hash against the Malware Hash Registry.

Why didn’t this return a hit? Recall that hash functions are designed such that changing a single bit radically changes the hash value. Many pieces of Linux malware are built on the victim machine, which makes it easy to hard code values such as passwords and addresses, while simultaneously changing any hash values.

The National Institute of Standards and Technology (NIST) maintains a large database of hashes known as the National Software Reference Library (NSRL). At present there are over 40 million MD5 hashes in this database. Updates to NSRL are released four times a year. In order to avoid the need to download this massive 6GB database and get more frequent updates, a query server has been set up.

Using the NSRL query server requires the installation of a program known as nsrllookup. This program can be obtained from the following URL on github https://github.com/rjhansen/nsrllookup/archive/v1.2.3.tar.gz . This program is very easy to use. Simply pipe an MD5 hash to it like so, md5sum | nsrllookup. If the hash is unknown, it will be echoed back to you. If you prefer to see only known hashes, add the -k flag. Running nsrllookup against a set of files is shown in Figure 10.3.

FIGURE 10.3

Running nsrllookup against a list of files. None of the files in this directory are in the NSRL, as can be seen by running nsrllookup with the -k switch.

The NSRL contains known files, both good and bad. If you do get a hit, you will need to get the details from the NSRL website to decide if a file is malicious or benign. These queries can be performed at http://www.hashsets.com/nsrl/search . An example knowngood Linux binary is shown in Figure 10.4.

FIGURE 10.4

An entry in the NSRL Reference Data Set (RDS) for a known Linux binary.

The file command

The file command

FIGURE 10.1

FIGURE 10.2

FIGURE 10.3

results matching ""

No results matching ""