IMPORTING DATA INTO MySQL
Importing our data into a spreadsheet is an easy process. It does suffer when it comes to performance if the subject filesystem is large, however. There is another limitation of this method as well. It is not easy to make a true timeline where modification, access, and creation times are all presented on a single timeline. You could create a body file for use with Autopsy, but I have found that the performance is still lacking and this is not nearly as flexible as having everything in a proper database.
If you do not already have MySQL installed on your forensics workstation, it is quite simple to add. For Debian and Ubuntu based systems sudo apt-get install mysql-server should be all you need. Once MySQL has been installed you will want to create a database. To keep things clean, I recommend you create a new database for each case. The command to create a new database is simply mysqladmin -u <user>
-p create
Once a database has been created it is time to add some tables. The easiest way to do this is to start the MySQL client using mysql -u <user> -p, i.e. mysql -u root -p. You are not yet connected to your database. To remedy that situation issue the command connect <database> in the MySQL client shell. For example, in my case I would type connect case-pfe1. Logging in to MySQL and connecting to a database is shown in Figure 6.5.
FIGURE 6.5
Logging in with the MySQL client and connecting to a database.
The following SQL code will create a database table that can be used to import the semicolon-separated values in the file generated by our shell script. This script may be saved to a file and executed in the MySQL client. It is also just as easy to cut and past it into MySQL.
create table files (
AccessDate date not null,
AccessTime time not null,
ModifyDate date not null,
ModifyTime time not null,
CreateDate date not null,
CreateTime time not null,
Permissions smallint not null,
UserId smallint not null,
GroupId smallint not null,
FileSize bigint not null, Filename varchar(2048) not null, recno bigint not null auto_increment, primary key(recno)
);
We can see that this is a fairly simple table. All of the columns are declared ‘not null’ meaning that they cannot be empty. For readers not familiar with MySQL the last two lines might require some explanation. The first creates a column, recno, which is a long integer and sets it to automatically increment an internal counter every time a new row is inserted. On the next line recno is set as the primary key. The primary key is used for sorting and quickly retrieving information in the table.
Creating this table is shown in Figure 6.6. Notice that MySQL reports 0 rows affected which is correct. Adding a table does not create any rows (records) in it.
FIGURE 6.6
Create a Table to store file metadata in MySQL.
Now that there is a place for data to go, the information from our shell script can be imported. MySQL has a load data infile command that can be used for this purpose. There is a small complication that must be worked out before running this command. The date strings in the file must be converted to proper MySQL date objects before insertion in the database. This is what is happening in the set clause of the following script. There is also a line that reads ignore 1 rows which tells MySQL to ignore the headers at the top of our file that exist to make a spreadsheet import easier.
load data infile ‘/tmp/case-pfe1.csv’ into table files fields terminated by ‘;’ enclosed by ‘”’ lines terminated by ‘\n’ ignore 1 rows (@AccessDate, AccessTime, @ModifyDate, ModifyTime, @CreateDate, \
CreateTime, Permissions, UserId, GroupId, FileSize, Filename) set AccessDate=str_to_date(@AccessDate, “%m/%d/%Y”), ModifyDate=str_to_date(@ModifyDate, “%m/%d/%Y”),
CreateDate=str_to_date(@CreateDate, “%m/%d/%Y”);
The file to be imported must be in an approved directory or MySQL will ignore it. This is a security measure. This would prevent, among other things, an attacker who exploits a SQL vulnerability on a website from uploading a file to be executed, assuming that MySQL won’t accept files from any directory accessible by the webserver. You could change the list of directories in the MySQL files, but it is probably simpler to just copy your file to /tmp for the import as I have done.
Loading file metadata from the PFE subject system is shown in Figure 6.7. Notice that my laptop was able to insert 184,601 rows in only 5.29 seconds. The warnings concern the date imports. As we shall see, all of the dates were properly imported.
FIGURE 6.7
Loading file metadata into MySQL.
Once the data is imported you are free to query your one table database to your heart’s content. For example, to get these 184,601 files sorted by access time in descending order (so the latest activity is on the top) simply run the query select * from files order by accessdate desc, accesstime desc;. The results of running this query are shown in Figure 6.8. Note that retrieving 184,601 sorted rows required a mere 0.71 seconds on my laptop. If you prefer an ascending sort just omit ‘desc’ in the SQL query above.
FIGURE 6.8
Using MySQL to sort file metadata by access time.
Astute readers may have noticed that our script imported user and group IDs, not names. This was intentional. Why did we do it this way? If you stop to think about it, it will occur to you that when you list files on your Linux machine using ls, the ls program is using the password file to translated user and group IDs stored in the inodes (more about these later in the book) to names. You have mounted a filesystem from another computer on your forensics workstation and ls will use your /etc/passwd and /etc/group files to translate IDs to names, not the correct files from the subject system. This issue is shown in Figure 6.9 where all of the john user’s files report that they belong to the phil user because the user ID for john on the subject system is the same as the user ID for phil on my forensics workstation. In cases where the user ID is not found on the forensics workstation the raw user ID is displayed rather than the incorrect name.
FIGURE 6.9
User names incorrectly displayed for a mounted subject filesystem.
We can easily display the correct user and group name in queries of our database if we create two new tables and import the /etc/passwd and /etc/group files from the subject system. This is straightforward thanks to the fact that these files are already colon delimited. Importing this information is as simple as copying the subject’s passwd and group files to /tmp (or some other directory MySQL has been configured to use for imports), and then running the following SQL script.
create table users ( username varchar(255) not null, passwordHash varchar(255) not null, uid int not null, gid int not null,
userInfo varchar(255) not null, homeDir varchar(255) not null, shell varchar(2048) not null, primary key (username) ); load data infile ‘/tmp/passwd’ into table users fields terminated by ‘:’ enclosed by ‘”’ lines terminated by ‘\n’; create table groups (
groupname varchar(255) not null, passwordHash varchar(255) not null, gid int not null, userlist varchar(2048) ); load data infile ‘/tmp/group’ into table groups fields terminated by ‘:’ enclosed by ‘”’ lines terminated by ‘\n’;
This code is a bit simpler than the import for our metadata file. The primary reason for this is that there are no dates or other complex objects to convert. You will note that I have used the username and not the user ID as the primary key for the users table. The reason for this is that if an attacker has added an account with a duplicate ID, the import would fail as primary keys must be unique. It is not unusual for an attacker to create a new account that shares an ID, especially ID 0 for the root user. Executing the script above to load these two tables is shown in Figure 6.10.
FIGURE 6.10
Importing user and group information into MySQL.
Now that the user information has been imported I can perform some simple queries. It might be useful to see what shells are being used for each user. An attacker might change the shell of system accounts to allow login. Such accounts normally have a login shell of /usr/sbin/nologin or /bin/false. The results of executing the query select * from users order by uid; are shown in Figure 6.11. The results show that an attacker has created a bogus johnn account.
FIGURE 6.11
Selecting users from the database. Note the bogus johnn account that has been created by an attacker.
If an attacker has reused a user ID, that is easily detected. In addition to looking at the results from the previous query in order to see everything in the users table sorted by user ID, another query will instantly let you know if duplicate IDs exist. The query select distinct uid from users; should return the same number of rows as the previous query (38 in the case of the subject system). If it returns anything less, then duplicates exist.
Is there any value in viewing the group file information? Yes. The group file contains a list of users who belong to each group. If an attacker has added himself/herself to a group, it will show up here. New users in the sudo, adm, or wheel groups that are used to determine who gets root privileges (the exact group and mechanism varies from one Linux distribution to the next) are particularly interesting. Even knowing which legitimate users are in these groups can be helpful if you think an attacker has gained root access. The results of running the query select * from groups order by groupname; are shown in Figure 6.12. It would appear from this information that the john account has administrative privileges. The query select distinct gid from groups; should return the same number of rows if there are no duplicate group numbers.
FIGURE 6.12
Examining group file information.
Let’s return to the files table. After all, we said our motivation for importing users and groups was to display correct information. In order to display usernames in our queries we must do a database join. If you are not a SQL expert, fear not, the kind of join needed here is simple. We need only select from more than one table and give the condition that joins (associates) rows in the various tables.
If we wish to add usernames to the previous query of the files table something like the following will work: select accessdate, accesstime, filename, permissions, username from files, users where files.userid=users.uid order by accessdate desc, accesstime desc. What have we changed? We have gone from the catchall select * to an explicit list of columns. A second table has been added to the from clause. Finally, we have a join clause, where files.userid=users.uid, that determines which row in the users table is used to retrieve the username. If any of the column names in the list exist in both tables you must prefix the column name with <table>. to tell MySQL which table to use.
The results of running this query are shown in Figure 6.13.
FIGURE 6.13
Results of running query on files Table with usernames from users table.
Notice that Figure 6.13 shows a text file in gedit. Here we have used a useful feature of the MySQL client, the tee <logfile> command. This command is similar to the shell command with the same name in that it causes output to go both to the screen and also to a specified file. This allows all query output to be captured. This can be a useful thing to store in your case directory. When you no longer want to capture output the notee command will close the file and stop sending information. You might wish to tee everything to one big log file for all your queries or store queries in their own files, your choice. MySQL has shortcuts for many commands including \T and \t for tee and notee, respectively.
You may have noticed that I primarily like to use command line tools. I realize that not everyone shares my passion for command line programs. There is absolutely nothing stopping you from using the powerful MySQL techniques described in this book within PhpMyAdmin, MySQL Workbench, or any other Graphical User Interface (GUI) tool.
Could you still do lots of forensics without using a database? Yes, you certainly could and people do. However, if you look at available tools such as Autopsy you will notice that they are relatively slow when you put them up against querying a proper database. There is another reason I prefer to import data into a database. Doing so is infinitely more flexible. See the sidebar for a perfect example.