Hello
world, it’s been a while since my last series of blog posts! But now I am ready
to share with you the results of my recent research.
I face many
different challenges in my daily work as a digital forensics analyst, who deals
mainly with mobile devices. All modern smartphones are encrypted (usually with
file-based encryption (FBE)), so obtaining or cracking the passcode is required to gain
access to all the data stored on the device. And even if we know the passcode
(or the user has not set the passcode, which is increasingly rare these days),
we still need an exploit to gain “root” access to the device to read and copy
all the data and get our “best acquisition”, usually a full file system (FFS).
And then
what?
Then you have an enormous number of bytes stored in hundreds of thousands
of files in which to search for relevant information for the case.
In simple
terms, you have a box and you need to find a small piece of information in that
box. In the box there is some information that is completely useless for your
investigation: for example, think about the operating system data or the known
files installed by (native or third-party) applications. The other big
challenge is that every single application, as well as every single setting,
log, or database at the operating system level, can store its data in different
formats. Sometimes it is an XML-style plist file or an SQLite database with
only one table, but in other cases, it may be an SEGB file embedded in a binary
plist file and stored as BLOB in an SQLite database with more than 200 tables.
There is
only one way to do this on a large scale: using tools to process the
extraction, find relevant files, process them with an appropriate parser, and
then analyze the results. Once the data has been processed by the tools, we can
begin the investigation. We usually approach the analysis by selecting one or
more pivot points, such as a timestamp, a location, a phone number, an email
address, or a specific action performed by the user.
Then it is
important to have a whole arsenal of different tools and, in most cases, to
perform data processing with all of them. It’s hard for tool developers and
analysts to constantly keep up with the latest and greatest features introduced
by a new device, a new version of OS, or the latest update to an app. A simple
change in a database schema can bring a tool to a complete halt and cause our
investigation to be incomplete and perhaps even lead to incorrect conclusions.
Sometimes
one tool will analyze a particular app or log file while another tool does not
and the opposite is true. I have had cases where a new artifact was discovered
months later, and I started thinking: Ouch, that might have been critical in
this case!
We walk on
thin ice every day…
Over the
past few years, I have been involved in several projects and research efforts
to standardize digital evidence to facilitate validation and comparison of
parsing capabilities. I have always looked at the technical validation reportsfrom NIST, and they remain one of the best methods we can use to validate
tools. But in real cases, the differences are in the nuances, and we need to
know exactly what artifact might be helpful to us and what tool can easily
analyze that specific artifact for you.
To improve
our internal investigation and analysis capabilities, about six months ago I started
creating a huge and complex Excel file to track and compare the features and
parsing capabilities of the different tools we use on a day-to-day basis. Based
on this, I decided to take it a step further and create a comparison matrix for
tools based on a known image. By using a known image, it is easier to compare
and validate the results generated by different tools with the known
information. A simple example: If I know that a particular SMS message was sent
from a device to a recipient and with text at a particular time, I can verify
that a tool can find and process the information correctly.
As a known
image, I chose Josh Hickman’s public image of iOS 15 available on DigitalCorpora. Thank you, Josh, for your hard work: the amount of information you
were able to generate on this iPhone is incredible!
I created a
list of 5 commercial tools (in alphabetical order: Belkasoft X, Cellebrite PA,
Magnet AXIOM, MSAB XAMN, and Oxygen Forensic Detective), 1 freeware tool
(ArtEx), and 2 open source tools (Apollo and iLEAPP).
Based on
Josh’s documentation, I created a huge Excel file where I entered line by line
every action for every application mentioned in the document (application
installation, application usage, messages sent and received, websites visited,
etc.).
Then I
looked at the results from all the tools and reviewed the following:
- What information
was analyzed by each tool? - Whether the
information was analyzed as expected or whether a tool made an
incorrect interpretation - What
specific details are available for each app and/or artifact? (e.g., for a
particular messaging app, which tool can analyze app contacts or app calls)
The
comparison really goes in-depth, not only on data generated by the user (like
messages) but also on OS logs and databases that are useful to track the user’s “pattern of life”.
Over the
next few weeks, I will publish a series of articles on this comparison.
The
work will be organized in this way:
- Experiment setup: acquisition description and processing procedure
- General
device information: hardware and software information, accounts, and installed
applications - Stock
applications: Address book, call log, SMS/iMessage, calendar, notes and so on - Third-party
apps: these are organized into categories and various sections based on
application types such as messaging, social networking, browser, cloud storage,
etc - Operating
system logs and databases: these are also organized into categories and
different posts based on the specific information source (e.g. KnowledgeC,
InteractionC. PowerLog, Biome, etc.) or the type of information (i.e.,
location, app usage, device interaction, etc..)
Before I
begin, I want to clarify the goals: to support the community and help tool
developers improve their parsing capabilities. This research is in no way
intended to be a competition for the “best tool”; rather, I want to
provide a clear, verifiable, and objective view of the pros and cons of each
tool.
I can
already anticipate one of my conclusions. You will see that each tool has a
specific parser that can be helpful for a particular case. So we need the
support of tool developers to make this comparison easier, faster, and more
accurate than manual validation.
See you
soon with some real data!