Revision as of 12:32, 20 November 2017

This page outlines version 7.2 of the TSK database schema. It is used in Autopsy 4.6.0 and beyond. The database is made by using the tsk_loaddb command line tool or the equivalent library-level methods.

The major addition in this release was the addition of communications and account related tables.

Some general notes on this schema:

Every type of data is assigned a unique ID, called the Object ID
Data sources are grouped by devices (to allow a computer or phone to have multiple drives in it)
Data in a disk image has a hierarchy. Images are the root, with volume or file systems below it, followed by volumes and files.
The tsk_objects table is used to keep track of what object IDs have been used and to map the parent and child relationship.
This schema has been designed to store more than what TSK initially imports. It has been designed to support carved files and a folder full of local files, etc.
This schema supports the blackboard so that modules in Autopsy can communicate and save results.
Virtual files are made of unallocated space with the naming format of:
- Unalloc_[PARENT-OBJECT-ID]_[BYTE-START]_[BYTE-END]

NOTE: This maybe a bit out of date. The code is the best reference. See the initialize() method in db_sqlite.cpp.

General Information Tables

tsk_db_info

Metadata about the database.

schema_ver - Version of the database schema used to create database
tsk_ver - Version of TSK used to create database
"schema_minor_Ver" - Minor release version of the schema to allow for backward compatible changes.

Object Tables

tsk_objects

Every object (image, volume system, file, etc.) has an entry in this table. This table allows you to find the parent of a given object.

obj_id - Unique id
par_obj_id - The object id of the parent object (null for root objects). The parent of a volume system is an image, the parent of a directory is a directory or filesystem, the parent of a filesystem is a volume or an image, etc.
type - Object type (as TSK_DB_OBJECT_TYPE_ENUM enum).

Data Source/Device Tables

data_source_info

Contains information about a data source, which could be an image. This is where we group data sources into devices (based on device ID)

obj_id - Id of image/data source in tsk_objects
device_id - Unique ID (GUID) for the device that contains the data source.
time_zone - Timezone that the data source was originally located in.

tsk_image_info

Contains information about each set of images that is stored in the database.

obj_id - Id of image in tsk_objects
type - Type of disk image format (as TSK_IMG_TYPE_ENUM)
ssize - Sector size of device in bytes
tzone - Timezone where image is from (the same format that TSK tools want as input)
size - Size of the original image (in bytes)
md5 - Hash of the image. Currently, this is populated only if the input image is E01.
display_name - display name of the image.

tsk_image_names

Stores path(s) to file(s) on disk that make up an image set.

obj_id - Id of image in tsk_objects
name - Path to location of image file on disk
sequence - Position in sequence of image parts

Volume System Tables

tsk_vs_info

Contains one row for every volume system found in the images.

obj_id - Id of volume system in tsk_objects
vs_type - Type of volume system / media management (as TSK_VS_TYPE_ENUM)
img_offset - Byte offset where VS starts in disk image
block_size - Size of blocks in bytes

tsk_vs_parts

Contains one row for every volume / partition in the images.

obj_id - Id of volume in tsk_objects
addr - Address of this partition
start - Sector offset of start of partition
length - Number of sectors in partition
desc - Description of partition (volume system type-specific)
flags - Flags for partition (as TSK_VS_PART_FLAG_ENUM)

File System Tables

tsk_fs_info

Contains one for for every file system in the images.

obj_id - Id of filesystem in tsk_objects
img_offset - Byte offset that filesystem starts at
fs_type - Type of file system (as TSK_FS_TYPE_ENUM)
block_size - Size of each block (in bytes)
block_count - Number of blocks in filesystem
root_inum - Metadata address of root directory
first_inum - First valid metadata address
last_inum - Last valid metadata address
display_name - Display name of file system (could be volume label) (New in V3)

tsk_files

Contains one for for every file found in the images. Has the basic metadata for the file.

obj_id - Id of file in tsk_objects
fs_obj_id - Id of filesystem in tsk_objects (NULL if file is not located in a file system -- carved in unpartitioned space, etc.)
type - Type of file: filesystem, carved, etc. (as TSK_DB_FILES_TYPE_ENUM enum)
attr_type - Type of attribute (as TSK_FS_ATTR_TYPE_ENUM)
attr_id - Id of attribute
name - Name of attribute. Will be NULL if attribute doesn't have a name. Must not have any slashes in it.
meta_addr - Address of the metadata structure that the name points to.
meta_seq - Sequence of the metadata address - New in V3
has_layout - True if file has an entry in tsk_file_layout
has_path - True if file has an entry in tsk_files_path
dir_type - File type information: directory, file, etc. (as TSK_FS_NAME_TYPE_ENUM)
meta_type - File type (as TSK_FS_META_TYPE_ENUM)
dir_flags - Flags that describe allocation status etc. (as TSK_FS_NAME_FLAG_ENUM)
meta_flags - Flags for this file for its allocation status etc. (as TSK_FS_META_FLAG_ENUM)
size - File size in bytes
ctime - Last file / metadata status change time (stored in number of seconds since Jan 1, 1970 UTC)
crtime - Created time
atime - Last file content accessed time
mtime - Last file content modification time
mode - Unix-style permissions (as TSK_FS_META_MODE_ENUM)
uid - Owner id
gid - Group id
md5 - MD5 hash of file contents
known - Known status of file (as TSK_DB_FILES_KNOWN_ENUM)
parent_path - full path of parent folder. Must begin and end with a '/' (Note that a single '/' is valid).
mime_type - MIME type of the file content, if it has been detected.

tsk_file_layout

Stores the layout of a file within the image. A file will have one or more rows in this table depending on how fragmented it was. All file types use this table (file system, carved, unallocated blocks, etc.).

obj_id - Id of file in tsk_objects
sequence - Position of this run in the file (0-based and the obj_id and sequence pair will be unique in the table)
byte_start - Byte offset of fragment relative to the start of the image file
byte_len - Length of fragment in bytes

tsk_files_path

If a "locally-stored" file has been imported into the database for analysis, then this table stores its path. Used for derived files and other files that are not directly in the image file.

obj_id - Id of file in tsk_objects
path - Path to where the file is locally stored in a file system.
encoding_type - Method used to store the file on the disk.

file_encoding_types

Methods that can be used to store files on local disks to prevent them from being quarantined by antivirus

encoding_type - ID of method used to store data. See EncodingType enum.
name - Display name of technique.

tsk_files_derived_method

Derived files are those that result from analyzing another file. For example, files that are extracted from a ZIP file will be considered derived. This table keeps track of the derivation techniques that were used to make the derived files.

derived_id - Unique id for this derivation method.
tool_name - Name of derivation method/tool
tool_version - Version of tool used in derivation method
other - Other details

tsk_files_derived

Each derived file has a row that captures the information needed to re-derive it

obj_id - Id of file in tsk_objects
derived_id - Id of derivation method in tsk_files_derived_method
rederive - Details needed to re-derive file (will be specific to the derivation method)

Blackboard Tables

The blackboard is used to store results from analysis modules.

blackboard_artifacts

Stores artifacts associated with objects.

artifact_id - Id of the artifact (assigned by the database)
obj_id - Id of the associated object
artifact_type_id - Id for the type of artifact (can be looked up in the blackboard_artifact_types table)

blackboard_attributes

Stores name value pairs associated with an artifact. Only one of the value columns should be populated

artifact_id - Id of the associated artifact.
source - Source string, should be module name that created the entry.
context - Additional context string
attribute_type_id - Id for the type of attribute (can be looked up in the blackboard_attribute_types)
value_type - The type of value (0 for string, 1 for int, 2 for long, 3 for double, 4 for byte array)
value_byte - A blob of binary data (should be empty unless the value type is byte)
value_text - A string of text (should be empty unless the value type is string)
value_int32 - An integer (should be 0 unless the value type is int)
value_int64 - A long integer (should be 0 unless the value type is long)
value_double - A double (should be 0.0 unless the value type is double)

blackboard_artifact_types

Types of artifacts

artifact_type_id - Id for the type (this is used by the blackboard_artifacts table)
type_name - A string identifier for the type (unique)
display_name - A display name for the type (not unique, should be human readable)

Types of attribute

blackboard_attribute_types

attribute_type_id - Id for the type (this is used by the blackboard_attributes table)
type_name - A string identifier for the type (unique)
display_name - A display name for the type (not unique, should be human readable)

Communications / Accounts

These tables keep track of which accounts were found and who communicated with who.

Currently in DRAFT form (Nov 20, 2017)

accounts

A row is created in this table for each unique account in the _case_. Each account has a type and identifier (as assigned by the account provider). For example, EMAIL is an account type and the unique identifier is jdoe@gmail.com.

account_id - Assigned by the database
account_type_id - Type of account
account_unique_identifier - Unique identifier assigned by the provider.
UNIQUE: account_typeId, account_unique_identifier

account_types

A row is created in this table for each type of Account. Module writers in the future will be able to make their own types (no create API exists as of Nov 2017). Some types are predefined: http://sleuthkit.org/sleuthkit/docs/jni-docs/4.4.1/enumorg_1_1sleuthkit_1_1datamodel_1_1_account_1_1_type.html. (Need to update link to future version)

account_type_id - Assigned by the database
type_name - Short, enum name for the type
display_name - Display name (spaces, etc.)
UNIQUE: type_name

account_to_instances_map

A row is created each time an account is referenced in an artifact (or a file if no artifact was created).

account_id - Foreign key reference to the accounts table.
account_instance_id- Artifact ID associated with the instance.
UNIQUE: account_id, account_instance_id

relationships

A row is created each time there is a relationship between two accounts. For emails with several recipients, a relationship is added between all recipients and sender. For example, a message from A to B and C would have A<->B, A<->B, B<->C. This is a non-directional relationship. NOTE: Even though this relationship is between Accounts, it is assumed that an "Account Instance" (i.e. a blackboard artifact) is created for each account in the relationship. The relationship source file and the account source file should be the same. The TSK API enforces this. for the instance that was mentioned in this relationship and therefore there should be an entry for (account1_id,relationship_source_obj_id) and (account2_id,relationship_source_obj_id).

relationship_id - Assigned by database
account1_id - Foreign key to accounts table of one account in relationship
account2_id -Foreign key to accounts table of other account in relationship
relationship_source_obj_id - Object ID of the artifact (or file if no artifact was created) where relationship was found.
date_time - Date associated with relationship (Unix epoch).
relationship_type - Currently an artifact type.
data_source_obj_id - Data source that relationship was found in. This is denormalized storage to make filtering queries faster and simpler.
UNIQUE: account1_id, account2_id, relationship_source_obj_id

Ingest Module Status

These tables keep track in Autopsy which modules were run on the data sources.

ingest_module_types table

Defines the types of ingest modules supported.

type_id INTEGER PRIMARY KEY
type_name TEXT NOT NULL)",

ingest_modules

Defines which modules were installed. One row for each module.

ingest_module_id INTEGER PRIMARY KEY
display_name TEXT NOT NULL
unique_name TEXT UNIQUE NOT NULL
type_id INTEGER NOT NULL
version TEXT NOT NULL,

ingest_job_status_types table

type_id INTEGER PRIMARY KEY
type_name TEXT NOT NULL

ingest_jobs

One row is created each time ingest is started, which is a set of modules in a pipeline.

ingest_job_id INTEGER PRIMARY KEY
obj_id INTEGER NOT NULL
host_name TEXT NOT NULL
start_date_time INTEGER NOT NULL
end_date_time INTEGER NOT NULL
status_id INTEGER NOT NULL
settings_dir TEXT

ingest_job_modules

Defines the order of the modules in a given pipeline (i.e. ingest_job)

ingest_job_id INTEGER
ingest_module_id INTEGER
pipeline_position INTEGER,

Indexes

parObjId

Index to speed up the process of finding parent objects.

artifactID ON blackboard_artifacts(artifact_id)
artifact_objID ON blackboard_artifacts(obj_id)
attrsArtifactID ON blackboard_attributes(artifact_id)
layout_objID ON tsk_file_layout(obj_id)

@@ Line 223: / Line 223: @@
 * type_name - Short, enum name for the type
 * display_name - Display name (spaces, etc.)
+* UNIQUE: type_name
 == account_to_instances_map ==

Difference between revisions of "Database v7.2 Schema"