Difference between revisions of "YAFFS2 Implementation Notes"

From SleuthKitWiki
Jump to: navigation, search
(Creating versions)
m (Creating versions)
Line 36: Line 36:
 
=== Creating versions ===
 
=== Creating versions ===
 
The question is, where do we start these versions? Starting at every chunk in the list would produce way too many files. Even only starting at header chunks seems like it would create more versions than could reasonably be worked with. For now, we base versions around changes in the sequence number and create them as follows:
 
The question is, where do we start these versions? Starting at every chunk in the list would produce way too many files. Even only starting at header chunks seems like it would create more versions than could reasonably be worked with. For now, we base versions around changes in the sequence number and create them as follows:
* A "version" has a number, a pointer to its most recent header, and a pointer its the most recent chunk
+
* A version has a number, a pointer to its most recent header, and a pointer its the most recent chunk
 
* Go through the sorted list of chunks adding each to the current version (i.e., updating the most recent header and chunk pointers) until the sequence number changes
 
* Go through the sorted list of chunks adding each to the current version (i.e., updating the most recent header and chunk pointers) until the sequence number changes
 
** Note that if we find an unlinked/deleted header we don't record it in the header chunk pointer unless we have no previous header. These headers give us no information and can cause us to entirely miss files created and deleted in one block, as in the following example:
 
** Note that if we find an unlinked/deleted header we don't record it in the header chunk pointer unless we have no previous header. These headers give us no information and can cause us to entirely miss files created and deleted in one block, as in the following example:

Revision as of 05:32, 5 February 2013

See YAFFS2 for general information about the YAFFS2 file system.

General File System Reconstruction

Determining the format of the spare area

The YAFFS2 specification does not specify exactly where the data in the header should be stored or exactly how the extra fields should fit in. In practice we've seen this format:

  • 4 byte sequence number
  • 4 byte object ID (with the object type in the high four bits if the header flag is set in the next entry)
  • 4 byte chunk ID OR high bit set (header flag) and the parent ID in the rest
  • 4 byte number of bytes

At present we've seen these 16 bytes start at offset 0, offset 2, or offset 30. TSK attempts to determine where the fields are by reading in a reasonable number of intialized spare areas and then doing some tests on each possible offset:

  • Sequence numbers should be the same for all chunks in a block
  • Sequence number shouldn't be null or 0xffffffff
  • Object ID can't be zero
  • If we have other options, we'd prefer the sequence number not to start with 0xff

These tests could certainly be improved upon.

Constructing the current version of each object

  1. Read in the sequence number, object ID, and chunk ID from the spare area of each chunk and record the offset of the chunk
  2. Make a list of chunks for each object ID and sort it by sequence number and then offset, resulting in a chronological list of chunks for each object
  3. The current header and data chunks can then be found by reading backwards through the list

The inode for the current version of an object is its object ID.

Constructing the file hierarchy

  1. The root directory always has object ID 1
  2. To find the children of a directory, search over all objects to find those with the appropriate parent ID


Finding Deleted/Older Versions of Objects

Object versions

As described in the previous section, for each object we create a chronologically ordered list of chunks with that object ID. The current version of an object is created by starting at the end of the list and reading backwards, but we can start at any point in the list and read backwards to create an older version of the object. Object IDs tend to get reused frequently, so in addition to multiple versions of the same file we also expect to see old files that have been deleted.

Creating versions

The question is, where do we start these versions? Starting at every chunk in the list would produce way too many files. Even only starting at header chunks seems like it would create more versions than could reasonably be worked with. For now, we base versions around changes in the sequence number and create them as follows:

  • A version has a number, a pointer to its most recent header, and a pointer its the most recent chunk
  • Go through the sorted list of chunks adding each to the current version (i.e., updating the most recent header and chunk pointers) until the sequence number changes
    • Note that if we find an unlinked/deleted header we don't record it in the header chunk pointer unless we have no previous header. These headers give us no information and can cause us to entirely miss files created and deleted in one block, as in the following example:
Obj id   Seq num  Offset   Type Parent   Name
000005f9 00001b3c 07229a00 file 00000004 deleted       
000005f9 00001b3d 07240d40 file 0000032e im.db-journal
000005f9 00001b3d 07245780 file 0000032e im.db-journal
000005f9 00001b3d 07246800 file 0000032e im.db-journal
000005f9 00001b3d 0724b240 file 00000003 unlinked     
000005f9 00001b3d 0724ba80 file 00000004 deleted
  • Save that version and create a new one, incrementing the version number. This new version will start with the older header pointer as its most recent header. A few exceptions:
    • If we're looking at a directory (which will only have a header chunk) and its name hasn't changed, don't start a new version. Multiple copies of the same directory name don't give us much information
    • If we never found a header for the previous version, don't start a new version. We can't do anything with a version with no header

Inodes and filenames

Inodes for older versions are created using the object ID and version number. A version number of zero always returns the current version of the object. To avoid name conflicts, non-current versions have their version number and object ID appended to the file name.

r/r 764:        pvcodec.txt
r/r * 2360060:  pvcodec.txt#764,9
r/r * 2097916:  pvcodec.txt#764,8
r/r * 1835772:  pvcodec.txt#764,7

Determining allocated/unallocated status of versions and chunks

We consider a version of an object to be unallocated if it is not the most recent version or if the most recent header block is a deleted block. We consider a chunk to be allocated if it is part of an allocated version of an object. Since each chunk contains the object ID it belongs to, linking chunks with objects to determine their allocated status is fairly simple.