YAFFS2
Contents
YAFFS2 Overview
This page provides a quick overview of the YAFFS2 file system. For a more complete description, see How Yaffs Works.
Details on how TSK implements YAFFS2 can be found in YAFFS2 Implementation Notes. The description below should be enough to understand the basic implementation.
YAFFS2 Terms
- Chunk : Data unit consisting of a page and spare area (you can think of a chunk as a cluster in NTFS/FAT -- with some extra spare area that is not storing content)
- Block : Group of chunks ( a block is the unit of erasure)
- Object : A YAFFS2 file/directory/etc
- Object ID : Unique identifier for each object (you can think of this as the meta data address, but we use a different meta data address to better deal with different versions of an object)
- Chunk ID : Position of this chunk in the file (0 = header, 1 = first chunk with content, 2 = second chunk with content, etc.)
- Sequence Number : Increments with each block written and stored in each chunk of a block (used to order blocks chronologically)
YAFFS2 Objects
A YAFFS2 Object (file, directory, etc.) consists of a header chunk, storing all metadata for the object, and zero or more data chunks. The spare area of each chunk will contain an object ID, sequence number, chunk ID, and file size, and possibly the type of object and the object ID of its parent (the type and parent object ID will also be in the data portion of the header chunk).
A YAFFS2 file system consists entirely of these objects - there is no master record of files or directory structure. The parent object ID field in each object is the only source for reconstructing the file hierarchy.
Basic YAFFS2 Operation
YAFFS2 is a log-structured file system that writes only once to each chunk. It does not use deletion markers; instead it stores enough information to reconstruct the chronological order of each chunk and from there use the most recent. The primary tool to do this is a sequence number stored in each chunk. This sequence number is incremented with each new block written, so that ordering blocks by sequence number will result in a chronological list regardless of where the blocks are in memory. Chunks are written sequentially within each block, so chunks early in a block are older than chunks that occur later.
For those not familiar with the workings of flash memory, an entire block is erased at a time. Once a chunk is written to, it cannot be changed without resetting the entire block that it belongs to. When the block is reset, it gets a new sequence number.
As an example, if we create a file temp.txt with 2 chunks worth of data, and then the first chunk of data is changed, we could see the following:
Sequence number | Offset | Object ID | Chunk ID | Notes |
1000 | 0x29400 | 500 | 0 | Object header containing file name "temp.txt" and other metadata |
1000 | 0x29c40 | 500 | 1 | First chunk of "temp.txt" |
1000 | 0x2a480 | 500 | 2 | Second chunk of "temp.txt" |
1000 | 0x2acc0 | 500 | 1 | First chunk of "temp.txt" |
The first version of chunk 1 is still there, but since we have a newer one it will now be ignored.
If after that we delete the file, it will get two new header blocks with the file named changed to "unlinked" or "deleted", the size set to zero, and the parent ID set to the unlinked or deleted folders.
Sequence number | Offset | Object ID | Chunk ID | Notes |
1000 | 0x29400 | 500 | 0 | Object header containing file name "temp.txt" and other metadata |
1000 | 0x29c40 | 500 | 1 | First chunk of "temp.txt" |
1000 | 0x2a480 | 500 | 2 | Second chunk of "temp.txt" |
1000 | 0x2acc0 | 500 | 1 | First chunk of "temp.txt" |
1006 | 0x02940 | 500 | 0 | Unlinked header |
1006 | 0x03180 | 500 | 0 | Deleted header |
Again, all the old data is still present (though at some point it may be garbage collected) but it will be ignored since we have a new header. Also note how the deleted header has a lower offset than the older data but a higher sequence number.