Windows Implementation Notes

From SleuthKitWiki
Revision as of 17:32, 26 October 2008 by Carrier (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

NOTE: This was created from skins_windows.txt. It may need updating and wiki'ing help.


Introduction

Version 2.06 of The Sleuth Kit included support for Microsoft Windows. There were several design changes that needed to occur so that TSK could run on both Windows and Unix systems. The biggest change, and the focus of this document, was how Unicode and non-English characters were dealt with.


Problem

Unicode characters can be stored in multiple formats. Unix systems use UTF-8, which stores the characters in 1, 2, 3, or 4 bytes. Windows users UTF-16, which stores characters in 2 or 4 bytes. Because of this difference, the input to and output of TSK is different on Windows versus Unix.


Solution

The solution to this problem was to create many C #defines that map a general name to the specific function or type that is used on each platform. Internally, all code uses the UTF-8 encoding. This means that the input and output may need to be converted on Windows.

The input data consists of image file names, image and file system types, and addresses. There is no need to convert the file names because the native system calls need the same format as the input. For the image, volume, and file system types, I assume that they will always be in English and therefore they are easily converted to ASCII on Windows. Lastly, addresses in a string form are easy to convert to an integer and this is done using either UTF-8 or UTF-16 atoi-type functions.

For output, the printf and fprintf functions were wrapped with TSK-specific versions. The wrappers will convert the UTF-8 code to UTF-16, if needed, and then print the resulting data.

Therefore, few changes occurred to the volume and file system code except that the printf wrappers were used. The command line tools needed to be changed to handle the 2-byte TCHAR values as input and to use the T* functions, which map to either UTF-8 or UTF-16 functions.

Update: When support was added for the mingw cross-compiler, some of the things had to be changed. Specifically, the biggest change was that the command line arguments in the tools have to be obtained via GetCommandLineW() instead of using wmain() because mingw does not support wmain().