CryptoTE / Enctain File Format v1.0

Introduction

This HTML document describes the file format used by CryptoTE. The editor saves text documents in an encrypted container file using the libenctain library. enctain is short for ENCrypted conTAINer. An encrypted container can hold a set of enumerated binary subfiles with associated metadata.

Each encrypted container file contains the following sections:

All binary numbers are stored in little-endian encoding.

The file format contains encrypted sections, these sections are, if not otherwise stated, encrypted using the Serpent cipher. The cipher is a 16-byte block cipher with a 256-bits key. The 256 keybits are defined by the user's encryption key string. The key string may be any length and is digested by the SHA-256 hash algorithm into a 256-bit key. The Serpent cipher (or other ciphers) is used in CBC-mode with different initialization vectors.

Furthermore some parts of the file are compressed using zlib. These parts are usually encrypted afterwards and thus padded to 16-byte blocks. The padding is ignored during decompression by zlib and thus no extra length values are required.

Detailed Format

Overview

The whole file follows the structure in the table below:

16 bytes Header1 Unencrypted header to identify file format and version.
Variable Unencrypted Metadata Holds global metadata which can be displayed e.g. in file listings without requiring the encryption key.
16 bytes Encrypted Header2 Encrypted header specifying subfiles and encrypted metadata.
Variable Encrypted Metadata Encrypted metadata contains two main parts: global metadata properties and (local) subfile metadata properties.
Variable SubFile Data Possibly encrypted and compressed data. No separators needed anymore.

Header1

Header1 is an unencrypted header at offset 0 to identify the file format and version.

8 Bytes Signature An eight byte string or binary magic signature to identify the file's type. The CryptoTE editor uses "CryptoTE" (without NULL string termination). The value used by libenctain can be changed using the function SetSignature().
32 bit uint Version Currently 0x00010000 which means v1.0.
32 bit uint Unencrypted Metadata Length Length of the following unencrypted metadata in bytes.

Unencrypted Metadata

The first variable-length section of the file contains application-defined unencrypted metadata properties. These key-value pairs can be set, retrieved and enumerated using the functions SetGlobalUnencryptedProperty(), SetGlobalUnencryptedProperty(), EraseGlobalUnencryptedProperty() and EraseGlobalUnencryptedProperty(). This section can also be omitted (zero as metadata length in Header1) if no properties are defined by the application.

Variable properties are stored by libenctain using a the following format. Each properties key-value consists of two (possibly binary) strings, the key and the value. The string binary data is prefixed with it's length encoded in a single byte:

"string" is encoded into the hexbytes 06 73 74 72 69 6e 67. Note the 06 string length at the beginning.

This encoding makes the functions to read and write variable-length strings very simple. Furthermore binary NULL (0 bytes) can also be stored in the strings, as they are not NULL-terminated.

If a string is longer than 255 bytes the length does not fit into the prefix byte. Thus an "escape length" is introduced: 0xFF in the length field means "long string". The 0xFF is then followed by a 32-bit unsigned integer specifying the full length of the string. Therefore all strings with 255 bytes or longer have 5 prefix bytes specifying their length.

Example:
A string containing 'a' 1022 times is encoded: FF FE 03 00 00 61 61 (1018d more 'a's) 61 61.

The variable length unencrypted metadata contains a list of concatenated key-value pairs, which represent the global properties of the container. The number of key-value pairs is stored as a 32-bit uint as the beginning of the variable length structure. The length of this variable structured is defined by Header1.UnencryptedMetadataLength. The actual key-value pairs used are completely left up to the application.

CryptoTE currently uses the following unencrypted key-value properties:

Subject User-defined subject text string from the container properties dialog.
Author User-defined author text string from the container properties dialog. Initialized with the user's login name for new containers.
Description User-defined description multi-line string from the container properties dialog.

Example:

Subject Test Example 0000000000 43 72 79 70 74 6f 54 45 00 00 01 00 41 00 00 00 CryptoTE....A...
0000000016 03 00 00 00 06 41 75 74 68 6f 72 02 54 42 0b 44 .....Author.TB.D
0000000032 65 73 63 72 69 70 74 69 6f 6e 11 53 6f 6d 65 20 escription.Some
0000000048 6c 6f 6e 67 65 72 20 74 65 78 74 2e 07 53 75 62 longer text..Sub
0000000064 6a 65 63 74 0c 54 65 73 74 20 45 78 61 6d 70 6c ject.Test Exampl
0000000080 65                                              e
Author TB
Description Some longer text.

Encrypted Header2

Following the variable unencrypted metadata is the first encrypted header. This header is 16-byte long, exactly the block-size of the Serpent256 cipher, which is used to encrypt it. Thus to read this block, the user must be queried for the encryption key, the Serpent cipher must be initialized with CBC-mode and the correct initialization vector set. This IV is stored in enctain.cpp and will not be repeated here.

32-bit uint test123 Always 0x12345678. This field can be used to quick-test if header decryption was successful. Otherwise the user probably entered a wrong encryption key. I am not aware that this field will make any known-plaintext attack easier.
32-bit uint Metadata Compressed Length The length of the following variable metadata block. Because this metadata block is compressed using zlib, this value specifies the compressed length.
32-bit uint Metadata CRC32 CRC32 of the following variable metadata block. This is actually duplicated by zlib at the end of the compressed stream and may be removed in a future version.
32-bit uint SubFileNum Number of subfiles in the container.

Encrypted Metadata

Following the encrypted header is a variable-length block of properties. The length of this section is defined in Header2. This section is encrypted using the Serpent256 cipher and compressed using zlib. The CBC-IV context continues from the the header.

The compressed metadata contains two main parts: global metadata properties and (local) subfile metadata properties. These are combined into one section to make compression more efficient. Furthurmore it should be possible to read and display the metadata properties of all subfiles without reading the complete subfile data.

Special about this section is that fixed and variable data is mixed.

32-bits uint Number of Global Properties Number of global property key-value pairs in the following variable length section.
variable Global Metadata Section holding all global encrypted properties. They are simply concatenated and their number is known from the previous field. These properties are completely application-defined.
repeated #subfile times 32-bits uint Storage Size Size of the SubFile as stored in the container. This includes eventual encryption padding. Used to read the concatenated subfiles.
32-bits uint Real Size Size of the SubFile after decryption and decompression.
32-bits uint Flags Compound field holding the subfile's encryption cipher number and compression algorithm.
8-bit uint Compression Algorithm:
0 = none
1 = ZLib
2 = BZ2
8-bit uint Encryption Algorithm:
0 = none
1 = Serpent
16-bit uint reserved
32-bits uint CRC32 CRC32 value of the SubFile's real data. Used to verify decryption and decompression.
16 bytes CBC-IV Randomized CBC initialization vector of the subfile's encryption cipher context.
32-bits uint Number of SubFile Properties Number of (local) properties.
variable SubFile Properties Variable length block holding all subfile properties. They are simply concatentated. Again these properties are completely application-defined.

CryptoTE currently uses the following global key-value properties:
CTime Creation Time of the container. Stored as 4-byte time_t value.
MTime Last Modification Time of the container. Stored as 4-byte time_t value.
DefaultCompression Default compression algorithm for new SubFiles.
DefaultEncryption Default encryption cipher for new SubFiles.
FileListDisplayMode Private binary structure used to save the display mode of the file list.
FileListColumns Private binary structure used to save the currently displayed columns in report file list mode.

CryptoTE currently uses the following local SubFile key-value properties:
Name Filename as displayed in file list. Note that this is a just a property and no lookup key. Thus file names are not required to be unique and cannot be searched for directly.
CTime Creation Time of the SubFile. Stored as 4-byte time_t value.
MTime Last Modification Time of the SubFile. Stored as 4-byte time_t value.
Filetype Currently either "text" or anything else. If it is "text" the SubFile is opened using the text editor page, otherwise it is shown using a simple hexdump.
Author Used-defined string in properties. Initialized with the login name upon SubFile creation.
Subject Used-defined string in properties.
Description Used-defined multi-line string in properties.
WTextPageSettings Private binary structure used by the text editor page to save various display options like line-wrapping and line-numbers.

SubFile Data

Finally after all the headers and metadata the actual SubFile data is located. Each SubFile's storage length, encryption cipher and compression algorithm are defined in the compressed and encryption metadata section. Thus no additional structuring is required. All SubFiles are simply stored concatenated.

To locate a specific SubFile's data it is necessary to know the beginning offset of all SubFile Data. To this offset all preceding SubFile's StorageSize field must be added.