This HTML document describes the file format used by CryptoTE. The editor saves text documents in an encrypted container file using the libenctain library. enctain is short for ENCrypted conTAINer. An encrypted container can hold a set of enumerated binary subfiles with associated metadata.
Each encrypted container file contains the following sections:
All binary numbers are stored in little-endian encoding.
The file format contains encrypted sections, these sections are, if not otherwise stated, encrypted using the Serpent cipher. The cipher is a 16-byte block cipher with a 256-bits key. The 256 keybits are defined by the user's encryption key string. The key string may be any length and is digested by the SHA-256 hash algorithm into a 256-bit key. The Serpent cipher (or other ciphers) is used in CBC-mode with different initialization vectors.Furthermore some parts of the file are compressed using zlib.
These parts are usually encrypted afterwards and thus padded to
16-byte blocks. The padding is ignored during decompression by
zlib and thus no extra length values are required.
The whole file follows the structure in the table below:
16 bytes | Header1 | Unencrypted header to identify file format and version. |
Variable | Unencrypted Metadata | Holds global metadata which can be displayed e.g. in file listings without requiring the encryption key. |
16 bytes | Encrypted Header2 | Encrypted header specifying subfiles and encrypted metadata. |
Variable | Encrypted Metadata | Encrypted metadata contains two main parts: global metadata properties and (local) subfile metadata properties. |
Variable | SubFile Data | Possibly encrypted and compressed data. No separators needed anymore. |
Header1 is an unencrypted header at offset 0 to identify the file format and version.
8 Bytes | Signature | An eight byte string or binary magic signature to identify the
file's type. The CryptoTE editor uses "CryptoTE" (without NULL
string termination). The value used by libenctain can be changed
using the function SetSignature() . |
32 bit uint | Version | Currently 0x00010000 which means v1.0. |
32 bit uint | Unencrypted Metadata Length | Length of the following unencrypted metadata in bytes. |
The first variable-length section of the file contains
application-defined unencrypted metadata properties. These
key-value pairs can be set, retrieved and enumerated using the
functions SetGlobalUnencryptedProperty()
,
SetGlobalUnencryptedProperty()
,
EraseGlobalUnencryptedProperty()
and
EraseGlobalUnencryptedProperty()
. This section can
also be omitted (zero as metadata length in Header1) if no
properties are defined by the application.
Variable properties are stored by libenctain using a the following format. Each properties key-value consists of two (possibly binary) strings, the key and the value. The string binary data is prefixed with it's length encoded in a single byte:
"string" is encoded into the hexbytes 06 73 74 72 69 6e 67. Note the 06 string length at the beginning.
This encoding makes the functions to read and write variable-length strings very simple. Furthermore binary NULL (0 bytes) can also be stored in the strings, as they are not NULL-terminated.
If a string is longer than 255 bytes the length does not fit
into the prefix byte. Thus an "escape length" is introduced:
0xFF
in the length field means "long string". The
0xFF
is then followed by a 32-bit unsigned integer
specifying the full length of the string. Therefore all strings
with 255 bytes or longer have 5 prefix bytes specifying their
length.
Example:
A string containing 'a' 1022 times is encoded: FF FE 03 00 00 61 61
(1018d more 'a's) 61 61.
The variable length unencrypted metadata contains a list of concatenated key-value pairs, which represent the global properties of the container. The number of key-value pairs is stored as a 32-bit uint as the beginning of the variable length structure. The length of this variable structured is defined by Header1.UnencryptedMetadataLength. The actual key-value pairs used are completely left up to the application.
CryptoTE currently uses the following unencrypted key-value properties:
Subject | User-defined subject text string from the container properties dialog. |
Author | User-defined author text string from the container properties dialog. Initialized with the user's login name for new containers. |
Description | User-defined description multi-line string from the container properties dialog. |
Example:
Subject | Test Example | 0000000000 43 72 79 70 74 6f 54 45 00 00 01 00 41
00 00 00 CryptoTE....A... 0000000016 03 00 00 00 06 41 75 74 68 6f 72 02 54 42 0b 44 .....Author.TB.D 0000000032 65 73 63 72 69 70 74 69 6f 6e 11 53 6f 6d 65 20 escription.Some 0000000048 6c 6f 6e 67 65 72 20 74 65 78 74 2e 07 53 75 62 longer text..Sub 0000000064 6a 65 63 74 0c 54 65 73 74 20 45 78 61 6d 70 6c ject.Test Exampl 0000000080 65 e |
Author | TB | |
Description | Some longer text. |
Following the variable unencrypted metadata is the first encrypted header. This header is 16-byte long, exactly the block-size of the Serpent256 cipher, which is used to encrypt it. Thus to read this block, the user must be queried for the encryption key, the Serpent cipher must be initialized with CBC-mode and the correct initialization vector set. This IV is stored in enctain.cpp and will not be repeated here.
32-bit uint | test123 | Always 0x12345678. This field can be used to quick-test if header decryption was successful. Otherwise the user probably entered a wrong encryption key. I am not aware that this field will make any known-plaintext attack easier. |
32-bit uint | Metadata Compressed Length | The length of the following variable metadata block. Because this metadata block is compressed using zlib, this value specifies the compressed length. |
32-bit uint | Metadata CRC32 | CRC32 of the following variable metadata block. This is actually duplicated by zlib at the end of the compressed stream and may be removed in a future version. |
32-bit uint | SubFileNum | Number of subfiles in the container. |
Following the encrypted header is a variable-length block of properties. The length of this section is defined in Header2. This section is encrypted using the Serpent256 cipher and compressed using zlib. The CBC-IV context continues from the the header.
The compressed metadata contains two main parts: global metadata properties and (local) subfile metadata properties. These are combined into one section to make compression more efficient. Furthurmore it should be possible to read and display the metadata properties of all subfiles without reading the complete subfile data.
Special about this section is that fixed and variable data is mixed.
32-bits uint | Number of Global Properties | Number of global property key-value pairs in the following variable length section. | ||||||
variable | Global Metadata | Section holding all global encrypted properties. They are simply concatenated and their number is known from the previous field. These properties are completely application-defined. | ||||||
repeated #subfile times | 32-bits uint | Storage Size | Size of the SubFile as stored in the container. This includes eventual encryption padding. Used to read the concatenated subfiles. | |||||
32-bits uint | Real Size | Size of the SubFile after decryption and decompression. | ||||||
32-bits uint | Flags | Compound field holding the subfile's encryption cipher number
and compression algorithm.
|
||||||
32-bits uint | CRC32 | CRC32 value of the SubFile's real data. Used to verify decryption and decompression. | ||||||
16 bytes | CBC-IV | Randomized CBC initialization vector of the subfile's encryption cipher context. | ||||||
32-bits uint | Number of SubFile Properties | Number of (local) properties. | ||||||
variable | SubFile Properties | Variable length block holding all subfile properties. They are simply concatentated. Again these properties are completely application-defined. |
CTime | Creation Time of the container. Stored as 4-byte time_t value. |
MTime | Last Modification Time of the container. Stored as 4-byte time_t value. |
DefaultCompression | Default compression algorithm for new SubFiles. |
DefaultEncryption | Default encryption cipher for new SubFiles. |
FileListDisplayMode | Private binary structure used to save the display mode of the file list. |
FileListColumns | Private binary structure used to save the currently displayed columns in report file list mode. |
Name | Filename as displayed in file list. Note that this is a just a property and no lookup key. Thus file names are not required to be unique and cannot be searched for directly. |
CTime | Creation Time of the SubFile. Stored as 4-byte time_t value. |
MTime | Last Modification Time of the SubFile. Stored as 4-byte time_t value. |
Filetype | Currently either "text" or anything else. If it is "text" the SubFile is opened using the text editor page, otherwise it is shown using a simple hexdump. |
Author | Used-defined string in properties. Initialized with the login name upon SubFile creation. |
Subject | Used-defined string in properties. |
Description | Used-defined multi-line string in properties. |
WTextPageSettings | Private binary structure used by the text editor page to save various display options like line-wrapping and line-numbers. |
Finally after all the headers and metadata the actual SubFile data is located. Each SubFile's storage length, encryption cipher and compression algorithm are defined in the compressed and encryption metadata section. Thus no additional structuring is required. All SubFiles are simply stored concatenated.
To locate a specific SubFile's data it is necessary to know the beginning offset of all SubFile Data. To this offset all preceding SubFile's StorageSize field must be added.