https://github.com/facebook/rocksdb/wiki/RocksDB-Overview

SSTable Format V0

The SST is read starting from bottom to top, starting with Offset of SsTableInfoT

+-----------------------------------------------+  
|               SSTable                         |  
+-----------------------------------------------+  
|  +-----------------------------------------+  |  
|  |  List of Blocks                         |  |  
|  |  +-----------------------------------+  |  |  
|  |  |  block.Block                      |  |  |
|  |  |  +-------------------------------+|  |  |  
|  |  |  |  List of KeyValue pairs        |  |  |  
|  |  |  |  +---------------------------+ |  |  |  
|  |  |  |  |  Key Length (2 bytes)     | |  |  |  
|  |  |  |  |  Key                      | |  |  |  
|  |  |  |  |  Value Length (4 bytes)   | |  |  |  
|  |  |  |  |  Value                    | |  |  |  
|  |  |  |  +---------------------------+ |  |  |  
|  |  |  |  ...                           |  |  |  
|  |  |  +-------------------------------+|  |  |  
|  |  |  |  Offsets for each Key          |  |  |  
|  |  |  |  (n * 2 bytes)                 |  |  |  
|  |  |  +-------------------------------+|  |  |  
|  |  |  |  Number of Offsets (2 bytes)   |  |  |  
|  |  |  +-------------------------------+|  |  |  
|  |  |  |  Checksum (4 bytes)            |  |  |  
|  |  +-----------------------------------+  |  |  
|  |  ...                                    |  |  
|  +-----------------------------------------+  |  
|                                               |  
|  +-----------------------------------------+  |  
|  |  bloom.Filter (if MinFilterKeys met)    |  |
|  +-----------------------------------------+  |  
|                                               |  
|  +-----------------------------------------+  |  
|  |  flatbuf.SsTableIndexT                  |  |
|  |  (List of Block Offsets)                |  |  
|  |  - Block Offset (Start of Block)        |  |  
|  |  - FirstKey of this Block               |  |  
|  |  ...                                    |  |  
|  +-----------------------------------------+  |  
|                                               |  
|  +-----------------------------------------+  |  
|  |  flatbuf.SsTableInfoT                   |  |
|  |  - FirstKey of the SSTable              |  |  
|  |  - Offset of bloom.Filter               |  |
|  |  - Length of bloom.Filter               |  |
|  |  - Offset of flatbuf.SsTableIndexT      |  |
|  |  - Length of flatbuf.SsTableIndexT      |  |
|  |  - The Compression Codec                |  |  
|  +-----------------------------------------+  |  
|  |  Checksum of SsTableInfoT (4 bytes)     |  |  
|  +-----------------------------------------+  |  
|                                               |  
|  +-----------------------------------------+  |  
|  |  Offset of SsTableInfoT (4 bytes)       |  |  
|  +-----------------------------------------+  |  
+-----------------------------------------------+

Key Value encoding for the v0

uint16uint16[]byteuint64uint8int64int64uint32[]byte
KeyPrefixLenKeySuffixLenKeySuffixseqflagsexpireAtcreatedAtvalueLenvalue
Tombstone format
uint16uint16[]byteuint64uint8int64
KeyPrefixLenKeySuffixLenKeySuffixseqflagscreatedAt
FieldTypeDescription
KeyPrefixLenuint16Length of the key prefix
KeySuffixLenuint16Length of the key suffix
KeySuffix[]byteSuffix of the key
sequint64Sequence Number
flagsuint8Flags of the row
expireAtint64Optional, only has value when flags & FlagHasExpire
createdAtint64Optional, only has value when flags & FlagHasCreate
value_lenuint32Length of the value
value[]byteValue bytes

NOTE: both expireAt and createdAt are unix epoch

Trie based SSTs

We may consider moving toward a Trie based SST instead of using a bloom filter. See https://x.com/debasishg/status/1871091225383821622?t=PfOw7F5vE4SLwZhG5u-Heg&s=19 PDF: https://t.co/4DENC3z0tk

A Trie implementation https://github.com/dghubble/trie

I also developed a trie for Mailgun I never used, I may want to resurrect that.