RLP (Recursive Length Prefix) serialization encodes arbitrary data structures into byte arrays using a deterministic set of rules. The encoding format is determined by examining the first byte of the encoded data. RLP treats all data as either byte strings or lists of byte strings. Integers are first converted to their byte representation before encoding. The full-length rules can be read from here.
For my own understanding, below is a distillation of the rules used to convert some data types into RLP format.
1. Integers
- For integers in range 0-127 (0x00-0x7f), the output is simply that byte.
- For integers > 127, encode as:
[0x80 + byte_length, ...bytes]
- Special case: 0 encodes as [0x80] (empty string)
Some examples of such an encoding of integer elements are:
Let's try to write some janky code to illustrate. The input is some Python
integer and the output is a hex string. zfill(length)
simply pre-fills the string with zeroes up to some fixed length. Eg.
'A'.zfill(3) = '00A'
.
Outputs running code above:
2. Strings
- For strings containing a single ASCII character (byte value ≤127), the output is that character's ASCII byte value.
- For strings 0-55 bytes long, encode as:
[0x80 + length, ...string_bytes]
.- For strings longer than 55 bytes, encode as:
[0xb7 + length_of_length, length, ...string_bytes]
.
For strings of length 0-55, 0x80
is used as it is
the first available value after the single-byte range (0x00-0x7f). For strings
longer than 55 bytes,
0xb7
is used to distinguish between a 55-byte string
(0xb7) and a 56-byte string (0xb8) just by looking at the first byte.
Now for some code to illustrate the rules above. ord(c)
changes the string character into a decimal number.
Some outputs:
3. Lists
Lastly, we will go through the rules for lists conversion.
- For the empty list, it is encoded as
0xc0
.- For lists with total payload ≤55 bytes, encode as:
[0xc0 + payload_length, ...encoded_items]
. Where we RLP encode each element in the list and concatenate their bytes at the end.- For lists with total payload >55 bytes, encode as:
[0xf7 + length_of_length, payload_length, ...encoded_items]
Finally, the lists code example.
Outputs:
Wrap-up
RLP's elegant design provides a deterministic way to serialize complex data structures into byte arrays. By using the first byte as a type indicator and length prefix, RLP enables efficient encoding and decoding while maintaining the ability to handle arbitrarily nested structures. This makes it particularly well-suited for blockchain applications like Ethereum, where consistent data serialization across distributed networks is crucial. Understanding these encoding rules helps developers work more effectively with Ethereum's underlying data structures and protocols.