Introduction

This document presents the BitMAC and StreamMAC message-authentication-code algorithms. BitMAC requires a one-time-pad to work, and assuming you never re-use your one-time pad, BitMAC is information-theoretically secure; but, is also slow. The improvements which lead to better performance require a much larger pad, and therefore we also introduce the more practical version of BitMAC, StreamMAC, which takes advantage of a stream cipher.

Alice and Bob

Alice wants to communicate with Bob over an insecure channel. Unfortunately for them, they live in a society where block ciphers, stream ciphers, and message-authentication codes are outlawed.

But, due to legal loopholes, xor operations are allowed, and thus so are one-time pads.

Alice and Bob have agreed that she will use the following one-time pad to send messages to Bob (shown in hexadecimal):

(A grid full of hexadecimal values should have loaded here. If you're using a phone, try viewing this page from a desktop browser instead.)

There’s still one problem, though: while one-time pads are excellent encryption devices, they are infamously malleable. An attacker who is able to manipulate bits on the channel between Alice and Bob can flip any chosen bit at will, and if the attacker happens to know exactly which bit might cause harm, they could launch a calculated attack.

How can Bob authenticate Alice’s messages?

BitMAC: One bit at a time

To use BitMAC, Alice will first need to think of the message(s) she sends to Bob as a stream of bits. She’ll also need to decide on the number of bits of security she wants: we’ll assume 128 bits of security.

To encrypt and authenticate the first bit (of her stream of messages), she’ll do the following:

If the first bit is $0$, then send Bob the first 128-bits of the pad
If the first bit is $1$, then send Bob the second 128-bits of the pad

Bob can decrypt/verify the first bit supplied by Alice as follows:

Bob will read 128 bits from Alice
He will perform a trial encryptiion for both possible cases: he’ll repeat Alice’s encryption for both the $0$ case and $1$ case
He’ll compare both cases to see which one matches what Alice sent

If what Bob received matches the first 128-bits of the pad, then he knows Alice has sent a $0$ bit. If what Bob received matches the second 128-bits of the pad, then he knows Alice has sent a $1$ bit. If neither case matches, then authentication has failed.

When Alice is ready to send her second bit, she can do the same thing as before, but this time, instead of using bits $0-255$, she uses the next group of 256 bits ($256-511$).

In summary:

To send bit number $i$ as a $0$, she sends bits $256i$ through $256i+127$.
To send bit number $i$ as a $1$, she sends bits $256i+128$ through $256i+255$.

Here’s an illustration: try typing a message below, and playing around with different security levels:

(An interactive grid full of hexadecimal values should have loaded here. If you're using a phone, try viewing this page from a desktop browser instead.)

This proves that authenticated encryption is possible using only a one-time pad. But, this protocol is ~128x slower and takes up ~128x as much space as as non-authenticated encryption. Can we make it a little more efficient?

StreamMAC

Let’s relax the rules, and assume that stream ciphers have been legalized. Stream ciphers are nothing but a virtual mapping from a space of keys to one-time pads (they also have the benefit of random access into the pad), so let’s apply the same ideas behind BitMAC to a stream cipher, then see what additional performance improvements we can make.

Alice and Bob will now share (instead of an entire one-time pad), a $key$. They’ll also agree on a stream cipher hash function, $SC$, which is called as follows:

$stream = SC(key,nonce)$

This stream cipher we’ll be using takes a $key$ (already known by both parties) and returns a virtual one-time pad of size $2^{262}$ bytes. Because this is too large to fit into memory, it also takes a $nonce$, which is a 256-bit number representing an offset into the pad—specifically, it represents an offset in multiples of 64 bytes. The stream cipher hash function also only returns 64 bytes: given a nonce, it will seek the chosen 64-byte-multiple-offset, and return only the first 64 bytes starting at that offset. (In fact, what I’ve described here is exactly how the xsalsa20 stream cipher works.)

We can implement one-bit-at-a-time authenticated encryption as follows:

$AE_{v1}(key, i, v) = SC(key, 2i+v)[0..15]$

Where:

$AE_{v1}(key, i, v)$ means we are encrypting the $i$th bit, and that the bit has a value of $v$.
$SC(key, i)$ means we are seeking the $i$th 64-byte block into the virtual one-time pad
$x[0..15]$ means we are slicing bytes from the byte array $x$ starting from the $0$-indexed byte (inclusive) to the $15$-indexed byte (inclusive)

Bob can decrypt and verify by receiving $ciphertext$ from Alice and checking both cases:

$ciphertext = SC(key, 2i+0)[0..15] \implies 0 \\ ciphertext = SC(key, 2i+1)[0..15] \implies 1 \\ otherwise \implies AuthenticationFailure $

This means, we’ve now applied BitMAC to a stream cipher. The advantage is that we can now carry a small $key$ instead of an extremely lengthy one-time pad. But, since we’re calling our stream cipher hash function ($SC$) for each bit (one at a time), in practice, performance is pretty bad. How can we do better?

One chunk at a time

To lessen the number of encryptions, Alice could use the same algorithm she’s been using, but with a new realization: instead of encrypting/authenticating a single bit at a time, she could encrypt/authenticate 4 bits at a time.

For example: when using (as we did before) 1-bit messages, each ciphertext which Bob receives from Alice has $2$ possible values, and so he only needs to compare each one against $2$ nonces. If we used, instead, 4-bit messages, then each message has $2^4$ possible values, and therefore Bob will need to compare each ciphertext against $2^4$ nonces.

The authenticated encryption function is then:

$ AE_{v2}(key, i, v) = SC(key,2^{4}i+v)[0..15] $

But why stop with 4-bit chunks? We could map chunks of:

4 bits to 2^4 nonces
8 bits to 2^8 nonces
16 bits to 2^16 nonces
32 bits to 2^32 nonces
64 bits to 2^64 nonces
128 bits to 2^128 nonces
192 bits to 2^192 nonces
256 bits to 2^256 nonces

The bigger the chunk, the less number of encryptions we’ll have to do, but the more nonce-space we’ll use up. If we want to allow the user to send up to $2^{64}$ chunks, then the largest chunk size we can choose is 192-bits; i.e.:

$ AE_{v3}(key, i, v) = SC(key,2^{192}i+v)[0..15] $

The reason is as follows: because our stream cipher hash function ($SC$) only accepts $2^{256}$ nonces, and because we want to allow up to $2^{64}$ chunks (0-indexed as $i$), then the nonce of the maximum-index, maximum-value, message will be:

$ 2^{192}i+v = 2^{192}(2^{64}-1)+(2^{192}-1) = 2^{256}-1 $

Great, so now we can send 24-byte (192-bit) messages (chunks). But, there’s one seemingly-critical problem is this: if Bob receives a 192-bit chunk of data, it will be difficult for him to verify the chunk using “encryption trials” as he did before. With the “one bit at a time” strategy, there were only two potential values Alice could have encrypted each bit with, so he was able to trivially check both cases. Now, for each chunk of data, there are $2^{192}$ possible nonces he’d need to check, which is obviously infeasible.

How do we solve this?

Include two copies

Fortunately, there is a simple solution: with each message, Alice will include two copies: the first copy is authenticated (using the strategy we’ve just described) and the second copy is encrypted with our stream cipher (but not authenticated).

So now, our formulas are:

$ \begin{align*} AE_{v4}(key,i,v) & = A(key,i,v) || E(key,i,v) \\ A(key,i,v) & = SC(key,2^{192}i+v)[0..15] \end{align*} $

This solves the previous problem, but introduces a new problem: we’ve already reserved the entire nonce-space for our authentication ($A$) function, and, since we need to encrypt as a separate operation, we don’t have any nonces left. For now, we’ll solve this by using two keys: an encryption key ($key_e$) and an authentication key ($key_a$):

$ \begin{align*} AE_{v5}(key_a,key_e,i,v) & = A(key_a,i,v) || E(key_e,i,v) \\ A(key_a,i,v) & = SC(key_a,2^{192}i+v)[0..15] \end{align*} $

With our new chunk-by-chunk algorithm: our performance has improved greatly: to authenticate and encrypt $n$ bytes:

Authentication will require $\frac{n}{24}$ calls to the stream cipher hash function
Encryption will require $\frac{n}{64}$ calls to the stream cipher hash function
Our ciphertext will be $n + \frac{16}{24}n$ bytes long

Can we do any better?

Collapse the authenticators

With the system explained so far, if Alice wants to send multiple 24-byte chunks, she’ll need to send a 16-byte authenticator along with each chunk. This multiplies our network bandwidth usage by $1+\frac{16}{24}=1.66$.

Here’s how we can make this better: when sending multiple chunks, simply xor the authenticators together. This allows you to send several chunks (of 24 bytes each) with a single 16-byte authenticator.

Before, our $AE$ function processed one 16-byte chunk, encrypted and authenticated it. But, since we are now authenticating multiple chunks at once, we’ll modify our $AE$ function to take a stream of bytes, split it into chunks of 16 bytes, then concatenate the result of both encryption and authentication of each chunk:

$ \begin{align*} AE(key,i,bytes) = \ \ & bytes.split(64).map(v \rightarrow E(key,i,v)) \\ & || \ \ bytes.split(24).fold(v \rightarrow ⊕,A(key,i,v)) \\ \end{align*} $

Where:

$||$ denotes concatenation,
$split(n)$ denotes splitting the stream of bytes into a stream of n-byte chunks

After making this change, our ciphertext will be $n + 16$ bytes long, where $n$ is the number of bytes in the original plaintext.

We’ve made progress, but we still have one nagging problem: our algorithm takes two keys. Can we fix that?

Back to one key

Earlier, we decided to use two keys because our authenticator key ($key_a$) needs to use up to $2^{64}2^{192}=2^{256}$ nonces, and therefore there was no room left for the encryption key ($key_e$). But, there’s a trick we can use to fix this.

When our authenticator function ($A$) calls our stream cipher hash function ($SC$) for a given nonce, the $SC$ function actually returns $64$ bytes, but we’re only ever using the first $16$ bytes. In other words, the $SC$ function can, in fact, return $64 * 2^{256}=2^{262}$ bytes, but our authenticator only needs $16*2^{256}=2^{260}$ of the bytes. This means there are $2^{262}-2^{260} \approx 2^{261}$ bytes available for the encryption function to use.

We can take advantage of this by doing the following:

Split the nonce-space in half: $$ \begin{align*} [0..2^{255}-1] & \rightarrow E \\ [2^{255}..2^{256}-1] & \rightarrow A \end{align*} $$
If the authentication function ($A$) is asked for a nonce in the range:
1. $[2^{255},2^{256}-1]$, it will use the bytes $[0..15]$ returned by by $SC$.
2. $[0,2^{255}-1]$, it will add $2^{255}$ to the nonce and use bytes $[16..31]$ returned by SC.

In other words, $A$ now becomes:

$ \begin{align*} A(key,i,v) & = SC(key,n_a(i,v))[(0+o_a(i,v))..(15+o_a(i,v))] \\ n_a(i,v) & = ((2^{192}i+v) \mod 2^{255}) + 2^{255} \\ o_a(i,v) & = 16 \lfloor \frac{2^{192}i+v}{2^{255}} \rfloor \end{align*} $

Conclusion

The final version of StreamMAC is defined as:

$ \begin{align*} AE(key,i,bytes) = \ \ & bytes.split(64).map(v \rightarrow E(key,i,v)) \\ & || \ \ bytes.split(24).fold(v \rightarrow ⊕,A(key,i,v)) \\ AE(key,i,v) & = A(key,i,v) || E(key,i,v) \\ A(key,i,v) & = SC(key,n_a(i,v))[(0+o_a(i,v))..(15+o_a(i,v))] \\ n_a(i,v) & = ((2^{192}i+v) \mod 2^{255}) + 2^{255} \\ o_a(i,v) & = 16 \lfloor \frac{2^{192}i+v}{2^{255}} \rfloor \end{align*} $

And has the following performance metrics for an $n$-byte plaintext:

Authentication will require $\lceil \frac{n}{24} \rceil$ calls to the stream cipher hash function
Encryption will require $\lceil \frac{n}{64} \rceil$ calls to the stream cipher hash function
Our ciphertext will be $n + 16$ bytes long

Thus, StreamMAC can authenticate and encrypt an $n$-byte plaintext at a speed $\lceil \frac{n}{24} \rceil + \lceil \frac{n}{64} \rceil \approx 2.66$ times slower than the underlying stream cipher. Also, its security margin is optimal, requiring an attacker to forge an average of $\frac{2^{128}}{2}$ authentication tags before succeeding, assuming an ideal underlying stream cipher.