// ------------------------------------------------------------

63. Key-Recovery Attacks on GCM with Repeated Nonces

GCM is the most widely deployed block cipher mode for authenticated
encryption with associated data (AEAD). It's basically just CTR mode
with a weird MAC function wrapped around it. The MAC function works by
evaluating a polynomial over GF(2^128).

Remember how much trouble a repeated nonce causes for CTR mode
encryption? The same thing is true here: an attacker can XOR
ciphertexts together and recover plaintext using statistical methods.

But there's an even more devastating consequence for GCM: it leaks the
authentication key immediately!

Here's the high-level view:

1. The GCM MAC function (GMAC) works by building up a polynomial whose
   coefficients are the blocks of associated data (AD), the blocks of
   ciphertext (C), a block encoding the length of AD and C, and a
   block used to "mask" the output. Sort of like this:

       AD*y^3 + C*y^2 + L*y + S

   To calculate the MAC, we plug in the authentication key for y and
   evaluate.

2. AD, C, and their respective lengths are known. For a given message,
   the attacker knows everything about the MAC polynomial except the
   masking block

3. The masking block is generated using only the key and the nonce. If
   the nonce is repeated, the mask is the same. If we can collect two
   messages encrypted under the same nonce, they'll have used the same
   mask.

4. In this field, addition is XOR. We can XOR our two messages
   together to "wash out" the mask and recover a known polynomial with
   the authentication key as a root. We can factor the polynomial to
   recover the authentication key immediately.

The last step probably feels a little magical, but you don't actually
need to understand it to implement the attack: you can literally just
plug the right values into a computer algebra system like SageMath and
hit "factor".

But that's not satisfying. You didn't come this far to beat the game
on Easy Mode, did you?

I didn't think so. Now, let's dig into that MAC function. Like I said,
a polynomial over GF(2^128).

So far, all the fields we've worked with have been of prime size p,
i.e. GF(p). It turns out we can construct GF(q) for any q = p^k for
any positive integer k. GF(p) = GF(p^1) is just one form that's common
in cryptography. Another is GF(2^k), and in this case we have
GF(2^128).

For GF(2^k), we'll represent its elements as polynomials with
coefficients in GF(2). That just means each coefficient will be 0 or
1. Before we start talking about a particular field (i.e. a particular
choice of k), let's just talk about these polynomials. Here are some
of them:

                    0
                    1
                x
                x + 1
          x^2
          x^2     + 1
          x^2 + x
          x^2 + x + 1
    x^3
    x^3           + 1
    x^3       + x
    x^3       + x + 1
    x^3 + x^2
    x^3 + x^2     + 1
    x^3 + x^2 + x
    x^3 + x^2 + x + 1
    ...

And so forth.

If you squint a little they look like the binary expansions of the
integers counting up from zero. This is convenient because it gives us
an obvious choice of representation in unsigned integers.

Now we need some primitive functions for operating on these
polynomials. Let's tool up:

1. Addition and subtraction between GF(2) polynomials are really
   simple: they're both just the XOR function.

2. Multiplication and division are a little trickier, but they both
   just approximate the algorithms you learned in grade school. Here
   they are:

       function mul(a, b):
            p := 0

            while a > 0:
                if a & 1:
                    p := p ^ b

                a := a >> 1
                b := b << 1

            return p

        function divmod(a, b):
            q, r := 0, a

            while deg(r) >= deg(b):
                d := deg(r) - deg(b)
                q := q ^ (1 << d)
                r := r ^ (b << d)

            return q, r

   deg(a) is a function returning the degree of a polynomial. For the
   polynomial x^4 + x + 1, it should return 4. For 1, it should return
   0. For 0, it should return some negative value.

Now that we have a small nucleus of functions to work on polynomials
in GF(2), let's see how we can use them to represent elements in
GF(2^k). To be concrete, let's say k = 4.

Our set of elements is the same one enumerated above:

                    0
                    1
                x
                x + 1
          x^2
          x^2     + 1
          x^2 + x
          x^2 + x + 1
    x^3
    x^3           + 1
    x^3       + x
    x^3       + x + 1
    x^3 + x^2
    x^3 + x^2     + 1
    x^3 + x^2 + x
    x^3 + x^2 + x + 1

Addition and subtraction are unchanged. We still just XOR elements
together.

Multiplication is different. As with fields of size p, we need to
perform modular reductions after each multiplication to keep our
elements in range. Our modulus will be x^4 + x + 1. If that seems a
little arbitrary, it is - we could use any fourth-degree monic
polynomial that's irreducible over GF(2). An irreducible polynomial is
sort of analogous to a prime number in this setting.

Here's a naive modmul:

    function modmul(a, b, m):
        p := mul(a, b)
        q, r := divmod(p, m)
        return r

In practice, we'll want to be more efficient. So we'll interleave the
steps of the multiplication with steps of the reduction:

    function modmul(a, b, m):
        p := 0

        while a > 0:
            if a & 1:
                p := p ^ b

            a := a >> 1
            b := b << 1

            if deg(b) = deg(m):
                b := b ^ m

        return p

You can implement both versions to prove to yourself that the output
is the same.

Division is also different. Remember that in fields of size p we
defined it as multiplication by the inverse. So you'll need to write a
modinv function. It should be pretty easy to translate your existing
integer modinv function. I'll leave that to you.

You may find yourself in want of other functions you take for granted
in the integer setting, e.g. modexp. Most of these should have
straightforward equivalents in our polynomial setting. Do what you
need to.

Okay, now that you are the master of GF(2^k), we can finally talk
about GCM. Like I said (many words ago): CTR mode for encryption,
weird MAC in GF(2^128).

Here's the modulus for that field:

    x^128 + x^7 + x^2 + x + 1

The size of this field was chosen very specifically to match up with
the width of a 128-bit block cipher. We can convert a block into a
field element trivially; the leftmost bit is the coefficient of x^0,
and so on.

I described the MAC at a very high level. Here's a more detailed view:

1. Take your AES key and use it to encrypt a block of zeros:

       h := E(K, 0)

   h is your authentication key. Convert it into a field element.

1. Zero-pad the bytes of associated data (AD) to be divisible by the
   block length. If it's already aligned on a block, leave it
   alone. Do the same with the ciphertext. Chain them together so you
   have something like:

       a0 || a1 || c0 || c1 || c2

2. Add one last block describing the length of the AD and the length
   of the ciphertext. Original lengths, not padded lengths; bit
   lengths, not byte lengths. Like this:

       len(AD) || len(C)

3. Take h and your string of blocks and do this:

       g := 0
       for b in bs:
           g := g + b
           g := g * h

   Convert the blocks into field elements first, of course. The
   resulting value of g is a keyed hash of the input blocks.

4. GCM takes a 96-bit nonce. Do this with it:

       s := E(K, nonce || 1)
       t := g + s

   Conceptually, we're masking the hash with a nonce-derived secret
   value. More on that later.

   t is your tag. Convert it back to a block and ship it.

Implement GCM. Use AES-128 as your block cipher. You can probably
reuse whatever you had before for CTR mode. The important new thing to
implement here is the MAC. The above description is brief and
informal; check out the spec for the finer points. Since you've
already got the tools for working in GF(2^k), this shouldn't take too
long.

Okay. Let's rethink our view of the MAC. We'll use our example payload
from above. Here it is:

    t = (((((((((((h * a0) + a1) * h) + c0) * h) + c1) * h) + c2) * h) + len) * h) + s

Kind of a mouthful. Let's rewrite it:

    t = a0*h^6 + a1*h^5 + c0*h^4 + c1*h^3 + c2*h^2 + len*h + s

In other words, we calculate the MAC by constructing this polynomial:

    f(y) = a0*y^6 + a1*y^5 + c0*y^4 + c1*y^3 + c2*y^2 + len*y + s

And computing t = f(h).

Remember: as the attacker, we don't know that whole polynomial. We
know all the AD and ciphertext coefficients, and we know t = f(h), but
we don't know the mask s.

What happens if we repeat a nonce? Let's posit this additional payload
encrypted under the same nonce:

    b0 || d0 || d1

That's one block of AD and two blocks of ciphertext. The MAC will look
like this:

    t = b0*h^4 + d0*h^3 + d1*h^2 + len*h + s

Let's put them side by side (and rewrite them a little):

    t0 = a0*h^6 + a1*h^5 + c0*h^4 + c1*h^3 + c2*h^2 + l0*h + s
    t1 =                   b0*h^4 + d0*h^3 + d1*h^2 + l1*h + s

See how the s masks are identical? They depend only on the nonce and
the encryption key. Since addition is XOR in our field, we can add
these two equations together and that mask will wash right out:

    t0 + t1 = a0*h^6 + a1*h^5 + (c0 + b0)*h^4 + (c1 + d0)*h^3 +
              (c2 + d1)*h^2 + (l0 + l1)*h

Finally, we'll collect all the terms on one side:

    0 = a0*h^6 + a1*h^5 + (c0 + b0)*h^4 + (c1 + d0)*h^3 +
        (c2 + d1)*h^2 + (l0 + l1)*h + (t0 + t1)

And rewrite it as a polynomial in y:

    f(y) = a0*y^6 + a1*y^5 + (c0 + b0)*y^4 + (c1 + d0)*y^3 +
           (c2 + d1)*y^2 + (l0 + l1)*y + (t0 + t1)

Now we know a polynomial f(y), and we know that f(h) = 0. In other
words, the authentication key is a root. That means all we have to do
is factor the equation to get an extremely short list of candidates
for the authentication key. This turns out not to be so hard, but we
will need some more tools.

First, we need to be able to operate on polynomials with coefficients
in GF(2^128).

(Don't get confused: before we used polynomials with coefficients in
GF(2) to represent elements in GF(2^128). Now we're building on top of
that to work with polynomials with coefficients in GF(2^128).)

The simplest representation is probably just an array of field
elements. The algorithms are all going to be basically the same as
above, so I'm not going to reiterate them here. The only difference is
that you will need to call your primitive functions for GF(2^k)
polynomials in place of your language's built-in arithmetic operators.

With that out of the way, let's get factoring. Factoring a polynomial
over a finite field means separating it out into smaller polynomials
that are irreducible over the field. Remember that irreducible
polynomials are sort of like prime numbers.

To factor a polynomial, we proceed in three (well, four) phases:

0. As a preliminary step, we need to convert our polynomial to a monic
   polynomial. That just means the leading coefficient is 1. So take
   your polynomial and divide it by the coefficient of the leading
   term. You can save the coefficient as a degree zero factor if you
   want, but it's not really important for our purposes.

1. This is the real first step: we perform a square-free
   factorization. We find any doubled factors (i.e. "squares") and
   split them out.

2. Next, We take each of our square-free polynomials and find its
   distinct-degree factorization. This separates out polynomials that
   are products of smaller polynomials of equal degree. So if our
   polynomial has three irreducible factors of degree four, this will
   separate out a polynomial of degree twelve that is the product of
   all of them.

3. Finally, we take each output from the last step and perform an
   equal-degree factorization. This is pretty much like it sounds. In
   that last example, we'd take that twelfth-degree polynomial and
   factor it into its fourth-degree components.

Square-free factorization and distinct-degree factorization are both
easy to implement. Just find them on Wikipedia and go to town.

I want to focus on equal-degree factorization. Meet Cantor-Zassenhaus:

    function edf(f, d):
        n := deg(f)
        r := n / d
        S := {f}

        while len(S) < r:
            h := random_polynomial(1, f)
            g := gcd(h, f)

            if g = 1:
                g := h^((q^d - 1)/3) - 1 mod f

            for u in S:
                if deg(u) = d:
                    continue

                if gcd(g, u) =/= 1 and gcd(g, u) =/= u:
                    S := union(S - {u}, {gcd(g, u), u / gcd(g, u)})

        return S

It's kind of brilliant.

Remember earlier that we said a finite field of size p^k can be
represented as a polynomial in GF(p) modulo any monic, irreducible
degree-k polynomial? Take a moment to convince yourself that a field
of size q^d can be represented by polynomials in GF(q) modulo a
polynomial of degree d.

f is the product of r polynomials of degree d. Each of them is a valid
modulus for a finite field of size q^d. And each field of that size
contains a multiplicative group of size q^d - 1. And since q^d - 1 is
always divisible by 3 (in our case), each group of that size has a
subgroup of size 3. It contains the multiplicative identity (1) and
two other elements.

We have a simple trick for forcing elements into that subgroup: simply
raise them to the exponent (q^d - 1)/3. When we force our random
element into that subgroup, there's a 1/3 chance we'll land on 1. This
means that when we subtract 1, there's a 1/3 chance we'll be sitting
on 0.

The only hitch is that we don't know what these moduli are. But we
don't need to! Since f is their product, we can perform these
operations mod f and implicitly apply the Chinese Remainder Theorem.

So we do it: generate some polynomial, raise it to (q^d - 1)/3, and
subtract 1. (All modulo f, of course.) Compute the GCD of this
polynomial and each of our remaining composites. For any given
remaining factor, there's a 1/3 chance our polynomial is a
multiple. In other words, our factors should reveal themselves pretty
quickly.

Just keep doing this until the whole thing is factored into
irreducible parts.

Once the polynomial is factored, we can pick out the authentication
key at our leisure. There will be at least one first-degree
factor. Like this:

    y + c

Where c is a constant element of GF(2^128). It's also a candidate for
the key. It's possible you'll end up with a few first-degree factors
like this. The key will be one of them.

If you do have more than one candidate, there are two ways to narrow
the list:

1. Recover another pair of messages encrypted under the same
   nonce. Perform the factorization again and identify the common
   factors. The key will probably be the only one.

2. Just attempt a forgery with each candidate. This is probably
   easier.