// ------------------------------------------------------------
66. Exploiting Implementation Errors in Diffie-Hellman
Most of the problems we've exploited have been application- or
protocol-layer errors of design. But cryptographic primitives can
suffer from implementation errors too!
One seldom-seen (but never the less deadly!) class of implementation
error is the notorious carry bug. It goes like this:
1. Most public-key (and some symmetric) cryptographic primitives take
advantage of bignum arithmetic.
2. Bignums are big. They can be thousands of bits long. They do not
fit in machine registers.
3. To compute the correct results, implementations are forced to break
up bignums into pieces (henceforth "limbs") that will fit in
registers.
4. Just like long addition in grade school, intermediate results are
computed one column (pair of limbs) at a time.
5. Just like long addition in grade school, intermediate results can
overflow. The implementation must track and apply all these carry
bits with utmost care.
6. Finally, the implementation has to stitch all the limbs back into a
coherent result.
There are opportunities for errors everywhere, but step 5 is
especially pernicious. Carry bugs can be difficult to detect, since
the incidence of random faults might be 2^-64 or worse. And because
these bugs are so implementation-dependent, standard test vectors,
even those meant to exercise pathological cases, typically are not
very good at finding these flaws.
Ladies and gentlemen, cryptopals, please: a moment of silence for the
crypto implementers.
*Sheds a single tear*
Okay, that's enough sympathy. Now let's find a nontrivial homomorphism
from their sorrow to the profit domain.
Let's target Diffie-Hellman, the bread-and-butter public-key primitive
of cryptographic protocols. We'll stick with EC notation. Here's the
high-level view of scalar multiplication:
function scalarmult(Q, k):
R := Q
# assume n-bit k with k[1] = 1
# iterate bits from high to low, skip first bit
for b in bits(k)[2..n]:
R := add(R, R)
if b = 1:
R := add(R, Q)
return R
In ECDH, Q is an attacker-supplied public point, and k is our secret
scalar. For simplicity, let's suppose k is a fixed bit-length. (This
is often true in DH implementations anyway.) We number the bits of k
from 1 to n with k[1] the most-significant bit and k[n] the
least-significant bit.
We need to hide some low-level bug in this high-level logic. We could
go through the exercise of implementing multiple-precision arithmetic
with some subtle-but-realistic carry bug embedded deep within the
logic, but that would be incredibly tedious, and it would only
obfuscate what is (at heart) a beautifully simple attack.
Instead, let's cheat:
1. If you worked through the previous ECDH problems, you should be
able to borrow code from your existing implementation. If not, gin
one up with the standard EC addition laws from Wikipedia.
2. Replace add() with faultyadd(), which should look something like
this:
function faultyadd(Q1, Q2):
if fault(Q1, Q2):
raise
return add(Q1, Q2)
3. Fault() should just hash the points and compare them to some
sentinel value. I did something like this:
function fault(Q1, Q2):
return (Q1.x * Q2.x) % p = 0
With p the probability of a fault.
It doesn't really matter what fault() is, but it must be
deterministic. Just make sure it's not too slow, since it's going to
get called 2n times per scalar multiplication. And make sure you can
adjust p easily, for your own sanity.
Before we think about how to attack, I think it's helpful to think
about the DH implementation above.
Write yourself a little tracer that steps through a scalar
multiplication and, at each call to add, prints the coefficients c and
d of Q in each call add(cQ, dQ). Feel free to add whatever debug
outputs make sense.
For example, with n = 6, trace(58) gives me:
# k = 111010
# i = 2, b = 1
add(1Q, 1Q)
add(2Q, 1Q)
# i = 3, b = 1
add(3Q, 3Q)
add(6Q, 1Q)
# i = 4, b = 0
add(7Q, 7Q)
# i = 5, b = 1
add(14Q, 14Q)
add(28Q, 1Q)
# i = 6, b = 0
add(29Q, 29Q)
And here's trace(62):
# k = 111110
# i = 2, b = 1
add(1Q, 1Q)
add(2Q, 1Q)
# i = 3, b = 1
add(3Q, 3Q)
add(6Q, 1Q)
# i = 4, b = 1
add(7Q, 7Q)
add(14Q, 1Q)
# i = 5, b = 1
add(15Q, 15Q)
add(30Q, 1Q)
# i = 6, b = 0
add(31Q, 31Q)
Play around with different inputs until you get an intuitive feel for
it.
Here are some things to note:
1. The coefficients of Q in each call to k is a function of the secret
scalar alone. This implies another model for the secret: a unique
sequence of group operations. While the attacker chooses the input
point, the victim alone chooses the sequence of operations on it.
2. For any given k, each pair of inputs to add() is unique. That is,
we never repeat a (Q1, Q2) pair. This is because the accumulator
doubles in each iteration.
3. In each iteration, the inputs to add() depend on the current bit
and all preceding bits. The succeeding bits are not considered
until later. The two examples above differ only in k[4]. Notice
that they encode an identical sequence of operations before
diverging forever when i = 4.
Another way to put this is that the bits of k encode a binary decision
tree:
add(1Q, 1Q)
/---/ \---\
b = 0 b = 1
add(2Q, 2Q) add(2Q, 1Q)
add(3Q, 3Q)
/ \ / \
b = 0 b = 1 b = 0 b = 1
add(4Q, 4Q) add(4Q, 1Q) add(6Q, 6Q) add(6Q, 1Q)
add(5Q, 5Q) add(7Q, 7Q)
...
And so on.
Now, let's attack. We know k[1] = 1 by definition. Let's consider
k[2]. By inspecting the tree above, we can see add(2Q, 2Q) is computed
if and only if k[2] = 0.
So:
1. Generate a random scalar d. Keep it secret and keep it safe. This
is the target of our attack.
2. Define an oracle that accepts a point Q, multiplies it by d, and
returns true or false depending upon whether a fault is
triggered. In a realistic setting, this could be an endpoint that
computes the ECDH handshake and decrypts a message. You can build
this out if you're feeling fancy, but the artificial oracle is okay
too.
3. Working offline, generate random points and compute add(1Q, 1Q)
(the unconditional first step of the multiplication) followed by
add(2Q, 2Q). You're looking for a point that survives the first
addition and triggers a fault on the second one.
4. When you find a point that triggers a fault, query the oracle. The
response from the oracle will tell you the value of k[2]. If it
succeeds, k[2] = 1. If it triggers the fault, k[2] = 0 (probably.)
Probably? Well, sure. There is, of course, a chance for false
positives. Since we're treating faults as random, there is a small but
nonzero chance your input point will trigger a fault on some later
step.
We can eliminate these by improving our simulation:
1. Instead of simulating only the b = 0 branch, simulate both
branches. Find a candidate point that triggers a fault on one but
not the other.
2. Suppose your point triggers a fault when k[2] = 1. Query the
oracle. If there is no fault, k[2] = 0. Otherwise, we are unsure,
so return to step 1. (The same logic applies if your point triggers
when k[2] = 0.)
From here, we can generalize the above algorithm to recover succeeding
bits:
1. Generate random points and simulate the bits of the key that you
know. Ensure there is no error.
2. Simulate all possible computations for the next unknown bit of the
key. If you trigger a fault, proceed. Otherwise, return to step 1.
3. Query the oracle. A negative result leaks a key bit with
certainty. A positive result (i.e. a fault) may leak a key
bit. Depending on how you tuned your fault probability, this may
give you enough confidence to proceed.
4. Rinse and repeat until all key bits are recovered.
That's pretty much it.
Well, there are a few opportunities to get fancy, if you feel so
inclined:
1. Even in the presence of uncertainty, positive results have
value. You can calculate the probability of a false positive and
determine whether you have enough confidence to proceed.
2. Once you have enough bits of the key, you can compute the remainder
of the attack offline using standard discrete logarithm attacks
(e.g. Pollard's kangaroo).