By now most everyone has read about the IOS vulnerability
that was fixed last Friday. Because that portion of code is open source, it
didn’t take long for the security community to track down the offending issue
and bring it to the light of day. The majority of comments I have read have
wondered how such a simple break could make it out in to the wild. I thought it
would be fun to speculate as to how it happened and also to do a little code
review. A good in depth review of the bug and it’s implications from a user
standpoint is here.
*Disclaimer – I don’t consider myself better or above
mistakes like this. The code I write today is as good as I understand the
technology today, code I wrote last week is always not as good as it could be
now. We are constantly learning as developers and continually making our code
better. Reviewing the code we wrote last week and contemplating ways to make it
better are all a part of becoming that better developer we all strive to be.
As developers we have come up with some best practices when
coding, rules of thumb that we should follow. Most of these practices are
debated and fought over like religious battles. They are usually a tug of war
between readability, succinctness and performance. Most of the battles are grey
areas. There is little black and white to software design and architecture.
Code statements themselves are black and white, it either works or doesn’t. The
composition of statements into useful code is still more art than science.
Every program is unique with different user stories and different behaviors.
This code is interesting piece to review because in my
opinion not following best practices led to this vulnerability. Individually they
do not present a clear issue, but in tandem they allowed a single extra line of
code to hide in the file unnoticed by compilers or humans. 99% of the time following
the best practice is the best approach because doing so reduces the chances of bad stufftm happening, but 1%
of the time there is good reason not to. Knowing when is the key.
Following is the code and following that are the practices
it breaks.
static OSStatus SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, uint8_t *signature, UInt16 signatureLen) { OSStatus err; SSLBuffer hashOut, hashCtx, clientRandom, serverRandom; uint8_t hashes[SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN]; SSLBuffer signedHashes; uint8_t *dataToSign; size_t dataToSignLen; signedHashes.data = 0; hashCtx.data = 0; clientRandom.data = ctx->clientRandom; clientRandom.length = SSL_CLIENT_SRVR_RAND_SIZE; serverRandom.data = ctx->serverRandom; serverRandom.length = SSL_CLIENT_SRVR_RAND_SIZE; if(isRsa) { /* skip this if signing with DSA */ dataToSign = hashes; dataToSignLen = SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN; hashOut.data = hashes; hashOut.length = SSL_MD5_DIGEST_LEN; if ((err = ReadyHash(&SSLHashMD5, &hashCtx)) != 0) goto fail; if ((err = SSLHashMD5.update(&hashCtx, &clientRandom)) != 0) goto fail; if ((err = SSLHashMD5.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashMD5.update(&hashCtx, &signedParams)) != 0) goto fail; if ((err = SSLHashMD5.final(&hashCtx, &hashOut)) != 0) goto fail; } else { /* DSA, ECDSA - just use the SHA1 hash */ dataToSign = &hashes[SSL_MD5_DIGEST_LEN]; dataToSignLen = SSL_SHA1_DIGEST_LEN; } hashOut.data = hashes + SSL_MD5_DIGEST_LEN; hashOut.length = SSL_SHA1_DIGEST_LEN; if ((err = SSLFreeBuffer(&hashCtx)) != 0) goto fail; if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; //here is the problem if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; err = sslRawVerify(ctx, ctx->peerPubKey, dataToSign, /* plaintext */ dataToSignLen, /* plaintext length */ signature, signatureLen); if(err) { sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify " "returned %d\n", (int)err); goto fail; } fail: SSLFreeBuffer(&signedHashes); SSLFreeBuffer(&hashCtx); return err; }
Let’s start with a quick discussion of what the code is doing and why it fails. The interesting bits are the if statements from the ReadyHash call down. Basically the code is creating a SHA1 key that it will verify. After the third SSLHashSHA1.update() call the goto fail statement is repeated. This was either caused by the deletion of another function call or simply the duplication of the line. The result is that the implicit code block for that if statement is only the single next statement. The second goto is executed no matter the outcome of the statement above. The variable err is updated in each if statement and returned at the end. Because no error condition actually existed when it hits the extra goto statement, it returns the success result of the previous function.
Each of the best practices not followed are below.
Use of goto
The saying we typically use is never use
goto. I don’t like to say never. If you
have to use goto you are probably doing
something wrong. Sometimes however, it can make sense. In this case it’s used
to short circuit several tests throughout the function and this pattern is used
throughout other functions in the class. Each goto statement and it’s
corresponding label are contained within the function. Each function still
starts at the top and ends at the bottom with the return statement. It’s much
better than a complicated nested if statement or compound if statements. This
is the easy thing to point to as the major cause of this flaw because of our
general rule about using goto, but I don’t think so. I believe it’s warranted
here. It makes it extremely easy to read and understand what is happening and
why. The goto statements do not jump around creating the spaghetti code, they
just skip to the bottom of the function and bypass additional checks if one
above does not pass.
Not putting blocks with if statements
When an if statement is not followed by
curly braces to create a code block, the single next command is executed if
true. Writing if statements without curly braces can make the code more
succinct. Adding braces would have added 11 lines of code to this one function.
In my opinion using braces with one statement better expresses the logic of the
code. The extra typing and extra space it takes is more than justified by the
gains in readability and explicitness of the code. If there were a code block
here chances are the second goto would have been enclosed in that code block
and the issue would have been avoided. At the very least, if an if statement were
deleted it would have been noticeable that a code block with braces remained.
In my opinion this was the best practice that would have had the best chance of
preventing the issue.
The naming convention is confusing
At first glance the code block makes sense.
If there is an error it goes to the fail tag and exits the function. But the
fail tag in this case doesn’t truly mean fail, it means exit. The return code
is being set by an assign statement within the if statement. The first thing
you would think when looking at this error for the first time would be that it’s
going to always fail because it will always “goto fail”. The first thing you
should think is it is always going to “goto exit” and the verify function will
never be called. The fail label doesn’t accurately portray what the code below
it is doing and should be called exit, endfunction, cleanup or something of that
nature. This would not fix the issue, but it would make the intentions of the code
clear. In fact, err is misleading as well. The err variable is also used for a
success status.
Proper indentation
This code is pulled from GitHub and I’m not
sure what editor is used to edit the code. It’s entirely possible this is a
non-issue in the IDE that the developers use. I do see the issue when browsing
the code online as well as when it’s downloaded to a Windows machine. Obviously
the line in question is a mistake and is not properly indented because of the
mistake, but several lines of code are not indented properly and can cause
confusion. Improper indentation can lead to easily misread code.
Edit: The code formatter I am using does not properly show the indentation. It is viewable in the source for sslKeyExchange.c.
Edit: The code formatter I am using does not properly show the indentation. It is viewable in the source for sslKeyExchange.c.
It is easy to misinterpret:
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; err = sslRawVerify(ctx, ctx->peerPubKey, dataToSign, /* plaintext */ dataToSignLen, /* plaintext length */ signature, signatureLen); if(err) { sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify " "returned %d\n", (int)err); goto fail; }
as
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) { goto fail; err = sslRawVerify(ctx, ctx->peerPubKey, dataToSign, /* plaintext */ dataToSignLen, /* plaintext length */ signature, signatureLen); if(err) { sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify " "returned %d\n", (int)err); goto fail; } }
Not unit testing
Unit tests are designed to catch this exact
issue. Two unit tests would have found this issue immediately, one checking for
successful results and one checking for errors. This would have been an easy
catch. Unit testing would not prevent the bug, but it would catch it before
promoting the code.
Not using an IDE that highlighted the issue or
ignoring the warning if the IDE did give one.
I have not researched this to find out what
IDE they use, so it’s impossible for me to say they ignored a warning or simply
weren’t given one. Visual Studio will give a warning if compiling code with the
above issue. Alan Coopersmith has an interesting blog post
about this and shows that gcc as of at least 4.5.2 will not give a warning for
unreachable code using the default settings. Specifying –Wall (give me all
warnings) still doesn’t warn on unreachable code. –Weverything does, but also
gives so many other warnings that it is typically not used.