Thursday, February 27, 2014

The IOS goto fail Vulnerability and Code Practices

By now most everyone has read about the IOS vulnerability that was fixed last Friday. Because that portion of code is open source, it didn’t take long for the security community to track down the offending issue and bring it to the light of day. The majority of comments I have read have wondered how such a simple break could make it out in to the wild. I thought it would be fun to speculate as to how it happened and also to do a little code review. A good in depth review of the bug and it’s implications from a user standpoint is here.

*Disclaimer – I don’t consider myself better or above mistakes like this. The code I write today is as good as I understand the technology today, code I wrote last week is always not as good as it could be now. We are constantly learning as developers and continually making our code better. Reviewing the code we wrote last week and contemplating ways to make it better are all a part of becoming that better developer we all strive to be.

As developers we have come up with some best practices when coding, rules of thumb that we should follow. Most of these practices are debated and fought over like religious battles. They are usually a tug of war between readability, succinctness and performance. Most of the battles are grey areas. There is little black and white to software design and architecture. Code statements themselves are black and white, it either works or doesn’t. The composition of statements into useful code is still more art than science. Every program is unique with different user stories and different behaviors.

This code is interesting piece to review because in my opinion not following best practices led to this vulnerability. Individually they do not present a clear issue, but in tandem they allowed a single extra line of code to hide in the file unnoticed by compilers or humans. 99% of the time following the best practice is the best approach because doing so reduces the chances of bad stufftm happening, but 1% of the time there is good reason not to. Knowing when is the key.


Following is the code and following that are the practices it breaks.
static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    SSLBuffer       hashOut, hashCtx, clientRandom, serverRandom;
    uint8_t         hashes[SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN];
    SSLBuffer       signedHashes;
    uint8_t   *dataToSign;
 size_t   dataToSignLen;

 signedHashes.data = 0;
    hashCtx.data = 0;

    clientRandom.data = ctx->clientRandom;
    clientRandom.length = SSL_CLIENT_SRVR_RAND_SIZE;
    serverRandom.data = ctx->serverRandom;
    serverRandom.length = SSL_CLIENT_SRVR_RAND_SIZE;


 if(isRsa) {
  /* skip this if signing with DSA */
  dataToSign = hashes;
  dataToSignLen = SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN;
  hashOut.data = hashes;
  hashOut.length = SSL_MD5_DIGEST_LEN;
  
  if ((err = ReadyHash(&SSLHashMD5, &hashCtx)) != 0)
   goto fail;
  if ((err = SSLHashMD5.update(&hashCtx, &clientRandom)) != 0)
   goto fail;
  if ((err = SSLHashMD5.update(&hashCtx, &serverRandom)) != 0)
   goto fail;
  if ((err = SSLHashMD5.update(&hashCtx, &signedParams)) != 0)
   goto fail;
  if ((err = SSLHashMD5.final(&hashCtx, &hashOut)) != 0)
   goto fail;
 }
 else {
  /* DSA, ECDSA - just use the SHA1 hash */
  dataToSign = &hashes[SSL_MD5_DIGEST_LEN];
  dataToSignLen = SSL_SHA1_DIGEST_LEN;
 }

 hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
    hashOut.length = SSL_SHA1_DIGEST_LEN;
    if ((err = SSLFreeBuffer(&hashCtx)) != 0)
        goto fail;

    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail; //here is the problem
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

 err = sslRawVerify(ctx,
                       ctx->peerPubKey,
                       dataToSign,    /* plaintext */
                       dataToSignLen,   /* plaintext length */
                       signature,
                       signatureLen);
 if(err) {
  sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify "
                    "returned %d\n", (int)err);
  goto fail;
 }

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;

}

Let’s start with a quick discussion of what the code is doing and why it fails. The interesting bits are the if statements from the ReadyHash call down. Basically the code is creating a SHA1 key that it will verify. After the third SSLHashSHA1.update() call the goto fail statement is repeated. This was either caused by the deletion of another function call or simply the duplication of the line. The result is that the implicit code block for that if statement is only the single next statement. The second goto is executed no matter the outcome of the statement above. The variable err is updated in each if statement and returned at the end. Because no error condition actually existed when it hits the extra goto statement, it returns the success result of the previous function.

Each of the best practices not followed are below.

Use of goto
The saying we typically use is never use goto. I don’t like to say never. If  you have to use goto you are probably doing something wrong. Sometimes however, it can make sense. In this case it’s used to short circuit several tests throughout the function and this pattern is used throughout other functions in the class. Each goto statement and it’s corresponding label are contained within the function. Each function still starts at the top and ends at the bottom with the return statement. It’s much better than a complicated nested if statement or compound if statements. This is the easy thing to point to as the major cause of this flaw because of our general rule about using goto, but I don’t think so. I believe it’s warranted here. It makes it extremely easy to read and understand what is happening and why. The goto statements do not jump around creating the spaghetti code, they just skip to the bottom of the function and bypass additional checks if one above does not pass.

Not putting blocks with if statements
When an if statement is not followed by curly braces to create a code block, the single next command is executed if true. Writing if statements without curly braces can make the code more succinct. Adding braces would have added 11 lines of code to this one function. In my opinion using braces with one statement better expresses the logic of the code. The extra typing and extra space it takes is more than justified by the gains in readability and explicitness of the code. If there were a code block here chances are the second goto would have been enclosed in that code block and the issue would have been avoided. At the very least, if an if statement were deleted it would have been noticeable that a code block with braces remained. In my opinion this was the best practice that would have had the best chance of preventing the issue.

The naming convention is confusing
At first glance the code block makes sense. If there is an error it goes to the fail tag and exits the function. But the fail tag in this case doesn’t truly mean fail, it means exit. The return code is being set by an assign statement within the if statement. The first thing you would think when looking at this error for the first time would be that it’s going to always fail because it will always “goto fail”. The first thing you should think is it is always going to “goto exit” and the verify function will never be called. The fail label doesn’t accurately portray what the code below it is doing and should be called exit, endfunction, cleanup or something of that nature. This would not fix the issue, but it would make the intentions of the code clear. In fact, err is misleading as well. The err variable is also used for a success status.

 Proper indentation
This code is pulled from GitHub and I’m not sure what editor is used to edit the code. It’s entirely possible this is a non-issue in the IDE that the developers use. I do see the issue when browsing the code online as well as when it’s downloaded to a Windows machine. Obviously the line in question is a mistake and is not properly indented because of the mistake, but several lines of code are not indented properly and can cause confusion. Improper indentation can lead to easily misread code. 
Edit: The code formatter I am using does not properly show the indentation. It is viewable in the source for sslKeyExchange.c.
It is easy to misinterpret:

    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

 err = sslRawVerify(ctx,
                       ctx->peerPubKey,
                       dataToSign,    /* plaintext */
                       dataToSignLen,   /* plaintext length */
                       signature,
                       signatureLen);
 if(err) {
  sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify "
                    "returned %d\n", (int)err);
  goto fail;
 }
as
        if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) {
        goto fail;

 err = sslRawVerify(ctx,
                       ctx->peerPubKey,
                       dataToSign,    /* plaintext */
                       dataToSignLen,   /* plaintext length */
                       signature,
                       signatureLen);
 if(err) {
  sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify "
                    "returned %d\n", (int)err);
  goto fail;
 }
}
      Not unit testing
Unit tests are designed to catch this exact issue. Two unit tests would have found this issue immediately, one checking for successful results and one checking for errors. This would have been an easy catch. Unit testing would not prevent the bug, but it would catch it before promoting the code.

Not using an IDE that highlighted the issue or ignoring the warning if the IDE did give one.
I have not researched this to find out what IDE they use, so it’s impossible for me to say they ignored a warning or simply weren’t given one. Visual Studio will give a warning if compiling code with the above issue. Alan Coopersmith has an interesting blog post about this and shows that gcc as of at least 4.5.2 will not give a warning for unreachable code using the default settings. Specifying –Wall (give me all warnings) still doesn’t warn on unreachable code. –Weverything does, but also gives so many other warnings that it is typically not used.

No comments:

Post a Comment