Smashing the Potato: Codegate 2015 'mashed

mashed_potato was a pwnable worth 600 points during Codegate CTF 2015.

We’re provided with a binary as well as the IP address and port of the target server. Here’s an excerpt from running the binary:

$ ./mashed
Leave your message
Menu)
 1: leave plain text
 2: leave encrypted text
 3: exit
Select : 2
Message length: 100
Message : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
17 bytes of message sent
Select : 1
Message length: 200
Message : BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
200 bytes of message sent
Select : 3

And that’s already pretty much all the functionality the binary provides.

##Reversing + Bug Discovery

Reverse engineering the binary is fairly simple (even more so since it’s not stripped). Here are the key points:

All messages are written to the file ‘msg.log’
The size values and menu choices are first read into a stack buffer (via fgets()), then converted to an integer using atoi()
The messages are read into a stack buffer (approximately 500 bytes large) using fgets() with the user provided size
Encryption works by first compressing the messages (using zlib), then doing repeated XOR with the key “The_M0s7_s3cur3_k3y”

With this the bug is fairly obvious: the size of the message is never checked and the buffer into which the message is written is located on the stack. So there’s a classic stack based buffer overflow. However, we have to deal with stack cookies, so no simple return address overwrite. The binary also does not fork so there’s a different cookie value for every connection.

And so the fun begins.

##Leaking the Cookie

The message we provide is written to the file ‘msg.log’. Since we can specify a larger size than what we actually submit (because fgets() will stop reading when it encounters a ‘\n’), we can write arbitrary stack memory into the log file. This is pretty useless as there is no way to read the logfile though. :)

The only output we get when submitting a message is the size of the data that was written to the file. For a plain message this will be the size that we specify. However, for an “encrypted” message this will be the size of the resulting compressed data. With this we thus effectively get an info leak through the compression ratio! Here’s the basic idea:

We will try to guess the cookie byte by byte. For every possible value of the current byte we generate the message content in a way that will result in better compression (i.e. shorter output size) if our current guess is correct. We will be able to notice the difference and proceed with the next byte until we have recovered the whole cookie. Note that we do not need to brute force the first byte of the (8 byte) cookie as it will always be zero (to prevent exploitation of strcpy() and similar functions).

##DEFLATE

To implement the above attack we first need to understand the compression algorithm used by zlib. The algorithm (DEFLATE) works in two steps as described here.

During the first phase (LZ77), repeating patterns are replaced by an index + length pair. So for example (stealing from above article :) )

Blah blah blah blah blah!

becomes

Blah b[D=5, L=18]!

Here “D=5” means “go back 5 bytes” and “L=18” means “use 18 bytes from that point on”. Note that the compressed and decompressed regions can overlap as is the case in this example.

The second phase performs a Huffman encoding on the result of the first phase. It is sufficient to know that after the Huffman encoding bytes with higher frequencies in the input text will be encoded using shorter sequences of output bits.

Since the LZ77 algorithm only affects sequences longer than a few bytes (probably 4) we will have to focus on the Huffman encoding for the first 4 bytes of the cookie (the last controlled byte before the cookie will always be ‘\n’ which we can’t put in preceding content). Afterwards we can (ab)use the LZ77 algorithm to recover the remaining bytes as this is a bit more reliable.

To avoid the LZ77 compression for the first part I used the sequence “XY00XY01XY02XY03XY04…” (hex) as input. This will cause the current byte (XY) to be encoded using the shortest sequence in the Huffman tree. If the next byte of the cookie also has the value ‘XY’ we will likely save an additional byte of output and be able to notice the difference.

For part two (brute forcing bytes 5-8) we can repeat the first part of the cookie with the current guess appended a couple of times. If we guessed correctly, LZ77 will be able to shorten the cookie during compression, again causing a noticeable decrease in the output size. This will roughly look as follows:

          +-------first part of cookie value
    -----\/----
    00 7F 43 98 31 ... 0A 00 7F 43 98 31 ..
                /\        --------/\---------
current guess --+                  +------------ stack cookie

will then become

                                   +---- L
                                --\/-
    00 7F 43 98 31 ... 0A F8 01 05 00 ..
                          --/\-
                             +---- D

Here we’ve guessed correctly and thus LZ77 is able to compress more efficiently.

(In practice things are a little bit different but this is the basic idea behind the attack).

##Exploitation

At this point we are able to recover the stack cookie, then get RIP control by overwriting past the cookie.

So where do we jump?

The binary is non-PIE (but does have NX enabled), meaning it will always be loaded at the same address. However, there aren’t really any useful ROP gadgets contained in the binary itself. While it may be possible to leak the saved return address to __libc_start_main (and thus the base address of the libc) from the stack using the same technique used to leak the cookie, I went a different way.

Before returning, the function executes the leave instruction, effectively performing a “mov rsp, rbp; pop rbp”. As such we can control the base pointer (and thus the stack frame) for the main function (which implements the main menu). This is interesting as the function reads up to 9 bytes (plus a terminating 0 byte) into a buffer relative to RBP when reading the user’s menu choice.

So here’s the plan:

Brute force the cookie as described above, then overwrite saved RBP with a location in the GOT of the binary. Make the function return cleanly by setting RIP to it’s original value (we can’t overwrite just the saved RBP as parts of RIP will be corrupted by the terminating 0 byte).
Send [b’1’ + the address of printf@plt], causing the main loop to overwrite fwrite@got with the address of printf@plt and also select menu entry 1 (plain message). Note that this works because atoi() is fairly permissive: A “1” followed by non-ASCII bytes will be interpreted as a 1.
Send a format string of the form “%93$lx”. The binary will fwrite() that string into the log file, however, fwrite@got has been overwritten and now points to printf(). We get a memory leak. (To get the offset of 93 either leak multiple values or break at fwrite() and look at the stack).
Now that we’ve got the base address of the libc, calculate the address of system and send [b’1’ + address of system()]. Same trick as above. Now just specify the shell command to execute :)

For some reason the authors of the challenge provided us with an IDA database of the libc used on the target system. With this we can easily find the offset of system() in the target’s libc.

Exploit output:

$ ./pwn.py
.................................................................
[+] next byte found: 43
...................................
[+] next byte found: 25
..................................................................................................................................................................................................................................
[+] next byte found: e8
[+] got first half: 0xe8254300
..........................................................................................
[+] next byte found: 5e
.................................................................................................................................
[+] next byte found: 86
........................................................................................................................................................
[+] next byte found: 9e
..........................................................................................................
[+] next byte found: 6f
[+] got cookie: 0x6f9e865ee8254300
[+] libc @ 0x7fb6fa0c0ec5
[+] system @ 0x7fb6fa0e5640
id
uid=1002(mashed) gid=1002(mashed) groups=1002(mashed)
ls
flag
mashed
msg.log
cat flag
no_more_mashed_potato

Find the full exploit code here.