saas was an exploitation challenge worth 50 points at the 31C3 CTF.

We solved it after the competition was already over.

``````Welcome to SaaS (Sorting as a Service)!

0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 1
How many numbers do you want to sort(max 1024)?
: 4
Number 1: 6
Number 2: 3
Number 3: 2
Number 4: 5
Number 1 is now: 2
Number 2 is now: 3
Number 3 is now: 5
Number 4 is now: 6
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 7
function f(a, b)
return a < b
end

How many numbers do you want to sort(max 1024)?
: 4
Number 1: 6
Number 2: 2
Number 3: 3
Number 4: 4
Number 1 is now: 2
Number 2 is now: 3
Number 3 is now: 4
Number 4 is now: 6
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 0
Goodbye
``````

As you can see, we can choose either a predefined comparator function for the sort or we can provide our own Lua code to define it.

Initially we tried to reverse the binary, which was made a bit harder by the fact that binary was compiled as position dependent code but then linked as a position independent exeuctable, resulting in a relocatable binary where all external function calls are patched at runtime by the loader. Some reversing tools deal badly with this situation (See below).

We note the following:

• NX and ASLR are enabled, the binary is fully position independent so its base address is also randomized.
• The program creates a buffer of size `4 * 1025` and fills the first word with the magic value `0xdeadbeef`.
• Input numbers are written into that buffer after the magic value.
• The call to `std::sort` is inlined, which makes the `main` function really long.
• The predefined comparators are also implemented in Lua and can be found as strings in the binary.
• Before executing any Lua code and before starting a sort, a call to `alarm` sets a timeout of 10 seconds, after which `SIGALRM` is raised. The alarm is cleared after the corresponding operation has finished. This is annoying for debugging.
• For the actual sorting, the function `comparator` at offset `0x1fc0` is used. It basically calls the Lua function `f` to compare the two given numbers (see image below).

You will notice in the screenshot the weird calls with offset 1. This is because the disassembler sees only the unrelocated binary.

The first thing we want to do is get rid of the alarm, which is an easy exercise. We can just patch the binary and replace the first function argument `0xa` with `0x0`.

## A not so useful bug

The program assumes that the function `f` has exactly two parameters. It first pushes the function object and then the arguments on the internal Lua stack. However, we can control the number of parameters, so we can in fact read and write beyond the stack by adding more parameters to f.

For debugging, we looked at the Lua source code to figure out the layout of the internal data structures. The global `lua_State` that also has a pointer to the top of the stack can be accessed under the symbol `L`. Let’s see:

``````gdb-peda\$ break comparator(unsigned int, unsigned int)
Breakpoint 1 at 0x1fc0
gdb-peda\$ run
Starting program: /home/niklas/ctf/31c3ctf/saas/saas/service
warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Welcome to SaaS (Sorting as a Service)!

0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 7
function f(a,b,c,d,e,f,g,h,i,j,k,l)
l=31337
end

How many numbers do you want to sort(max 1024)?
: 2
Number 1: 1
Number 2: 2
Breakpoint 1, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda\$ x/3wx L
// This is a lua_State struct
0x5655b0b0:	0x00000000	0x00000008	0x5655b228
gdb-peda\$ x/20gf 0x5655b228
// This is the Lua stack, containing pointers and 64-bit float values
0x5655b228:	nan(0x7a5465655bb10)	nan(0x7a5465655bb10)
0x5655b238:	nan(0x7a5455655bb80)	nan(0x7a5445655b6d0)
0x5655b248:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b258:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b268:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b278:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b288:	nan(0x7a50000000000)
gdb-peda\$ c
Continuing.
Breakpoint 1, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda\$ c
Continuing.
Number 1 is now: 1
Number 2 is now: 2
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: ^C
Stopped reason: SIGINT
0xf7fdabf0 in __kernel_vsyscall ()
gdb-peda\$ x/13gf 0x5655b228
0x5655b228:	-3.6993765324808395e+269	2
0x5655b238:	1	nan(0x7a5005655b6d0)
0x5655b248:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b258:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b268:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b278:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b288:	31337
``````

So we can in fact control the stack. With some experimentation we found out that we can have ~100 parameters. However, the stack has a size of only 40, so we can overflow it.

The bad news is that we did not find anything particularily interesting to overwrite. Maybe this bug is actually exploitable, but the conditions seem tough. We had one more idea though.

## std::sort and undefined behaviour

Maybe a bit untuitively, the C++ standard places strict requirements onto the comparator function provided to `std::sort` and other standard library functions. Specifically, it is required that the comparator defines a strict weak ordering. Providing a comparator that does not fulfil these requirements leads to undefined behaviour, which basically means that “anything can happen”. In particular, the implementation is allowed to infinitely loop or crash or overflow memory in this case.

And indeed, modern implementations of the C++ standard library opt to save some bounds checking in favor of performance. An example of a comparator that is very likely to crash with a recent GCC compiler is the following:

This sort actually overwrites the heap past the end of the allocated buffer, resulting in memory corruption. To understand why, we need to have a look at the implementation of std::sort in the standard library.

We notice several functions with the prefix `__unguarded_`. One of them is `__unguarded_linear_insert`. It underflows the buffer, which is not helpful to us because (a) all the interesting values lie after the buffer and (b) the comparator has a check that makes it exit immediately if one of the values is `0xdeadbeef`, which is the value in front of the buffer…

Let’s dig deeper:

This looks very promising. `__introsort_loop` is basically a quicksort with a fallback to heap sort in case a certain recursion depth is reached. It uses `__unguarded_partition_pivot` to partition the array.

If we watch closely, we control the return value of `__unguarded_partition` and thus `__unguarded_partition_pivot`! We can just make the comparator evaluate to `true` as often as we want, increasing the `__first` pointer beyond the end of the buffer. Because we then have `__first > __last`, the function returns `__first` immediately.

So what happens if we force `__unguarded_partition_pivot` to return an iterator past the end of the buffer? The recursion into the “right half” `[cut, last)` will do nothing because `cut > last` (it’s a signed comparison). Then the algorithm goes on to sort the “left half” `[first, cut)`, which is bigger then the original buffer!

In summary: We are now able to sort not only our 1024 input numbers, but also a range that goes beyond this buffer. E.g. we could sort 2048 words, namely our input plus 1024 extra words that happen to lie on the stack). Since we control the comparator, we actually have a very powerful primitive: We can permute the values on the stack arbitrarily, starting from the beginning of our buffer. In particular, we can read and write arbitrarily on the stack.

We know that we can arbitrarily permute the stack, but we are to lazy to figure out the exact sequence of answers that we would need for the permutation we want. So we use a trick: We write our own little C++ program that calls `std::sort` with a custom comparator, and after an initial sequence of answers to extend the sort range, we just fall back to a standard “less-than” comparator and record its answers. `std::sort` will now just do its job and sort the extended buffer.

We lay out the numbers in a way such that the sort will have to perform a swap of the first and last element in the range, and leave the rest of the elements intact. Our input array will look like this:

``````{m, 1, 2, ..., m - 1, 0}
``````

If within SaaS, we replay the exact answers that we recorded, it will happily repeat this exact permutation, regardless of the actual values in the array. The C++ program we created to record the answers will just output a sequence of answers that swaps the elements at offsets `0` and `offset`, when `std::sort` is called on a buffer of size `n`.

## Infoleak + EIP control

We can now read an arbitrary value on the stack, we just need to pick one! A good candidate is the return address into `__libc_start_main`:

``````gdb-peda\$ break main
Breakpoint 1 at 0x15a6
gdb-peda\$ run
Starting program: /home/niklas/ctf/31c3ctf/saas/saas/service
Breakpoint 1, 0x565565a6 in main ()
gdb-peda\$ backtrace
#0  0x565565a6 in main ()
#1  0xf7cc9e5e in __libc_start_main () from /usr/lib32/libc.so.6
#2  0x56556e88 in _start ()
gdb-peda\$ break comparator(unsigned int, unsigned int)
Breakpoint 2 at 0x56556fc0
gdb-peda\$ c
Continuing.
Welcome to SaaS (Sorting as a Service)!

0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 1
How many numbers do you want to sort(max 1024)?
: 2
Number 1: 1
Number 2: 2
Breakpoint 2, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda\$ x/40wx \$esp
0xffffc07c:	0x565573dd	0x00000002	0x00000001	0xffffc0c8
0xffffc08c:	0xffffc098	0xf7f5ce20	0xffffc114	0x00000002
0xffffc09c:	0x0000000a	0x565578d7	0x01eef476	0x00000000
0xffffc0ac:	0x46baeb00	0xf7f5ce60	0x565578d7	0xf7ed829b
0xffffc0bc:	0xffffc114	0xffffc10c	0x00000008	0xffffd128
0xffffc0cc:	0x56556dbd	0xffffc10c	0xffffc114	0x56556fc0
0xffffc0dc:	0x56556fc0	0x00000000	0x00000000	0x00000000
0xffffc0ec:	0x00000000	0x00000000	0x00000000	0x00000000
0xffffc10c:	0x00000001	0x00000002	0x00000000	0x00000000
``````

So the input buffer landed at `0xffffc10c`. Let’s see what’s 1024 words later:

``````gdb-peda\$ x/12wx 0xffffc10c + 1024*4
0xffffd10c:	0x46baeb00	0xf7e66420	0xf7f57c14	0x565577db
0xffffd11c:	0xf7e66000	0x00000000	0x56556e57	0x00000000
0xffffd12c:	0xf7cc9e5e	0x00000001	0xffffd1c4	0xffffd1cc
``````

Perfect, so the return address is the 8-th word after the buffer. Let’s write an exploit!

And voila:

``````\$ python2 leak.py
[*] leaked ret addr to libc_start_main = 0xf74aca83
``````

The offset is different on my local machine because I have a different libc version installed.

Also, if we try this under a debugger, we get

``````Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
gdb-peda\$
``````

So the swap worked and we control EIP.

## The final Exploit

From here on it’s a simple exercise: We know the libc base and we can write arbitrary values onto the stack. Now we simply overwrite saved eip with the address of system and set the first argument to point to “/bin/sh” (also found inside libc).

The final exploit code reliably gives shell access and we are able to retrieve the flag

``````\$ python2 exploit.py
[*] leaked libc base = 0xf74ce000