Added first four exercises of start-guide#133
Added first four exercises of start-guide#133josh-nook wants to merge 1 commit intobeehive-lab:masterfrom
Conversation
IgWod
left a comment
There was a problem hiding this comment.
Hey @josh-nook,
There are few small comments to address and the ending needs some polish (as you already know), but otherwise I think it is a quite good tutorial. Well done!
| > | ||
| > **Modification:** The altering of a program | ||
|
|
||
| So altogether, a DBM _Tool_ is a program that can alter natively compiled user-space binary during runtime, with no source code required. We could take `simple_program` and pass it through to MAMBO as we did before, but instead of simply executing it, we could perform all sorts of modifications on it. Examples of these include: |
There was a problem hiding this comment.
A small detail. Technically MAMBO is a DBM framework, whereas a tool is MAMBO + a specific plugin (e.g., MAMBO memcheck). Admittedly, we have been quite bad at making that distinction ourselves, but if possible, let's try to call MAMBO a DBM framework.
| > | ||
| > **Debugging:** Detecting memory faults within a program | ||
|
|
||
| MAMBO isn't by any means the first DBM Tool to exist. [Pin](https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-instrumentation-tool.html), [Qemu](https://www.qemu.org), and [DynamoRIO](https://dynamorio.org) are all examples of DBM-based tools. So if other options are avaliable, what is the purpose of MAMBO? |
There was a problem hiding this comment.
Here the terminology is more important. It should say DBM frameworks, not DBM-based tools, since stuff like Pintools are clearly defined by PIN.
|
|
||
| ### Why MAMBO? | ||
|
|
||
| MAMBO was created as part of Cosmin Gorgovan's EPSRC-funded PhD in the School of Computer Science at the University of Manchester, with a handful of properties that distinguishes it from other DBMs: |
There was a problem hiding this comment.
A nitpick. It reads like Cosmin created all those features; however, Guillermo optimised it for ARM64, and Alistair developed RISC-V support. I would say something along the lines: "It was initially developed by Cosmin's ... with other people contributing since then".
|
|
||
| This exercise will go through how a program like our `simple_program` is executed using MAMBO, step-by-step. It's not _necessary_ content for the rest of the tutorial, but it'll certainly help you fully grasp MAMBO if you want to contribute the project. | ||
|
|
||
| This exercise will obfuscate for the sake of simplicity much of how MAMBO works, most notably with optimisations regarding branches. |
There was a problem hiding this comment.
The second part of the sentence needs a better flow. Maybe: "..., most notably branch optimisations".
| </div> | ||
| <br> | ||
|
|
||
| For simplicity, portability, and full control over execution, DBM Tools often **load target programs within their own address space**. This cannot be done with `ld`, shown on the LHS of the diagram below, so we must implement a userspace loader which for MAMBO is `libelf`: |
There was a problem hiding this comment.
I would just add at the end: "Not to be confused with the Linux provided libelf: https://archlinux.org/packages/core/x86_64/libelf/"
| Most optimisations are to do with the main source of overhead in DBM tools: indirect branches. Description of optimisations are out of the scope of this tutorial, so a handful of them are outlined below: | ||
|
|
||
| - **Inline hash lookups** are instrumented at the end of code blocks | ||
| - **Hot Paths** between basic blocks are identified and directly linked |
There was a problem hiding this comment.
This needs a bit of updating. Hot paths and traces are directly related, as traces are created for hot paths. Also, as far as I remember, traces can contain conditional branches. So the bullet points should be something like (not in these exact words):
- Indirect branch opts - I think that is correct
- Direct linking of direct branches (conditional and unconditional) to avoid calling the dispatcher
- Traces to optimise hot paths
| - Instruction specific events | ||
|
|
||
|
|
||
| Callback functions that we write for inserting code into basic blocks (instrumentation) are registered with **scantime** with *MAMBO generated scantime events* ie. when our target program is passed through the code scanner: |
There was a problem hiding this comment.
What about: "... are registered on scan-time to be executed with MAMBO generated scan-time events".
| } | ||
| ``` | ||
|
|
||
| We've included also included a print statement so we can see the filename location and start address of each basic block. |
There was a problem hiding this comment.
"included" is repeated twice.
|
|
||
| The lighter sections within the label blocks represent a single basic block. For all but `LBB0_1`, they have one basic block as they end on a branch statement. | ||
|
|
||
| `LBB0_1` however has two basic blocks. This is because there are two branch statements: `b.ge .LBB0_4` and `b .LBB0_2`. Since a basic block is **strictly** single entry and single branch exit, `LBB0_1` will constitute as two seperate basic blocks in the code cache. |
There was a problem hiding this comment.
Branch-link (bl) also splits the assembly block into 2 basic blocks.
|
|
||
| // TODO | ||
|
|
||
| // I'm unsure as to why there is only 5 basic blocks and not 6. |
There was a problem hiding this comment.
I would run objdump on the binary and look at the address, this may shed some lights on what is happening.
|
Thanks for the feedback, I'll work on these at some point this week |
Ignore in current state, this is for myself and another developer to go through over voice call.