Skip to content

Commit 9677e47

Browse files
committed
Update documentation for calling convention changes
1 parent 4f8c337 commit 9677e47

4 files changed

Lines changed: 85 additions & 3 deletions

File tree

docs/dev/archplatform-platform.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,16 @@ Other fields that Quark did not need, but you can specify:
5555
* `implicitly_defined_regs` - Certain calling conventions pass registers to calls which are not included in type signatures (such as how MIPS on Linux sets `$t9` to the address of the called function, but this should not clutter up the type signature).
5656
* `required_arg_regs` - If specified, heuristic calling convention detection will only consider this calling convention if all the registers specified here are used before they are defined.
5757
* `required_clobbered_regs` - If specified, heuristic calling convention detection will only consider this calling convention if the function clobbers all the registers specified here.
58+
* `stack_args_naturally_aligned` - For stack arguments, set this to True if the convention pads each stack slot up to the natural alignment of its type rather than the address-size word. Defaults to False.
59+
* `is_return_type_reg_compatible` — Returns True if a return value of a type fits in the convention's return register(s). Override when your ABI accepts only a subset of widths or excludes structures/arrays even when they would technically fit. Used by analysis to decide whether to use an indirect return.
60+
* `is_arg_type_reg_compatible` — Returns True if a parameter value of a type fits in the convention's parameter register(s).
61+
* `is_non_reg_arg_indirect` — Returns True if a parameter that does not fit in registers should be passed by pointer instead of pushed on the stack.
62+
* `get_indirect_return_value_location` — Returns the `Variable` that holds the caller-supplied pointer to the return value storage when the return value is too big for registers. Defaults to the first integer argument register.
63+
* `get_returned_indirect_return_value_pointer` — Returns the `Variable` that the *callee* uses to give the indirect return pointer back to the caller, or `None` if the convention does not return it.
64+
* `get_return_value_location` — Compute the return value location for a given `ReturnValue`.
65+
* `get_parameter_locations` — Compute the location for each parameter given the already resolved return value location (which the convention may need to skip past).
66+
* `get_stack_adjustment_for_locations` — Returns the number of bytes of stack adjustment performed by the called function on return. The default returns zero (caller cleans the stack).
67+
* `get_call_layout` — Compute a complete `CallLayout` (parameter locations, return value location, stack adjustment, register stack adjustments) for a function with the given signature. The default implementation composes the answer from `get_return_value_location`, `get_parameter_locations` and the stack adjustment helpers. Override only when the layout has interactions you can't express component-wise (Go's stack-after-args return slot is an example).
5868

5969
Then, we need to register the Calling Convention and tell the Platform and Architecture to use it:
6070

docs/dev/bnil-hlil.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ There are a number of properties that can be queried on the [`HighLevelILInstruc
7979
* `HLIL_SPLIT` - A split pair of variables `high`:`low` which can be used a single expression
8080
* `HLIL_DEREF` - Dereferences `src`
8181
* `HLIL_DEREF_FIELD` -
82+
* `HLIL_PASS_BY_REF` - Wraps `src` to indicate that the calling convention is passing a parameter by reference. The inner expression has the reference taken and has a pointer type. Only appears as a parameter expression on a call instruction.
83+
* `HLIL_RETURN_BY_REF` - Wraps `src` to indicate that the value is being returned indirectly through a caller-supplied pointer. The inner expression is the destination of the return value, not a pointer to it. Only appears on the left side of an assignment for the result of a call instruction.
8284

8385
### Arithmetic Operations
8486

docs/dev/bnil-mlil.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,9 +277,8 @@ The parameter list can be accessed through the `params` property:
277277

278278
* `MLIL_JUMP` - Branch to the `dest` expression's address
279279
* `MLIL_JUMP_TO` - A jump table dispatch instruction. Uses the `dest` expression to calculate the MLIL instruction target `targets` to branch to
280-
* `MLIL_CALL` - Branch to the `dest` expression function, saving the return address, with the list of parameters `params` and returning the list of return values `output`
280+
* `MLIL_CALL` - Branch to the `dest` expression function, saving the return address, with the list of parameters `params` and a list of output expressions `output_exprs` describing how each return value is delivered
281281
* `MLIL_CALL_UNTYPED` - This is a call instruction where stack resolution could not be determined, and thus a list of parameters and return values do not exist
282-
* `MLIL_CALL_OUTPUT` - This expression holds a set of return values `dest` from a call
283282
* `MLIL_CALL_PARAM` - This expression holds the set of parameters `src` for a call instruction
284283
* `MLIL_RET` - Return to the calling function.
285284
* `MLIL_RET_HINT` - Indirect jump to `dest` expression (only used in internal analysis passes.)
@@ -318,7 +317,7 @@ The parameter list can be accessed through the `params` property:
318317
* `MLIL_FLOAT_CONST` - A floating point constant `constant`
319318
* `MLIL_IMPORT` - A `constant` integral value representing an imported address
320319
* `MLIL_LOW_PART` - `size` bytes from the low end of `src` expression
321-
320+
* `MLIL_PASS_BY_REF` - Wraps `src` to indicate that the calling convention is passing a parameter by reference. The inner expression has the reference taken and has a pointer type. Only appears as a parameter expression on a call instruction.
322321

323322
### Arithmetic Operations
324323

@@ -401,4 +400,14 @@ The parameter list can be accessed through the `params` property:
401400
* `MLIL_UNIMPL` - The expression is not implemented
402401
* `MLIL_UNIMPL_MEM` - The expression is not implemented but does access `src` memory
403402

403+
### Function Call Outputs
404+
405+
Prior to version 5.4, a function call could only return a list of variables as output. The `output` property on call instructions remains a list of variables, but a function call's `output_exprs` is a list of expressions that describe in more detail how each return value is delivered to the caller, and also adds support for indirect stores. The expressions in the list are one of:
406+
407+
* `MLIL_VAR_OUTPUT` - a whole variable is written. The simplest, most common case.
408+
* `MLIL_VAR_OUTPUT_FIELD` - a field of a variable (at byte `offset`) is written. Used when the return value is placed into part of a larger structure.
409+
* `MLIL_STORE_OUTPUT` - the return value is stored to memory at the given destination expression. Used for indirect returns that do not target a local variable.
410+
411+
Additionally, a return value can be the following expression, wrapping one of the above:
404412

413+
* `MLIL_RETURN_BY_REF` - Wraps `src` to indicate that the value is being returned indirectly through a caller-supplied pointer. The inner expression will be one of the `MLIL_VAR_OUTPUT`, `MLIL_VAR_OUTPUT_FIELD`, or `MLIL_STORE_OUTPUT` instructions.

docs/guide/types/attributes.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,12 +137,73 @@ The following built-in calling conventions without dedicated keywords are availa
137137
|`windows-syscall`|aarch64|Windows system call|
138138
|`apple-syscall`|aarch64|macOS and iOS system calls|
139139
|`go-stack`|x86, x86_64|Stack-based calling convention used by the Go compiler on 32-bit x86 or older compilers|
140+
|`pascal`|x86|Pascal stack-based convention with left-to-right parameter passing and callee stack cleanup|
140141
|`register`|x86|Register-based calling convention with left-to-right parameter passing (used by default in Delphi)|
141142
|`gcc-fastcall`|x86|The `fastcall` calling convention as implemented in GCC on non-Windows platforms|
142143
|`clang-fastcall`|x86|The `fastcall` calling convention as implemented in Clang on non-Windows platforms|
143144
|`gcc-thiscall`|x86|The `thiscall` calling convention as implemented in GCC on non-Windows platforms|
144145
|`clang-thiscall`|x86|The `thiscall` calling convention as implemented in Clang on non-Windows platforms|
145146
147+
???+ Warning "Linux x86 / x86_64 default convention rename"
148+
Prior to version 5.4, the default Linux convention on x86/x86_64 was named `cdecl` (and the stdcall variant was `stdcall`). It is now `sysv` (and `sysv-stdcall`) to deconflict with the Windows behavior of `cdecl`/`stdcall`. Both names continue to be registered on the architecture, so `__convention("cdecl")` will still resolve to the Windows version of `cdecl` even on Linux. If you have scripts that match calling conventions by string name, update them to recognize `sysv` and `sysv-stdcall`.
149+
150+
## Custom Parameter and Return Value Locations
151+
152+
Calling conventions describe the default placement of parameters and return values, but many real-world ABIs have functions whose locations diverge from those defaults (for example, hand-tuned assembly, custom register conventions, or high-level language features). You can override the default location for individual parameters with the `@` syntax or for a function's return value with the `__location("...")` attribute. The argument is a string in Binary Ninja's value-location syntax (described below).
153+
154+
### Examples
155+
156+
``` C
157+
/* Parameter locations: place this parameter in a specific register or stack slot */
158+
int foo(int reg_param @ rdi, int stack_param @ 0x10);
159+
160+
/* Return-value location: return through rsi instead of the default rax */
161+
int bar() __location("rsi");
162+
163+
/* A 16-byte value returned with the high half in rdx and the low half in rax;
164+
components are written left-to-right from high to low */
165+
struct pair get_pair() __location("rdx:rax");
166+
167+
/* A parameter value in two registers. Complex locations for parameters are quoted. */
168+
struct void set_pair(struct pair value @ "rdx:rax");
169+
170+
/* A return value spanning two registers with explicit field offsets */
171+
struct mixed get_mixed() __location("[0x0: rax, 0x8: xmm0]");
172+
173+
/* An indirect return through a caller-supplied pointer; the leading * marks the
174+
location as a pointer to the storage, and "-> *rax" says the same pointer is
175+
returned in rax */
176+
struct big get_big() __location("*rdi -> *rax");
177+
```
178+
179+
### Value Location Syntax
180+
181+
The string that makes up a location describes one or more storage components (the locations holding the bytes of a single value). The grammar is:
182+
183+
* **Register component:** the register name, e.g., `rax`, `xmm0`, `r1`.
184+
* **Stack component:** an integer offset into the caller's stack frame (decimal or `0x`-prefixed hex), e.g., `0x10`, `-4`.
185+
* **Component size suffix:** append `.b`, `.w`, `.d`, `.q`, `.t`, or `.o` for 1/2/4/8/10/16-byte sizes, or `.<n>` for an explicit byte count, e.g., `eax.d`, `r0.q`, `rax.b`. Without a suffix, the natural register width is used (for stack components, sizes are inferred from the type).
186+
* **Multi-component (concatenated):** components separated by `:`, written **high-to-low**, e.g., `rdx:rax` puts the low half in `rax` and the high half in `rdx`. This form requires that components are contiguous.
187+
* **Multi-component with offsets:** when components are not contiguous, list them inside `[ ... ]` as `offset: component`, e.g., `[0x0: rax, 0x8: xmm0]`. Offsets are byte offsets within the value being passed/returned.
188+
* **Indirect:** prefix the entire location with `*` to indicate the location holds a pointer to the value rather than the value itself, e.g., `*rdi`.
189+
* **Returned-pointer hint:** for indirect returns where the same pointer is also returned in a register, append `-> *<reg>`, e.g., `*rdi -> *rax`.
190+
191+
### Pass By Value and By Reference
192+
193+
For composite types (structures, arrays) the calling convention decides whether to pass the value packed into registers, on the stack, or indirectly through a pointer. When that default is wrong for a particular declaration (most commonly in C++ where non-trivial type rules are applied that cannot always be determined at the binary level) you can override it with `__by_value` or `__by_ref`:
194+
195+
``` C
196+
/* Force this argument to be passed by value (in registers or on the stack)
197+
even when the convention would normally pass it indirectly */
198+
void takes_value(struct value_type __by_value arg);
199+
200+
/* Force this argument to be passed by reference (as a pointer) even when the
201+
convention would normally pass it by value */
202+
void takes_object(struct object_type __by_ref arg);
203+
```
204+
205+
`__by_value` and `__by_ref` apply per-parameter and affect only the location chosen for the parameter (the parameter's type in the signature is unchanged). If you need to override the *exact* register or stack slot, use the `@` syntax or `__location()` attribute described above instead (it implies a custom location and overrides any by-value/by-ref decision).
206+
146207
## System Call Functions for Type Libraries
147208
148209
[Type Libraries](typelibraries.md) can annotate system calls by adding functions with the special `__syscall()` attribute, specifying names and arguments for each syscall number. This attribute has no effect outside of [Type Libraries](typelibraries.md) and [Platform Types](platformtypes.md).

0 commit comments

Comments
 (0)