Define bounded lists#640
Conversation
lukewagner
left a comment
There was a problem hiding this comment.
Thanks! This is looking generally good, a few comments:
| def flatten_list(elem_type, maybe_length, maybe_variable, opts): | ||
| if maybe_length is not None: | ||
| if maybe_variable: | ||
| return flatten_type(varint_type(maybe_length), opts) + flatten_type(elem_type, opts) * maybe_length |
There was a problem hiding this comment.
This is pre-existing, but a question: for flattening bounded- or fixed-length lists: should we have some low (e.g., 4) per-value maximum flattening length, beyond which the list gets passed via pointer? I'm mostly thinking of the case where there is some list that has a large bound, and it gets used as a function parameter, and it "blows" the MAX_FLAT budget, causing all the parameters to go into the heap when instead you probably wanted the bounded-/fixed-length list to go in the heap.
There was a problem hiding this comment.
From my perspective (one who wants to reduce/avoid allocations) this would make the data type harder to reason about. Fixed and bounded lists should be guaranteed to be "static".
TBH I'd rather say that some datatypes (like these special lists) should enforce* the use of a param or return area for passing - because lists are indexed dynamically unlike tuples (-> indirect addressing works only on memory locations and compilers would otherwise have to compensate for such lists split across registers - pretty terrible).
(enforce: such lists should definitely be placed into param/return area but other types, if possible could continue to be passed by args... but yeah, this obviously complicates the flattening logic)
There was a problem hiding this comment.
Ah, so then could we say that fixed- and bounded-length lists never get flattened, and are always passed as pointer (and maybe length if fixed)?
There was a problem hiding this comment.
Yes this is what I mean (except: length if bounded, that could still be in a register).
I will explore how that logic extension could look (in Python) in the next commit.
There was a problem hiding this comment.
Just pushed a proposal for hybrid lifting/lowering where register and memory passing can coexist.
There was a problem hiding this comment.
Flat-calling-convention treatment of fixed-length and bounded lists
Background
The canonical ABI flattens component-level values into Wasm core function
parameters. A MAX_FLAT cap (currently 16) limits how many flat slots a call may use before everything is
redirected through a param-area (a caller-allocated memory block whose
pointer is passed in r0).
The current implementation flattens fixed-length lists element-by-element
(list<T, N> → N flat slots) and bounded lists with a leading length register
(list<T, ..N> → 1 + N flat slots). This is wasteful in several ways.
A key property of fixed and bounded lists is that they can be passed without
additional allocations beyond the param area itself: because the maximum byte
count is statically known from the type, the caller can reserve space inline
rather than performing a separate heap allocation.
Why the current strategy is problematic
Semantic mismatch
List element access is indexed by a runtime value. Even if all elements
arrive in registers, the callee must immediately spill them to memory before it
can iterate or index, because registers are not addressable. Tuples do not
have this problem — each field is a compile-time-known register slot.
Passing list elements in registers therefore adds round-trip spill cost with no
benefit.
Budget poisoning
When the flat slots for a list exceed MAX_FLAT, the ABI falls back to the
classic all-params-to-param-area strategy: all arguments, including simple
scalars that would fit comfortably in registers, are bundled into the param
area and only a single pointer is passed. The list causes unrelated arguments
to pay an indirection penalty.
Example: f(list<s32, 16>, s32) needs 17 slots (16 elements + 1 scalar). With
MAX_FLAT=16 the scalar ends up in memory alongside the list — even though
there was a free register waiting for it.
Proposal
Allow list elements to occupy the param area without consuming flat-register slots.
- Fixed list: 0 flat slots; list elements are stored in the param area
(if no param area is required by other arguments, one is created and the
elements occupy it starting at offset 0). - Bounded list: 0 flat slots; a varint length prefix followed by the list
elements are stored in the param area (if no param area is required by other
arguments, one is created, with the varint prefix at offset 0 followed by the
elements). - After processing all typed arguments, if a param area is needed, one
additional flat slot (param area address) is appended to the flat list. - If the total flat slots now exceed
MAX_FLAT, fall back to the classic
all-params-to-param-area strategy for backward compatibility.
Benefits: scalar arguments that fit within MAX_FLAT stay in registers even
when a list is present. Both fixed and bounded lists use the same in-memory
layout (elements for fixed, varint | elements for bounded) whether they
appear as top-level flat arguments or as elements of an outer fixed/bounded
list — the representation composes uniformly.
Cost: the extra param area address slot can tip the budget when many scalar arguments
accompany a list, at which point the classic fallback fires. Potentially we could excempt the param area pointer from MAX_FLAT budget, like result area pointer is.
There was a problem hiding this comment.
Probably worth a separate discussion though I wanted to show a more natural way to deal with in-place lists.
There was a problem hiding this comment.
I shy away from the added complexity of mixed flattened and in-memory arguments. Up to now either all arguments are in registers or in memory.
Also I just realized that for guest imported functions list<> doesn't have to be heap allocated, as it is passed by reference, similarly guest exported results (custom cabi_post can replace malloc/free with arenas). Only the two other cases require cabi_realloc (which could still be different from malloc): Guest import result and guest export arguments. So this representation is only applicable to guest export (host calls into guest) argument representation.
Also I don't want to derail this discussion with just another wild idea, but I have been thinking about using SIMD/vector encoding of fixed length lists into registers, e.g. passing an [u8; 8] in an u64.
I think with SIMD and shift instructions this is efficient enough to access/process and helps reducing excessive register grabbing in the flattened case. My guess is that [u8; 8] to [u8; 64] would be a common enough structure element to make this worthwhile.
| `none` case of an optional immediate.) | ||
| * 🔧 for fixed-sized lists the length of the list must be larger than 0 to pass | ||
| validation. | ||
| * 🔧 for fixed-sized lists (`0x67`) the length of the list must be larger than |
There was a problem hiding this comment.
Pre-existing, but it looks like the grammar already covers this twice: once by using <u32> (unsigned) and once with the (if maxlen > 0). We could also remove the (if maxlen > 0). But should we specify a maximum for maxlen?
There was a problem hiding this comment.
For u32, zero is valid and the maxlen > 0 checks forbids zero. What would be a legitimate upper bound... i32::MAX?
There was a problem hiding this comment.
There's already (just recently added) a MAX_LIST_BYTE_LENGTH = 228-1. That's just bytes, but even still, it seems like a reasonable upper bound.
There was a problem hiding this comment.
Ok, defined that bound in BINARY.md but where would we check this in definitions.py ?
Cherry-picked the essence of cpetig's commit d2874eb from https://github.com/cpetig/component-model/tree/bounded-lists, adapted to the current codebase (ptr_type/opts threading, updated class names). Bounded strings are intentionally excluded. Co-authored-by: Christof Petig <christof.petig@arcor.de>
- Add trap_if(actual_len > maybe_length) to lift_flat_list, mirroring the existing trap in load_list's heap path - Add over-length trap tests for both flat and heap lifting - Add alignment test for bounded list of U32 (verifies 3-byte padding after U8 length prefix)
- fix memory bounds checking - improve integration with existing list load/store code - fix indentation - avoid default argument - more readable load/store recipe
Add bounded lists (
list<T, ..N>)Closes #385.