diff --git a/AGENTS.md b/AGENTS.md index 9fbeaa2b..96bb49f0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -31,6 +31,10 @@ Never mark a task done while tests are failing. ## Implementation notes +### Temporary immutable objects design + +When working on immutable objects, use `design/IMMUTABLE_OBJECTS_DESIGN.md` as the implementation design reference. This file is temporary and should be removed after the feature is complete. + ### v_object constructor conventions Types derived from `v_object` should follow the project-wide constructor pattern: @@ -48,6 +52,12 @@ When accessing a C++ object stored inside a Python wrapper, use `ext()` for read Use `modifyExt()` for real object mutations, especially durable state changes. Do not use `const_cast` on `ext()` to call a mutating method. If a wrapper currently exposes only a const object but needs a mutating API, change the wrapper type or access path so the mutation can go through `modifyExt()`. +### Python C API safety helpers + +When iterating over Python objects in C++, use `Py_FOR(item, iterator)` from `PySafeAPI.hpp` with an owned iterator, for example `auto iterator = Py_OWN(PyObject_GetIter(obj));`. The loop owns each yielded item and avoids manual `Py_DECREF` paths. + +For Python container/object writes, use the `PySafe_*` helpers from `PySafeAPI.hpp` instead of the raw C API when a helper exists, such as `PySafeList_SetItem`, `PySafeTuple_SetItem`, `PySafeDict_SetItem`, `PySafeDict_SetItemString`, `PySafeSet_Add`, and `PySafeModule_AddObject`. + ### MorphingBIndex: address and type can change on mutation A `MorphingBIndex` does not behave like a typical container. On mutation (`insert`, `erase`) it may morph into a different internal storage variant (itty / array_2..4 / vector / bindex), and the morph can change both its **address** and its **type**. diff --git a/design/IMMUTABLE_OBJECTS_DESIGN.md b/design/IMMUTABLE_OBJECTS_DESIGN.md new file mode 100644 index 00000000..f15aaafa --- /dev/null +++ b/design/IMMUTABLE_OBJECTS_DESIGN.md @@ -0,0 +1,215 @@ +# Immutable Objects Design + +This is a temporary design document for agentic development of immutable objects. Remove it after the feature is complete and the durable design has moved into permanent code comments or project documentation. + +## Goal + +Immutable memo objects can be optimized because, after construction, their fields cannot be modified. The only permitted post-construction changes are external reference and tag bookkeeping. This lets dbzero use a compact object layout that avoids mutable-object structures and can embed selected nested values directly into the root allocation. + +The Python programming model should remain transparent: immutable embedded objects should behave like normal memo instances for reads, references, weak references, tags, and tag-based lookup. + +## Layout Changes + +Immutable objects may deviate from the regular memo-object layout in these ways: + +- The KV-map is eliminated because adding fields after construction is not allowed. +- Nested tuples, strings, and byte arrays may be embedded directly in the object structure to avoid extra references and allocations. +- Other immutable nested memo objects may be embedded. +- Immutable collections such as `list`, `dict`, and `set` may be embedded into the root object when the cost model supports it. + +The layout keeps: + +- `POS-VT` and `INDEX-VT` segments, unchanged, for fixed-size members such as dates, datetimes, memo references, floats, or low-fidelity buffers. +- A new `OFFSET-MAP` structure, based on the `o_dict` implementation, mapping field index to offset. Both index and offset are stored as packed `uint32`. +- A variable-length member block (`VL-BLOCK`) immediately after the `OFFSET-MAP`. + +Offsets in `OFFSET-MAP` are calculated from the beginning of `VL-BLOCK`. Variable-length member types are stored in `VL-BLOCK` immediately before their contents. This allows dbzero to calculate the addresses of embedded nested members without needing the mutable KV-map. + +## Embedding Cost Model + +Embedding is not always the best storage model. It can reduce construction work and allocation count, but it can also make retrieval more expensive because fetching the root object may fetch embedded fields that the caller never reads. + +Use this criterion: + +```text +SavedCost > EmbeddedCost +``` + +Where: + +```text +SavedCost = + SeparateStorageBytes + + AllocationsAvoided * AllocationCost + +EmbeddedCost = + EmbeddedBytes + + ExtraPagesFetched * PageFetchCost + + AddressabilityCost + + ViewCost +``` + +Suggested constants: + +- `AllocationCost = 64b` +- `PageFetchCost = page_size / 2` +- `AddressabilityCost = 128` for nested memo objects only +- `ViewCost = 64` for simple nested objects +- `ViewCost = 128` for collections + +Inputs to consider: + +- Relative size of the embedded element as a proportion of the entire object. +- Absolute size of the embedded element. +- Allocation savings, especially for collections like sets and dicts. +- Administrative storage savings, including avoided pointers and headers. +- Expected read patterns, especially whether large embedded fields are commonly skipped. + +## Nested Object References + +Embedded nested objects must not be distinguishable from regular memo objects in Python code. + +Example: + +```python +@db0.memo +@dataclass +class InnerData: + inner_value: int + +@db0.memo +class OuterData: + value: int + inner_data: InnerData + + def __init__(self, value, inner_value): + self.value = value + self.inner_data = InnerData(inner_value) + +outer = OuterData(1, 2) + +# Reference to embedded instance. +other.ref = outer.inner_data + +# Weak reference to an embedded instance. +other_px.long_ref = db0.weak_proxy(outer.inner_data) + +# Assigning tags. +db0.tags(outer.inner_data).add("INNER") + +# Lookup by tags may retrieve the inner reference. +db0.find(InnerData, "INNER") +``` + +Implementation requirements: + +- Field retrieval returns an object view of the root object that exposes only the nested fields for read access. +- The view must maintain the lock or lifetime guard of the top-level object while nested fields are accessed. +- References to embedded objects point to a memory location inside the root allocation and also carry the nested member offset. The offset may be deeply nested. +- The lifecycle of an embedded object is tied to the root instance because the root owns the allocation containing the full embedded tree. +- The embedded member is identified by its own address, but that address is inside the allocation and is not the allocation start. +- The allocator must be able to recover allocation metadata from an inner address. This allows embedded object addresses to use the same 50-bit representation as regular object addresses. +- A parent object can still be referenced by the parent allocation address. + +## Object Views + +Nested embedded objects require specialized views rather than independent opened objects. + +Object views should: + +- Expose the same read interface expected for a memo object of the nested type. +- Resolve fields relative to the nested object offset inside the root allocation. +- Keep the root object allocation and lock valid for the duration of access. +- Reject mutation except for operations explicitly allowed for immutable objects, such as reference and tag bookkeeping. +- Support reference creation, weak proxy creation, and tag operations using the nested address. + +Collection views should follow the same model but account for collection-specific traversal and lookup costs. Use the higher `ViewCost` constant for collection embedding decisions. + +## Embedded Simple Sets + +The first embedded-set slice is `o_set`, a variable-length overlaid object for simple immutable set values. It uses the same tagged embedded item representation as `o_tuple_item`, so payload bytes live inside the set allocation rather than in side allocations. + +Layout: + +```text +o_set + packed count + packed element_block_byte_size + packed bucket_block_byte_size + o_tuple_item element[count] + uint32 bucket_offset_plus_one[capacity] + o_tuple bucket[occupied_slots] +``` + +Construction removes duplicate simple descriptors before arranging members. The first occurrence determines physical order in the main element stream. `count` stores the unique item count and `element_block_byte_size` stores the exact byte extent of that stream. The hash index is a direct bucket table: slot `hash % capacity` stores `bucket_block_offset + 1`, and `0` means empty. Each occupied slot points to an embedded `o_tuple` containing the elements that landed in that hash bucket. Lookup reads one slot and scans only that bucket tuple to resolve collisions. `sizeOf()` and `safeSizeOf()` rely on the stored element byte size, count-derived index size, and stored bucket byte size for the total extent. + +## Deferred Materialization + +Embedding pre-existing immutable dbzero instances is allowed only when the instance has no external references yet, because its final durable address is not known until it is embedded or otherwise materialized. + +Introduce deferred materialization for immutable objects: + +- Create immutable instances initially without a durable external address when possible. +- Materialize the instance when it is first externally referenced or embedded. +- If embedded, transform the Python wrapper into an object view whose lifetime is tied to the containing root object. +- If externally referenced first, materialize it as a standalone durable object and store normal references to it. + +Simple constructor example: + +```python +outer = OuterData(1, InnerData(3)) +``` + +Expected behavior: + +- `InnerData` is created without external references. +- `OuterData` construction sees that the inner value has no external references. +- `InnerData` is embedded into `OuterData`. + +Pre-bound local example: + +```python +inner = InnerData(3) +outer = OuterData(1, inner) +``` + +Expected behavior: + +- `InnerData` is created without durable external references. Only the Python local reference exists. +- `OuterData` embeds `inner`. +- The `inner` Python wrapper is transformed in place into an object view tied to `outer`. +- Python code continues to behave as if `inner` were a regular immutable memo object. + +## Development Guidance + +Follow TDD for this feature. Start with Python behavior tests for transparent semantics, then add C++ tests for native layout, allocator/address handling, and view behavior. + +Recommended implementation slices: + +1. Define immutable-object construction semantics and prevent post-construction field mutation. +2. Add deferred materialization for immutable memo instances. +3. Add the immutable root layout without embedded nested objects. +4. Add `OFFSET-MAP` and `VL-BLOCK` handling for variable-length members. +5. Add object views for embedded nested memo objects. +6. Add reference, weak reference, and tag support for embedded object addresses. +7. Add collection and large variable-length value embedding behind the cost model. +8. Add retrieval benchmarks or focused performance tests for embedding tradeoffs. + +Tests should cover: + +- Post-construction field assignment is rejected for immutable objects. +- Immutable objects can still be referenced, weak-referenced, tagged, and found by tags. +- Embedded nested memo objects read like standalone memo objects. +- References and weak references to embedded nested objects survive reopening the root object. +- Tag lookup can return embedded nested objects. +- Pre-bound deferred instances transform into views after embedding. +- Previously externalized immutable instances are referenced rather than embedded. +- Large fields are not embedded when the cost model rejects embedding. +- Views keep the root object alive and locked while nested fields are accessed. + +Native implementation must preserve existing project conventions: + +- Use the established `v_object` constructor pattern. +- Use camelCase for C++ locals, lambdas, and method names. +- Use `modifyExt()` for real durable state mutations from Python wrappers. +- Do not use `const_cast` on `ext()` to call mutating methods. diff --git a/design/OVERLAID_TYPES_NOTES.md b/design/OVERLAID_TYPES_NOTES.md new file mode 100644 index 00000000..74a25aa5 --- /dev/null +++ b/design/OVERLAID_TYPES_NOTES.md @@ -0,0 +1,249 @@ +# Overlaid Types Notes + +This note captures the local overlaid-object conventions that matter for embedded tuple/list/set work. + +## Core Model + +An overlaid type is a C++ view over bytes already placed in dbzero-managed memory. It is not copied or moved as a normal C++ object. Construction happens with `T::__new(buf, args...)`, which placement-constructs the object at `buf`; reopening happens with `T::__const_ref(buf)` / `T::__ref(buf)`. + +There are two broad categories: + +- Fixed-size overlays derive from `o_fixed`. They have constant `sizeOf() == true_size_of()`, can be addressed with normal array arithmetic, and can be stored in `o_array`, `o_micro_array`, `o_unbound_array`, or fixed C++ fields. +- Variable-size overlays derive from `o_base` or `o_ext`. Their physical extent must be found by calling `sizeOf()` / `safeSizeOf()`. They must not be walked with `ptr + index` unless every element is known fixed-size. + +`true_size_of()` is `sizeof(T) - std::is_empty()`. This matters for empty CRTP bases like `o_base` and `o_fixed_null`, which are intended to add no payload. + +## Required API + +A usable overlaid type should provide: + +- `static T &__new(void *buf, Args&&...)`, inherited from `o_base` or `o_fixed`. +- `static T &__ref(void *buf)` and `static const T &__const_ref(const void *buf)`, inherited in most cases. +- `static std::size_t measure(args...)`, the exact bytes needed before construction. +- `std::size_t sizeOf() const`, the exact bytes occupied by an existing instance. +- `template static std::size_t safeSizeOf(BufT buf)`, which scans an existing instance and advances through a bounded buffer for validation. +- `static auto type()`, inherited from `o_base`/`o_fixed`, so `Foundation::Arranger` and `Meter` can instantiate it. + +`o_base` supplies default `sizeOf()` as `T::safeSizeOf(this)`, so variable-size types usually implement `safeSizeOf()` and may implement a faster instance `sizeOf()` when they store a total size. + +## `safeSizeOf` And Bounds + +`safeSizeOf(buf)` has two jobs: + +- With a regular pointer, behave like normal size calculation and return the byte extent. +- With `bounded_buf_t` / `const_bounded_buf_t`, advance the checked buffer through the bytes it calculates. If that walks out of bounds, the buffer itself raises through its configured exception path. + +`safeSizeOf` should not throw its own out-of-bounds exception. It should calculate/scan the extent and use buffer advancement or child `safeSizeOf` calls to perform validation. This means an implementation must not read fixed header fields from `__const_ref(buf)` until the bytes containing those fields have already been bounds-checked. The local idiom is: + +```cpp +template static std::size_t safeSizeOf(BufT buf) +{ + auto checked = buf; + checked += super_t::baseSize(); // validates fixed fields for bounded buffers + auto &self = T::__const_ref(buf); // now header reads are safe + ... +} +``` + +For ordinary pointers, `checked += ...` is plain pointer arithmetic and has no special cost beyond the calculation. For bounded buffers it is the operation that detects truncated input. + +`Foundation::SafeSize` follows the same principle for simple static member chains. When code calls: + +```cpp +auto safeSize = super_t::sizeOfMembers(buf); +safeSize = safeSize(MemberT::type()); +``` + +the member scanner obtains a bounded sub-buffer through `&buf[sizeSoFar]`; that indexed access validates that the member start is inside the bounded range, and the member's own `safeSizeOf` validates its extent. For raw pointers, the same code falls back to ordinary pointer arithmetic. + +For dynamic element streams, prefer an explicit cursor over an accumulator: + +```cpp +template static std::size_t safeSizeOf(BufT buf) +{ + auto start = buf; + auto cursor = buf; + cursor += baseSize(); // validate header/fixed fields + auto &self = T::__const_ref(buf); // now fixed fields are safe to read + for (std::uint32_t i = 0; i < self.count(); ++i) { + cursor += ElementT::safeSizeOf(cursor); + } + return cursor - start; +} +``` + +This keeps the current checked buffer as the single source of truth. Each element receives the same `buf_t` category as the parent (`const_bounded_buf_t` or raw pointer), validates itself, and the parent advances to the next element by exactly the size the element reported. + +If a type stores both count and byte-size metadata for a dynamic element stream, choose one source of truth for `safeSizeOf`. Scanning actual elements validates the nested layout and returns the scanned extent; using the stored byte-size validates the declared extent by advancing to it. Do not add a separate direct throw for mismatches in `safeSizeOf`; corruption policy belongs outside the bounds-walking primitive. + +## Dynamic Members + +`o_base` places fixed C++ fields first, then dynamic members after `baseSize()`: + +```cpp +class DB0_PACKED_ATTR Example : public o_base { + std::uint32_t m_id; + + Example(std::uint32_t id, const std::string &name) + : m_id(id) + { + arrangeMembers() + (o_string::type(), name); + } + + static std::size_t measure(std::uint32_t, const std::string &name) { + return measureMembers() + (o_string::type(), name); + } + + template static std::size_t safeSizeOf(BufT buf) { + return sizeOfMembers(buf) + (o_string::type()); + } + + const o_string &name() const { + return getDynFirst(o_string::type()); + } +}; +``` + +Access to later dynamic members uses the prior member’s `sizeOf()`: + +```cpp +const o_string &first() const { return getDynFirst(o_string::type()); } +const o_string &last() const { return getDynAfter(first(), o_string::type()); } +``` + +The same order must be used in constructor, `measure()`, `safeSizeOf()`, and accessors. Versioned layouts use `[version]` in `arrangeMembers()`, `measureMembers()`, and `sizeOfMembers()`. + +## Variable Elements In A Sequence + +`o_list` is the main precedent for an embedded sequence of variable-length elements: + +- It stores `size_of` and `count` in the list header. +- Its constructor repeatedly calls `arranger(T::type(), itemArgs...)`. +- Iteration starts at `beginOfDynamicArea()`. +- `operator++` advances by `item->sizeOf()`. +- `end()` is computed as `beginOfMemberArea() + size_of`, not `begin() + count`. + +This is the important pattern for embedded tuples/lists: + +```cpp +const_iterator &operator++() +{ + item = reinterpret_cast( + reinterpret_cast(item) + item->sizeOf() + ); + return *this; +} +``` + +By contrast, `o_array` and `o_micro_array` store contiguous fixed-size values and use pointer arithmetic. They are not appropriate for elements that may themselves be `o_string`, `o_binary`, nested tuple, or any other variable-size overlay. + +## Conditional Dynamic Layouts + +`o_change_log` demonstrates a conditional layout: after a fixed boolean, the next member is either an RLE sequence or a plain list. Its `safeSizeOf()` manually reads the boolean and then dispatches to the correct type: + +```cpp +buf += super_t::safeBaseSize(buf); +auto isRle = o_simple::__const_ref(buf); +buf += isRle.sizeOf(); +if (isRle.value()) { + buf += o_rle_sequence::safeSizeOf(buf); +} else { + buf += o_list>::safeSizeOf(buf); +} +``` + +This pattern is relevant for a tagged union element where the next payload type depends on `StorageClass`. + +## Existing Variable-Length Examples + +`o_base_string` stores a packed length followed by raw string bytes. `measure()` is packed-length bytes plus content bytes. `safeSizeOf()` reads the packed length then advances by that many bytes. + +`o_binary` stores a `uint32_t m_bytes` followed by a flexible one-byte member `m_buf`. `measure(size)` is `sizeof(uint32_t) + size`. `begin()` returns `&m_buf`; `safeSizeOf()` reads `m_bytes` and advances by header plus payload. + +`o_packed_int` is itself variable-length. It encodes directly into the object bytes and has no fixed payload field beyond the CRTP base. Its `safeSizeOf()` scans continuation bits. Small metadata fields that are commonly below 128, such as tuple item count and element-block byte size, should use `packed_int32` instead of fixed `uint32_t` fields when the object is already variable-length. + +`o_packed_array` is fixed-size as a container but stores variable-length items inside an internal byte array. Its iterator advances by `ItemT::sizeOf()`, not by `sizeof(ItemT)`. + +`PosVT` shows a mixed pattern: `o_micro_array` is fixed-size and self-sized, then `o_unbound_array` follows. Because `o_unbound_array` has no own size header, `safeSizeOf()` must derive its size from the preceding `types().size()`. + +## Requirements For Embedded Tuple/List/Set + +For embedded tuple/list elements, `o_tuple_item` must not be `o_fixed`. An item is a tagged overlaid union: + +- Fixed/simple payloads may store their value inline after the tag, using `o_simple` or another fixed overlay. +- Variable-length payloads must be embedded immediately after the item tag/header as overlaid objects such as `o_string`, `o_binary`, or nested `o_tuple`. +- The item’s `sizeOf()` must dispatch on the tag and include the embedded payload’s actual `sizeOf()`. +- The item’s `safeSizeOf()` must perform the same dispatch using bounded-safe scanning. +- A tuple/list sequence must advance from one item to the next by `item.sizeOf()`. +- Random access by index requires either linear scan or a separate offset table. A C++ array of item descriptors is not enough if payloads are embedded in the item stream. +- If an offset table is added, offsets should point to item starts relative to the element block, not to separately allocated payloads. +- Construction descriptors should be cheap tagged views, not structs containing every possible expensive payload. Use a union-style payload for primitives and string/byte views; callers must keep viewed variable-length data alive until construction finishes. + +The minimum correct first implementation should model: + +```text +o_tuple + header: packed count, packed element_block_byte_size + element block: + o_tuple_item + o_tuple_item + ... + +o_tuple_item + tuple_item_kind + payload selected by tuple_item_kind +``` + +An embedded set uses the same element encoding for the first implementation: + +```text +o_set + header: packed count, packed element_block_byte_size, packed bucket_block_byte_size + element block: + o_tuple_item + o_tuple_item + ... + hash index: + 32-bit bucket-block offset plus one + bucket block: + o_tuple + o_tuple + ... +``` + +Set construction deduplicates simple construction descriptors before writing the element block. The stored count is the number of unique items and the stored byte size is the byte extent of the unique item stream. Lookup should map the 32-bit item hash to a slot using modulo arithmetic, read the bucket offset from the embedded offset table, and then scan only that bucket's embedded `o_tuple` to resolve collisions. Slot value `0` means unoccupied; occupied slots store `bucket_block_offset + 1`. `safeSizeOf` should use the stored element byte size, count-derived index size, and stored bucket byte size to validate and return the declared embedded set extent; it must not iterate through every item just to rediscover a byte count that is already in the header. + +`o_tuple_item` uses `TupleItemKind`, not `StorageClass`, because the tag is persisted inside embedded tuple/set bytes and describes the embedded payload layout rather than object-model storage behavior. Enum values must stay explicit and stable. New payload kinds should be appended with new numeric values; do not renumber existing values. + +Current persisted values: + +- `0`: `UNDEFINED` (invalid/reserved) +- `1`: `NONE` +- `2`: `BOOLEAN` as embedded `o_simple` +- `3`: `INT64` as embedded `o_simple` +- `4`: `FP_NUMERIC64` as embedded `o_simple` +- `5`: `STRING` as embedded `o_string` +- `6`: `BINARY` as embedded `o_binary` +- `7`: `PTIME64` as embedded `o_simple` +- `8`: `DATE` as embedded `o_simple` +- `9`: `DATETIME` as embedded `o_simple` +- `10`: `DATETIME_TZ` as embedded `o_simple` +- `11`: `TIME` as embedded `o_simple` +- `12`: `TIME_TZ` as embedded `o_simple` +- `13`: `DECIMAL` as embedded `o_simple` +- `14`: `PACKED_INT64` as embedded `packed_int64` + +`Element::integer()` automatically chooses `PACKED_INT64` for non-negative integers whose packed payload is 6 bytes or less. That is the point where the payload saves at least 2 bytes compared with fixed `INT64`. Negative integers and values whose packed payload would need 7 or more bytes remain `INT64`. Set equality and hashing treat `INT64` and `PACKED_INT64` as the same logical integer encoding. + +Embedded collections are intentionally not part of the current slice. When tuple/list/set/dict payloads are added later, they should get dedicated `TupleItemKind` values and should embed the child overlay bytes immediately after the tag. + +## Common Pitfalls + +- Do not put variable-length elements in `o_micro_array` or `o_unbound_array` unless `T` is truly fixed-size or the size is externally supplied and access is not by `T* + index`. +- Do not use a descriptor table that stores payload offsets and calls that “embedded” if the payloads are outside the element stream; embedded means the payload object bytes live inside the parent allocation. +- Keep constructor, `measure()`, `safeSizeOf()`, `sizeOf()`, and accessors layout-equivalent. +- When copying an overlaid object, raw-byte copy is the normal pattern only when the entire object extent is known. +- Bounds-safe scanning matters because these objects may be reopened from persisted storage. diff --git a/src/dbzero/bindings/python/PyTypeManager.cpp b/src/dbzero/bindings/python/PyTypeManager.cpp index df271baf..0c3b24d2 100644 --- a/src/dbzero/bindings/python/PyTypeManager.cpp +++ b/src/dbzero/bindings/python/PyTypeManager.cpp @@ -452,6 +452,35 @@ namespace db0::python } } + const char *PyTypeManager::extractString(ObjectPtr obj_ptr) const + { + if (!PyUnicode_Check(obj_ptr)) { + THROWF(db0::InputException) << "Expected a string object, got " + << PyToolkit::getTypeName(obj_ptr) << THROWF_END; + } + auto value = PyUnicode_AsUTF8(obj_ptr); + if (!value) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to encode Python string as UTF-8"; + } + return value; + } + + PyTypeManager::BytesView PyTypeManager::extractBytes(ObjectPtr obj_ptr) const + { + if (!PyBytes_Check(obj_ptr)) { + THROWF(db0::InputException) << "Expected a bytes object, got " + << PyToolkit::getTypeName(obj_ptr) << THROWF_END; + } + char *data = nullptr; + Py_ssize_t size = 0; + if (PyBytes_AsStringAndSize(obj_ptr, &data, &size) != 0) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to read Python bytes"; + } + return { reinterpret_cast(data), static_cast(size) }; + } + PyTypeManager::TypeObjectPtr PyTypeManager::getTypeObject(ObjectPtr py_type) const { assert(PyType_Check(py_type)); diff --git a/src/dbzero/bindings/python/PyTypeManager.hpp b/src/dbzero/bindings/python/PyTypeManager.hpp index 211882d0..7164a8df 100644 --- a/src/dbzero/bindings/python/PyTypeManager.hpp +++ b/src/dbzero/bindings/python/PyTypeManager.hpp @@ -3,6 +3,7 @@ #pragma once +#include #include #include #include @@ -89,6 +90,11 @@ namespace db0::python using TagDef = db0::object_model::TagDef; using CompositeTagDef = db0::object_model::CompositeTagDef; using ByteArray = db0::object_model::ByteArray; + struct BytesView + { + const std::byte *m_data = nullptr; + std::size_t m_size = 0; + }; PyTypeManager(); ~PyTypeManager(); @@ -148,7 +154,8 @@ namespace db0::python const EnumValueRepr &extractEnumValueRepr(ObjectPtr enum_value_repr_ptr) const; ObjectIterable &extractObjectIterable(ObjectPtr) const; FieldDef &extractFieldDef(ObjectPtr) const; - std::string extractString(ObjectPtr) const; + const char *extractString(ObjectPtr) const; + BytesView extractBytes(ObjectPtr) const; TypeObjectPtr getTypeObject(ObjectPtr py_type) const; ObjectPtr getLangObject(TypeObjectPtr py_type) const; std::shared_ptr extractConstClass(ObjectPtr py_class) const; diff --git a/src/dbzero/bindings/python/types/DateTime.hpp b/src/dbzero/bindings/python/types/DateTime.hpp index 506fe2a2..cbedb7d3 100644 --- a/src/dbzero/bindings/python/types/DateTime.hpp +++ b/src/dbzero/bindings/python/types/DateTime.hpp @@ -11,6 +11,8 @@ namespace db0::python { + void init_datetime(); + bool isDatatimeWithTZ(PyObject *py_datetime); PyObject * uint64ToPyDatetime(std::uint64_t datetime); @@ -29,4 +31,4 @@ namespace db0::python std::uint64_t pyTimeWithTzToUint64(PyObject *py_date); -} \ No newline at end of file +} diff --git a/src/dbzero/core/serialization/Types.cpp b/src/dbzero/core/serialization/Types.cpp index f2ae94f5..05bdeba8 100644 --- a/src/dbzero/core/serialization/Types.cpp +++ b/src/dbzero/core/serialization/Types.cpp @@ -32,6 +32,12 @@ namespace db0 std::copy(data.data(), data.data() + m_bytes, &m_buf); } + o_binary::o_binary(std::size_t size, void (*write)(void *, const void *), const void *source) + : m_bytes(size) + { + write(&m_buf, source); + } + o_binary &o_binary::operator=(const o_binary &binary) { assert(m_bytes == binary.size()); diff --git a/src/dbzero/core/serialization/Types.hpp b/src/dbzero/core/serialization/Types.hpp index d9573901..8302e1ac 100644 --- a/src/dbzero/core/serialization/Types.hpp +++ b/src/dbzero/core/serialization/Types.hpp @@ -116,6 +116,8 @@ DB0_PACKED_BEGIN o_binary(const std::vector &); + o_binary(std::size_t size, void (*write)(void *, const void *), const void *source); + public: /** * Get content size diff --git a/src/dbzero/object_model/dict/Dict.hpp b/src/dbzero/object_model/dict/Dict.hpp index 8fbcc28e..efcfe920 100644 --- a/src/dbzero/object_model/dict/Dict.hpp +++ b/src/dbzero/object_model/dict/Dict.hpp @@ -34,7 +34,7 @@ namespace db0::object_model class DictIterator; DB0_PACKED_BEGIN - struct DB0_PACKED_ATTR o_dict: public db0::o_fixed_versioned + struct DB0_PACKED_ATTR o_mutable_dict: public db0::o_fixed_versioned { // common object header o_unique_header m_header; @@ -48,12 +48,12 @@ DB0_PACKED_BEGIN }; DB0_PACKED_END - class Dict: public db0::ObjectBase, StorageClass::DB0_DICT> + class Dict: public db0::ObjectBase, StorageClass::DB0_DICT> { GC0_Declare public: - using super_t = db0::ObjectBase, StorageClass::DB0_DICT>; + using super_t = db0::ObjectBase, StorageClass::DB0_DICT>; friend super_t; using LangToolkit = db0::python::PyToolkit; using ObjectPtr = typename LangToolkit::ObjectPtr; @@ -111,4 +111,4 @@ DB0_PACKED_END void restoreIterators(); }; -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/dict/o_dict.cpp b/src/dbzero/object_model/dict/o_dict.cpp new file mode 100644 index 00000000..d9225514 --- /dev/null +++ b/src/dbzero/object_model/dict/o_dict.cpp @@ -0,0 +1,754 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_dict.hpp" + +#include +#include +#include + +#include +#include + +namespace db0::object_model +{ + o_dict_pair::o_dict_pair(const Element &key, const Element &value) + { + arrangeMembers() + (o_tuple_item::type(), key) + (o_tuple_item::type(), value); + } + + const o_tuple_item &o_dict_pair::key() const + { + return getDynFirst(o_tuple_item::type()); + } + + const o_tuple_item &o_dict_pair::value() const + { + return getDynAfter(key(), o_tuple_item::type()); + } + + std::size_t o_dict_pair::sizeOf() const + { + return safeSizeOf(reinterpret_cast(this)); + } + + std::size_t o_dict_pair::measure(const Element &key, const Element &value) + { + return measureMembers() + (o_tuple_item::type(), key) + (o_tuple_item::type(), value); + } + + o_dict_bucket::o_dict_bucket(const std::vector &keys, const std::vector &values) + { + if (keys.size() != values.size()) { + THROWF(db0::InternalException) << "Dict bucket key/value count mismatch"; + } + arrangeMembers() + (o_compact_tuple::type(), keys) + (o_compact_tuple::type(), values); + } + + const o_compact_tuple &o_dict_bucket::keys() const + { + return getDynFirst(o_compact_tuple::type()); + } + + const o_compact_tuple &o_dict_bucket::values() const + { + return getDynAfter(keys(), o_compact_tuple::type()); + } + + std::size_t o_dict_bucket::sizeOf() const + { + return safeSizeOf(reinterpret_cast(this)); + } + + std::size_t o_dict_bucket::measure(const std::vector &keys, const std::vector &values) + { + if (keys.size() != values.size()) { + THROWF(db0::InternalException) << "Dict bucket key/value count mismatch"; + } + return measureMembers() + (o_compact_tuple::type(), keys) + (o_compact_tuple::type(), values); + } + + std::size_t o_dict_bucket::measureForBytes( + std::uint32_t count, std::uint32_t keysByteSize, std::uint32_t valuesByteSize + ) + { + return o_compact_tuple::Builder::measure(count, keysByteSize) + + o_compact_tuple::Builder::measure(count, valuesByteSize); + } + + std::size_t o_dict_bucket::measureGrowth( + std::uint32_t count, std::uint32_t keysByteSize, std::uint32_t valuesByteSize, + std::uint32_t addedKeyByteSize, std::uint32_t addedValueByteSize + ) + { + auto newCount = count + 1; + auto newKeysByteSize = keysByteSize + addedKeyByteSize; + auto newValuesByteSize = valuesByteSize + addedValueByteSize; + if (newCount <= count || newKeysByteSize < keysByteSize || newValuesByteSize < valuesByteSize) { + THROWF(db0::InternalException) << "Dict bucket growth exceeds uint32 range"; + } + if (count == 0) { + return 0; + } + auto newSize = measureForBytes(newCount, newKeysByteSize, newValuesByteSize); + if (count == 1) { + return newSize; + } + return newSize - measureForBytes(count, keysByteSize, valuesByteSize); + } + + bool o_dict::HashIndexEntry::isEmpty() const + { + return m_value == 0; + } + + bool o_dict::HashIndexEntry::isBucket() const + { + return (m_value & BUCKET_FLAG) != 0; + } + + bool o_dict::HashIndexEntry::isPendingBucket() const + { + return m_value == BUCKET_FLAG; + } + + std::uint32_t o_dict::HashIndexEntry::offset() const + { + return (m_value & OFFSET_MASK) - 1; + } + + void o_dict::HashIndexEntry::clear() + { + m_value = 0; + } + + void o_dict::HashIndexEntry::setPendingBucket() + { + m_value = BUCKET_FLAG; + } + + void o_dict::HashIndexEntry::setPair(std::uint32_t offset) + { + if (offset >= OFFSET_MASK) { + THROWF(db0::InternalException) << "Dict pair offset exceeds hash index entry capacity"; + } + m_value = offset + 1; + } + + void o_dict::HashIndexEntry::setBucket(std::uint32_t offset) + { + if (offset >= OFFSET_MASK) { + THROWF(db0::InternalException) << "Dict bucket offset exceeds hash index entry capacity"; + } + m_value = BUCKET_FLAG | (offset + 1); + } + + const o_dict::Pair &o_dict::const_iterator::operator*() const + { + return *m_pair; + } + + const o_dict::Pair *o_dict::const_iterator::operator->() const + { + return m_pair; + } + + o_dict::const_iterator &o_dict::const_iterator::operator++() + { + m_pair = reinterpret_cast( + reinterpret_cast(m_pair) + m_pair->sizeOf() + ); + return *this; + } + + bool o_dict::const_iterator::operator==(const const_iterator &other) const + { + return m_pair == other.m_pair; + } + + bool o_dict::const_iterator::operator!=(const const_iterator &other) const + { + return m_pair != other.m_pair; + } + + o_dict::const_iterator::const_iterator(const Pair *pair) + : m_pair(pair) + { + } + + std::size_t o_dict::ElementHash::operator()(const Element &element) const + { + return elementHash(element); + } + + bool o_dict::ElementEqual::operator()(const Element &lhs, const Element &rhs) const + { + return elementsEqual(lhs, rhs); + } + + o_dict::o_dict(const ElementMap &elements) + { + auto pairsSize = checkedUint32Size(measurePairs(elements), "Dict pairs byte size"); + auto capacity = hashIndexCapacity(elements.size()); + auto bucketSize = checkedUint32Size( + measureCollisionBuckets(elements, capacity), "Dict bucket byte size" + ); + + auto arranger = arrangeDictMembers(static_cast(elements.size()), pairsSize, bucketSize); + for (const auto &[key, value]: elements) { + arranger = arranger(Pair::type(), key, value); + } + + finishDictConstruction(arranger.ptr(), pairsSize, capacity, bucketSize); + } + + db0::Foundation::Arranger o_dict::arrangeDictMembers( + std::uint32_t count, std::uint32_t pairsByteSize, std::uint32_t bucketByteSize + ) + { + return arrangeMembers() + (db0::packed_int32::type(), count) + (db0::packed_int32::type(), pairsByteSize) + (db0::packed_int32::type(), bucketByteSize); + } + + void o_dict::finishDictConstruction( + void *indexEntriesPtr, std::uint32_t pairsByteSize, std::size_t capacity, std::uint32_t bucketByteSize + ) + { + auto *indexEntries = reinterpret_cast(indexEntriesPtr); + auto bucketByteSizeWritten = writeCollisionBuckets( + indexEntries, beginOfPairs(), pairsByteSize, capacity, reinterpret_cast(indexEntries + capacity) + ); + if (bucketByteSizeWritten != bucketByteSize) { + THROWF(db0::InternalException) << "Dict bucket byte size changed during construction"; + } + } + + std::size_t o_dict::size() const + { + return count().value(); + } + + std::size_t o_dict::pairsByteSize() const + { + return pairsByteSizeMember().value(); + } + + bool o_dict::empty() const + { + return size() == 0; + } + + bool o_dict::contains(const Element &key) const + { + return get(key) != nullptr; + } + + const o_dict::Item *o_dict::get(const Element &key) const + { + auto capacity = hashIndexCapacity(size()); + if (capacity == 0) { + return nullptr; + } + + const auto *entries = beginOfHashIndex(); + auto slot = elementHash(key) % capacity; + const auto &entry = entries[slot]; + if (entry.isEmpty()) { + return nullptr; + } + + auto offset = entry.offset(); + if (!entry.isBucket()) { + const auto &pair = pairAtOffset(offset); + return itemEqualsElement(pair.key(), key) ? &pair.value() : nullptr; + } + + const auto &bucket = bucketAtOffset(offset); + auto keyIt = bucket.keys().begin(); + auto valueIt = bucket.values().begin(); + for (; keyIt != bucket.keys().end(); ++keyIt, ++valueIt) { + if (itemEqualsElement(*keyIt, key)) { + return &*valueIt; + } + } + return nullptr; + } + + o_dict::const_iterator o_dict::begin() const + { + return const_iterator(reinterpret_cast(beginOfPairs())); + } + + o_dict::const_iterator o_dict::end() const + { + return const_iterator(reinterpret_cast(beginOfPairs() + pairsByteSize())); + } + + std::size_t o_dict::sizeOf() const + { + return safeSizeOf(reinterpret_cast(this)); + } + + std::size_t o_dict::measure(const ElementMap &elements) + { + auto pairsSize = measurePairs(elements); + auto bucketSize = measureCollisionBuckets(elements, hashIndexCapacity(elements.size())); + return measureMembers() + (db0::packed_int32::type(), static_cast(elements.size())) + (db0::packed_int32::type(), checkedUint32Size(pairsSize, "Dict pairs byte size")) + (db0::packed_int32::type(), checkedUint32Size(bucketSize, "Dict bucket byte size")) + (pairsSize) + (hashIndexByteSize(elements.size())) + (bucketSize); + } + + const db0::packed_int32 &o_dict::count() const + { + return getDynFirst(db0::packed_int32::type()); + } + + const db0::packed_int32 &o_dict::pairsByteSizeMember() const + { + return getDynAfter(count(), db0::packed_int32::type()); + } + + const db0::packed_int32 &o_dict::bucketByteSizeMember() const + { + return getDynAfter(pairsByteSizeMember(), db0::packed_int32::type()); + } + + const std::byte *o_dict::beginOfPairs() const + { + const auto &bucketByteSizeMemberRef = bucketByteSizeMember(); + return reinterpret_cast(&bucketByteSizeMemberRef) + bucketByteSizeMemberRef.sizeOf(); + } + + o_dict::HashIndexEntry *o_dict::beginOfHashIndex() + { + return reinterpret_cast( + const_cast(static_cast(this)->beginOfPairs()) + pairsByteSize() + ); + } + + const o_dict::HashIndexEntry *o_dict::beginOfHashIndex() const + { + return reinterpret_cast(beginOfPairs() + pairsByteSize()); + } + + const std::byte *o_dict::beginOfBuckets() const + { + return reinterpret_cast(beginOfHashIndex() + hashIndexCapacity(size())); + } + + const o_dict_bucket &o_dict::bucketAtOffset(std::uint32_t offset) const + { + return o_dict_bucket::__const_ref(beginOfBuckets() + offset); + } + + const o_dict::Pair &o_dict::pairAtOffset(std::uint32_t offset) const + { + return Pair::__const_ref(beginOfPairs() + offset); + } + + bool o_dict::elementsEqual(const Element &lhs, const Element &rhs) + { + auto lhsIsInt = lhs.m_kind == StorageClass::INT64 || lhs.m_kind == StorageClass::PACKED_INT32; + auto rhsIsInt = rhs.m_kind == StorageClass::INT64 || rhs.m_kind == StorageClass::PACKED_INT32; + if (lhs.m_kind != rhs.m_kind && !(lhsIsInt && rhsIsInt)) { + return false; + } + + switch (lhs.m_kind) { + case StorageClass::NONE: + return true; + case StorageClass::BOOLEAN: + return lhs.boolValue() == rhs.boolValue(); + case StorageClass::INT64: + case StorageClass::PACKED_INT32: + return lhs.intValue() == rhs.intValue(); + case StorageClass::FP_NUMERIC64: + return lhs.doubleValue() == rhs.doubleValue(); + case StorageClass::STRING_REF: + return lhs.m_payload.m_string_value == rhs.m_payload.m_string_value; + case StorageClass::DB0_BYTES: + return lhs.bytesSize() == rhs.bytesSize() && bytesEqual(lhs.bytesData(), rhs.bytesData(), lhs.bytesSize()); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return lhs.bytesSize() == rhs.bytesSize() && bytesEqual(lhs.bytesData(), rhs.bytesData(), lhs.bytesSize()); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return lhs.uint64Value() == rhs.uint64Value(); + default: + THROWF(db0::InternalException) << "Unsupported dict item kind"; + } + return false; + } + + bool o_dict::itemEqualsElement(const Item &item, const Element &element) + { + auto itemIsInt = item.itemKind() == StorageClass::INT64 || item.itemKind() == StorageClass::PACKED_INT32; + auto elementIsInt = element.m_kind == StorageClass::INT64 || element.m_kind == StorageClass::PACKED_INT32; + if (item.itemKind() != element.m_kind && !(itemIsInt && elementIsInt)) { + return false; + } + + switch (element.m_kind) { + case StorageClass::NONE: + return true; + case StorageClass::BOOLEAN: + return item.boolPayload().value() == element.boolValue(); + case StorageClass::INT64: + case StorageClass::PACKED_INT32: { + auto itemValue = item.itemKind() == StorageClass::PACKED_INT32 + ? static_cast(item.packedIntPayload().value()) + : item.intPayload().value(); + return itemValue == element.intValue(); + } + case StorageClass::FP_NUMERIC64: + return item.doublePayload().value() == element.doubleValue(); + case StorageClass::STRING_REF: + return item.stringPayload().toString() == element.stringValue(); + case StorageClass::DB0_BYTES: + return item.bytesPayload().size() == element.bytesSize() + && bytesEqual(item.bytesPayload().begin(), element.bytesData(), element.bytesSize()); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return item.embeddedPayload().size() == element.bytesSize() + && bytesEqual(item.embeddedPayload().begin(), element.bytesData(), element.bytesSize()); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return item.uint64Payload().value() == element.uint64Value(); + default: + THROWF(db0::InternalException) << "Unsupported dict item kind"; + } + return false; + } + + bool o_dict::bytesEqual(const std::byte *lhs, const std::byte *rhs, std::size_t size) + { + return size == 0 || std::memcmp(lhs, rhs, size) == 0; + } + + std::size_t o_dict::measurePairs(const ElementMap &elements) + { + std::size_t size = 0; + for (const auto &[key, value]: elements) { + auto pairSize = Pair::measure(key, value); + if (size + pairSize < size) { + THROWF(db0::InternalException) << "Dict pairs byte size overflow"; + } + size += pairSize; + } + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Dict pair block"); + return size; + } + + std::size_t o_dict::measureCollisionBuckets(const ElementMap &elements, std::size_t capacity) + { + if (capacity == 0) { + return 0; + } + + struct BucketMeasure + { + std::uint32_t m_count = 0; + std::uint32_t m_keysByteSize = 0; + std::uint32_t m_valuesByteSize = 0; + }; + + std::vector buckets(capacity); + std::size_t size = 0; + for (const auto &[key, value]: elements) { + auto &bucket = buckets[elementHash(key) % capacity]; + auto keySize = checkedUint32Size(Item::measure(key), "Dict bucket key byte size"); + auto valueSize = checkedUint32Size(Item::measure(value), "Dict bucket value byte size"); + auto growth = o_dict_bucket::measureGrowth( + bucket.m_count, bucket.m_keysByteSize, bucket.m_valuesByteSize, keySize, valueSize + ); + if (size + growth < size) { + THROWF(db0::InternalException) << "Dict bucket block byte size overflow"; + } + size += growth; + ++bucket.m_count; + if (bucket.m_keysByteSize + keySize < bucket.m_keysByteSize + || bucket.m_valuesByteSize + valueSize < bucket.m_valuesByteSize) { + THROWF(db0::InternalException) << "Dict bucket elements byte size overflow"; + } + bucket.m_keysByteSize += keySize; + bucket.m_valuesByteSize += valueSize; + } + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Dict bucket block"); + return size; + } + + o_dict::Element o_dict::elementFromItem(const Item &item) + { + switch (item.itemKind()) { + case StorageClass::NONE: + return Element::none(); + case StorageClass::BOOLEAN: + return Element::boolean(item.boolPayload().value()); + case StorageClass::INT64: + return Element::integer(item.intPayload().value()); + case StorageClass::PACKED_INT32: + return Element::integer(static_cast(item.packedIntPayload().value())); + case StorageClass::FP_NUMERIC64: + return Element::floating(item.doublePayload().value()); + case StorageClass::STRING_REF: { + auto str = item.stringPayload().get(); + return Element::string(std::string_view(str.get_raw(), str.size())); + } + case StorageClass::DB0_BYTES: + return Element::bytes(item.bytesPayload().begin(), item.bytesPayload().size()); + case StorageClass::DB0_TUPLE: + return Element::embeddedTuple(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::DB0_SET: + return Element::embeddedSet(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::DB0_DICT: + return Element::embeddedDict(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::OBJECT_REF: + return Element::embeddedObject(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::PTIME64: + return Element::timestamp(item.uint64Payload().value()); + case StorageClass::DATE: + return Element::date(item.uint64Payload().value()); + case StorageClass::DATETIME: + return Element::datetime(item.uint64Payload().value()); + case StorageClass::DATETIME_TZ: + return Element::datetimeTz(item.uint64Payload().value()); + case StorageClass::TIME: + return Element::time(item.uint64Payload().value()); + case StorageClass::TIME_TZ: + return Element::timeTz(item.uint64Payload().value()); + case StorageClass::DECIMAL: + return Element::decimal(item.uint64Payload().value()); + default: + THROWF(db0::InternalException) << "Unsupported dict item kind"; + } + return Element::none(); + } + + std::uint32_t o_dict::elementHash(const Element &element) + { + auto seedKind = element.m_kind == StorageClass::PACKED_INT32 ? StorageClass::INT64 : element.m_kind; + auto seed = 0x9e3779b9U ^ static_cast(seedKind); + switch (element.m_kind) { + case StorageClass::NONE: + return hashBytes(nullptr, 0, seed); + case StorageClass::BOOLEAN: + return hashBytes(&element.m_payload.m_bool_value, sizeof(element.m_payload.m_bool_value), seed); + case StorageClass::INT64: + case StorageClass::PACKED_INT32: + return hashBytes(&element.m_payload.m_int_value, sizeof(element.m_payload.m_int_value), seed); + case StorageClass::FP_NUMERIC64: + return hashBytes(&element.m_payload.m_double_value, sizeof(element.m_payload.m_double_value), seed); + case StorageClass::STRING_REF: + return hashBytes( + element.m_payload.m_string_value.data(), element.m_payload.m_string_value.size(), seed + ); + case StorageClass::DB0_BYTES: + return hashBytes(element.bytesData(), element.bytesSize(), seed); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: { + if (element.m_payload.m_bytes_value.m_writer) { + std::vector payload(element.bytesSize()); + element.m_payload.m_bytes_value.m_writer(payload.data(), element.m_payload.m_bytes_value.m_source); + return hashBytes(payload.data(), payload.size(), seed); + } + return hashBytes(element.bytesData(), element.bytesSize(), seed); + } + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return hashBytes(&element.m_payload.m_uint64_value, sizeof(element.m_payload.m_uint64_value), seed); + default: + THROWF(db0::InternalException) << "Unsupported dict item kind"; + } + return 0; + } + + std::uint32_t o_dict::itemHash(const Item &item) + { + auto seedKind = item.itemKind() == StorageClass::PACKED_INT32 ? StorageClass::INT64 : item.itemKind(); + auto seed = 0x9e3779b9U ^ static_cast(seedKind); + switch (item.itemKind()) { + case StorageClass::NONE: + return hashBytes(nullptr, 0, seed); + case StorageClass::BOOLEAN: { + auto value = item.boolPayload().value(); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::INT64: { + auto value = item.intPayload().value(); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::PACKED_INT32: { + auto value = static_cast(item.packedIntPayload().value()); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::FP_NUMERIC64: { + auto value = item.doublePayload().value(); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::STRING_REF: { + auto str = item.stringPayload().get(); + return hashBytes(str.get_raw(), str.size(), seed); + } + case StorageClass::DB0_BYTES: + return hashBytes(item.bytesPayload().begin(), item.bytesPayload().size(), seed); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return hashBytes(item.embeddedPayload().begin(), item.embeddedPayload().size(), seed); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: { + auto value = item.uint64Payload().value(); + return hashBytes(&value, sizeof(value), seed); + } + default: + THROWF(db0::InternalException) << "Unsupported dict item kind"; + } + return 0; + } + + std::uint32_t o_dict::hashBytes(const void *data, std::size_t size, std::uint32_t seed) + { + static const std::byte empty = std::byte{0}; + auto hash = db0::murmurhash64A(size == 0 ? &empty : data, size, seed); + return static_cast(hash ^ (hash >> 32)); + } + + std::size_t o_dict::hashIndexCapacity(std::size_t count) + { + if (count == 0) { + return 0; + } + + std::size_t capacity = 1; + while (capacity < count * 2) { + capacity <<= 1; + } + return capacity; + } + + std::size_t o_dict::hashIndexByteSize(std::size_t count) + { + return hashIndexCapacity(count) * sizeof(HashIndexEntry); + } + + std::uint32_t o_dict::buildHashIndex( + HashIndexEntry *indexEntries, const std::byte *pairsBegin, std::size_t pairsByteSize, std::size_t capacity + ) + { + for (std::size_t i = 0; i < capacity; ++i) { + indexEntries[i].clear(); + } + + if (capacity == 0) { + return 0; + } + + auto *cursor = pairsBegin; + auto *pairsEnd = pairsBegin + pairsByteSize; + while (cursor < pairsEnd) { + const auto &pair = Pair::__const_ref(cursor); + auto pairOffset = checkedHashIndexOffset(cursor - pairsBegin, "Dict pair"); + auto slot = itemHash(pair.key()) % capacity; + auto &entry = indexEntries[slot]; + if (entry.isEmpty()) { + entry.setPair(pairOffset); + } else if (!entry.isPendingBucket()) { + entry.setPendingBucket(); + } + cursor += pair.sizeOf(); + } + return checkedHashIndexOffset(pairsByteSize == 0 ? 0 : pairsByteSize - 1, "Dict pair block"); + } + + std::uint32_t o_dict::writeCollisionBuckets( + HashIndexEntry *indexEntries, const std::byte *pairsBegin, std::size_t pairsByteSize, + std::size_t capacity, std::byte *bucketStart + ) + { + buildHashIndex(indexEntries, pairsBegin, pairsByteSize, capacity); + + auto *bucketCursor = bucketStart; + auto *pairsEnd = pairsBegin + pairsByteSize; + for (std::size_t slot = 0; slot < capacity; ++slot) { + if (!indexEntries[slot].isPendingBucket()) { + continue; + } + + std::vector keys; + std::vector values; + auto *cursor = pairsBegin; + while (cursor < pairsEnd) { + const auto &pair = Pair::__const_ref(cursor); + if (itemHash(pair.key()) % capacity == slot) { + keys.push_back(elementFromItem(pair.key())); + values.push_back(elementFromItem(pair.value())); + } + cursor += pair.sizeOf(); + } + + indexEntries[slot].setBucket(checkedHashIndexOffset(bucketCursor - bucketStart, "Dict bucket")); + auto &bucket = o_dict_bucket::__new(bucketCursor, keys, values); + bucketCursor += bucket.sizeOf(); + } + + return checkedHashIndexOffset(bucketCursor - bucketStart, "Dict bucket block"); + } + + std::uint32_t o_dict::checkedHashIndexOffset(std::size_t offset, const char *name) + { + if (offset >= HashIndexEntry::OFFSET_MASK) { + THROWF(db0::InternalException) << name << " offset exceeds hash index entry capacity"; + } + return static_cast(offset); + } + + std::uint32_t o_dict::checkedUint32Size(std::size_t size, const char *name) + { + if (size > std::numeric_limits::max()) { + THROWF(db0::InternalException) << name << " exceeds uint32 range"; + } + return static_cast(size); + } + +} diff --git a/src/dbzero/object_model/dict/o_dict.hpp b/src/dbzero/object_model/dict/o_dict.hpp new file mode 100644 index 00000000..a4856bff --- /dev/null +++ b/src/dbzero/object_model/dict/o_dict.hpp @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include +#include +#include + +#include +#include +#include +#include + +namespace db0::object_model +{ + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_dict_pair: public db0::o_base + { + protected: + using super_t = db0::o_base; + friend super_t; + + public: + using Element = o_tuple_item::Element; + + o_dict_pair(const Element &key, const Element &value); + + const o_tuple_item &key() const; + const o_tuple_item &value() const; + std::size_t sizeOf() const; + + static std::size_t measure(const Element &key, const Element &value); + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + cursor += o_tuple_item::safeSizeOf(cursor); + cursor += o_tuple_item::safeSizeOf(cursor); + return cursor - start; + } + + protected: + o_dict_pair() = default; + }; +DB0_PACKED_END + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_dict_bucket: public db0::o_base + { + protected: + using super_t = db0::o_base; + friend super_t; + + public: + using Element = o_tuple_item::Element; + + o_dict_bucket(const std::vector &keys, const std::vector &values); + + const o_compact_tuple &keys() const; + const o_compact_tuple &values() const; + std::size_t sizeOf() const; + + static std::size_t measure(const std::vector &keys, const std::vector &values); + static std::size_t measureForBytes( + std::uint32_t count, std::uint32_t keysByteSize, std::uint32_t valuesByteSize + ); + static std::size_t measureGrowth( + std::uint32_t count, std::uint32_t keysByteSize, std::uint32_t valuesByteSize, + std::uint32_t addedKeyByteSize, std::uint32_t addedValueByteSize + ); + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + cursor += o_compact_tuple::safeSizeOf(cursor); + cursor += o_compact_tuple::safeSizeOf(cursor); + return cursor - start; + } + + protected: + o_dict_bucket() = default; + }; +DB0_PACKED_END + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_dict: public db0::o_base + { + protected: + using super_t = db0::o_base; + friend super_t; + + public: + using Element = o_tuple_item::Element; + using Pair = o_dict_pair; + using Item = o_tuple_item; + + struct ElementHash + { + std::size_t operator()(const Element &element) const; + }; + + struct ElementEqual + { + bool operator()(const Element &lhs, const Element &rhs) const; + }; + + using ElementMap = std::unordered_map; + + class const_iterator + { + public: + const_iterator() = default; + + const Pair &operator*() const; + const Pair *operator->() const; + const_iterator &operator++(); + bool operator==(const const_iterator &other) const; + bool operator!=(const const_iterator &other) const; + + private: + friend class o_dict; + + explicit const_iterator(const Pair *pair); + + const Pair *m_pair = nullptr; + }; + + explicit o_dict(const ElementMap &elements); + + std::size_t size() const; + bool empty() const; + bool contains(const Element &key) const; + const Item *get(const Element &key) const; + const_iterator begin() const; + const_iterator end() const; + std::size_t sizeOf() const; + + static std::size_t measure(const ElementMap &elements); + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + + auto countAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + auto pairsByteSizeAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + auto bucketByteSizeAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + + auto pairsByteSize = db0::packed_int32::__const_ref(pairsByteSizeAt).value(); + auto bucketByteSize = db0::packed_int32::__const_ref(bucketByteSizeAt).value(); + cursor += pairsByteSize; + auto count = db0::packed_int32::__const_ref(countAt).value(); + cursor += hashIndexByteSize(count); + cursor += bucketByteSize; + return cursor - start; + } + + protected: + o_dict() = default; + + db0::Foundation::Arranger arrangeDictMembers( + std::uint32_t count, std::uint32_t pairsByteSize, std::uint32_t bucketByteSize + ); + void finishDictConstruction( + void *indexEntries, std::uint32_t pairsByteSize, std::size_t capacity, std::uint32_t bucketByteSize + ); + + static bool elementsEqual(const Element &lhs, const Element &rhs); + static std::uint32_t elementHash(const Element &element); + static std::size_t hashIndexCapacity(std::size_t count); + static std::uint32_t checkedHashIndexOffset(std::size_t offset, const char *name); + static std::uint32_t checkedUint32Size(std::size_t size, const char *name); + + private: + struct HashIndexEntry + { + static constexpr std::uint32_t BUCKET_FLAG = 0x80000000U; + static constexpr std::uint32_t OFFSET_MASK = 0x7fffffffU; + + std::uint32_t m_value = 0; + + bool isEmpty() const; + bool isBucket() const; + bool isPendingBucket() const; + std::uint32_t offset() const; + void clear(); + void setPendingBucket(); + void setPair(std::uint32_t offset); + void setBucket(std::uint32_t offset); + }; + static_assert(sizeof(HashIndexEntry) == sizeof(std::uint32_t)); + + std::size_t pairsByteSize() const; + const db0::packed_int32 &count() const; + const db0::packed_int32 &pairsByteSizeMember() const; + const db0::packed_int32 &bucketByteSizeMember() const; + const std::byte *beginOfPairs() const; + const std::byte *beginOfBuckets() const; + HashIndexEntry *beginOfHashIndex(); + const HashIndexEntry *beginOfHashIndex() const; + const o_dict_bucket &bucketAtOffset(std::uint32_t offset) const; + const Pair &pairAtOffset(std::uint32_t offset) const; + + static bool itemEqualsElement(const Item &item, const Element &element); + static bool bytesEqual(const std::byte *lhs, const std::byte *rhs, std::size_t size); + static std::size_t measurePairs(const ElementMap &elements); + static std::size_t measureCollisionBuckets(const ElementMap &elements, std::size_t capacity); + static Element elementFromItem(const Item &item); + static std::uint32_t itemHash(const Item &item); + static std::uint32_t hashBytes(const void *data, std::size_t size, std::uint32_t seed); + static std::size_t hashIndexByteSize(std::size_t count); + static std::uint32_t buildHashIndex( + HashIndexEntry *indexEntries, const std::byte *pairsBegin, std::size_t pairsByteSize, + std::size_t capacity + ); + static std::uint32_t writeCollisionBuckets( + HashIndexEntry *indexEntries, const std::byte *pairsBegin, std::size_t pairsByteSize, + std::size_t capacity, std::byte *bucketStart + ); + }; +DB0_PACKED_END + +} diff --git a/src/dbzero/object_model/dict/o_py_dict.cpp b/src/dbzero/object_model/dict/o_py_dict.cpp new file mode 100644 index 00000000..25528865 --- /dev/null +++ b/src/dbzero/object_model/dict/o_py_dict.cpp @@ -0,0 +1,250 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_py_dict.hpp" + +#include + +#include + +#include +#include +#include +#include +#include + +namespace db0::object_model +{ + namespace + { + void writePyTuple(void *buf, const void *source) + { + o_py_tuple::__new(buf, const_cast(static_cast(source))); + } + + void writePySet(void *buf, const void *source) + { + o_py_set::__new(buf, const_cast(static_cast(source))); + } + + void writePyDict(void *buf, const void *source) + { + o_py_dict::__new(buf, const_cast(static_cast(source))); + } + } + + o_py_dict::o_py_dict(PyObject *dict) + : o_dict() + { + auto count = dictSize(dict); + auto pairsByteSize = checkedUint32Size(measurePairs(dict), "Python dict pairs byte size"); + auto capacity = hashIndexCapacity(count); + auto bucketByteSize = checkedUint32Size( + measureCollisionBuckets(dict, capacity), "Python dict bucket byte size" + ); + + auto arranger = arrangeDictMembers(count, pairsByteSize, bucketByteSize); + auto iterator = Py_OWN(PyObject_GetIter(dict)); + if (!iterator) { + PyErr_Clear(); + THROWF(db0::InputException) << "o_py_dict expects a Python dict"; + } + + Py_FOR(key, iterator) { + arranger = arranger(Pair::type(), elementFromPythonObject(*key), valueFromPythonDict(dict, *key)); + } + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to iterate Python dict"; + } + + finishDictConstruction(arranger.ptr(), pairsByteSize, capacity, bucketByteSize); + } + + std::size_t o_py_dict::measure(PyObject *dict) + { + auto count = dictSize(dict); + auto pairsByteSize = measurePairs(dict); + auto bucketByteSize = measureCollisionBuckets(dict, hashIndexCapacity(count)); + return measureMembers() + (db0::packed_int32::type(), count) + (db0::packed_int32::type(), checkedUint32Size(pairsByteSize, "Python dict pairs byte size")) + (db0::packed_int32::type(), checkedUint32Size(bucketByteSize, "Python dict bucket byte size")) + (pairsByteSize) + (hashIndexCapacity(count) * sizeof(std::uint32_t)) + (bucketByteSize); + } + + o_py_dict &o_py_dict::__ref(void *buf) + { + return *reinterpret_cast(buf); + } + + const o_py_dict &o_py_dict::__const_ref(const void *buf) + { + return *reinterpret_cast(buf); + } + + db0::Foundation::Type o_py_dict::type() + { + return db0::Foundation::Type(); + } + + o_py_dict::Element o_py_dict::elementFromPythonObject(PyObject *object) + { + auto &typeManager = db0::python::PyToolkit::getTypeManager(); + auto typeId = typeManager.getTypeId(object); + + switch (typeId) { + case db0::bindings::TypeId::NONE: + return Element::none(); + case db0::bindings::TypeId::BOOLEAN: + return Element::boolean(object == Py_True); + case db0::bindings::TypeId::INTEGER: { + auto value = PyLong_AsLongLong(object); + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Python integer is out of int64 range"; + } + return Element::integer(value); + } + case db0::bindings::TypeId::FLOAT: + return Element::floating(PyFloat_AsDouble(object)); + case db0::bindings::TypeId::DATETIME: + return Element::datetime(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DATETIME_TZ: + return Element::datetimeTz(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DATE: + return Element::date(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::TIME: + return Element::time(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::TIME_TZ: + return Element::timeTz(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DECIMAL: + return Element::decimal(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::STRING: { + return Element::string(typeManager.extractString(object)); + } + case db0::bindings::TypeId::BYTES: { + auto bytes = typeManager.extractBytes(object); + return Element::bytes(bytes.m_data, bytes.m_size); + } + case db0::bindings::TypeId::LIST: + case db0::bindings::TypeId::TUPLE: + return Element::embeddedTuple(o_py_tuple::measure(object), writePyTuple, object); + case db0::bindings::TypeId::SET: + return Element::embeddedSet(o_py_set::measure(object), writePySet, object); + case db0::bindings::TypeId::DICT: + return Element::embeddedDict(o_py_dict::measure(object), writePyDict, object); + default: + break; + } + + THROWF(db0::InputException) << "Unsupported o_py_dict item type: " << Py_TYPE(object)->tp_name; + return Element::none(); + } + + o_py_dict::Element o_py_dict::valueFromPythonDict(PyObject *dict, PyObject *key) + { + auto *value = PyDict_GetItemWithError(dict, key); + if (!value) { + if (PyErr_Occurred()) { + PyErr_Clear(); + } + THROWF(db0::InputException) << "Unable to read Python dict value"; + } + return elementFromPythonObject(value); + } + + std::uint32_t o_py_dict::dictSize(PyObject *dict) + { + if (!PyDict_Check(dict)) { + THROWF(db0::InputException) << "o_py_dict expects a Python dict"; + } + auto size = PyDict_Size(dict); + if (size < 0) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to read Python dict size"; + } + return checkedUint32Size(static_cast(size), "Python dict size"); + } + + std::size_t o_py_dict::measurePairs(PyObject *dict) + { + dictSize(dict); + std::size_t size = 0; + auto iterator = Py_OWN(PyObject_GetIter(dict)); + if (!iterator) { + PyErr_Clear(); + THROWF(db0::InputException) << "o_py_dict expects a Python dict"; + } + + Py_FOR(key, iterator) { + auto pairSize = Pair::measure(elementFromPythonObject(*key), valueFromPythonDict(dict, *key)); + if (size + pairSize < size) { + THROWF(db0::InternalException) << "Python dict pairs byte size overflow"; + } + size += pairSize; + } + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to iterate Python dict"; + } + + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Python dict pair block"); + return size; + } + + std::size_t o_py_dict::measureCollisionBuckets(PyObject *dict, std::size_t capacity) + { + dictSize(dict); + if (capacity == 0) { + return 0; + } + + struct BucketMeasure + { + std::uint32_t m_count = 0; + std::uint32_t m_keysByteSize = 0; + std::uint32_t m_valuesByteSize = 0; + }; + + std::vector buckets(capacity); + std::size_t size = 0; + auto iterator = Py_OWN(PyObject_GetIter(dict)); + if (!iterator) { + PyErr_Clear(); + THROWF(db0::InputException) << "o_py_dict expects a Python dict"; + } + + Py_FOR(key, iterator) { + auto keyElement = elementFromPythonObject(*key); + auto valueElement = valueFromPythonDict(dict, *key); + auto &bucket = buckets[elementHash(keyElement) % capacity]; + auto keySize = checkedUint32Size(Item::measure(keyElement), "Python dict bucket key byte size"); + auto valueSize = checkedUint32Size(Item::measure(valueElement), "Python dict bucket value byte size"); + auto growth = o_dict_bucket::measureGrowth( + bucket.m_count, bucket.m_keysByteSize, bucket.m_valuesByteSize, keySize, valueSize + ); + if (size + growth < size) { + THROWF(db0::InternalException) << "Python dict bucket block byte size overflow"; + } + size += growth; + ++bucket.m_count; + if (bucket.m_keysByteSize + keySize < bucket.m_keysByteSize + || bucket.m_valuesByteSize + valueSize < bucket.m_valuesByteSize) { + THROWF(db0::InternalException) << "Python dict bucket elements byte size overflow"; + } + bucket.m_keysByteSize += keySize; + bucket.m_valuesByteSize += valueSize; + } + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to iterate Python dict"; + } + + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Python dict bucket block"); + return size; + } + +} diff --git a/src/dbzero/object_model/dict/o_py_dict.hpp b/src/dbzero/object_model/dict/o_py_dict.hpp new file mode 100644 index 00000000..3d45db16 --- /dev/null +++ b/src/dbzero/object_model/dict/o_py_dict.hpp @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include +#include + +#include + +struct _object; +using PyObject = _object; + +namespace db0::object_model +{ + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_py_dict: public o_dict + { + public: + explicit o_py_dict(PyObject *dict); + + static std::size_t measure(PyObject *dict); + + template static std::size_t safeSizeOf(BufT buf) + { + return o_dict::safeSizeOf(buf); + } + + static o_py_dict &__ref(void *buf); + static const o_py_dict &__const_ref(const void *buf); + + template static o_py_dict &__new(void *buf, Args&& ...args) + { + return *(new(buf) o_py_dict(std::forward(args)...)); + } + + static db0::Foundation::Type type(); + + private: + static Element elementFromPythonObject(PyObject *object); + static Element valueFromPythonDict(PyObject *dict, PyObject *key); + static std::uint32_t dictSize(PyObject *dict); + static std::size_t measurePairs(PyObject *dict); + static std::size_t measureCollisionBuckets(PyObject *dict, std::size_t capacity); + }; +DB0_PACKED_END + +} diff --git a/src/dbzero/object_model/object/ObjectImplBase.cpp b/src/dbzero/object_model/object/ObjectImplBase.cpp index 310c6bac..9877c4a3 100644 --- a/src/dbzero/object_model/object/ObjectImplBase.cpp +++ b/src/dbzero/object_model/object/ObjectImplBase.cpp @@ -3,6 +3,7 @@ #include "ObjectImplBase.hpp" #include +#include #include #include #include @@ -41,14 +42,14 @@ namespace db0::object_model ObjectImplBase::ObjectImplBase(std::shared_ptr db0_class) { // prepare for initialization - InitManager::instance.addInitializer(*this, db0_class); + InitManager::instance.template addInitializerFor(*this, db0_class); } template ObjectImplBase::ObjectImplBase(TypeInitializer &&type_initializer) { // prepare for initialization - InitManager::instance.addInitializer(*this, std::move(type_initializer)); + InitManager::instance.template addInitializerFor(*this, std::move(type_initializer)); } template diff --git a/src/dbzero/object_model/object/ObjectInitializer.cpp b/src/dbzero/object_model/object/ObjectInitializer.cpp index 780c7dad..181ffe04 100644 --- a/src/dbzero/object_model/object/ObjectInitializer.cpp +++ b/src/dbzero/object_model/object/ObjectInitializer.cpp @@ -2,6 +2,7 @@ // Copyright (c) 2025 DBZero Software sp. z o.o. #include "ObjectInitializer.hpp" +#include #include #include @@ -57,7 +58,7 @@ namespace db0::object_model m_values.push_back({ loc.first, storage_class, value }, mask); m_has_value.set(loc, true); } - + bool ObjectInitializer::remove(std::pair loc, std::uint64_t mask) { if (!m_has_value.get(loc)) { @@ -78,7 +79,7 @@ namespace db0::object_model // retrieve the whole value return m_values.tryGetAt(loc.first, result); } - + db0::swine_ptr ObjectInitializer::getFixture() const { return getClass().getFixture(); } @@ -89,18 +90,25 @@ namespace db0::object_model std::pair ObjectInitializer::getData(PosVT::Data &data, unsigned int &offset) { - m_values.sortAndMerge(); - if (m_values.empty()) { + return getDataFrom(m_values, data, offset); + } + + std::pair ObjectInitializer::getDataFrom( + XValuesVector &initializationValues, PosVT::Data &data, unsigned int &offset + ) const + { + initializationValues.sortAndMerge(); + if (initializationValues.empty()) { // object has no data - return { &*m_values.begin(), &*m_values.end() }; + return { nullptr, nullptr }; } // offset if the first pos-vt index - offset = m_values.front().getIndex(); + offset = initializationValues.front().getIndex(); // Divide values into index-encoded and position-encoded (pos-vt) // index represents the number of pos-vt elements - auto index = m_values.size(); - auto it = m_values.begin() + index - 1; + auto index = initializationValues.size(); + auto it = initializationValues.begin() + index - 1; // below rule allows pos-vt to be created with the fill-rate of at least 50% while (index > 0 && ((it->getIndex() - offset) > ((index - offset) << 1))) { --index; @@ -119,7 +127,7 @@ namespace db0::object_model auto &values = data.m_values; types.reserve(size); values.reserve(size); - for (auto it = m_values.begin(), end = m_values.begin() + index; it != end; ++it) { + for (auto it = initializationValues.begin(), end = initializationValues.begin() + index; it != end; ++it) { // fill with undefined elements until reaching the index while (types.size() < (it->getIndex() - offset)) { types.push_back(StorageClass::UNDEFINED); @@ -132,7 +140,8 @@ namespace db0::object_model assert(types.size() == size); } - return { &*(m_values.begin() + index), &*(m_values.end()) }; + auto *begin = initializationValues.data(); + return { begin + index, begin + initializationValues.size() }; } void ObjectInitializer::incRef(bool is_tag) @@ -154,6 +163,102 @@ namespace db0::object_model return m_values.empty(); } + bool ImmutableObjectInitializer::isFixedStorageClass(StorageClass storage_class) + { + switch (storage_class) { + case StorageClass::UNDEFINED: + case StorageClass::DELETED: + case StorageClass::NONE: + case StorageClass::INT64: + case StorageClass::PTIME64: + case StorageClass::FP_NUMERIC64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + case StorageClass::BOOLEAN: + case StorageClass::PACK_2: + case StorageClass::PACKED_INT32: + return true; + default: + return false; + } + } + + void ImmutableObjectInitializer::setObject( + std::pair loc, StorageClass storage_class, Value value, + ObjectSharedPtr object, std::uint64_t mask + ) + { + set(loc, storage_class, value, mask); + if (isFixedStorageClass(storage_class)) { + eraseObjectAt(loc); + return; + } + + eraseObjectAt(loc); + m_objects.push_back({ loc, storage_class, std::move(object) }); + } + + bool ImmutableObjectInitializer::remove(std::pair loc, std::uint64_t mask) + { + eraseObjectAt(loc); + return ObjectInitializer::remove(loc, mask); + } + + bool ImmutableObjectInitializer::tryGetObjectAt( + std::pair loc, ObjectSharedPtr &object + ) const + { + for (const auto &value: m_objects) { + if (value.m_loc == loc) { + object = value.m_object; + return object.get() != nullptr; + } + } + return false; + } + + std::pair ImmutableObjectInitializer::getData( + PosVT::Data &data, unsigned int &offset + ) const + { + m_values.sortAndMerge(); + m_fixed_values.clear(); + m_fixed_values.reserve(m_values.size()); + for (const auto &value: m_values) { + m_fixed_values.push_back(value); + } + for (const auto &value: m_objects) { + assert(value.m_loc.second == 0 && "Variable-length embedded fields must use default fidelity"); + m_fixed_values.remove(value.m_loc.first); + } + return getDataFrom(m_fixed_values, data, offset); + } + + void ImmutableObjectInitializer::resetObjects() + { + m_objects.clear(); + m_fixed_values.clear(); + } + + const std::vector &ImmutableObjectInitializer::objects() const + { + return m_objects; + } + + void ImmutableObjectInitializer::eraseObjectAt(std::pair loc) + { + m_objects.erase( + std::remove_if(m_objects.begin(), m_objects.end(), [&](const auto &value) { + return value.m_loc == loc; + }), + m_objects.end() + ); + } + bool ObjectInitializer::trySetFixture(db0::swine_ptr &new_fixture) { assert(new_fixture); @@ -188,6 +293,9 @@ namespace db0::object_model void ObjectInitializerManager::closeAt(std::uint32_t loc) { auto result = m_initializers[loc]->getClassPtr(); + if (auto *initializer = dynamic_cast(m_initializers[loc].get())) { + initializer->resetObjects(); + } m_initializers[loc]->reset(); // move to inactive slot std::swap(m_initializers[loc], m_initializers[m_active_count - 1]); @@ -196,4 +304,4 @@ namespace db0::object_model --m_active_count; } -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/object/ObjectInitializer.hpp b/src/dbzero/object_model/object/ObjectInitializer.hpp index 8384037a..12389fcc 100644 --- a/src/dbzero/object_model/object/ObjectInitializer.hpp +++ b/src/dbzero/object_model/object/ObjectInitializer.hpp @@ -9,10 +9,14 @@ #include #include #include +#include +#include #include "ValueTable.hpp" #include #include #include +#include +#include #include #include "ValueTable.hpp" #include "XValuesVector.hpp" @@ -32,7 +36,9 @@ namespace db0::object_model class Class; class Object; + class ObjectImmutableImpl; class ObjectInitializer; + class ImmutableObjectInitializer; using Fixture = db0::Fixture; /** @@ -47,6 +53,9 @@ namespace db0::object_model template void addInitializer(T &object, Args&& ...args); + + template + void addInitializerFor(T &object, Args&& ...args); // Close the initializer and retrieve object's class template @@ -89,6 +98,8 @@ namespace db0::object_model public: using XValue = db0::object_model::XValue; using TypeInitializer = std::function(db0::swine_ptr &)>; + + virtual ~ObjectInitializer() = default; // loc - position in the initializer manager's array template @@ -137,7 +148,7 @@ namespace db0::object_model } // @param mask required for lo-fi types (pack-2) - void set(std::pair loc, StorageClass storage_class, Value value, + void set(std::pair loc, StorageClass storage_class, Value value, std::uint64_t mask = 0); bool remove(std::pair loc, std::uint64_t mask = 0); @@ -190,6 +201,9 @@ namespace db0::object_model void reset(); void operator=(std::uint32_t new_loc); + std::pair getDataFrom( + XValuesVector &values, PosVT::Data &data, unsigned int &pos_vt_offset + ) const; private: // maximum size of the position-encoded value-block (pos-VT) @@ -200,28 +214,85 @@ namespace db0::object_model // pointer to an implementation-specific type void *m_object_ptr = nullptr; mutable std::shared_ptr m_class; + + protected: // indexed initialization values mutable XValuesVector m_values; + + private: // flags indicating values presence (for fast removal pruning) mutable SparseBoolMatrix m_has_value; std::pair m_ref_counts = {0, 0}; mutable db0::swine_ptr m_fixture; mutable TypeInitializer m_type_initializer; }; + + class ImmutableObjectInitializer: public ObjectInitializer + { + public: + using ObjectSharedPtr = LangConfig::ObjectSharedPtr; + using ObjectInitializer::ObjectInitializer; + + void setObject( + std::pair loc, StorageClass storage_class, Value value, + ObjectSharedPtr object, std::uint64_t mask = 0 + ); + bool remove(std::pair loc, std::uint64_t mask = 0); + bool tryGetObjectAt(std::pair loc, ObjectSharedPtr &object) const; + std::pair getData(PosVT::Data &data, unsigned int &pos_vt_offset) const; + void resetObjects(); + + static bool isFixedStorageClass(StorageClass storage_class); + + struct ObjectValue + { + std::pair m_loc; + StorageClass m_storage_class = StorageClass::UNDEFINED; + ObjectSharedPtr m_object; + }; + + const std::vector &objects() const; + + private: + std::vector m_objects; + mutable XValuesVector m_fixed_values; + + void eraseObjectAt(std::pair loc); + }; template void ObjectInitializerManager::addInitializer(T &object, Args&& ...args) { + addInitializerFor(object, std::forward(args)...); + } + + template + void ObjectInitializerManager::addInitializerFor(T &object, Args&& ...args) + { + using InitializerT = std::conditional_t< + std::is_same_v, + ImmutableObjectInitializer, + ObjectInitializer + >; + + auto initAt = [&](std::uint32_t loc) { + if (m_initializers[loc] && typeid(*m_initializers[loc]) == typeid(InitializerT)) { + static_cast(m_initializers[loc].get())->init(object, std::forward(args)...); + } else { + m_initializers[loc].reset(new InitializerT(*this, loc, object, std::forward(args)...)); + } + }; + if (m_active_count < m_total_count) { auto loc = m_active_count++; - m_initializers[loc]->init(object, std::forward(args)...); + initAt(loc); return; } for (;;) { if (m_total_count < m_initializers.size()) { auto loc = m_total_count++; - m_initializers[loc].reset(new ObjectInitializer(*this, loc, object, std::forward(args)...)); + initAt(loc); ++m_active_count; return; } @@ -270,4 +341,4 @@ namespace db0::object_model return nullptr; } -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/object/ValueTable.cpp b/src/dbzero/object_model/object/ValueTable.cpp index cb04bcb3..2a591d7f 100644 --- a/src/dbzero/object_model/object/ValueTable.cpp +++ b/src/dbzero/object_model/object/ValueTable.cpp @@ -36,6 +36,10 @@ namespace db0::object_model std::size_t PosVT::size() const { return types().size(); } + + unsigned int PosVT::offset() const { + return types().offset(); + } std::size_t PosVT::measure(const Data &data, unsigned int offset) { @@ -148,4 +152,4 @@ namespace db0::object_model return std::memcmp(this, &other, this->sizeOf()) == 0; } -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/object/o_embedded_object.cpp b/src/dbzero/object_model/object/o_embedded_object.cpp new file mode 100644 index 00000000..3d12daed --- /dev/null +++ b/src/dbzero/object_model/object/o_embedded_object.cpp @@ -0,0 +1,197 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_embedded_object.hpp" + +#include +#include +#include +#include +#include + +namespace db0::object_model +{ + namespace + { + constexpr std::uint64_t PACK2_MASK = 0x3; + + void writePyTuple(void *buf, const void *source) + { + o_py_tuple::__new(buf, const_cast(static_cast(source))); + } + + void writePySet(void *buf, const void *source) + { + o_py_set::__new(buf, const_cast(static_cast(source))); + } + + void writePyDict(void *buf, const void *source) + { + o_py_dict::__new(buf, const_cast(static_cast(source))); + } + + o_dict::Element fieldMapElementFromObject( + StorageClass storageClass, ImmutableObjectInitializer::ObjectSharedPtr object + ) + { + auto *pyObject = object.get(); + if (!pyObject) { + THROWF(db0::InternalException) << "Cannot store null object in embedded field map"; + } + + switch (storageClass) { + case StorageClass::STRING_REF: + case StorageClass::POOLED_STRING: + case StorageClass::STR64: + return o_dict::Element::string(db0::python::PyToolkit::getTypeManager().extractString(pyObject)); + case StorageClass::DB0_BYTES: + case StorageClass::DB0_BYTES_ARRAY: { + auto bytes = db0::python::PyToolkit::getTypeManager().extractBytes(pyObject); + return o_dict::Element::bytes(bytes.m_data, bytes.m_size); + } + case StorageClass::DB0_LIST: + case StorageClass::DB0_TUPLE: { + auto size = o_py_tuple::measure(pyObject); + return o_dict::Element::embeddedTuple(size, writePyTuple, pyObject); + } + case StorageClass::DB0_SET: { + auto size = o_py_set::measure(pyObject); + return o_dict::Element::embeddedSet(size, writePySet, pyObject); + } + case StorageClass::DB0_DICT: { + auto size = o_py_dict::measure(pyObject); + return o_dict::Element::embeddedDict(size, writePyDict, pyObject); + } + default: + THROWF(db0::InputException) + << "Storage class cannot be stored in embedded field map: " << storageClass; + } + return o_dict::Element::none(); + } + + o_dict::ElementMap buildEmbeddedFieldMap(const ImmutableObjectInitializer &initializer) + { + o_dict::ElementMap fieldMap; + for (const auto &value: initializer.objects()) { + assert(value.m_loc.second == 0 && "Variable-length embedded fields must use default fidelity"); + fieldMap[o_dict::Element::integer(value.m_loc.first)] = + fieldMapElementFromObject(value.m_storage_class, value.m_object); + } + return fieldMap; + } + } + + FixedValue::FixedValue(StorageClass kind, std::uint64_t value) + : m_kind(kind) + , m_value(value) + { + } + + bool FixedValue::isPack2() const + { + return m_kind == StorageClass::PACK_2; + } + + std::optional FixedValue::unpack2(unsigned int offset) const + { + if (!isPack2()) { + return *this; + } + if (offset >= 32) { + THROWF(db0::InternalException) << "2-bit embedded value offset exceeds uint64 storage"; + } + auto value = (m_value >> (offset * 2)) & PACK2_MASK; + switch (value) { + case 0: + return FixedValue(StorageClass::NONE, 0); + case 1: + return FixedValue(StorageClass::BOOLEAN, 0); + case 2: + return FixedValue(StorageClass::BOOLEAN, 1); + default: + return std::nullopt; + } + } + + o_embedded_object::o_embedded_object( + std::uint32_t classRefValue, const ImmutableObjectInitializer &initializer + ) + { + PosVT::Data posVtData; + unsigned int posVtOffset = 0; + auto indexVtData = initializer.getData(posVtData, posVtOffset); + auto fieldMap = buildEmbeddedFieldMap(initializer); + arrangeMembers() + (db0::packed_int32::type(), classRefValue) + (PosVT::type(), posVtData, posVtOffset) + (IndexVT::type(), indexVtData.first, indexVtData.second) + (o_dict::type(), fieldMap); + } + + std::uint32_t o_embedded_object::getClassRef() const + { + return classRef().value(); + } + + const PosVT &o_embedded_object::pos_vt() const + { + return getDynAfter(classRef(), PosVT::type()); + } + + const IndexVT &o_embedded_object::index_vt() const + { + return getDynAfter(pos_vt(), IndexVT::type()); + } + + const o_dict &o_embedded_object::field_map() const + { + return getDynAfter(index_vt(), o_dict::type()); + } + + std::optional o_embedded_object::fixedValue( + std::uint32_t index, unsigned int fidelityOffset + ) const + { + std::pair posValue; + if (pos_vt().find(index, posValue)) { + auto value = FixedValue(posValue.first, posValue.second.m_store); + return value.isPack2() ? value.unpack2(fidelityOffset) : std::optional(value); + } + if (index_vt().find(index, posValue)) { + auto value = FixedValue(posValue.first, posValue.second.m_store); + return value.isPack2() ? value.unpack2(fidelityOffset) : std::optional(value); + } + return std::nullopt; + } + + const o_tuple_item *o_embedded_object::variableValue(std::uint32_t index) const + { + return field_map().get(o_dict::Element::integer(index)); + } + + std::size_t o_embedded_object::sizeOf() const + { + return safeSizeOf(reinterpret_cast(this)); + } + + std::size_t o_embedded_object::measure( + std::uint32_t classRefValue, const ImmutableObjectInitializer &initializer + ) + { + PosVT::Data posVtData; + unsigned int posVtOffset = 0; + auto indexVtData = initializer.getData(posVtData, posVtOffset); + auto fieldMap = buildEmbeddedFieldMap(initializer); + return measureMembers() + (db0::packed_int32::type(), classRefValue) + (PosVT::type(), posVtData, posVtOffset) + (IndexVT::type(), indexVtData.first, indexVtData.second) + (o_dict::type(), fieldMap); + } + + const db0::packed_int32 &o_embedded_object::classRef() const + { + return getDynFirst(db0::packed_int32::type()); + } + +} diff --git a/src/dbzero/object_model/object/o_embedded_object.hpp b/src/dbzero/object_model/object/o_embedded_object.hpp new file mode 100644 index 00000000..a96d8eb6 --- /dev/null +++ b/src/dbzero/object_model/object/o_embedded_object.hpp @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +namespace db0::object_model +{ + + struct FixedValue + { + StorageClass m_kind = StorageClass::UNDEFINED; + std::uint64_t m_value = 0; + + FixedValue() = default; + FixedValue(StorageClass kind, std::uint64_t value); + + bool isPack2() const; + std::optional unpack2(unsigned int offset) const; + }; + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_embedded_object: public db0::o_base + { + protected: + using super_t = db0::o_base; + friend super_t; + + public: + using Element = o_tuple_item::Element; + + o_embedded_object(std::uint32_t classRef, const ImmutableObjectInitializer &initializer); + + std::uint32_t getClassRef() const; + const PosVT &pos_vt() const; + const IndexVT &index_vt() const; + const o_dict &field_map() const; + std::optional fixedValue(std::uint32_t index, unsigned int fidelityOffset = 0) const; + const o_tuple_item *variableValue(std::uint32_t index) const; + std::size_t sizeOf() const; + + static std::size_t measure(std::uint32_t classRef, const ImmutableObjectInitializer &initializer); + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + cursor += db0::packed_int32::safeSizeOf(cursor); + cursor += PosVT::safeSizeOf(cursor); + cursor += IndexVT::safeSizeOf(cursor); + cursor += o_dict::safeSizeOf(cursor); + return cursor - start; + } + + protected: + o_embedded_object() = default; + + private: + const db0::packed_int32 &classRef() const; + }; +DB0_PACKED_END + +} diff --git a/src/dbzero/object_model/set/Set.hpp b/src/dbzero/object_model/set/Set.hpp index f12ded31..1c18d82b 100644 --- a/src/dbzero/object_model/set/Set.hpp +++ b/src/dbzero/object_model/set/Set.hpp @@ -33,7 +33,7 @@ namespace db0::object_model class SetIterator; DB0_PACKED_BEGIN - struct DB0_PACKED_ATTR o_set: public db0::o_fixed_versioned + struct DB0_PACKED_ATTR o_db0_set: public db0::o_fixed_versioned { // common object header o_unique_header m_header; @@ -47,12 +47,12 @@ DB0_PACKED_BEGIN }; DB0_PACKED_END - class Set: public db0::ObjectBase, StorageClass::DB0_SET> + class Set: public db0::ObjectBase, StorageClass::DB0_SET> { GC0_Declare public: - using super_t = db0::ObjectBase, StorageClass::DB0_SET>; - friend class db0::ObjectBase, StorageClass::DB0_SET>; + using super_t = db0::ObjectBase, StorageClass::DB0_SET>; + friend class db0::ObjectBase, StorageClass::DB0_SET>; using LangToolkit = db0::python::PyToolkit; using ObjectPtr = typename LangToolkit::ObjectPtr; using ObjectSharedPtr = typename LangToolkit::ObjectSharedPtr; @@ -106,4 +106,4 @@ DB0_PACKED_END void restoreIterators(); }; -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/set/o_py_set.cpp b/src/dbzero/object_model/set/o_py_set.cpp new file mode 100644 index 00000000..f07dc373 --- /dev/null +++ b/src/dbzero/object_model/set/o_py_set.cpp @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_py_set.hpp" + +#include + +#include + +#include +#include +#include +#include +#include + +namespace db0::object_model +{ + namespace + { + void writePyTuple(void *buf, const void *source) + { + o_py_tuple::__new(buf, const_cast(static_cast(source))); + } + + void writePySet(void *buf, const void *source) + { + o_py_set::__new(buf, const_cast(static_cast(source))); + } + + void writePyDict(void *buf, const void *source) + { + o_py_dict::__new(buf, const_cast(static_cast(source))); + } + } + + o_py_set::o_py_set(PyObject *iterable) + : o_set() + { + auto count = setSize(iterable); + auto elementsByteSize = checkedUint32Size(measureElements(iterable), "Python set elements byte size"); + auto capacity = hashIndexCapacity(count); + auto bucketByteSize = checkedUint32Size( + measureCollisionBuckets(iterable, capacity), "Python set bucket byte size" + ); + + auto arranger = arrangeSetMembers(count, elementsByteSize, bucketByteSize); + auto iterator = Py_OWN(PyObject_GetIter(iterable)); + if (!iterator) { + PyErr_Clear(); + THROWF(db0::InputException) << "o_py_set expects a Python set"; + } + + Py_FOR(item, iterator) { + arranger = arranger(Item::type(), elementFromPythonObject(*item)); + } + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to iterate Python set"; + } + + finishSetConstruction(arranger.ptr(), elementsByteSize, capacity, bucketByteSize); + } + + std::size_t o_py_set::measure(PyObject *iterable) + { + auto count = setSize(iterable); + auto elementsByteSize = measureElements(iterable); + auto bucketByteSize = measureCollisionBuckets(iterable, hashIndexCapacity(count)); + return measureMembers() + (db0::packed_int32::type(), count) + (db0::packed_int32::type(), checkedUint32Size(elementsByteSize, "Python set elements byte size")) + (db0::packed_int32::type(), checkedUint32Size(bucketByteSize, "Python set bucket byte size")) + (elementsByteSize) + (hashIndexCapacity(count) * sizeof(std::uint32_t)) + (bucketByteSize); + } + + o_py_set &o_py_set::__ref(void *buf) + { + return *reinterpret_cast(buf); + } + + const o_py_set &o_py_set::__const_ref(const void *buf) + { + return *reinterpret_cast(buf); + } + + db0::Foundation::Type o_py_set::type() + { + return db0::Foundation::Type(); + } + + o_py_set::Element o_py_set::elementFromPythonObject(PyObject *object) + { + auto &typeManager = db0::python::PyToolkit::getTypeManager(); + auto typeId = typeManager.getTypeId(object); + + switch (typeId) { + case db0::bindings::TypeId::NONE: + return Element::none(); + case db0::bindings::TypeId::BOOLEAN: + return Element::boolean(object == Py_True); + case db0::bindings::TypeId::INTEGER: { + auto value = PyLong_AsLongLong(object); + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Python integer is out of int64 range"; + } + return Element::integer(value); + } + case db0::bindings::TypeId::FLOAT: + return Element::floating(PyFloat_AsDouble(object)); + case db0::bindings::TypeId::DATETIME: + return Element::datetime(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DATETIME_TZ: + return Element::datetimeTz(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DATE: + return Element::date(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::TIME: + return Element::time(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::TIME_TZ: + return Element::timeTz(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DECIMAL: + return Element::decimal(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::STRING: { + return Element::string(typeManager.extractString(object)); + } + case db0::bindings::TypeId::BYTES: { + auto bytes = typeManager.extractBytes(object); + return Element::bytes(bytes.m_data, bytes.m_size); + } + case db0::bindings::TypeId::LIST: + case db0::bindings::TypeId::TUPLE: + return Element::embeddedTuple(o_py_tuple::measure(object), writePyTuple, object); + case db0::bindings::TypeId::SET: + return Element::embeddedSet(o_py_set::measure(object), writePySet, object); + case db0::bindings::TypeId::DICT: + return Element::embeddedDict(o_py_dict::measure(object), writePyDict, object); + default: + break; + } + + THROWF(db0::InputException) << "Unsupported o_py_set element type: " << Py_TYPE(object)->tp_name; + return Element::none(); + } + + std::uint32_t o_py_set::setSize(PyObject *set) + { + if (!PySet_Check(set)) { + THROWF(db0::InputException) << "o_py_set expects a Python set"; + } + auto size = PySet_GET_SIZE(set); + if (size < 0) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to read Python set size"; + } + return checkedUint32Size(static_cast(size), "Python set size"); + } + + std::size_t o_py_set::measureElements(PyObject *set) + { + setSize(set); + std::size_t size = 0; + auto iterator = Py_OWN(PyObject_GetIter(set)); + if (!iterator) { + PyErr_Clear(); + THROWF(db0::InputException) << "o_py_set expects a Python set"; + } + + Py_FOR(item, iterator) { + auto itemSize = Item::measure(elementFromPythonObject(*item)); + if (size + itemSize < size) { + THROWF(db0::InternalException) << "Python set elements byte size overflow"; + } + size += itemSize; + } + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to iterate Python set"; + } + + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Python set item block"); + return size; + } + + std::size_t o_py_set::measureCollisionBuckets(PyObject *set, std::size_t capacity) + { + setSize(set); + if (capacity == 0) { + return 0; + } + + struct BucketMeasure + { + std::uint32_t m_count = 0; + std::uint32_t m_elementsByteSize = 0; + }; + + std::vector buckets(capacity); + std::size_t size = 0; + auto iterator = Py_OWN(PyObject_GetIter(set)); + if (!iterator) { + PyErr_Clear(); + THROWF(db0::InputException) << "o_py_set expects a Python set"; + } + + Py_FOR(item, iterator) { + auto element = elementFromPythonObject(*item); + auto &bucket = buckets[elementHash(element) % capacity]; + auto itemSize = checkedUint32Size(Item::measure(element), "Python set bucket item byte size"); + auto growth = o_compact_tuple::Builder::measureGrowth(bucket.m_count, bucket.m_elementsByteSize, itemSize); + if (size + growth < size) { + THROWF(db0::InternalException) << "Python set bucket block byte size overflow"; + } + size += growth; + ++bucket.m_count; + if (bucket.m_elementsByteSize + itemSize < bucket.m_elementsByteSize) { + THROWF(db0::InternalException) << "Python set bucket elements byte size overflow"; + } + bucket.m_elementsByteSize += itemSize; + } + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Unable to iterate Python set"; + } + + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Python set bucket block"); + return size; + } + +} diff --git a/src/dbzero/object_model/set/o_py_set.hpp b/src/dbzero/object_model/set/o_py_set.hpp new file mode 100644 index 00000000..f4298f15 --- /dev/null +++ b/src/dbzero/object_model/set/o_py_set.hpp @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include +#include + +#include + +struct _object; +using PyObject = _object; + +namespace db0::object_model +{ + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_py_set: public o_set + { + public: + explicit o_py_set(PyObject *iterable); + + static std::size_t measure(PyObject *iterable); + + template static std::size_t safeSizeOf(BufT buf) + { + return o_set::safeSizeOf(buf); + } + + static o_py_set &__ref(void *buf); + static const o_py_set &__const_ref(const void *buf); + + template static o_py_set &__new(void *buf, Args&& ...args) + { + return *(new(buf) o_py_set(std::forward(args)...)); + } + + static db0::Foundation::Type type(); + + private: + static Element elementFromPythonObject(PyObject *object); + static std::uint32_t setSize(PyObject *set); + static std::size_t measureElements(PyObject *set); + static std::size_t measureCollisionBuckets(PyObject *set, std::size_t capacity); + }; +DB0_PACKED_END + +} diff --git a/src/dbzero/object_model/set/o_set.cpp b/src/dbzero/object_model/set/o_set.cpp new file mode 100644 index 00000000..b75243d7 --- /dev/null +++ b/src/dbzero/object_model/set/o_set.cpp @@ -0,0 +1,672 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_set.hpp" + +#include +#include +#include + +#include +#include + +namespace db0::object_model +{ + bool o_set::HashIndexEntry::isEmpty() const + { + return m_value == 0; + } + + bool o_set::HashIndexEntry::isBucket() const + { + return (m_value & BUCKET_FLAG) != 0; + } + + bool o_set::HashIndexEntry::isPendingBucket() const + { + return m_value == BUCKET_FLAG; + } + + std::uint32_t o_set::HashIndexEntry::offset() const + { + return (m_value & OFFSET_MASK) - 1; + } + + void o_set::HashIndexEntry::clear() + { + m_value = 0; + } + + void o_set::HashIndexEntry::setPendingBucket() + { + m_value = BUCKET_FLAG; + } + + void o_set::HashIndexEntry::setItem(std::uint32_t offset) + { + if (offset >= OFFSET_MASK) { + THROWF(db0::InternalException) << "Set item offset exceeds hash index entry capacity"; + } + m_value = offset + 1; + } + + void o_set::HashIndexEntry::setBucket(std::uint32_t offset) + { + if (offset >= OFFSET_MASK) { + THROWF(db0::InternalException) << "Set bucket offset exceeds hash index entry capacity"; + } + m_value = BUCKET_FLAG | (offset + 1); + } + + const o_set::Item &o_set::const_iterator::operator*() const + { + return *m_item; + } + + const o_set::Item *o_set::const_iterator::operator->() const + { + return m_item; + } + + o_set::const_iterator &o_set::const_iterator::operator++() + { + m_item = reinterpret_cast( + reinterpret_cast(m_item) + m_item->sizeOf() + ); + return *this; + } + + bool o_set::const_iterator::operator==(const const_iterator &other) const + { + return m_item == other.m_item; + } + + bool o_set::const_iterator::operator!=(const const_iterator &other) const + { + return m_item != other.m_item; + } + + o_set::const_iterator::const_iterator(const Item *item) + : m_item(item) + { + } + + std::size_t o_set::ElementHash::operator()(const Element &element) const + { + return elementHash(element); + } + + bool o_set::ElementEqual::operator()(const Element &lhs, const Element &rhs) const + { + return elementsEqual(lhs, rhs); + } + + o_set::o_set(const ElementSet &elements) + { + auto elementsByteSize = checkedUint32Size(measureElements(elements), "Set elements byte size"); + auto capacity = hashIndexCapacity(elements.size()); + auto bucketByteSize = checkedUint32Size( + measureCollisionBuckets(elements, capacity), "Set bucket byte size" + ); + + auto arranger = arrangeSetMembers(static_cast(elements.size()), elementsByteSize, bucketByteSize); + for (const auto &element: elements) { + arranger = arranger(Item::type(), element); + } + + finishSetConstruction(arranger.ptr(), elementsByteSize, capacity, bucketByteSize); + } + + db0::Foundation::Arranger o_set::arrangeSetMembers( + std::uint32_t count, std::uint32_t elementsByteSize, std::uint32_t bucketByteSize + ) + { + return arrangeMembers() + (db0::packed_int32::type(), count) + (db0::packed_int32::type(), elementsByteSize) + (db0::packed_int32::type(), bucketByteSize); + } + + void o_set::finishSetConstruction( + void *indexEntriesPtr, std::uint32_t elementsByteSize, std::size_t capacity, std::uint32_t bucketByteSize + ) + { + auto *indexEntries = reinterpret_cast(indexEntriesPtr); + auto bucketByteSizeWritten = writeCollisionBuckets( + indexEntries, beginOfItems(), elementsByteSize, capacity, reinterpret_cast(indexEntries + capacity) + ); + if (bucketByteSizeWritten != bucketByteSize) { + THROWF(db0::InternalException) << "Set bucket byte size changed during construction"; + } + } + + std::size_t o_set::size() const + { + return count().value(); + } + + std::size_t o_set::elementsByteSize() const + { + return elementsByteSizeMember().value(); + } + + bool o_set::empty() const + { + return size() == 0; + } + + bool o_set::contains(const Element &element) const + { + auto capacity = hashIndexCapacity(size()); + if (capacity == 0) { + return false; + } + + auto hash = elementHash(element); + const auto *entries = beginOfHashIndex(); + auto slot = hash % capacity; + const auto &entry = entries[slot]; + if (entry.isEmpty()) { + return false; + } + + auto offset = entry.offset(); + if (!entry.isBucket()) { + return itemEqualsElement(itemAtOffset(offset), element); + } + + const auto &bucket = bucketAtOffset(offset); + for (const auto &bucketItem: bucket) { + if (itemEqualsElement(bucketItem, element)) { + return true; + } + } + return false; + } + + const o_set::Item &o_set::item(std::size_t index) const + { + auto it = begin(); + for (std::size_t i = 0; i < index; ++i) { + ++it; + } + return *it; + } + + o_set::const_iterator o_set::begin() const + { + return const_iterator(reinterpret_cast(beginOfItems())); + } + + o_set::const_iterator o_set::end() const + { + return const_iterator(reinterpret_cast(beginOfItems() + elementsByteSize())); + } + + std::size_t o_set::sizeOf() const + { + return safeSizeOf(reinterpret_cast(this)); + } + + std::size_t o_set::measure(const ElementSet &elements) + { + auto elementsByteSize = measureElements(elements); + auto bucketByteSize = measureCollisionBuckets(elements, hashIndexCapacity(elements.size())); + return measureMembers() + (db0::packed_int32::type(), static_cast(elements.size())) + (db0::packed_int32::type(), checkedUint32Size(elementsByteSize, "Set elements byte size")) + (db0::packed_int32::type(), checkedUint32Size(bucketByteSize, "Set bucket byte size")) + (elementsByteSize) + (hashIndexByteSize(elements.size())) + (bucketByteSize); + } + + const db0::packed_int32 &o_set::count() const + { + return getDynFirst(db0::packed_int32::type()); + } + + const db0::packed_int32 &o_set::elementsByteSizeMember() const + { + return getDynAfter(count(), db0::packed_int32::type()); + } + + const db0::packed_int32 &o_set::bucketByteSizeMember() const + { + return getDynAfter(elementsByteSizeMember(), db0::packed_int32::type()); + } + + const std::byte *o_set::beginOfItems() const + { + const auto &bucketByteSizeMemberRef = bucketByteSizeMember(); + return reinterpret_cast(&bucketByteSizeMemberRef) + bucketByteSizeMemberRef.sizeOf(); + } + + o_set::HashIndexEntry *o_set::beginOfHashIndex() + { + return reinterpret_cast( + const_cast(static_cast(this)->beginOfItems()) + elementsByteSize() + ); + } + + const o_set::HashIndexEntry *o_set::beginOfHashIndex() const + { + return reinterpret_cast(beginOfItems() + elementsByteSize()); + } + + const std::byte *o_set::beginOfBuckets() const + { + return reinterpret_cast(beginOfHashIndex() + hashIndexCapacity(size())); + } + + const o_compact_tuple &o_set::bucketAtOffset(std::uint32_t offset) const + { + return o_compact_tuple::__const_ref(beginOfBuckets() + offset); + } + + const o_set::Item &o_set::itemAtOffset(std::uint32_t offset) const + { + return Item::__const_ref(beginOfItems() + offset); + } + + bool o_set::elementsEqual(const Element &lhs, const Element &rhs) + { + auto lhsIsInt = lhs.m_kind == StorageClass::INT64 || lhs.m_kind == StorageClass::PACKED_INT32; + auto rhsIsInt = rhs.m_kind == StorageClass::INT64 || rhs.m_kind == StorageClass::PACKED_INT32; + if (lhs.m_kind != rhs.m_kind && !(lhsIsInt && rhsIsInt)) { + return false; + } + + switch (lhs.m_kind) { + case StorageClass::NONE: + return true; + case StorageClass::BOOLEAN: + return lhs.boolValue() == rhs.boolValue(); + case StorageClass::INT64: + case StorageClass::PACKED_INT32: + return lhs.intValue() == rhs.intValue(); + case StorageClass::FP_NUMERIC64: + return lhs.doubleValue() == rhs.doubleValue(); + case StorageClass::STRING_REF: + return lhs.m_payload.m_string_value == rhs.m_payload.m_string_value; + case StorageClass::DB0_BYTES: + return lhs.bytesSize() == rhs.bytesSize() && bytesEqual(lhs.bytesData(), rhs.bytesData(), lhs.bytesSize()); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return lhs.bytesSize() == rhs.bytesSize() && bytesEqual(lhs.bytesData(), rhs.bytesData(), lhs.bytesSize()); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return lhs.uint64Value() == rhs.uint64Value(); + default: + THROWF(db0::InternalException) << "Unsupported set item kind"; + } + return false; + } + + bool o_set::itemEqualsElement(const Item &item, const Element &element) + { + auto itemIsInt = item.itemKind() == StorageClass::INT64 || item.itemKind() == StorageClass::PACKED_INT32; + auto elementIsInt = element.m_kind == StorageClass::INT64 || element.m_kind == StorageClass::PACKED_INT32; + if (item.itemKind() != element.m_kind && !(itemIsInt && elementIsInt)) { + return false; + } + + switch (element.m_kind) { + case StorageClass::NONE: + return true; + case StorageClass::BOOLEAN: + return item.boolPayload().value() == element.boolValue(); + case StorageClass::INT64: + case StorageClass::PACKED_INT32: { + auto itemValue = item.itemKind() == StorageClass::PACKED_INT32 + ? static_cast(item.packedIntPayload().value()) + : item.intPayload().value(); + return itemValue == element.intValue(); + } + case StorageClass::FP_NUMERIC64: + return item.doublePayload().value() == element.doubleValue(); + case StorageClass::STRING_REF: + return item.stringPayload().toString() == element.stringValue(); + case StorageClass::DB0_BYTES: + return item.bytesPayload().size() == element.bytesSize() + && bytesEqual(item.bytesPayload().begin(), element.bytesData(), element.bytesSize()); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return item.embeddedPayload().size() == element.bytesSize() + && bytesEqual(item.embeddedPayload().begin(), element.bytesData(), element.bytesSize()); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return item.uint64Payload().value() == element.uint64Value(); + default: + THROWF(db0::InternalException) << "Unsupported set item kind"; + } + return false; + } + + bool o_set::bytesEqual(const std::byte *lhs, const std::byte *rhs, std::size_t size) + { + return size == 0 || std::memcmp(lhs, rhs, size) == 0; + } + + std::size_t o_set::measureElements(const ElementSet &elements) + { + std::size_t size = 0; + for (const auto &element: elements) { + auto itemSize = Item::measure(element); + if (size + itemSize < size) { + THROWF(db0::InternalException) << "Set elements byte size overflow"; + } + size += itemSize; + } + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Set item block"); + return size; + } + + std::size_t o_set::measureCollisionBuckets(const ElementSet &elements, std::size_t capacity) + { + if (capacity == 0) { + return 0; + } + + struct BucketMeasure + { + std::uint32_t m_count = 0; + std::uint32_t m_elementsByteSize = 0; + }; + + std::vector buckets(capacity); + std::size_t size = 0; + for (const auto &element: elements) { + auto &bucket = buckets[elementHash(element) % capacity]; + auto itemSize = checkedUint32Size(Item::measure(element), "Set bucket item byte size"); + auto growth = o_compact_tuple::Builder::measureGrowth(bucket.m_count, bucket.m_elementsByteSize, itemSize); + if (size + growth < size) { + THROWF(db0::InternalException) << "Set bucket block byte size overflow"; + } + size += growth; + ++bucket.m_count; + if (bucket.m_elementsByteSize + itemSize < bucket.m_elementsByteSize) { + THROWF(db0::InternalException) << "Set bucket elements byte size overflow"; + } + bucket.m_elementsByteSize += itemSize; + } + checkedHashIndexOffset(size == 0 ? 0 : size - 1, "Set bucket block"); + return size; + } + + o_set::Element o_set::elementFromItem(const Item &item) + { + switch (item.itemKind()) { + case StorageClass::NONE: + return Element::none(); + case StorageClass::BOOLEAN: + return Element::boolean(item.boolPayload().value()); + case StorageClass::INT64: + return Element::integer(item.intPayload().value()); + case StorageClass::PACKED_INT32: + return Element::integer(static_cast(item.packedIntPayload().value())); + case StorageClass::FP_NUMERIC64: + return Element::floating(item.doublePayload().value()); + case StorageClass::STRING_REF: { + auto str = item.stringPayload().get(); + return Element::string(std::string_view(str.get_raw(), str.size())); + } + case StorageClass::DB0_BYTES: + return Element::bytes(item.bytesPayload().begin(), item.bytesPayload().size()); + case StorageClass::DB0_TUPLE: + return Element::embeddedTuple(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::DB0_SET: + return Element::embeddedSet(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::DB0_DICT: + return Element::embeddedDict(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::OBJECT_REF: + return Element::embeddedObject(item.embeddedPayload().begin(), item.embeddedPayload().size()); + case StorageClass::PTIME64: + return Element::timestamp(item.uint64Payload().value()); + case StorageClass::DATE: + return Element::date(item.uint64Payload().value()); + case StorageClass::DATETIME: + return Element::datetime(item.uint64Payload().value()); + case StorageClass::DATETIME_TZ: + return Element::datetimeTz(item.uint64Payload().value()); + case StorageClass::TIME: + return Element::time(item.uint64Payload().value()); + case StorageClass::TIME_TZ: + return Element::timeTz(item.uint64Payload().value()); + case StorageClass::DECIMAL: + return Element::decimal(item.uint64Payload().value()); + default: + THROWF(db0::InternalException) << "Unsupported set item kind"; + } + return Element::none(); + } + + std::uint32_t o_set::elementHash(const Element &element) + { + auto seedKind = element.m_kind == StorageClass::PACKED_INT32 ? StorageClass::INT64 : element.m_kind; + auto seed = 0x9e3779b9U ^ static_cast(seedKind); + switch (element.m_kind) { + case StorageClass::NONE: + return hashBytes(nullptr, 0, seed); + case StorageClass::BOOLEAN: + return hashBytes(&element.m_payload.m_bool_value, sizeof(element.m_payload.m_bool_value), seed); + case StorageClass::INT64: + case StorageClass::PACKED_INT32: + return hashBytes(&element.m_payload.m_int_value, sizeof(element.m_payload.m_int_value), seed); + case StorageClass::FP_NUMERIC64: + return hashBytes(&element.m_payload.m_double_value, sizeof(element.m_payload.m_double_value), seed); + case StorageClass::STRING_REF: + return hashBytes( + element.m_payload.m_string_value.data(), element.m_payload.m_string_value.size(), seed + ); + case StorageClass::DB0_BYTES: + return hashBytes(element.bytesData(), element.bytesSize(), seed); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: { + if (element.m_payload.m_bytes_value.m_writer) { + std::vector payload(element.bytesSize()); + element.m_payload.m_bytes_value.m_writer(payload.data(), element.m_payload.m_bytes_value.m_source); + return hashBytes(payload.data(), payload.size(), seed); + } + return hashBytes(element.bytesData(), element.bytesSize(), seed); + } + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return hashBytes(&element.m_payload.m_uint64_value, sizeof(element.m_payload.m_uint64_value), seed); + default: + THROWF(db0::InternalException) << "Unsupported set item kind"; + } + return 0; + } + + std::uint32_t o_set::itemHash(const Item &item) + { + auto seedKind = item.itemKind() == StorageClass::PACKED_INT32 ? StorageClass::INT64 : item.itemKind(); + auto seed = 0x9e3779b9U ^ static_cast(seedKind); + switch (item.itemKind()) { + case StorageClass::NONE: + return hashBytes(nullptr, 0, seed); + case StorageClass::BOOLEAN: { + auto value = item.boolPayload().value(); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::INT64: { + auto value = item.intPayload().value(); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::PACKED_INT32: { + auto value = static_cast(item.packedIntPayload().value()); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::FP_NUMERIC64: { + auto value = item.doublePayload().value(); + return hashBytes(&value, sizeof(value), seed); + } + case StorageClass::STRING_REF: { + auto str = item.stringPayload().get(); + return hashBytes(str.get_raw(), str.size(), seed); + } + case StorageClass::DB0_BYTES: + return hashBytes(item.bytesPayload().begin(), item.bytesPayload().size(), seed); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return hashBytes(item.embeddedPayload().begin(), item.embeddedPayload().size(), seed); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: { + auto value = item.uint64Payload().value(); + return hashBytes(&value, sizeof(value), seed); + } + default: + THROWF(db0::InternalException) << "Unsupported set item kind"; + } + return 0; + } + + std::uint32_t o_set::hashBytes(const void *data, std::size_t size, std::uint32_t seed) + { + static const std::byte empty = std::byte{0}; + auto hash = db0::murmurhash64A(size == 0 ? &empty : data, size, seed); + return static_cast(hash ^ (hash >> 32)); + } + + std::size_t o_set::hashIndexCapacity(std::size_t count) + { + if (count == 0) { + return 0; + } + + std::size_t capacity = 1; + while (capacity < count * 2) { + capacity <<= 1; + } + return capacity; + } + + std::size_t o_set::hashIndexByteSize(std::size_t count) + { + return hashIndexCapacity(count) * sizeof(HashIndexEntry); + } + + std::uint32_t o_set::buildHashIndex( + HashIndexEntry *indexEntries, const std::byte *itemsBegin, std::size_t itemsByteSize, std::size_t capacity + ) + { + for (std::size_t i = 0; i < capacity; ++i) { + indexEntries[i].clear(); + } + + if (capacity == 0) { + return 0; + } + + auto *cursor = itemsBegin; + auto *itemsEnd = itemsBegin + itemsByteSize; + while (cursor < itemsEnd) { + const auto &item = Item::__const_ref(cursor); + auto itemOffset = checkedHashIndexOffset(cursor - itemsBegin, "Set item"); + auto slot = itemHash(item) % capacity; + auto &entry = indexEntries[slot]; + if (entry.isEmpty()) { + entry.setItem(itemOffset); + } else if (!entry.isPendingBucket()) { + entry.setPendingBucket(); + } + cursor += item.sizeOf(); + } + return checkedHashIndexOffset(itemsByteSize == 0 ? 0 : itemsByteSize - 1, "Set item block"); + } + + std::uint32_t o_set::writeCollisionBuckets( + HashIndexEntry *indexEntries, const std::byte *itemsBegin, std::size_t itemsByteSize, + std::size_t capacity, std::byte *bucketStart + ) + { + buildHashIndex(indexEntries, itemsBegin, itemsByteSize, capacity); + + auto *bucketCursor = bucketStart; + auto *itemsEnd = itemsBegin + itemsByteSize; + for (std::size_t slot = 0; slot < capacity; ++slot) { + if (!indexEntries[slot].isPendingBucket()) { + continue; + } + + std::uint32_t count = 0; + std::size_t elementsByteSize = 0; + auto *cursor = itemsBegin; + while (cursor < itemsEnd) { + const auto &item = Item::__const_ref(cursor); + if (itemHash(item) % capacity == slot) { + ++count; + auto itemSize = item.sizeOf(); + if (elementsByteSize + itemSize < elementsByteSize) { + THROWF(db0::InternalException) << "Set bucket elements byte size overflow"; + } + elementsByteSize += itemSize; + } + cursor += item.sizeOf(); + } + + indexEntries[slot].setBucket(checkedHashIndexOffset(bucketCursor - bucketStart, "Set bucket")); + o_compact_tuple::Builder tupleBuilder( + bucketCursor, count, checkedUint32Size(elementsByteSize, "Set bucket elements byte size") + ); + cursor = itemsBegin; + while (cursor < itemsEnd) { + const auto &item = Item::__const_ref(cursor); + if (itemHash(item) % capacity == slot) { + tupleBuilder.add(elementFromItem(item)); + } + cursor += item.sizeOf(); + } + auto &tuple = tupleBuilder.finish(); + bucketCursor += tuple.sizeOf(); + } + + return checkedHashIndexOffset(bucketCursor - bucketStart, "Set bucket block"); + } + + std::uint32_t o_set::checkedHashIndexOffset(std::size_t offset, const char *name) + { + if (offset >= HashIndexEntry::OFFSET_MASK) { + THROWF(db0::InternalException) << name << " offset exceeds hash index entry capacity"; + } + return static_cast(offset); + } + + std::uint32_t o_set::checkedUint32Size(std::size_t size, const char *name) + { + if (size > std::numeric_limits::max()) { + THROWF(db0::InternalException) << name << " exceeds uint32 range"; + } + return static_cast(size); + } + +} diff --git a/src/dbzero/object_model/set/o_set.hpp b/src/dbzero/object_model/set/o_set.hpp new file mode 100644 index 00000000..5265ef72 --- /dev/null +++ b/src/dbzero/object_model/set/o_set.hpp @@ -0,0 +1,153 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include +#include + +#include +#include +#include +#include + +namespace db0::object_model +{ + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_set: public db0::o_base + { + protected: + using super_t = db0::o_base; + friend super_t; + + public: + using Element = o_tuple_item::Element; + using Item = o_tuple_item; + + struct ElementHash + { + std::size_t operator()(const Element &element) const; + }; + + struct ElementEqual + { + bool operator()(const Element &lhs, const Element &rhs) const; + }; + + using ElementSet = std::unordered_set; + + class const_iterator + { + public: + const_iterator() = default; + + const Item &operator*() const; + const Item *operator->() const; + const_iterator &operator++(); + bool operator==(const const_iterator &other) const; + bool operator!=(const const_iterator &other) const; + + private: + friend class o_set; + + explicit const_iterator(const Item *item); + + const Item *m_item = nullptr; + }; + + explicit o_set(const ElementSet &elements); + + std::size_t size() const; + bool empty() const; + bool contains(const Element &element) const; + const_iterator begin() const; + const_iterator end() const; + std::size_t sizeOf() const; + + static std::size_t measure(const ElementSet &elements); + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + + auto countAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + auto elementsByteSizeAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + auto bucketByteSizeAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + + auto elementsByteSize = db0::packed_int32::__const_ref(elementsByteSizeAt).value(); + auto bucketByteSize = db0::packed_int32::__const_ref(bucketByteSizeAt).value(); + cursor += elementsByteSize; + auto count = db0::packed_int32::__const_ref(countAt).value(); + cursor += hashIndexByteSize(count); + cursor += bucketByteSize; + return cursor - start; + } + + protected: + o_set() = default; + + db0::Foundation::Arranger arrangeSetMembers( + std::uint32_t count, std::uint32_t elementsByteSize, std::uint32_t bucketByteSize + ); + void finishSetConstruction( + void *indexEntries, std::uint32_t elementsByteSize, std::size_t capacity, std::uint32_t bucketByteSize + ); + + static std::uint32_t elementHash(const Element &element); + static std::size_t hashIndexCapacity(std::size_t count); + static std::uint32_t checkedHashIndexOffset(std::size_t offset, const char *name); + static std::uint32_t checkedUint32Size(std::size_t size, const char *name); + + private: + struct HashIndexEntry + { + static constexpr std::uint32_t BUCKET_FLAG = 0x80000000U; + static constexpr std::uint32_t OFFSET_MASK = 0x7fffffffU; + + std::uint32_t m_value = 0; + + bool isEmpty() const; + bool isBucket() const; + bool isPendingBucket() const; + std::uint32_t offset() const; + void clear(); + void setPendingBucket(); + void setItem(std::uint32_t offset); + void setBucket(std::uint32_t offset); + }; + static_assert(sizeof(HashIndexEntry) == sizeof(std::uint32_t)); + + std::size_t elementsByteSize() const; + const Item &item(std::size_t index) const; + const db0::packed_int32 &count() const; + const db0::packed_int32 &elementsByteSizeMember() const; + const db0::packed_int32 &bucketByteSizeMember() const; + const std::byte *beginOfItems() const; + const std::byte *beginOfBuckets() const; + HashIndexEntry *beginOfHashIndex(); + const HashIndexEntry *beginOfHashIndex() const; + const o_compact_tuple &bucketAtOffset(std::uint32_t offset) const; + const Item &itemAtOffset(std::uint32_t offset) const; + + static bool elementsEqual(const Element &lhs, const Element &rhs); + static bool itemEqualsElement(const Item &item, const Element &element); + static bool bytesEqual(const std::byte *lhs, const std::byte *rhs, std::size_t size); + static std::size_t measureElements(const ElementSet &elements); + static std::size_t measureCollisionBuckets(const ElementSet &elements, std::size_t capacity); + static Element elementFromItem(const Item &item); + static std::uint32_t itemHash(const Item &item); + static std::uint32_t hashBytes(const void *data, std::size_t size, std::uint32_t seed); + static std::size_t hashIndexByteSize(std::size_t count); + static std::uint32_t buildHashIndex(HashIndexEntry *indexEntries, const std::byte *itemsBegin, std::size_t itemsByteSize, std::size_t capacity); + static std::uint32_t writeCollisionBuckets(HashIndexEntry *indexEntries, const std::byte *itemsBegin, std::size_t itemsByteSize, std::size_t capacity, std::byte *bucketStart); + }; +DB0_PACKED_END + +} diff --git a/src/dbzero/object_model/tuple/Tuple.cpp b/src/dbzero/object_model/tuple/Tuple.cpp index 3704af57..5a34d0b5 100644 --- a/src/dbzero/object_model/tuple/Tuple.cpp +++ b/src/dbzero/object_model/tuple/Tuple.cpp @@ -14,23 +14,23 @@ namespace db0::object_model GC0_Define(Tuple) - o_tuple::o_tuple(std::size_t size) + o_db0_tuple::o_db0_tuple(std::size_t size) { arrangeMembers() (o_micro_array::type(), size).ptr(); } - std::size_t o_tuple::size() const { + std::size_t o_db0_tuple::size() const { return items().size(); } - std::size_t o_tuple::sizeOf() const + std::size_t o_db0_tuple::sizeOf() const { return sizeOfMembers() (o_micro_array::type()); } - std::size_t o_tuple::measure(std::size_t size) + std::size_t o_db0_tuple::measure(std::size_t size) { return measureMembers() (o_micro_array::measure(size)); diff --git a/src/dbzero/object_model/tuple/Tuple.hpp b/src/dbzero/object_model/tuple/Tuple.hpp index 00ff6d56..b96a0cbc 100644 --- a/src/dbzero/object_model/tuple/Tuple.hpp +++ b/src/dbzero/object_model/tuple/Tuple.hpp @@ -28,13 +28,13 @@ namespace db0::object_model class TupleIterator; DB0_PACKED_BEGIN - class DB0_PACKED_ATTR o_tuple: public o_base + class DB0_PACKED_ATTR o_db0_tuple: public o_base { protected: - using super_t = o_base; + using super_t = o_base; friend super_t; - o_tuple(std::size_t size); + o_db0_tuple(std::size_t size); public: // common object header @@ -67,11 +67,11 @@ DB0_PACKED_BEGIN }; DB0_PACKED_END - class Tuple: public db0::ObjectBase, StorageClass::DB0_TUPLE> + class Tuple: public db0::ObjectBase, StorageClass::DB0_TUPLE> { GC0_Declare public: - using super_t = db0::ObjectBase, StorageClass::DB0_TUPLE>; + using super_t = db0::ObjectBase, StorageClass::DB0_TUPLE>; friend super_t; using LangToolkit = db0::python::PyToolkit; using ObjectPtr = typename LangToolkit::ObjectPtr; @@ -108,4 +108,4 @@ DB0_PACKED_END std::shared_ptr getIterator(ObjectPtr lang_tuple) const; }; -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/tuple/o_py_tuple.cpp b/src/dbzero/object_model/tuple/o_py_tuple.cpp new file mode 100644 index 00000000..37a0e750 --- /dev/null +++ b/src/dbzero/object_model/tuple/o_py_tuple.cpp @@ -0,0 +1,156 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_py_tuple.hpp" + +#include + +#include +#include +#include +#include + +namespace db0::object_model +{ + namespace + { + void writePyTuple(void *buf, const void *source) + { + o_py_tuple::__new(buf, const_cast(static_cast(source))); + } + + void writePySet(void *buf, const void *source) + { + o_py_set::__new(buf, const_cast(static_cast(source))); + } + + void writePyDict(void *buf, const void *source) + { + o_py_dict::__new(buf, const_cast(static_cast(source))); + } + } + + o_py_tuple::o_py_tuple(PyObject *sequence) + : o_tuple<>() + { + auto count = static_cast(sequenceSize(sequence)); + auto elementsByteSize = static_cast(measureElements(sequence)); + + auto arranger = arrangeMembers(); + arranger = arranger(db0::packed_int32::type(), count); + arranger = arranger(db0::packed_int32::type(), elementsByteSize); + for (std::size_t i = 0; i < count; ++i) { + arranger = arranger(o_tuple_item::type(), elementFromPythonObject(sequenceItem(sequence, i))); + } + } + + std::size_t o_py_tuple::measure(PyObject *sequence) + { + auto count = static_cast(sequenceSize(sequence)); + auto elementsByteSize = measureElements(sequence); + return measureMembers() + (db0::packed_int32::type(), count) + (db0::packed_int32::type(), static_cast(elementsByteSize)) + (elementsByteSize); + } + + o_py_tuple &o_py_tuple::__ref(void *buf) + { + return *reinterpret_cast(buf); + } + + const o_py_tuple &o_py_tuple::__const_ref(const void *buf) + { + return *reinterpret_cast(buf); + } + + db0::Foundation::Type o_py_tuple::type() + { + return db0::Foundation::Type(); + } + + o_py_tuple::Element o_py_tuple::elementFromPythonObject(PyObject *object) + { + auto &typeManager = db0::python::PyToolkit::getTypeManager(); + auto typeId = typeManager.getTypeId(object); + + switch (typeId) { + case db0::bindings::TypeId::NONE: + return Element::none(); + case db0::bindings::TypeId::BOOLEAN: + return Element::boolean(object == Py_True); + case db0::bindings::TypeId::INTEGER: { + auto value = PyLong_AsLongLong(object); + if (PyErr_Occurred()) { + PyErr_Clear(); + THROWF(db0::InputException) << "Python integer is out of int64 range"; + } + return Element::integer(value); + } + case db0::bindings::TypeId::FLOAT: + return Element::floating(PyFloat_AsDouble(object)); + case db0::bindings::TypeId::DATETIME: + return Element::datetime(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DATETIME_TZ: + return Element::datetimeTz(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DATE: + return Element::date(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::TIME: + return Element::time(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::TIME_TZ: + return Element::timeTz(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::DECIMAL: + return Element::decimal(typeManager.extractUInt64(typeId, object)); + case db0::bindings::TypeId::STRING: { + return Element::string(typeManager.extractString(object)); + } + case db0::bindings::TypeId::BYTES: { + auto bytes = typeManager.extractBytes(object); + return Element::bytes(bytes.m_data, bytes.m_size); + } + case db0::bindings::TypeId::LIST: + case db0::bindings::TypeId::TUPLE: + return Element::embeddedTuple(o_py_tuple::measure(object), writePyTuple, object); + case db0::bindings::TypeId::SET: + return Element::embeddedSet(o_py_set::measure(object), writePySet, object); + case db0::bindings::TypeId::DICT: + return Element::embeddedDict(o_py_dict::measure(object), writePyDict, object); + default: + break; + } + + THROWF(db0::InputException) << "Unsupported o_py_tuple element type: " << Py_TYPE(object)->tp_name; + return Element::none(); + } + + std::size_t o_py_tuple::sequenceSize(PyObject *sequence) + { + if (PyTuple_Check(sequence)) { + return static_cast(PyTuple_GET_SIZE(sequence)); + } + if (PyList_Check(sequence)) { + return static_cast(PyList_GET_SIZE(sequence)); + } + THROWF(db0::InputException) << "o_py_tuple expects a Python tuple or list"; + return 0; + } + + PyObject *o_py_tuple::sequenceItem(PyObject *sequence, std::size_t index) + { + if (PyTuple_Check(sequence)) { + return PyTuple_GET_ITEM(sequence, static_cast(index)); + } + return PyList_GET_ITEM(sequence, static_cast(index)); + } + + std::size_t o_py_tuple::measureElements(PyObject *sequence) + { + auto count = sequenceSize(sequence); + std::size_t size = 0; + for (std::size_t i = 0; i < count; ++i) { + size += o_tuple_item::measure(elementFromPythonObject(sequenceItem(sequence, i))); + } + return size; + } + +} diff --git a/src/dbzero/object_model/tuple/o_py_tuple.hpp b/src/dbzero/object_model/tuple/o_py_tuple.hpp new file mode 100644 index 00000000..9a64d4c5 --- /dev/null +++ b/src/dbzero/object_model/tuple/o_py_tuple.hpp @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include + +#include + +struct _object; +using PyObject = _object; + +namespace db0::object_model +{ + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_py_tuple: public o_tuple<> + { + public: + explicit o_py_tuple(PyObject *sequence); + + static std::size_t measure(PyObject *sequence); + + template static std::size_t safeSizeOf(BufT buf) + { + return o_tuple<>::safeSizeOf(buf); + } + + static o_py_tuple &__ref(void *buf); + static const o_py_tuple &__const_ref(const void *buf); + + template static o_py_tuple &__new(void *buf, Args&& ...args) + { + return *(new(buf) o_py_tuple(std::forward(args)...)); + } + + static db0::Foundation::Type type(); + + private: + static Element elementFromPythonObject(PyObject *object); + static std::size_t sequenceSize(PyObject *sequence); + static PyObject *sequenceItem(PyObject *sequence, std::size_t index); + static std::size_t measureElements(PyObject *sequence); + }; +DB0_PACKED_END + +} diff --git a/src/dbzero/object_model/tuple/o_tuple.cpp b/src/dbzero/object_model/tuple/o_tuple.cpp new file mode 100644 index 00000000..784fa06a --- /dev/null +++ b/src/dbzero/object_model/tuple/o_tuple.cpp @@ -0,0 +1,626 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include "o_tuple.hpp" + +#include + +#include + +namespace db0::object_model +{ + + o_tuple_item::Element::Payload::Payload() + : m_int_value(0) + { + } + + o_tuple_item::Element o_tuple_item::Element::none() + { + return { StorageClass::NONE }; + } + + o_tuple_item::Element o_tuple_item::Element::boolean(bool value) + { + Element result; + result.m_kind = StorageClass::BOOLEAN; + result.m_payload.m_bool_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::integer(std::int64_t value) + { + Element result; + if (value >= 0 && value <= std::numeric_limits::max()) { + result.m_kind = StorageClass::PACKED_INT32; + } else { + result.m_kind = StorageClass::INT64; + } + result.m_payload.m_int_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::floating(double value) + { + Element result; + result.m_kind = StorageClass::FP_NUMERIC64; + result.m_payload.m_double_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::string(std::string_view value) + { + Element result; + result.m_kind = StorageClass::STRING_REF; + result.m_payload.m_string_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::bytes(const std::byte *data, std::size_t size) + { + Element result; + result.m_kind = StorageClass::DB0_BYTES; + result.m_payload.m_bytes_value = { data, size }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::bytes(const std::vector &value) + { + return bytes(value.data(), value.size()); + } + + o_tuple_item::Element o_tuple_item::Element::timestamp(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::PTIME64; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::date(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::DATE; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::datetime(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::DATETIME; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::datetimeTz(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::DATETIME_TZ; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::time(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::TIME; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::timeTz(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::TIME_TZ; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::decimal(std::uint64_t value) + { + Element result; + result.m_kind = StorageClass::DECIMAL; + result.m_payload.m_uint64_value = value; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedTuple(const void *data, std::size_t size) + { + Element result; + result.m_kind = StorageClass::DB0_TUPLE; + result.m_payload.m_bytes_value = { reinterpret_cast(data), size }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedSet(const void *data, std::size_t size) + { + Element result; + result.m_kind = StorageClass::DB0_SET; + result.m_payload.m_bytes_value = { reinterpret_cast(data), size }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedDict(const void *data, std::size_t size) + { + Element result; + result.m_kind = StorageClass::DB0_DICT; + result.m_payload.m_bytes_value = { reinterpret_cast(data), size }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedObject(const void *data, std::size_t size) + { + Element result; + result.m_kind = StorageClass::OBJECT_REF; + result.m_payload.m_bytes_value = { reinterpret_cast(data), size }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedTuple( + std::size_t size, BytesView::Writer writer, const void *source + ) + { + Element result; + result.m_kind = StorageClass::DB0_TUPLE; + result.m_payload.m_bytes_value = { nullptr, size, writer, source }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedSet( + std::size_t size, BytesView::Writer writer, const void *source + ) + { + Element result; + result.m_kind = StorageClass::DB0_SET; + result.m_payload.m_bytes_value = { nullptr, size, writer, source }; + return result; + } + + o_tuple_item::Element o_tuple_item::Element::embeddedDict( + std::size_t size, BytesView::Writer writer, const void *source + ) + { + Element result; + result.m_kind = StorageClass::DB0_DICT; + result.m_payload.m_bytes_value = { nullptr, size, writer, source }; + return result; + } + + std::int64_t o_tuple_item::Element::intValue() const + { + return m_payload.m_int_value; + } + + std::uint64_t o_tuple_item::Element::uint64Value() const + { + return m_payload.m_uint64_value; + } + + double o_tuple_item::Element::doubleValue() const + { + return m_payload.m_double_value; + } + + bool o_tuple_item::Element::boolValue() const + { + return m_payload.m_bool_value; + } + + std::string o_tuple_item::Element::stringValue() const + { + return std::string(m_payload.m_string_value); + } + + const std::byte *o_tuple_item::Element::bytesData() const + { + return m_payload.m_bytes_value.m_data; + } + + std::size_t o_tuple_item::Element::bytesSize() const + { + return m_payload.m_bytes_value.m_size; + } + + o_tuple_item::o_tuple_item(const Element &element) + : m_kind(element.m_kind) + { + arrangePayload(element); + } + + StorageClass o_tuple_item::itemKind() const + { + return m_kind; + } + + std::size_t o_tuple_item::sizeOf() const + { + switch (m_kind) { + case StorageClass::NONE: + return sizeOfMembers(); + case StorageClass::BOOLEAN: + return sizeOfMembers()(o_simple::type()); + case StorageClass::INT64: + return sizeOfMembers()(o_simple::type()); + case StorageClass::PACKED_INT32: + return sizeOfMembers()(packed_int32::type()); + case StorageClass::FP_NUMERIC64: + return sizeOfMembers()(o_simple::type()); + case StorageClass::STRING_REF: + return sizeOfMembers()(o_string::type()); + case StorageClass::DB0_BYTES: + return sizeOfMembers()(o_binary::type()); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return sizeOfMembers()(o_binary::type()); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return sizeOfMembers()(o_simple::type()); + default: + throwUnsupportedItemKind(); + return 0; + } + } + + std::size_t o_tuple_item::measure(const Element &element) + { + switch (element.m_kind) { + case StorageClass::NONE: + return measureMembers(); + case StorageClass::BOOLEAN: + return measureMembers()(o_simple::type(), element.boolValue()); + case StorageClass::INT64: + return measureMembers()(o_simple::type(), element.intValue()); + case StorageClass::PACKED_INT32: + return measureMembers()(packed_int32::type(), static_cast(element.intValue())); + case StorageClass::FP_NUMERIC64: + return measureMembers()(o_simple::type(), element.doubleValue()); + case StorageClass::STRING_REF: + return measureMembers()(o_string::type(), element.stringValue()); + case StorageClass::DB0_BYTES: + return measureMembers()(o_binary::type(), element.bytesData(), element.bytesSize()); + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + return measureMembers()(o_binary::type(), element.bytesData(), element.bytesSize()); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + return measureMembers()(o_simple::type(), element.uint64Value()); + default: + throwUnsupportedItemKind(); + return 0; + } + } + + void o_tuple_item::arrangePayload(const Element &element) + { + switch (element.m_kind) { + case StorageClass::NONE: + arrangeMembers(); + return; + case StorageClass::BOOLEAN: + arrangeMembers()(o_simple::type(), element.boolValue()); + return; + case StorageClass::INT64: + arrangeMembers()(o_simple::type(), element.intValue()); + return; + case StorageClass::PACKED_INT32: + arrangeMembers()(packed_int32::type(), static_cast(element.intValue())); + return; + case StorageClass::FP_NUMERIC64: + arrangeMembers()(o_simple::type(), element.doubleValue()); + return; + case StorageClass::STRING_REF: + arrangeMembers()(o_string::type(), element.stringValue()); + return; + case StorageClass::DB0_BYTES: + arrangeMembers()(o_binary::type(), element.bytesData(), element.bytesSize()); + return; + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + if (element.m_payload.m_bytes_value.m_writer) { + arrangeMembers()( + o_binary::type(), element.bytesSize(), element.m_payload.m_bytes_value.m_writer, + element.m_payload.m_bytes_value.m_source + ); + } else { + arrangeMembers()(o_binary::type(), element.bytesData(), element.bytesSize()); + } + return; + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + arrangeMembers()(o_simple::type(), element.uint64Value()); + return; + default: + throwUnsupportedItemKind(); + } + } + + const o_simple &o_tuple_item::boolPayload() const + { + return getDynFirst(o_simple::type()); + } + + const o_simple &o_tuple_item::intPayload() const + { + return getDynFirst(o_simple::type()); + } + + const packed_int32 &o_tuple_item::packedIntPayload() const + { + return getDynFirst(packed_int32::type()); + } + + const o_simple &o_tuple_item::uint64Payload() const + { + return getDynFirst(o_simple::type()); + } + + const o_simple &o_tuple_item::doublePayload() const + { + return getDynFirst(o_simple::type()); + } + + const o_string &o_tuple_item::stringPayload() const + { + return getDynFirst(o_string::type()); + } + + const o_binary &o_tuple_item::bytesPayload() const + { + return getDynFirst(o_binary::type()); + } + + const o_binary &o_tuple_item::embeddedPayload() const + { + return getDynFirst(o_binary::type()); + } + + void o_tuple_item::throwUnsupportedItemKind() + { + THROWF(db0::InternalException) << "Unsupported tuple item kind"; + } + + template + const o_tuple_item &o_tuple::const_iterator::operator*() const + { + return *m_item; + } + + template + const o_tuple_item *o_tuple::const_iterator::operator->() const + { + return m_item; + } + + template + typename o_tuple::const_iterator &o_tuple::const_iterator::operator++() + { + m_item = reinterpret_cast( + reinterpret_cast(m_item) + m_item->sizeOf() + ); + return *this; + } + + template + bool o_tuple::const_iterator::operator==(const const_iterator &other) const + { + return m_item == other.m_item; + } + + template + bool o_tuple::const_iterator::operator!=(const const_iterator &other) const + { + return m_item != other.m_item; + } + + template + o_tuple::const_iterator::const_iterator(const o_tuple_item *item) + : m_item(item) + { + } + + template + o_tuple::o_tuple(const std::vector &elements) + { + auto elementsByteSize = static_cast(measureElements(elements)); + Builder builder(*this, static_cast(elements.size()), elementsByteSize); + for (const auto &element: elements) { + builder.add(element); + } + builder.finish(); + } + + template + std::size_t o_tuple::size() const + { + return count().value(); + } + + template + std::size_t o_tuple::elementsByteSize() const + { + if constexpr (!compact) { + const auto &elementsByteSizeMember = this->getDynAfter(count(), db0::packed_int32::type()); + return elementsByteSizeMember.value(); + } else { + std::size_t result = 0; + auto *cursor = beginOfItems(); + for (std::uint32_t i = 0; i < size(); ++i) { + const auto &tupleItem = o_tuple_item::__const_ref(cursor); + auto itemSize = tupleItem.sizeOf(); + result += itemSize; + cursor += itemSize; + } + return result; + } + } + + template + bool o_tuple::empty() const + { + return size() == 0; + } + + template + const o_tuple_item &o_tuple::item(std::size_t index) const + { + auto it = begin(); + for (std::size_t i = 0; i < index; ++i) { + ++it; + } + return *it; + } + + template + typename o_tuple::const_iterator o_tuple::begin() const + { + return const_iterator(reinterpret_cast(beginOfItems())); + } + + template + typename o_tuple::const_iterator o_tuple::end() const + { + return const_iterator(reinterpret_cast(beginOfItems() + elementsByteSize())); + } + + template + std::size_t o_tuple::sizeOf() const + { + return safeSizeOf(reinterpret_cast(this)); + } + + template + std::size_t o_tuple::measure(const std::vector &elements) + { + auto elementsByteSize = measureElements(elements); + return Builder::measure(static_cast(elements.size()), static_cast(elementsByteSize)); + } + + template + o_tuple::Builder::Builder(void *buf, std::uint32_t count, std::uint32_t elementsByteSize) + : m_tuple(*(new(buf) o_tuple())) + , m_arranger(m_tuple.arrangeMembers()) + , m_expectedCount(count) + { + m_arranger = m_arranger(db0::packed_int32::type(), count); + if constexpr (!compact) { + m_arranger = m_arranger(db0::packed_int32::type(), elementsByteSize); + } + } + + template + o_tuple::Builder::Builder(o_tuple &tuple, std::uint32_t count, std::uint32_t elementsByteSize) + : m_tuple(tuple) + , m_arranger(m_tuple.arrangeMembers()) + , m_expectedCount(count) + { + m_arranger = m_arranger(db0::packed_int32::type(), count); + if constexpr (!compact) { + m_arranger = m_arranger(db0::packed_int32::type(), elementsByteSize); + } + } + + template + void o_tuple::Builder::add(const Element &element) + { + m_arranger = m_arranger(o_tuple_item::type(), element); + ++m_addedCount; + } + + template + o_tuple &o_tuple::Builder::finish() + { + if (m_addedCount != m_expectedCount) { + THROWF(db0::InternalException) << "Tuple builder received unexpected element count"; + } + return m_tuple; + } + + template + std::size_t o_tuple::Builder::measure(std::uint32_t count, std::uint32_t elementsByteSize) + { + if constexpr (compact) { + return super_t::measureMembers() + (db0::packed_int32::type(), count) + (static_cast(elementsByteSize)); + } else { + return super_t::measureMembers() + (db0::packed_int32::type(), count) + (db0::packed_int32::type(), elementsByteSize) + (static_cast(elementsByteSize)); + } + } + + template + std::size_t o_tuple::Builder::measureGrowth( + std::uint32_t count, std::uint32_t elementsByteSize, std::uint32_t addedElementByteSize + ) + { + auto newCount = count + 1; + auto newElementsByteSize = elementsByteSize + addedElementByteSize; + if (newCount <= count || newElementsByteSize < elementsByteSize) { + THROWF(db0::InternalException) << "Tuple builder growth exceeds uint32 range"; + } + if (count == 0) { + return 0; + } + auto newSize = measure(newCount, newElementsByteSize); + if (count == 1) { + return newSize; + } + return newSize - measure(count, elementsByteSize); + } + + template + const db0::packed_int32 &o_tuple::count() const + { + return this->getDynFirst(db0::packed_int32::type()); + } + + template + const std::byte *o_tuple::beginOfItems() const + { + if constexpr (compact) { + const auto &countMember = count(); + return reinterpret_cast(&countMember) + countMember.sizeOf(); + } else { + const auto &elementsByteSizeMemberRef = this->getDynAfter(count(), db0::packed_int32::type()); + return reinterpret_cast(&elementsByteSizeMemberRef) + elementsByteSizeMemberRef.sizeOf(); + } + } + + template + std::size_t o_tuple::measureElements(const std::vector &elements) + { + std::size_t size = 0; + for (const auto &element: elements) { + size += o_tuple_item::measure(element); + } + return size; + } + + template class o_tuple; + template class o_tuple; + +} diff --git a/src/dbzero/object_model/tuple/o_tuple.hpp b/src/dbzero/object_model/tuple/o_tuple.hpp new file mode 100644 index 00000000..6147e6f7 --- /dev/null +++ b/src/dbzero/object_model/tuple/o_tuple.hpp @@ -0,0 +1,261 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +namespace db0::object_model +{ + +DB0_PACKED_BEGIN + class DB0_PACKED_ATTR o_tuple_item: public db0::o_base + { + protected: + using super_t = db0::o_base; + friend super_t; + + public: + struct Element + { + struct BytesView + { + using Writer = void (*)(void *, const void *); + + const std::byte *m_data = nullptr; + std::size_t m_size = 0; + Writer m_writer = nullptr; + const void *m_source = nullptr; + }; + + StorageClass m_kind = StorageClass::UNDEFINED; + union Payload + { + std::int64_t m_int_value; + std::uint64_t m_uint64_value; + double m_double_value; + bool m_bool_value; + std::string_view m_string_value; + BytesView m_bytes_value; + + Payload(); + } m_payload; + + static Element none(); + static Element boolean(bool value); + static Element integer(std::int64_t value); + static Element floating(double value); + static Element string(std::string_view value); + static Element bytes(const std::byte *data, std::size_t size); + static Element bytes(const std::vector &value); + static Element timestamp(std::uint64_t value); + static Element date(std::uint64_t value); + static Element datetime(std::uint64_t value); + static Element datetimeTz(std::uint64_t value); + static Element time(std::uint64_t value); + static Element timeTz(std::uint64_t value); + static Element decimal(std::uint64_t value); + static Element embeddedTuple(const void *data, std::size_t size); + static Element embeddedSet(const void *data, std::size_t size); + static Element embeddedDict(const void *data, std::size_t size); + static Element embeddedObject(const void *data, std::size_t size); + static Element embeddedTuple(std::size_t size, BytesView::Writer writer, const void *source); + static Element embeddedSet(std::size_t size, BytesView::Writer writer, const void *source); + static Element embeddedDict(std::size_t size, BytesView::Writer writer, const void *source); + + std::int64_t intValue() const; + std::uint64_t uint64Value() const; + double doubleValue() const; + bool boolValue() const; + std::string stringValue() const; + const std::byte *bytesData() const; + std::size_t bytesSize() const; + }; + + explicit o_tuple_item(const Element &element); + + StorageClass itemKind() const; + std::size_t sizeOf() const; + + static std::size_t measure(const Element &element); + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + advancePayload(super_t::__const_ref(buf).m_kind, cursor); + return cursor - start; + } + + private: + StorageClass m_kind = StorageClass::UNDEFINED; + + void arrangePayload(const Element &element); + + public: + const o_simple &boolPayload() const; + const o_simple &intPayload() const; + const packed_int32 &packedIntPayload() const; + const o_simple &uint64Payload() const; + const o_simple &doublePayload() const; + const o_string &stringPayload() const; + const o_binary &bytesPayload() const; + const o_binary &embeddedPayload() const; + + private: + template static void advancePayload(StorageClass kind, BufT &cursor) + { + switch (kind) { + case StorageClass::NONE: + return; + case StorageClass::BOOLEAN: + cursor += o_simple::safeSizeOf(cursor); + return; + case StorageClass::INT64: + cursor += o_simple::safeSizeOf(cursor); + return; + case StorageClass::PACKED_INT32: + cursor += packed_int32::safeSizeOf(cursor); + return; + case StorageClass::FP_NUMERIC64: + cursor += o_simple::safeSizeOf(cursor); + return; + case StorageClass::STRING_REF: + cursor += o_string::safeSizeOf(cursor); + return; + case StorageClass::DB0_BYTES: + cursor += o_binary::safeSizeOf(cursor); + return; + case StorageClass::DB0_TUPLE: + case StorageClass::DB0_SET: + case StorageClass::DB0_DICT: + case StorageClass::OBJECT_REF: + cursor += o_binary::safeSizeOf(cursor); + return; + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + cursor += o_simple::safeSizeOf(cursor); + return; + default: + throwUnsupportedItemKind(); + } + } + + static void throwUnsupportedItemKind(); + }; +DB0_PACKED_END + +DB0_PACKED_BEGIN + template + class DB0_PACKED_ATTR o_tuple: public db0::o_base, 0, false> + { + protected: + using super_t = db0::o_base, 0, false>; + friend super_t; + + public: + using Element = o_tuple_item::Element; + + class const_iterator + { + public: + const_iterator() = default; + + const o_tuple_item &operator*() const; + const o_tuple_item *operator->() const; + const_iterator &operator++(); + bool operator==(const const_iterator &other) const; + bool operator!=(const const_iterator &other) const; + + private: + friend class o_tuple; + + explicit const_iterator(const o_tuple_item *item); + + const o_tuple_item *m_item = nullptr; + }; + + explicit o_tuple(const std::vector &elements); + + std::size_t size() const; + std::size_t elementsByteSize() const; + bool empty() const; + const o_tuple_item &item(std::size_t index) const; + const_iterator begin() const; + const_iterator end() const; + std::size_t sizeOf() const; + + static std::size_t measure(const std::vector &elements); + + class Builder + { + public: + Builder(void *buf, std::uint32_t count, std::uint32_t elementsByteSize); + Builder(o_tuple &tuple, std::uint32_t count, std::uint32_t elementsByteSize); + + void add(const Element &element); + o_tuple &finish(); + + static std::size_t measure(std::uint32_t count, std::uint32_t elementsByteSize); + static std::size_t measureGrowth( + std::uint32_t count, std::uint32_t elementsByteSize, std::uint32_t addedElementByteSize + ); + + private: + o_tuple &m_tuple; + db0::Foundation::Arranger m_arranger; + std::uint32_t m_expectedCount = 0; + std::uint32_t m_addedCount = 0; + }; + + template static std::size_t safeSizeOf(BufT buf) + { + auto start = buf; + auto cursor = buf; + cursor += super_t::baseSize(); + + cursor += db0::packed_int32::safeSizeOf(cursor); + if constexpr (compact) { + auto count = db0::packed_int32::__const_ref(start + super_t::baseSize()).value(); + for (std::uint32_t i = 0; i < count; ++i) { + cursor += o_tuple_item::safeSizeOf(cursor); + } + } else { + auto elementsByteSizeAt = cursor; + cursor += db0::packed_int32::safeSizeOf(cursor); + auto elementsByteSize = db0::packed_int32::__const_ref(elementsByteSizeAt).value(); + cursor += elementsByteSize; + } + return cursor - start; + } + + protected: + o_tuple() = default; + + private: + const db0::packed_int32 &count() const; + const std::byte *beginOfItems() const; + static std::size_t measureElements(const std::vector &elements); + }; +DB0_PACKED_END + + // Used for embedded buckets where the containing object already stores the byte size. + using o_compact_tuple = o_tuple; + +} diff --git a/src/dbzero/object_model/value/StorageClass.cpp b/src/dbzero/object_model/value/StorageClass.cpp index 4c8e1932..c9a2d109 100644 --- a/src/dbzero/object_model/value/StorageClass.cpp +++ b/src/dbzero/object_model/value/StorageClass.cpp @@ -165,6 +165,7 @@ namespace std case StorageClass::PACK_2: return os << "PACK_2"; case StorageClass::OBJECT_WEAK_REF: return os << "OBJECT_WEAK_REF"; case StorageClass::OBJECT_LONG_WEAK_REF: return os << "OBJECT_LONG_WEAK_REF"; + case StorageClass::PACKED_INT32: return os << "PACKED_INT32"; case StorageClass::INVALID: return os << "INVALID"; default: return os << "ERROR!"; } @@ -204,4 +205,4 @@ namespace db0 return getStorageClass(pre_storage_class); } -} \ No newline at end of file +} diff --git a/src/dbzero/object_model/value/StorageClass.hpp b/src/dbzero/object_model/value/StorageClass.hpp index 920309fc..7effdb45 100644 --- a/src/dbzero/object_model/value/StorageClass.hpp +++ b/src/dbzero/object_model/value/StorageClass.hpp @@ -116,10 +116,12 @@ namespace db0::object_model DELETED = static_cast(PreStorageClass::DELETED), CALLABLE = static_cast(PreStorageClass::CALLABLE), DB0_WEAK_SET = static_cast(PreStorageClass::DB0_WEAK_SET), + // Embedded immutable integer encoded with packed-int storage. + PACKED_INT32 = std::numeric_limits::max() - 2, // weak reference to other (Memo) instance from a foreign prefix - OBJECT_LONG_WEAK_REF = static_cast(PreStorageClass::COUNT), + OBJECT_LONG_WEAK_REF = std::numeric_limits::max() - 1, // COUNT used to determine size of the StorageClass associated arrays - COUNT = static_cast(PreStorageClass::COUNT) + 1, + COUNT = std::numeric_limits::max(), // invalid / reserved value, never used in objects INVALID = std::numeric_limits::max() }; diff --git a/tests/unit_tests/EmbeddedDictTest.cpp b/tests/unit_tests/EmbeddedDictTest.cpp new file mode 100644 index 00000000..b5ee5174 --- /dev/null +++ b/tests/unit_tests/EmbeddedDictTest.cpp @@ -0,0 +1,464 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +namespace tests +{ + + using namespace db0; + using namespace db0::object_model; + using namespace db0::python; + + class EmbeddedDictTest: public MemspaceTestBase + { + }; + + static void throwDecodeError() + { + throw std::runtime_error("decode error"); + } + + static std::int64_t asInt64(const o_tuple_item &item) + { + if (item.itemKind() == StorageClass::PACKED_INT32) { + return static_cast(item.packedIntPayload().value()); + } + return item.intPayload().value(); + } + + static bool asBool(const o_tuple_item &item) + { + return item.boolPayload().value(); + } + + static std::string asString(const o_tuple_item &item) + { + return item.stringPayload().toString(); + } + + static std::vector asBytes(const o_tuple_item &item) + { + const auto &payload = item.bytesPayload(); + return { payload.begin(), payload.end() }; + } + + static std::string bytesKey(const std::byte *data, std::size_t size) + { + std::ostringstream key; + for (std::size_t i = 0; i < size; ++i) { + key << std::hex << std::setw(2) << std::setfill('0') << static_cast(data[i]); + } + return key.str(); + } + + static std::string elementKey(const o_dict::Element &element) + { + std::ostringstream key; + key << static_cast(element.m_kind) << ':'; + switch (element.m_kind) { + case StorageClass::NONE: + key << "none"; + break; + case StorageClass::BOOLEAN: + key << element.boolValue(); + break; + case StorageClass::INT64: + case StorageClass::PACKED_INT32: + key << element.intValue(); + break; + case StorageClass::FP_NUMERIC64: + key << std::setprecision(17) << element.doubleValue(); + break; + case StorageClass::STRING_REF: + key << element.stringValue(); + break; + case StorageClass::DB0_BYTES: + key << bytesKey(element.bytesData(), element.bytesSize()); + break; + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + key << element.uint64Value(); + break; + default: + throw std::runtime_error("unsupported test dict element kind"); + } + return key.str(); + } + + static std::string itemKey(const o_tuple_item &item) + { + std::ostringstream key; + key << static_cast(item.itemKind()) << ':'; + switch (item.itemKind()) { + case StorageClass::NONE: + key << "none"; + break; + case StorageClass::BOOLEAN: + key << item.boolPayload().value(); + break; + case StorageClass::INT64: + key << item.intPayload().value(); + break; + case StorageClass::PACKED_INT32: + key << item.packedIntPayload().value(); + break; + case StorageClass::FP_NUMERIC64: + key << std::setprecision(17) << item.doublePayload().value(); + break; + case StorageClass::STRING_REF: + key << item.stringPayload().toString(); + break; + case StorageClass::DB0_BYTES: + key << bytesKey(item.bytesPayload().begin(), item.bytesPayload().size()); + break; + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + key << item.uint64Payload().value(); + break; + default: + throw std::runtime_error("unsupported test dict item kind"); + } + return key.str(); + } + + static void assertItemEqualsElement(const o_tuple_item &item, const o_dict::Element &element) + { + ASSERT_EQ(itemKey(item), elementKey(element)); + } + + static std::size_t testHashIndexCapacity(std::size_t count) + { + if (count == 0) { + return 0; + } + + std::size_t capacity = 1; + while (capacity < count * 2) { + capacity <<= 1; + } + return capacity; + } + + TEST_F( EmbeddedDictTest , testDictStoresSimpleKeyValuePairs ) + { + auto memspace = getMemspace(); + const std::vector bytes = { std::byte{0x01}, std::byte{0x02}, std::byte{0x03} }; + o_dict::ElementMap elements; + elements[o_dict::Element::integer(42)] = o_dict::Element::string("answer"); + elements[o_dict::Element::string("flag")] = o_dict::Element::boolean(true); + elements[o_dict::Element::bytes(bytes)] = o_dict::Element::integer(-7); + + v_object dict(memspace, elements); + + ASSERT_EQ(dict->size(), 3u); + ASSERT_FALSE(dict->empty()); + ASSERT_TRUE(dict->contains(o_dict::Element::integer(42))); + ASSERT_TRUE(dict->contains(o_dict::Element::string("flag"))); + ASSERT_TRUE(dict->contains(o_dict::Element::bytes(bytes))); + ASSERT_FALSE(dict->contains(o_dict::Element::string("missing"))); + ASSERT_EQ(asString(*dict->get(o_dict::Element::integer(42))), "answer"); + ASSERT_TRUE(asBool(*dict->get(o_dict::Element::string("flag")))); + ASSERT_EQ(asInt64(*dict->get(o_dict::Element::bytes(bytes))), -7); + } + + TEST_F( EmbeddedDictTest , testDictMapInputCollapsesDuplicateKeys ) + { + auto memspace = getMemspace(); + o_dict::ElementMap elements; + elements[o_dict::Element::integer(7)] = o_dict::Element::string("first"); + elements[o_dict::Element::integer(7)] = o_dict::Element::string("second"); + + v_object dict(memspace, elements); + + ASSERT_EQ(dict->size(), 1u); + ASSERT_EQ(asString(*dict->get(o_dict::Element::integer(7))), "second"); + } + + TEST_F( EmbeddedDictTest , testDictMeasureSizeOfAndSafeSizeOf ) + { + auto memspace = getMemspace(); + const std::vector bytes = { std::byte{0xaa}, std::byte{0xbb} }; + o_dict::ElementMap elements; + elements[o_dict::Element::string("alpha")] = o_dict::Element::string("variable length value"); + elements[o_dict::Element::integer(11)] = o_dict::Element::bytes(bytes); + elements[o_dict::Element::date(20260519)] = o_dict::Element::decimal(123456789); + + v_object dict(memspace, elements); + auto *begin = reinterpret_cast(dict.getData()); + auto measured = o_dict::measure(elements); + + ASSERT_EQ(dict->size(), 3u); + ASSERT_EQ(dict->sizeOf(), measured); + ASSERT_EQ(o_dict::safeSizeOf(begin), measured); + ASSERT_EQ(o_dict::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + measured)), measured); + ASSERT_EQ(asString(*dict->get(o_dict::Element::string("alpha"))), "variable length value"); + ASSERT_EQ(asBytes(*dict->get(o_dict::Element::integer(11))), bytes); + ASSERT_EQ(dict->get(o_dict::Element::date(20260519))->uint64Payload().value(), 123456789u); + } + + TEST_F( EmbeddedDictTest , testSafeSizeOfRejectsTruncatedDict ) + { + auto memspace = getMemspace(); + o_dict::ElementMap elements; + elements[o_dict::Element::string("first")] = o_dict::Element::string("value"); + elements[o_dict::Element::integer(99)] = o_dict::Element::integer(100); + + v_object dict(memspace, elements); + auto *begin = reinterpret_cast(dict.getData()); + auto size = dict->sizeOf(); + + ASSERT_EQ(o_dict::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + size)), size); + for (std::size_t truncatedSize = 0; truncatedSize < size; ++truncatedSize) { + ASSERT_THROW( + o_dict::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + truncatedSize)), + std::runtime_error + ) << "truncated size: " << truncatedSize; + } + } + + TEST_F( EmbeddedDictTest , testDictCollisionBucketsUseParallelKeyValueTuples ) + { + auto memspace = getMemspace(); + constexpr std::size_t collisionCount = 16; + auto capacity = testHashIndexCapacity(collisionCount); + o_dict::ElementHash hash; + auto collisionSlot = hash(o_dict::Element::integer(17)) % capacity; + + o_dict::ElementMap elements; + for (std::int64_t candidate = 0; elements.size() < collisionCount; ++candidate) { + auto key = o_dict::Element::integer(candidate); + if (hash(key) % capacity == collisionSlot) { + elements[key] = o_dict::Element::integer(candidate * 10); + } + } + + v_object dict(memspace, elements); + + ASSERT_EQ(dict->size(), collisionCount); + ASSERT_EQ(dict->sizeOf(), o_dict::measure(elements)); + ASSERT_EQ(o_dict::safeSizeOf(reinterpret_cast(dict.getData())), dict->sizeOf()); + for (const auto &[key, value]: elements) { + ASSERT_TRUE(dict->contains(key)); + ASSERT_EQ(asInt64(*dict->get(key)), value.intValue()); + } + ASSERT_FALSE(dict->contains(o_dict::Element::integer(999999))); + } + + TEST_F( EmbeddedDictTest , testDictIterationVisitsStoredPairs ) + { + auto memspace = getMemspace(); + o_dict::ElementMap elements; + elements[o_dict::Element::integer(1)] = o_dict::Element::string("one"); + elements[o_dict::Element::integer(2)] = o_dict::Element::string("two"); + elements[o_dict::Element::integer(3)] = o_dict::Element::string("three"); + + v_object dict(memspace, elements); + + std::unordered_set keys; + std::size_t count = 0; + for (auto it = dict->begin(); it != dict->end(); ++it) { + keys.insert(asInt64(it->key())); + ASSERT_NE(it->value().itemKind(), StorageClass::UNDEFINED); + ++count; + } + + ASSERT_EQ(count, elements.size()); + ASSERT_EQ(keys.size(), elements.size()); + ASSERT_TRUE(keys.find(1) != keys.end()); + ASSERT_TRUE(keys.find(2) != keys.end()); + ASSERT_TRUE(keys.find(3) != keys.end()); + } + + TEST_F( EmbeddedDictTest , testComplexDictContainsAndIterationWithMixedTypes ) + { + auto memspace = getMemspace(); + o_dict::ElementMap elements; + std::vector keyStrings; + std::vector valueStrings; + std::vector> keyBytes; + std::vector> valueBytes; + keyStrings.reserve(80); + valueStrings.reserve(80); + keyBytes.reserve(40); + valueBytes.reserve(40); + + elements[o_dict::Element::none()] = o_dict::Element::string("none-value"); + elements[o_dict::Element::boolean(false)] = o_dict::Element::integer(-1); + elements[o_dict::Element::boolean(true)] = o_dict::Element::integer(1); + + for (std::int64_t i = 0; i < 60; ++i) { + keyStrings.push_back("complex-key-int-value-" + std::to_string(i)); + elements[o_dict::Element::integer((i * 7919) - 50000)] = o_dict::Element::string(keyStrings.back()); + } + for (std::size_t i = 0; i < 40; ++i) { + keyStrings.push_back("complex-dict-key-" + std::to_string(i) + "-" + std::string(i % 13, 'k')); + valueBytes.push_back({ + static_cast(i & 0xff), + static_cast((i * 3) & 0xff), + static_cast((i * 5) & 0xff), + static_cast((i * 7) & 0xff) + }); + elements[o_dict::Element::string(keyStrings.back())] = o_dict::Element::bytes(valueBytes.back()); + } + for (std::size_t i = 0; i < 24; ++i) { + keyBytes.push_back({ + static_cast((i + 1) & 0xff), + static_cast((i * 11) & 0xff), + static_cast((i * 17) & 0xff) + }); + valueStrings.push_back("bytes-key-value-" + std::to_string(i)); + elements[o_dict::Element::bytes(keyBytes.back())] = o_dict::Element::string(valueStrings.back()); + } + for (std::size_t i = 0; i < 16; ++i) { + elements[o_dict::Element::floating(static_cast(i) + 0.25)] = + o_dict::Element::floating(static_cast(i) + 0.75); + } + for (std::uint64_t i = 0; i < 4; ++i) { + elements[o_dict::Element::timestamp(100000 + i)] = o_dict::Element::date(200000 + i); + elements[o_dict::Element::date(300000 + i)] = o_dict::Element::datetime(400000 + i); + elements[o_dict::Element::datetime(500000 + i)] = o_dict::Element::datetimeTz(600000 + i); + elements[o_dict::Element::time(700000 + i)] = o_dict::Element::timeTz(800000 + i); + elements[o_dict::Element::decimal(900000 + i)] = o_dict::Element::integer(static_cast(i)); + } + + constexpr std::size_t forcedCollisionCount = 24; + o_dict::ElementHash hash; + auto finalCapacity = testHashIndexCapacity(elements.size() + forcedCollisionCount); + auto collisionSlot = hash(o_dict::Element::integer(17)) % finalCapacity; + std::size_t foundCollisions = 0; + for (std::int64_t candidate = 1000000; foundCollisions < forcedCollisionCount; ++candidate) { + auto key = o_dict::Element::integer(candidate); + if (hash(key) % finalCapacity != collisionSlot) { + continue; + } + auto inserted = elements.emplace(key, o_dict::Element::integer(candidate * 2)); + if (inserted.second) { + ++foundCollisions; + } + } + ASSERT_GE(elements.size(), 100u); + ASSERT_EQ(testHashIndexCapacity(elements.size()), finalCapacity); + + v_object dict(memspace, elements); + + ASSERT_EQ(dict->size(), elements.size()); + ASSERT_EQ(dict->sizeOf(), o_dict::measure(elements)); + ASSERT_EQ(o_dict::safeSizeOf(reinterpret_cast(dict.getData())), dict->sizeOf()); + + std::unordered_set expectedPairKeys; + expectedPairKeys.reserve(elements.size()); + for (const auto &[key, value]: elements) { + expectedPairKeys.insert(elementKey(key) + "=>" + elementKey(value)); + ASSERT_TRUE(dict->contains(key)) << elementKey(key); + ASSERT_NE(dict->get(key), nullptr) << elementKey(key); + assertItemEqualsElement(*dict->get(key), value); + } + ASSERT_FALSE(dict->contains(o_dict::Element::integer(999999999))); + ASSERT_FALSE(dict->contains(o_dict::Element::string("complex-dict-missing"))); + + std::unordered_set iteratedPairKeys; + iteratedPairKeys.reserve(dict->size()); + std::size_t iteratedCount = 0; + for (auto it = dict->begin(); it != dict->end(); ++it) { + auto pairKey = itemKey(it->key()) + "=>" + itemKey(it->value()); + ASSERT_TRUE(expectedPairKeys.find(pairKey) != expectedPairKeys.end()) << pairKey; + ASSERT_TRUE(iteratedPairKeys.insert(pairKey).second) << pairKey; + ++iteratedCount; + } + + ASSERT_EQ(iteratedCount, expectedPairKeys.size()); + ASSERT_EQ(iteratedPairKeys, expectedPairKeys); + } + + TEST_F( EmbeddedDictTest , testPyDictConstructsFromPythonDict ) + { + Py_Initialize(); + auto memspace = getMemspace(); + auto pyDict = Py_OWN(PyDict_New()); + ASSERT_NE(pyDict.get(), nullptr); + ASSERT_EQ(PySafeDict_SetItem( + *pyDict, Py_OWN(PyLong_FromLongLong(42)), Py_OWN(PyUnicode_FromString("python-dict")) + ), 0); + ASSERT_EQ(PySafeDict_SetItem( + *pyDict, Py_OWN(PyUnicode_FromString("flag")), Py_OWN(Py_NewRef(Py_True)) + ), 0); + ASSERT_EQ(PySafeDict_SetItem( + *pyDict, Py_OWN(PyBytes_FromStringAndSize("\x01\x02", 2)), Py_OWN(PyLong_FromLongLong(-7)) + ), 0); + + v_object dict(memspace, *pyDict); + + ASSERT_EQ(o_py_dict::measure(*pyDict), dict->sizeOf()); + ASSERT_EQ(dict->size(), 3u); + ASSERT_EQ(asString(*dict->get(o_dict::Element::integer(42))), "python-dict"); + ASSERT_TRUE(asBool(*dict->get(o_dict::Element::string("flag")))); + ASSERT_EQ(asInt64(*dict->get(o_dict::Element::bytes( + std::vector{ std::byte{0x01}, std::byte{0x02} } + ))), -7); + ASSERT_FALSE(dict->contains(o_dict::Element::integer(99))); + } + + TEST_F( EmbeddedDictTest , testPyDictConstructsFromDateTimeAndDecimal ) + { + Py_Initialize(); + db0::python::init_datetime(); + if (!PyDateTimeAPI) { + PyDateTime_IMPORT; + } + ASSERT_NE(PyDateTimeAPI, nullptr); + auto memspace = getMemspace(); + auto pyDict = Py_OWN(PyDict_New()); + auto decimalKey = Py_OWN(PyObject_CallFunction(db0::python::getDecimalClass(), "s", "123.45")); + auto decimalValue = Py_OWN(PyObject_CallFunction(db0::python::getDecimalClass(), "s", "987.65")); + ASSERT_NE(decimalKey.get(), nullptr); + ASSERT_NE(decimalValue.get(), nullptr); + ASSERT_EQ(PySafeDict_SetItem( + *pyDict, Py_OWN(PyDate_FromDate(2026, 5, 19)), Py_OWN(PyLong_FromLongLong(123)) + ), 0); + ASSERT_EQ(PySafeDict_SetItem( + *pyDict, Py_OWN(Py_NewRef(*decimalKey)), Py_OWN(Py_NewRef(*decimalValue)) + ), 0); + + v_object dict(memspace, *pyDict); + auto dateKey = db0::python::pyDateToUint64(Py_OWN(PyDate_FromDate(2026, 5, 19)).get()); + auto decimalKeyValue = db0::python::pyDecimalToUint64(*decimalKey); + auto decimalStoredValue = db0::python::pyDecimalToUint64(*decimalValue); + + ASSERT_EQ(dict->size(), 2u); + ASSERT_EQ(asInt64(*dict->get(o_dict::Element::date(dateKey))), 123); + ASSERT_EQ(dict->get(o_dict::Element::decimal(decimalKeyValue))->uint64Payload().value(), decimalStoredValue); + ASSERT_EQ(o_py_dict::measure(*pyDict), dict->sizeOf()); + } + +} diff --git a/tests/unit_tests/EmbeddedObjectTest.cpp b/tests/unit_tests/EmbeddedObjectTest.cpp new file mode 100644 index 00000000..eee26230 --- /dev/null +++ b/tests/unit_tests/EmbeddedObjectTest.cpp @@ -0,0 +1,297 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace tests +{ + + using namespace db0; + using namespace db0::object_model; + + class EmbeddedObjectTest: public MemspaceTestBase + { + }; + + static void throwDecodeError() + { + throw std::runtime_error("decode error"); + } + + static ImmutableObjectInitializer &makeInitializer(ObjectInitializerManager &manager, int &object) + { + manager.addInitializerFor(object, std::shared_ptr()); + auto *initializer = dynamic_cast(manager.findInitializer(object)); + if (!initializer) { + throw std::runtime_error("immutable initializer not found"); + } + return *initializer; + } + + TEST_F( EmbeddedObjectTest , testEmbeddedObjectStoresInitializerPlannedFixedTables ) + { + auto memspace = getMemspace(); + int sourceObject = 0; + ObjectInitializerManager manager; + auto &initializer = makeInitializer(manager, sourceObject); + initializer.set({0, 0}, StorageClass::INT64, Value(42)); + initializer.set({1, 0}, StorageClass::PACK_2, Value((Value::FALSE << 0) | (Value::TRUE << 2))); + initializer.set({100, 0}, StorageClass::DATE, Value(20260519)); + initializer.set({200, 0}, StorageClass::DECIMAL, Value(123456789)); + + v_object object(memspace, 77u, initializer); + + ASSERT_EQ(object->getClassRef(), 77u); + ASSERT_EQ(object->pos_vt().offset(), 0u); + ASSERT_EQ(object->pos_vt().size(), 2u); + auto intValue = object->fixedValue(0); + ASSERT_TRUE(intValue.has_value()); + ASSERT_EQ(intValue->m_kind, StorageClass::INT64); + ASSERT_EQ(intValue->m_value, 42u); + + auto falseValue = object->fixedValue(1, 0); + ASSERT_TRUE(falseValue.has_value()); + ASSERT_EQ(falseValue->m_kind, StorageClass::BOOLEAN); + ASSERT_EQ(falseValue->m_value, 0u); + auto trueValue = object->fixedValue(1, 1); + ASSERT_TRUE(trueValue.has_value()); + ASSERT_EQ(trueValue->m_kind, StorageClass::BOOLEAN); + ASSERT_EQ(trueValue->m_value, 1u); + + auto dateValue = object->fixedValue(100); + ASSERT_TRUE(dateValue.has_value()); + ASSERT_EQ(dateValue->m_kind, StorageClass::DATE); + ASSERT_EQ(dateValue->m_value, 20260519u); + auto decimalValue = object->fixedValue(200); + ASSERT_TRUE(decimalValue.has_value()); + ASSERT_EQ(decimalValue->m_kind, StorageClass::DECIMAL); + ASSERT_EQ(decimalValue->m_value, 123456789u); + + std::pair indexedValue; + ASSERT_TRUE(object->index_vt().find(100, indexedValue)); + ASSERT_EQ(indexedValue.first, StorageClass::DATE); + ASSERT_EQ(indexedValue.second, Value(20260519)); + ASSERT_TRUE(object->index_vt().find(200, indexedValue)); + ASSERT_EQ(indexedValue.first, StorageClass::DECIMAL); + ASSERT_EQ(indexedValue.second, Value(123456789)); + + ASSERT_FALSE(object->fixedValue(999).has_value()); + } + + TEST_F( EmbeddedObjectTest , testEmbeddedObjectStoresVariableFieldsInDictMap ) + { + Py_Initialize(); + + auto memspace = getMemspace(); + int sourceObject = 0; + ObjectInitializerManager manager; + auto &initializer = makeInitializer(manager, sourceObject); + auto pyString = Py_OWN(PyUnicode_FromString("variable string")); + const char rawBytes[] = { 0x01, 0x02, 0x03 }; + auto pyBytes = Py_OWN(PyBytes_FromStringAndSize(rawBytes, sizeof(rawBytes))); + initializer.setObject( + {300, 0}, StorageClass::STRING_REF, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pyString.get()) + ); + initializer.setObject( + {301, 0}, StorageClass::DB0_BYTES, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pyBytes.get()) + ); + + v_object object(memspace, 88u, initializer); + + ASSERT_EQ(object->getClassRef(), 88u); + auto *stringValue = object->variableValue(300); + ASSERT_NE(stringValue, nullptr); + ASSERT_EQ(stringValue->itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(stringValue->stringPayload().toString(), "variable string"); + auto *bytesValue = object->variableValue(301); + ASSERT_NE(bytesValue, nullptr); + ASSERT_EQ(bytesValue->itemKind(), StorageClass::DB0_BYTES); + ASSERT_EQ(bytesValue->bytesPayload().size(), 3u); + ASSERT_EQ(bytesValue->bytesPayload().begin()[0], std::byte{0x01}); + ASSERT_EQ(bytesValue->bytesPayload().begin()[1], std::byte{0x02}); + ASSERT_EQ(bytesValue->bytesPayload().begin()[2], std::byte{0x03}); + ASSERT_EQ(object->variableValue(999), nullptr); + } + + TEST_F( EmbeddedObjectTest , testEmbeddedObjectStoresNestedTuplePayload ) + { + Py_Initialize(); + + auto memspace = getMemspace(); + int sourceObject = 0; + ObjectInitializerManager manager; + auto &initializer = makeInitializer(manager, sourceObject); + auto pyList = Py_OWN(PyList_New(2)); + db0::python::PySafeList_SetItem(pyList.get(), 0, Py_OWN(PyLong_FromLong(700))); + db0::python::PySafeList_SetItem(pyList.get(), 1, Py_OWN(PyUnicode_FromString("value"))); + initializer.setObject( + {400, 0}, StorageClass::DB0_LIST, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pyList.get()) + ); + + v_object object(memspace, 12u, initializer); + + auto *tupleValue = object->variableValue(400); + ASSERT_NE(tupleValue, nullptr); + ASSERT_EQ(tupleValue->itemKind(), StorageClass::DB0_TUPLE); + const auto &payload = tupleValue->embeddedPayload(); + const auto &tuple = o_tuple<>::__const_ref(payload.begin()); + ASSERT_EQ(tuple.size(), 2u); + ASSERT_EQ(tuple.item(0).packedIntPayload().value(), 700u); + ASSERT_EQ(tuple.item(1).stringPayload().toString(), "value"); + } + + TEST_F( EmbeddedObjectTest , testComplexEmbeddedObjectStoresMultipleNestedCollections ) + { + Py_Initialize(); + + auto memspace = getMemspace(); + int sourceObject = 0; + ObjectInitializerManager manager; + auto &initializer = makeInitializer(manager, sourceObject); + initializer.set({0, 0}, StorageClass::INT64, Value(123)); + initializer.set({20, 0}, StorageClass::DATE, Value(20260519)); + + auto pyList = Py_OWN(PyList_New(3)); + db0::python::PySafeList_SetItem(pyList.get(), 0, Py_OWN(PyLong_FromLong(7))); + db0::python::PySafeList_SetItem(pyList.get(), 1, Py_OWN(PyUnicode_FromString("seven"))); + db0::python::PySafeList_SetItem(pyList.get(), 2, Py_OWN(PyBool_FromLong(1))); + initializer.setObject( + {100, 0}, StorageClass::DB0_LIST, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pyList.get()) + ); + + auto pySet = Py_OWN(PySet_New(nullptr)); + db0::python::PySafeSet_Add(pySet.get(), Py_OWN(PyLong_FromLong(10))); + db0::python::PySafeSet_Add(pySet.get(), Py_OWN(PyUnicode_FromString("ten"))); + initializer.setObject( + {101, 0}, StorageClass::DB0_SET, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pySet.get()) + ); + + auto pyDict = Py_OWN(PyDict_New()); + db0::python::PySafeDict_SetItem( + pyDict.get(), Py_OWN(PyUnicode_FromString("name")), Py_OWN(PyUnicode_FromString("dbzero")) + ); + db0::python::PySafeDict_SetItem( + pyDict.get(), Py_OWN(PyUnicode_FromString("count")), Py_OWN(PyLong_FromLong(3)) + ); + initializer.setObject( + {102, 0}, StorageClass::DB0_DICT, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pyDict.get()) + ); + + auto measured = o_embedded_object::measure(144u, initializer); + v_object object(memspace, 144u, initializer); + + ASSERT_EQ(object->sizeOf(), measured); + ASSERT_EQ(object->getClassRef(), 144u); + auto fixedValue = object->fixedValue(0); + ASSERT_TRUE(fixedValue.has_value()); + ASSERT_EQ(fixedValue->m_kind, StorageClass::INT64); + ASSERT_EQ(fixedValue->m_value, 123u); + auto dateValue = object->fixedValue(20); + ASSERT_TRUE(dateValue.has_value()); + ASSERT_EQ(dateValue->m_kind, StorageClass::DATE); + ASSERT_EQ(dateValue->m_value, 20260519u); + + auto *listValue = object->variableValue(100); + ASSERT_NE(listValue, nullptr); + ASSERT_EQ(listValue->itemKind(), StorageClass::DB0_TUPLE); + const auto &embeddedTuple = o_tuple<>::__const_ref(listValue->embeddedPayload().begin()); + ASSERT_EQ(embeddedTuple.size(), 3u); + ASSERT_EQ(embeddedTuple.item(0).packedIntPayload().value(), 7u); + ASSERT_EQ(embeddedTuple.item(1).stringPayload().toString(), "seven"); + ASSERT_EQ(embeddedTuple.item(2).boolPayload().value(), true); + + auto *setValue = object->variableValue(101); + ASSERT_NE(setValue, nullptr); + ASSERT_EQ(setValue->itemKind(), StorageClass::DB0_SET); + const auto &embeddedSet = o_set::__const_ref(setValue->embeddedPayload().begin()); + ASSERT_EQ(embeddedSet.size(), 2u); + ASSERT_TRUE(embeddedSet.contains(o_set::Element::integer(10))); + ASSERT_TRUE(embeddedSet.contains(o_set::Element::string("ten"))); + + auto *dictValue = object->variableValue(102); + ASSERT_NE(dictValue, nullptr); + ASSERT_EQ(dictValue->itemKind(), StorageClass::DB0_DICT); + const auto &embeddedDict = o_dict::__const_ref(dictValue->embeddedPayload().begin()); + ASSERT_EQ(embeddedDict.size(), 2u); + auto *nameValue = embeddedDict.get(o_dict::Element::string("name")); + ASSERT_NE(nameValue, nullptr); + ASSERT_EQ(nameValue->stringPayload().toString(), "dbzero"); + auto *countValue = embeddedDict.get(o_dict::Element::string("count")); + ASSERT_NE(countValue, nullptr); + ASSERT_EQ(countValue->packedIntPayload().value(), 3u); + } + + TEST_F( EmbeddedObjectTest , testEmbeddedObjectMeasureSizeOfAndSafeSizeOf ) + { + Py_Initialize(); + + auto memspace = getMemspace(); + int sourceObject = 0; + ObjectInitializerManager manager; + auto &initializer = makeInitializer(manager, sourceObject); + initializer.set({4, 0}, StorageClass::INT64, Value(5)); + initializer.set({50, 0}, StorageClass::TIME, Value(600)); + auto pyString = Py_OWN(PyUnicode_FromString("payload")); + initializer.setObject( + {100, 0}, StorageClass::STRING_REF, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(pyString.get()) + ); + + v_object object(memspace, 99u, initializer); + auto *begin = reinterpret_cast(object.getData()); + auto measured = o_embedded_object::measure(99u, initializer); + + ASSERT_EQ(object->sizeOf(), measured); + ASSERT_EQ(o_embedded_object::safeSizeOf(begin), measured); + ASSERT_EQ(o_embedded_object::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + measured)), measured); + for (std::size_t truncatedSize = 0; truncatedSize < measured; ++truncatedSize) { + ASSERT_THROW( + o_embedded_object::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + truncatedSize)), + std::runtime_error + ) << "truncated size: " << truncatedSize; + } + } + + TEST_F( EmbeddedObjectTest , testEmbeddedObjectStorageClassValuesAreStable ) + { + ASSERT_EQ(static_cast(StorageClass::UNDEFINED), 0u); + ASSERT_EQ(static_cast(StorageClass::NONE), 1u); + ASSERT_EQ(static_cast(StorageClass::STRING_REF), 2u); + ASSERT_EQ(static_cast(StorageClass::INT64), 4u); + ASSERT_EQ(static_cast(StorageClass::PTIME64), 5u); + ASSERT_EQ(static_cast(StorageClass::FP_NUMERIC64), 6u); + ASSERT_EQ(static_cast(StorageClass::DATE), 7u); + ASSERT_EQ(static_cast(StorageClass::DATETIME), 8u); + ASSERT_EQ(static_cast(StorageClass::DATETIME_TZ), 9u); + ASSERT_EQ(static_cast(StorageClass::TIME), 10u); + ASSERT_EQ(static_cast(StorageClass::TIME_TZ), 11u); + ASSERT_EQ(static_cast(StorageClass::DECIMAL), 12u); + ASSERT_EQ(static_cast(StorageClass::OBJECT_REF), 13u); + ASSERT_EQ(static_cast(StorageClass::DB0_DICT), 15u); + ASSERT_EQ(static_cast(StorageClass::DB0_SET), 16u); + ASSERT_EQ(static_cast(StorageClass::DB0_TUPLE), 17u); + ASSERT_EQ(static_cast(StorageClass::DB0_BYTES), 23u); + ASSERT_EQ(static_cast(StorageClass::BOOLEAN), 28u); + ASSERT_EQ(static_cast(StorageClass::PACK_2), 29u); + ASSERT_EQ(static_cast(StorageClass::PACKED_INT32), 253u); + } + +} diff --git a/tests/unit_tests/EmbeddedSetTest.cpp b/tests/unit_tests/EmbeddedSetTest.cpp new file mode 100644 index 00000000..75f4d5b7 --- /dev/null +++ b/tests/unit_tests/EmbeddedSetTest.cpp @@ -0,0 +1,580 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +namespace tests +{ + + using namespace db0; + using namespace db0::object_model; + using namespace db0::python; + + class EmbeddedSetTest: public MemspaceTestBase + { + }; + + static void throwDecodeError() + { + throw std::runtime_error("decode error"); + } + + static std::uint32_t testHashBytes(const void *data, std::size_t size, std::uint32_t seed) + { + static const std::byte empty = std::byte{0}; + auto hash = db0::murmurhash64A(size == 0 ? &empty : data, size, seed); + return static_cast(hash ^ (hash >> 32)); + } + + static std::uint32_t testElementHash(const o_set::Element &element) + { + auto seedKind = element.m_kind == StorageClass::PACKED_INT32 ? StorageClass::INT64 : element.m_kind; + auto seed = 0x9e3779b9U ^ static_cast(seedKind); + switch (element.m_kind) { + case StorageClass::NONE: + return testHashBytes(nullptr, 0, seed); + case StorageClass::BOOLEAN: { + auto value = element.boolValue(); + return testHashBytes(&value, sizeof(value), seed); + } + case StorageClass::INT64: { + auto value = element.intValue(); + return testHashBytes(&value, sizeof(value), seed); + } + case StorageClass::PACKED_INT32: { + auto value = element.intValue(); + return testHashBytes(&value, sizeof(value), seed); + } + case StorageClass::FP_NUMERIC64: { + auto value = element.doubleValue(); + return testHashBytes(&value, sizeof(value), seed); + } + case StorageClass::STRING_REF: { + auto value = element.m_payload.m_string_value; + return testHashBytes(value.data(), value.size(), seed); + } + case StorageClass::DB0_BYTES: + return testHashBytes(element.bytesData(), element.bytesSize(), seed); + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: { + auto value = element.uint64Value(); + return testHashBytes(&value, sizeof(value), seed); + } + default: + throw std::runtime_error("unsupported test set item storage class"); + } + } + + static std::size_t testHashIndexCapacity(std::size_t count) + { + if (count == 0) { + return 0; + } + + std::size_t capacity = 1; + while (capacity < count * 2) { + capacity <<= 1; + } + return capacity; + } + + static std::string bytesKey(const std::byte *data, std::size_t size) + { + std::ostringstream key; + for (std::size_t i = 0; i < size; ++i) { + key << std::hex << std::setw(2) << std::setfill('0') << static_cast(data[i]); + } + return key.str(); + } + + static std::string elementKey(const o_set::Element &element) + { + std::ostringstream key; + key << static_cast(element.m_kind) << ':'; + switch (element.m_kind) { + case StorageClass::NONE: + key << "none"; + break; + case StorageClass::BOOLEAN: + key << element.boolValue(); + break; + case StorageClass::INT64: + case StorageClass::PACKED_INT32: + key << element.intValue(); + break; + case StorageClass::FP_NUMERIC64: + key << std::setprecision(17) << element.doubleValue(); + break; + case StorageClass::STRING_REF: + key << element.stringValue(); + break; + case StorageClass::DB0_BYTES: + key << bytesKey(element.bytesData(), element.bytesSize()); + break; + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + key << element.uint64Value(); + break; + default: + throw std::runtime_error("unsupported test set item storage class"); + } + return key.str(); + } + + static std::string itemKey(const o_set::Item &item) + { + std::ostringstream key; + key << static_cast(item.itemKind()) << ':'; + switch (item.itemKind()) { + case StorageClass::NONE: + key << "none"; + break; + case StorageClass::BOOLEAN: + key << item.boolPayload().value(); + break; + case StorageClass::INT64: + key << item.intPayload().value(); + break; + case StorageClass::PACKED_INT32: + key << item.packedIntPayload().value(); + break; + case StorageClass::FP_NUMERIC64: + key << std::setprecision(17) << item.doublePayload().value(); + break; + case StorageClass::STRING_REF: + key << item.stringPayload().toString(); + break; + case StorageClass::DB0_BYTES: + key << bytesKey(item.bytesPayload().begin(), item.bytesPayload().size()); + break; + case StorageClass::PTIME64: + case StorageClass::DATE: + case StorageClass::DATETIME: + case StorageClass::DATETIME_TZ: + case StorageClass::TIME: + case StorageClass::TIME_TZ: + case StorageClass::DECIMAL: + key << item.uint64Payload().value(); + break; + default: + throw std::runtime_error("unsupported test set item storage class"); + } + return key.str(); + } + + static void assertSetSizeWithin( + Memspace &memspace, const char *name, const o_set::ElementSet &elements, std::size_t maxSize + ) + { + auto measured = o_set::measure(elements); + v_object set(memspace, elements); + + ASSERT_EQ(set->size(), elements.size()) << name; + ASSERT_EQ(set->sizeOf(), measured) << name; + ASSERT_EQ(o_set::safeSizeOf(reinterpret_cast(set.getData())), measured) << name; + ASSERT_LE(measured, maxSize) << name; + } + + TEST_F( EmbeddedSetTest , testSetStoresUniqueSimpleItems ) + { + auto memspace = getMemspace(); + const std::vector bytes = { std::byte{0x01}, std::byte{0x02}, std::byte{0x03} }; + o_set::ElementSet elements = { + o_set::Element::integer(42), + o_set::Element::string("alpha"), + o_set::Element::boolean(true), + o_set::Element::bytes(bytes), + o_set::Element::integer(42), + o_set::Element::string("alpha"), + o_set::Element::boolean(true), + o_set::Element::bytes(bytes) + }; + + v_object set(memspace, elements); + + ASSERT_EQ(set->size(), 4u); + ASSERT_FALSE(set->empty()); + ASSERT_TRUE(set->contains(o_set::Element::integer(42))); + ASSERT_TRUE(set->contains(o_set::Element::string("alpha"))); + ASSERT_TRUE(set->contains(o_set::Element::boolean(true))); + ASSERT_TRUE(set->contains(o_set::Element::bytes(bytes))); + ASSERT_FALSE(set->contains(o_set::Element::integer(100))); + ASSERT_FALSE(set->contains(o_set::Element::string("missing"))); + } + + TEST_F( EmbeddedSetTest , testSetMeasureSizeOfAndSafeSizeOf ) + { + auto memspace = getMemspace(); + const std::vector shortBytes = { std::byte{0x10}, std::byte{0x20} }; + const std::vector longBytes = { + std::byte{0xaa}, std::byte{0xbb}, std::byte{0xcc}, std::byte{0xdd}, + std::byte{0xee}, std::byte{0xff} + }; + o_set::ElementSet elements = { + o_set::Element::string(""), + o_set::Element::bytes(shortBytes), + o_set::Element::string("set variable string"), + o_set::Element::bytes(longBytes), + o_set::Element::integer(7), + o_set::Element::string("set variable string"), + o_set::Element::bytes(longBytes) + }; + + v_object set(memspace, elements); + auto *begin = reinterpret_cast(set.getData()); + auto measured = o_set::measure(elements); + + ASSERT_EQ(set->size(), 5u); + ASSERT_EQ(set->sizeOf(), measured); + ASSERT_EQ(o_set::safeSizeOf(begin), measured); + ASSERT_EQ(o_set::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + measured)), measured); + ASSERT_GT(set->sizeOf(), 0u); + ASSERT_TRUE(set->contains(o_set::Element::string(""))); + ASSERT_TRUE(set->contains(o_set::Element::bytes(shortBytes))); + ASSERT_TRUE(set->contains(o_set::Element::string("set variable string"))); + ASSERT_TRUE(set->contains(o_set::Element::bytes(longBytes))); + ASSERT_TRUE(set->contains(o_set::Element::integer(7))); + } + + TEST_F( EmbeddedSetTest , testSetMeasuredSizeDoesNotRegress ) + { + auto memspace = getMemspace(); + + // Python comparison values below were measured on CPython 3.11 x86_64 with sys.getsizeof. + // "set" is the shallow hash-table object size; "total" recursively includes contained + // Python objects. Allocator arena/pool overhead is not included, so real process footprint is higher. + // o_set max bytes vs Python set/total bytes: + // empty: 3 vs 216/216 + assertSetSizeWithin(memspace, "empty", {}, 3u); + // singleton packed integer: 13 vs 216/244 + assertSetSizeWithin(memspace, "singleton packed integer", { o_set::Element::integer(7) }, 13u); + // singleton fixed integer: 20 vs 216/244 + assertSetSizeWithin(memspace, "singleton fixed integer", { o_set::Element::integer(-7) }, 20u); + // mixed small scalar values: 54 vs 216/312 + assertSetSizeWithin(memspace, "mixed small", { + o_set::Element::none(), + o_set::Element::boolean(true), + o_set::Element::integer(42), + o_set::Element::floating(1.25) + }, 54u); + + const std::vector shortBytes = { std::byte{0x01}, std::byte{0x02}, std::byte{0x03} }; + const std::vector longBytes = { + std::byte{0x00}, std::byte{0x01}, std::byte{0x02}, std::byte{0x03}, + std::byte{0x04}, std::byte{0x05}, std::byte{0x06}, std::byte{0x07}, + std::byte{0x08}, std::byte{0x09}, std::byte{0x0a}, std::byte{0x0b}, + std::byte{0x0c}, std::byte{0x0d}, std::byte{0x0e}, std::byte{0x0f} + }; + // variable-length strings and bytes: 166 vs 472/750 + assertSetSizeWithin(memspace, "variable length", { + o_set::Element::string(""), + o_set::Element::string("short"), + o_set::Element::string("a somewhat longer string"), + o_set::Element::bytes(shortBytes), + o_set::Element::bytes(longBytes) + }, 166u); + + // date/datetime/decimal scalar payloads: 62 vs 216/400 + assertSetSizeWithin(memspace, "uint64 scalar kinds", { + o_set::Element::date(20260519), + o_set::Element::datetime(123456789), + o_set::Element::decimal(987654321) + }, 62u); + + constexpr std::size_t collisionCount = 16; + auto collisionCapacity = testHashIndexCapacity(collisionCount); + auto collisionSlot = testElementHash(o_set::Element::integer(17)) % collisionCapacity; + o_set::ElementSet collisions; + for (std::int64_t candidate = 0; collisions.size() < collisionCount; ++candidate) { + auto element = o_set::Element::integer(candidate); + if (testElementHash(element) % collisionCapacity == collisionSlot) { + collisions.insert(element); + } + } + // 16 forced integer collisions: 222 vs 728/1176 for a Python set of 16 ints. + assertSetSizeWithin(memspace, "forced collisions", collisions, 222u); + + o_set::ElementSet large; + std::vector strings; + strings.reserve(64); + for (std::int64_t i = 0; i < 64; ++i) { + large.insert(o_set::Element::integer(i)); + strings.push_back("item-" + std::to_string(i)); + large.insert(o_set::Element::string(strings.back())); + } + // 64 ints plus 64 strings: 2050 vs 4312/9816 + assertSetSizeWithin(memspace, "large mixed", large, 2050u); + } + + TEST_F( EmbeddedSetTest , testSingletonHashSlotDoesNotAllocateBucketTuple ) + { + auto memspace = getMemspace(); + o_set::ElementSet elements = { o_set::Element::integer(7) }; + std::vector::Element> bucketElements = { o_tuple<>::Element::integer(7) }; + + v_object set(memspace, elements); + + ASSERT_EQ(set->size(), 1u); + ASSERT_TRUE(set->contains(o_set::Element::integer(7))); + ASSERT_FALSE(set->contains(o_set::Element::integer(8))); + ASSERT_LT(set->sizeOf(), o_set::measure(elements) + o_tuple<>::measure(bucketElements)); + } + + TEST_F( EmbeddedSetTest , testSafeSizeOfRejectsTruncatedSet ) + { + auto memspace = getMemspace(); + const std::vector bytes = { + std::byte{0x01}, std::byte{0x02}, std::byte{0x03}, std::byte{0x04} + }; + o_set::ElementSet elements = { + o_set::Element::string("first"), + o_set::Element::bytes(bytes), + o_set::Element::string("second") + }; + v_object set(memspace, elements); + + auto *begin = reinterpret_cast(set.getData()); + auto size = set->sizeOf(); + + ASSERT_EQ(o_set::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + size)), size); + for (std::size_t truncatedSize = 0; truncatedSize < size; ++truncatedSize) { + ASSERT_THROW( + o_set::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + truncatedSize)), + std::runtime_error + ) << "truncated size: " << truncatedSize; + } + } + + TEST_F( EmbeddedSetTest , testSetSurvivesReopen ) + { + auto memspace = getMemspace(); + o_set::ElementSet elements = { + o_set::Element::none(), + o_set::Element::integer(5), + o_set::Element::string("reopened") + }; + + Address address; + { + v_object set(memspace, elements); + address = set.getAddress(); + } + + v_object set(memspace.myPtr(address)); + + ASSERT_EQ(set->size(), 3u); + ASSERT_TRUE(set->contains(o_set::Element::none())); + ASSERT_TRUE(set->contains(o_set::Element::integer(5))); + ASSERT_TRUE(set->contains(o_set::Element::string("reopened"))); + ASSERT_EQ(set->sizeOf(), o_set::safeSizeOf(reinterpret_cast(set.getData()))); + } + + TEST_F( EmbeddedSetTest , testLargeSetMembershipUsesEmbeddedHashIndex ) + { + auto memspace = getMemspace(); + o_set::ElementSet elements; + std::vector strings; + strings.reserve(256); + + for (std::int64_t i = 0; i < 256; ++i) { + elements.insert(o_set::Element::integer(i)); + strings.push_back("set-item-" + std::to_string(i)); + elements.insert(o_set::Element::string(strings.back())); + } + elements.insert(o_set::Element::integer(17)); + elements.insert(o_set::Element::string(strings[21])); + + v_object set(memspace, elements); + auto measured = o_set::measure(elements); + + ASSERT_EQ(set->size(), 512u); + ASSERT_EQ(set->sizeOf(), measured); + ASSERT_GT(set->sizeOf(), elements.size()); + ASSERT_TRUE(set->contains(o_set::Element::integer(0))); + ASSERT_TRUE(set->contains(o_set::Element::integer(255))); + ASSERT_TRUE(set->contains(o_set::Element::string(strings[0]))); + ASSERT_TRUE(set->contains(o_set::Element::string(strings[255]))); + ASSERT_FALSE(set->contains(o_set::Element::integer(512))); + ASSERT_FALSE(set->contains(o_set::Element::string("set-item-missing"))); + } + + TEST_F( EmbeddedSetTest , testComplexSetContainsAndIterationWithForcedCollisions ) + { + auto memspace = getMemspace(); + o_set::ElementSet elements; + std::vector strings; + std::vector> byteValues; + strings.reserve(220); + byteValues.reserve(100); + + elements.insert(o_set::Element::none()); + elements.insert(o_set::Element::boolean(false)); + elements.insert(o_set::Element::boolean(true)); + + for (std::int64_t i = 0; i < 180; ++i) { + elements.insert(o_set::Element::integer((i * 7919) - 50000)); + } + for (std::size_t i = 0; i < 100; ++i) { + strings.push_back("complex-set-string-" + std::to_string(i) + "-" + std::string(i % 17, 'x')); + elements.insert(o_set::Element::string(strings.back())); + } + for (std::size_t i = 0; i < 80; ++i) { + byteValues.push_back({ + static_cast(i & 0xff), + static_cast((i * 3) & 0xff), + static_cast((i * 7) & 0xff), + static_cast((i * 11) & 0xff), + static_cast((i * 13) & 0xff) + }); + elements.insert(o_set::Element::bytes(byteValues.back())); + } + for (std::size_t i = 0; i < 40; ++i) { + elements.insert(o_set::Element::floating(static_cast(i) + 0.125)); + } + for (std::uint64_t i = 0; i < 10; ++i) { + elements.insert(o_set::Element::timestamp(100000 + i)); + elements.insert(o_set::Element::date(200000 + i)); + elements.insert(o_set::Element::datetime(300000 + i)); + elements.insert(o_set::Element::datetimeTz(400000 + i)); + elements.insert(o_set::Element::time(500000 + i)); + elements.insert(o_set::Element::timeTz(600000 + i)); + elements.insert(o_set::Element::decimal(700000 + i)); + } + + constexpr std::size_t forcedCollisionCount = 32; + auto finalCapacity = testHashIndexCapacity(elements.size() + forcedCollisionCount); + auto collisionSlot = testElementHash(o_set::Element::integer(17)) % finalCapacity; + std::size_t foundCollisions = 0; + for (std::int64_t candidate = 1000000; foundCollisions < forcedCollisionCount; ++candidate) { + auto element = o_set::Element::integer(candidate); + if (testElementHash(element) % finalCapacity != collisionSlot) { + continue; + } + auto inserted = elements.insert(element); + if (inserted.second) { + ++foundCollisions; + } + } + ASSERT_EQ(testHashIndexCapacity(elements.size()), finalCapacity); + + std::unordered_set expectedKeys; + expectedKeys.reserve(elements.size()); + for (const auto &element: elements) { + expectedKeys.insert(elementKey(element)); + } + + v_object set(memspace, elements); + + ASSERT_EQ(set->size(), elements.size()); + ASSERT_EQ(set->sizeOf(), o_set::measure(elements)); + ASSERT_EQ(o_set::safeSizeOf(reinterpret_cast(set.getData())), set->sizeOf()); + + for (const auto &element: elements) { + ASSERT_TRUE(set->contains(element)) << elementKey(element); + } + ASSERT_FALSE(set->contains(o_set::Element::integer(999999999))); + ASSERT_FALSE(set->contains(o_set::Element::string("complex-set-string-missing"))); + + std::unordered_set iteratedKeys; + iteratedKeys.reserve(set->size()); + std::size_t iteratedCount = 0; + const auto &setRef = set.const_ref(); + for (auto it = setRef.begin(); it != setRef.end(); ++it) { + const auto &item = *it; + auto key = itemKey(item); + ASSERT_TRUE(expectedKeys.find(key) != expectedKeys.end()) << key; + ASSERT_TRUE(iteratedKeys.insert(key).second) << key; + ++iteratedCount; + } + + ASSERT_EQ(iteratedCount, expectedKeys.size()); + ASSERT_EQ(iteratedKeys, expectedKeys); + } + + TEST_F( EmbeddedSetTest , testPySetConstructsFromPythonSet ) + { + Py_Initialize(); + auto memspace = getMemspace(); + auto pySet = Py_OWN(PySet_New(nullptr)); + ASSERT_NE(pySet.get(), nullptr); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(PyLong_FromLongLong(42))), 0); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(PyUnicode_FromString("python-set"))), 0); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(Py_NewRef(Py_True))), 0); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(PyBytes_FromStringAndSize("\x01\x02", 2))), 0); + + v_object set(memspace, *pySet); + + ASSERT_EQ(o_py_set::measure(*pySet), set->sizeOf()); + ASSERT_EQ(set->size(), 4u); + ASSERT_TRUE(set->contains(o_set::Element::integer(42))); + ASSERT_TRUE(set->contains(o_set::Element::string("python-set"))); + ASSERT_TRUE(set->contains(o_set::Element::boolean(true))); + ASSERT_TRUE(set->contains(o_set::Element::bytes( + std::vector{ std::byte{0x01}, std::byte{0x02} } + ))); + ASSERT_FALSE(set->contains(o_set::Element::integer(99))); + } + + TEST_F( EmbeddedSetTest , testPySetConstructsFromDateTimeAndDecimal ) + { + Py_Initialize(); + db0::python::init_datetime(); + if (!PyDateTimeAPI) { + PyDateTime_IMPORT; + } + ASSERT_NE(PyDateTimeAPI, nullptr); + auto memspace = getMemspace(); + auto pySet = Py_OWN(PySet_New(nullptr)); + auto decimal = Py_OWN(PyObject_CallFunction(db0::python::getDecimalClass(), "s", "123.45")); + ASSERT_NE(decimal.get(), nullptr); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(PyDate_FromDate(2026, 5, 19))), 0); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(Py_NewRef(*decimal))), 0); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(PyLong_FromLongLong(123))), 0); + ASSERT_EQ(PySafeSet_Add(*pySet, Py_OWN(PyLong_FromLongLong(123))), 0); + + v_object set(memspace, *pySet); + auto dateValue = db0::python::pyDateToUint64(Py_OWN(PyDate_FromDate(2026, 5, 19)).get()); + auto decimalValue = db0::python::pyDecimalToUint64(*decimal); + + ASSERT_EQ(set->size(), 3u); + ASSERT_TRUE(set->contains(o_set::Element::date(dateValue))); + ASSERT_TRUE(set->contains(o_set::Element::decimal(decimalValue))); + ASSERT_TRUE(set->contains(o_set::Element::integer(123))); + + std::unordered_set iteratedKeys; + for (auto it = set->begin(); it != set->end(); ++it) { + iteratedKeys.insert(itemKey(*it)); + } + ASSERT_EQ(iteratedKeys.size(), 3u); + } + +} diff --git a/tests/unit_tests/EmbeddedTupleTest.cpp b/tests/unit_tests/EmbeddedTupleTest.cpp new file mode 100644 index 00000000..bc8a71e0 --- /dev/null +++ b/tests/unit_tests/EmbeddedTupleTest.cpp @@ -0,0 +1,553 @@ +// SPDX-License-Identifier: LGPL-2.1-or-later +// Copyright (c) 2025 DBZero Software sp. z o.o. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +namespace tests +{ + + using namespace db0; + using namespace db0::object_model; + using namespace db0::python; + + class EmbeddedTupleTest: public MemspaceTestBase + { + }; + + static std::int64_t asInt64(const o_tuple_item &item) + { + if (item.itemKind() == StorageClass::PACKED_INT32) { + return static_cast(item.packedIntPayload().value()); + } + return item.intPayload().value(); + } + + static bool asBool(const o_tuple_item &item) + { + return item.boolPayload().value(); + } + + static double asDouble(const o_tuple_item &item) + { + return item.doublePayload().value(); + } + + static std::uint64_t asUint64(const o_tuple_item &item) + { + return item.uint64Payload().value(); + } + + static std::string asString(const o_tuple_item &item) + { + return item.stringPayload().toString(); + } + + static std::vector asBytes(const o_tuple_item &item) + { + const auto &payload = item.bytesPayload(); + return { payload.begin(), payload.end() }; + } + + static void throwDecodeError() + { + throw std::runtime_error("decode error"); + } + + static std::size_t measureElementBlock(const std::vector::Element> &elements) + { + std::size_t size = 0; + for (const auto &element: elements) { + size += o_tuple_item::measure(element); + } + return size; + } + + TEST_F( EmbeddedTupleTest , testTupleStoresInlineAndVariableLengthElements ) + { + auto memspace = getMemspace(); + const std::vector bytes = { std::byte{0x01}, std::byte{0x02}, std::byte{0xff} }; + std::vector::Element> elements = { + o_tuple<>::Element::integer(42), + o_tuple<>::Element::string("alpha"), + o_tuple<>::Element::boolean(true), + o_tuple<>::Element::bytes(bytes) + }; + + v_object > tuple(memspace, elements); + auto expectedElementsSize = measureElementBlock(elements); + + ASSERT_EQ(o_tuple<>::measure(elements), + o_tuple<>::safeSizeOf(reinterpret_cast(tuple.getData()))); + ASSERT_EQ(tuple->size(), 4u); + ASSERT_EQ(tuple->elementsByteSize(), expectedElementsSize); + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::PACKED_INT32); + ASSERT_EQ(asInt64(tuple->item(0)), 42); + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(asString(tuple->item(1)), "alpha"); + ASSERT_EQ(tuple->item(2).itemKind(), StorageClass::BOOLEAN); + ASSERT_TRUE(asBool(tuple->item(2))); + ASSERT_EQ(tuple->item(3).itemKind(), StorageClass::DB0_BYTES); + ASSERT_EQ(asBytes(tuple->item(3)), (std::vector{ std::byte{0x01}, std::byte{0x02}, std::byte{0xff} })); + } + + TEST_F( EmbeddedTupleTest , testTupleItemStorageClassValuesAreStable ) + { + ASSERT_EQ(static_cast(StorageClass::UNDEFINED), 0u); + ASSERT_EQ(static_cast(StorageClass::NONE), 1u); + ASSERT_EQ(static_cast(StorageClass::STRING_REF), 2u); + ASSERT_EQ(static_cast(StorageClass::INT64), 4u); + ASSERT_EQ(static_cast(StorageClass::PTIME64), 5u); + ASSERT_EQ(static_cast(StorageClass::FP_NUMERIC64), 6u); + ASSERT_EQ(static_cast(StorageClass::DATE), 7u); + ASSERT_EQ(static_cast(StorageClass::DATETIME), 8u); + ASSERT_EQ(static_cast(StorageClass::DATETIME_TZ), 9u); + ASSERT_EQ(static_cast(StorageClass::TIME), 10u); + ASSERT_EQ(static_cast(StorageClass::TIME_TZ), 11u); + ASSERT_EQ(static_cast(StorageClass::DECIMAL), 12u); + ASSERT_EQ(static_cast(StorageClass::DB0_BYTES), 23u); + ASSERT_EQ(static_cast(StorageClass::BOOLEAN), 28u); + ASSERT_EQ(static_cast(StorageClass::PACKED_INT32), 253u); + } + + TEST_F( EmbeddedTupleTest , testTupleUsesPackedInt32KindOnlyWhenItSavesAtLeastTwoBytes ) + { + auto memspace = getMemspace(); + constexpr std::int64_t maxPackedInt32 = std::numeric_limits::max(); + constexpr std::int64_t firstInt64AfterPacked = static_cast(std::numeric_limits::max()) + 1; + std::vector::Element> elements = { + o_tuple<>::Element::integer(0), + o_tuple<>::Element::integer(127), + o_tuple<>::Element::integer(maxPackedInt32), + o_tuple<>::Element::integer(firstInt64AfterPacked), + o_tuple<>::Element::integer(-1) + }; + + v_object > tuple(memspace, elements); + + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::PACKED_INT32); + ASSERT_EQ(asInt64(tuple->item(0)), 0); + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::PACKED_INT32); + ASSERT_EQ(asInt64(tuple->item(1)), 127); + ASSERT_EQ(tuple->item(2).itemKind(), StorageClass::PACKED_INT32); + ASSERT_EQ(asInt64(tuple->item(2)), maxPackedInt32); + ASSERT_EQ(tuple->item(3).itemKind(), StorageClass::INT64); + ASSERT_EQ(asInt64(tuple->item(3)), firstInt64AfterPacked); + ASSERT_EQ(tuple->item(4).itemKind(), StorageClass::INT64); + ASSERT_EQ(asInt64(tuple->item(4)), -1); + + ASSERT_EQ(tuple->item(0).sizeOf(), o_tuple_item::measure(elements[0])); + ASSERT_EQ(tuple->item(3).sizeOf(), o_tuple_item::measure(elements[3])); + ASSERT_LE(tuple->item(2).sizeOf() + 2, tuple->item(3).sizeOf()); + } + + TEST_F( EmbeddedTupleTest , testTupleStoresUint64ScalarKinds ) + { + auto memspace = getMemspace(); + std::vector::Element> elements = { + o_tuple<>::Element::timestamp(1001), + o_tuple<>::Element::date(2002), + o_tuple<>::Element::datetime(3003), + o_tuple<>::Element::datetimeTz(4004), + o_tuple<>::Element::time(5005), + o_tuple<>::Element::timeTz(6006), + o_tuple<>::Element::decimal(7007) + }; + + v_object > tuple(memspace, elements); + + ASSERT_EQ(tuple->size(), elements.size()); + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::PTIME64); + ASSERT_EQ(asUint64(tuple->item(0)), 1001u); + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::DATE); + ASSERT_EQ(asUint64(tuple->item(1)), 2002u); + ASSERT_EQ(tuple->item(2).itemKind(), StorageClass::DATETIME); + ASSERT_EQ(asUint64(tuple->item(2)), 3003u); + ASSERT_EQ(tuple->item(3).itemKind(), StorageClass::DATETIME_TZ); + ASSERT_EQ(asUint64(tuple->item(3)), 4004u); + ASSERT_EQ(tuple->item(4).itemKind(), StorageClass::TIME); + ASSERT_EQ(asUint64(tuple->item(4)), 5005u); + ASSERT_EQ(tuple->item(5).itemKind(), StorageClass::TIME_TZ); + ASSERT_EQ(asUint64(tuple->item(5)), 6006u); + ASSERT_EQ(tuple->item(6).itemKind(), StorageClass::DECIMAL); + ASSERT_EQ(asUint64(tuple->item(6)), 7007u); + } + + TEST_F( EmbeddedTupleTest , testTupleMeasureSizeOfAndSafeSizeOfWithMultipleVariableLengthElements ) + { + auto memspace = getMemspace(); + const std::vector shortBytes = { std::byte{0x10}, std::byte{0x20} }; + const std::vector longBytes = { + std::byte{0x01}, std::byte{0x02}, std::byte{0x03}, std::byte{0x04}, + std::byte{0x05}, std::byte{0x06}, std::byte{0x07}, std::byte{0x08} + }; + std::vector::Element> elements = { + o_tuple<>::Element::string(""), + o_tuple<>::Element::bytes(shortBytes), + o_tuple<>::Element::string("alpha"), + o_tuple<>::Element::bytes(longBytes), + o_tuple<>::Element::string("variable length string payload"), + o_tuple<>::Element::integer(9001), + o_tuple<>::Element::string("tail") + }; + + v_object > tuple(memspace, elements); + auto *begin = reinterpret_cast(tuple.getData()); + auto expectedElementBytes = measureElementBlock(elements); + auto expectedTotalBytes = o_tuple<>::measure(elements); + + ASSERT_EQ(tuple->size(), elements.size()); + ASSERT_EQ(tuple->elementsByteSize(), expectedElementBytes); + ASSERT_EQ(tuple->sizeOf(), expectedTotalBytes); + ASSERT_EQ(o_tuple<>::safeSizeOf(begin), expectedTotalBytes); + ASSERT_EQ(o_tuple<>::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + expectedTotalBytes)), + expectedTotalBytes); + + ASSERT_EQ(asString(tuple->item(0)), ""); + ASSERT_EQ(asBytes(tuple->item(1)), shortBytes); + ASSERT_EQ(asString(tuple->item(2)), "alpha"); + ASSERT_EQ(asBytes(tuple->item(3)), longBytes); + ASSERT_EQ(asString(tuple->item(4)), "variable length string payload"); + ASSERT_EQ(asInt64(tuple->item(5)), 9001); + ASSERT_EQ(asString(tuple->item(6)), "tail"); + } + + TEST_F( EmbeddedTupleTest , testCompactTupleOmitsElementsByteSizeMember ) + { + auto memspace = getMemspace(); + const std::vector bytes = { std::byte{0x10}, std::byte{0x20}, std::byte{0x30} }; + std::vector::Element> elements = { + o_tuple<>::Element::integer(11), + o_tuple<>::Element::string("compact"), + o_tuple<>::Element::bytes(bytes) + }; + + v_object > tuple(memspace, elements); + v_object compactTuple(memspace, elements); + + ASSERT_EQ(compactTuple->size(), elements.size()); + ASSERT_EQ(compactTuple->elementsByteSize(), measureElementBlock(elements)); + ASSERT_EQ(compactTuple->sizeOf(), o_compact_tuple::measure(elements)); + ASSERT_EQ(o_compact_tuple::safeSizeOf(reinterpret_cast(compactTuple.getData())), + compactTuple->sizeOf()); + ASSERT_LT(compactTuple->sizeOf(), tuple->sizeOf()); + ASSERT_EQ(asInt64(compactTuple->item(0)), 11); + ASSERT_EQ(asString(compactTuple->item(1)), "compact"); + ASSERT_EQ(asBytes(compactTuple->item(2)), bytes); + } + + TEST_F( EmbeddedTupleTest , testSafeSizeOfRejectsTruncatedMultipleVariableLengthTuple ) + { + auto memspace = getMemspace(); + const std::vector bytes = { + std::byte{0xaa}, std::byte{0xbb}, std::byte{0xcc}, std::byte{0xdd}, std::byte{0xee} + }; + std::vector::Element> elements = { + o_tuple<>::Element::string("first variable payload"), + o_tuple<>::Element::bytes(bytes), + o_tuple<>::Element::string("second variable payload"), + o_tuple<>::Element::bytes(bytes), + o_tuple<>::Element::string("third variable payload") + }; + v_object > tuple(memspace, elements); + + auto *begin = reinterpret_cast(tuple.getData()); + auto size = tuple->sizeOf(); + + ASSERT_EQ(o_tuple<>::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + size)), size); + for (std::size_t truncatedSize = 0; truncatedSize < size; ++truncatedSize) { + ASSERT_THROW( + o_tuple<>::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + truncatedSize)), + std::runtime_error + ) << "truncated size: " << truncatedSize; + } + } + + TEST_F( EmbeddedTupleTest , testTupleSurvivesReopen ) + { + auto memspace = getMemspace(); + std::vector::Element> elements = { + o_tuple<>::Element::none(), + o_tuple<>::Element::string("variable length") + }; + + Address address; + { + v_object > tuple(memspace, elements); + address = tuple.getAddress(); + } + + v_object > tuple(memspace.myPtr(address)); + + ASSERT_EQ(tuple->sizeOf(), o_tuple<>::safeSizeOf(reinterpret_cast(tuple.getData()))); + ASSERT_EQ(tuple->size(), 2u); + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::NONE); + ASSERT_EQ(asString(tuple->item(1)), "variable length"); + } + + TEST_F( EmbeddedTupleTest , testSafeSizeOfValidatesBoundedBufferBeforeHeaderReads ) + { + auto memspace = getMemspace(); + const std::vector bytes = { std::byte{0x01}, std::byte{0x02}, std::byte{0xff} }; + std::vector::Element> elements = { + o_tuple<>::Element::integer(42), + o_tuple<>::Element::bytes(bytes) + }; + v_object > tuple(memspace, elements); + + auto *begin = reinterpret_cast(tuple.getData()); + auto size = tuple->sizeOf(); + + ASSERT_EQ(o_tuple<>::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + size)), size); + ASSERT_THROW(o_tuple<>::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + 1)), std::runtime_error); + ASSERT_THROW(o_tuple<>::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + size - 1)), std::runtime_error); + } + + TEST_F( EmbeddedTupleTest , testPyTupleConstructsFromPythonTuple ) + { + Py_Initialize(); + auto memspace = getMemspace(); + auto pyTuple = Py_OWN(PyTuple_New(5)); + PySafeTuple_SetItem(*pyTuple, 0, Py_OWN(PyLong_FromLongLong(123))); + PySafeTuple_SetItem(*pyTuple, 1, Py_OWN(PyUnicode_FromString("python"))); + PySafeTuple_SetItem(*pyTuple, 2, Py_OWN(PyBool_FromLong(1))); + PySafeTuple_SetItem(*pyTuple, 3, Py_OWN(PyFloat_FromDouble(4.5))); + PySafeTuple_SetItem(*pyTuple, 4, Py_OWN(PyBytes_FromStringAndSize("\x01\x02", 2))); + + v_object tuple(memspace, *pyTuple); + + ASSERT_EQ(o_py_tuple::measure(*pyTuple), tuple->sizeOf()); + ASSERT_EQ(tuple->size(), 5u); + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::PACKED_INT32); + ASSERT_EQ(asInt64(tuple->item(0)), 123); + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(asString(tuple->item(1)), "python"); + ASSERT_EQ(tuple->item(2).itemKind(), StorageClass::BOOLEAN); + ASSERT_TRUE(asBool(tuple->item(2))); + ASSERT_EQ(tuple->item(3).itemKind(), StorageClass::FP_NUMERIC64); + ASSERT_EQ(asDouble(tuple->item(3)), 4.5); + ASSERT_EQ(tuple->item(4).itemKind(), StorageClass::DB0_BYTES); + ASSERT_EQ(asBytes(tuple->item(4)), (std::vector{ std::byte{0x01}, std::byte{0x02} })); + } + + TEST_F( EmbeddedTupleTest , testPyTupleConstructsFromPythonList ) + { + Py_Initialize(); + auto memspace = getMemspace(); + auto pyList = Py_OWN(PyList_New(2)); + PySafeList_SetItem(*pyList, 0, Py_OWN(Py_NewRef(Py_None))); + PySafeList_SetItem(*pyList, 1, Py_OWN(PyUnicode_FromString("list item"))); + + v_object tuple(memspace, *pyList); + + ASSERT_EQ(o_py_tuple::measure(*pyList), tuple->sizeOf()); + ASSERT_EQ(tuple->size(), 2u); + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::NONE); + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(asString(tuple->item(1)), "list item"); + } + + TEST_F( EmbeddedTupleTest , testPyTupleConstructsDeeplyNestedCollections ) + { + Py_Initialize(); + auto memspace = getMemspace(); + + auto pyRoot = Py_OWN(PyTuple_New(3)); + + auto pyNestedList = Py_OWN(PyList_New(2)); + PySafeList_SetItem(*pyNestedList, 0, Py_OWN(PyLong_FromLongLong(11))); + + auto pyDeepDict = Py_OWN(PyDict_New()); + ASSERT_EQ(PySafeDict_SetItem( + *pyDeepDict, Py_OWN(PyUnicode_FromString("answer")), Py_OWN(PyLong_FromLongLong(42)) + ), 0); + + auto pyDeepList = Py_OWN(PyList_New(2)); + PySafeList_SetItem(*pyDeepList, 0, Py_OWN(PyUnicode_FromString("deep"))); + PySafeList_SetItem(*pyDeepList, 1, Py_OWN(Py_NewRef(*pyDeepDict))); + + auto pyInnerTuple = Py_OWN(PyTuple_New(2)); + PySafeTuple_SetItem(*pyInnerTuple, 0, Py_OWN(PyLong_FromLongLong(22))); + PySafeTuple_SetItem(*pyInnerTuple, 1, Py_OWN(Py_NewRef(*pyDeepList))); + + auto pyInnerDict = Py_OWN(PyDict_New()); + ASSERT_EQ(PySafeDict_SetItem( + *pyInnerDict, Py_OWN(PyUnicode_FromString("tuple")), Py_OWN(Py_NewRef(*pyInnerTuple)) + ), 0); + PySafeList_SetItem(*pyNestedList, 1, Py_OWN(Py_NewRef(*pyInnerDict))); + PySafeTuple_SetItem(*pyRoot, 0, Py_OWN(Py_NewRef(*pyNestedList))); + + auto pyNumbers = Py_OWN(PyList_New(2)); + PySafeList_SetItem(*pyNumbers, 0, Py_OWN(PyLong_FromLongLong(3))); + PySafeList_SetItem(*pyNumbers, 1, Py_OWN(PyLong_FromLongLong(4))); + + auto pyFlags = Py_OWN(PySet_New(nullptr)); + ASSERT_EQ(PySafeSet_Add(*pyFlags, Py_OWN(Py_NewRef(Py_True))), 0); + ASSERT_EQ(PySafeSet_Add(*pyFlags, Py_OWN(PyUnicode_FromString("ok"))), 0); + + auto pyRootDict = Py_OWN(PyDict_New()); + ASSERT_EQ(PySafeDict_SetItem( + *pyRootDict, Py_OWN(PyUnicode_FromString("numbers")), Py_OWN(Py_NewRef(*pyNumbers)) + ), 0); + ASSERT_EQ(PySafeDict_SetItem( + *pyRootDict, Py_OWN(PyUnicode_FromString("flags")), Py_OWN(Py_NewRef(*pyFlags)) + ), 0); + PySafeTuple_SetItem(*pyRoot, 1, Py_OWN(Py_NewRef(*pyRootDict))); + + auto pyRootSet = Py_OWN(PySet_New(nullptr)); + ASSERT_EQ(PySafeSet_Add(*pyRootSet, Py_OWN(PyUnicode_FromString("root-set"))), 0); + ASSERT_EQ(PySafeSet_Add(*pyRootSet, Py_OWN(PyLong_FromLongLong(99))), 0); + auto pySetTuple = Py_OWN(PyTuple_New(2)); + PySafeTuple_SetItem(*pySetTuple, 0, Py_OWN(PyUnicode_FromString("set-tuple"))); + PySafeTuple_SetItem(*pySetTuple, 1, Py_OWN(PyLong_FromLongLong(123))); + ASSERT_EQ(PySafeSet_Add(*pyRootSet, Py_OWN(Py_NewRef(*pySetTuple))), 0); + PySafeTuple_SetItem(*pyRoot, 2, Py_OWN(Py_NewRef(*pyRootSet))); + + v_object tuple(memspace, *pyRoot); + + ASSERT_EQ(o_py_tuple::measure(*pyRoot), tuple->sizeOf()); + ASSERT_EQ(tuple->size(), 3u); + + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::DB0_TUPLE); + const auto &nestedList = o_tuple<>::__const_ref(tuple->item(0).embeddedPayload().begin()); + ASSERT_EQ(nestedList.size(), 2u); + ASSERT_EQ(asInt64(nestedList.item(0)), 11); + ASSERT_EQ(nestedList.item(1).itemKind(), StorageClass::DB0_DICT); + + const auto &innerDict = o_dict::__const_ref(nestedList.item(1).embeddedPayload().begin()); + auto *innerTupleItem = innerDict.get(o_dict::Element::string("tuple")); + ASSERT_NE(innerTupleItem, nullptr); + ASSERT_EQ(innerTupleItem->itemKind(), StorageClass::DB0_TUPLE); + + const auto &innerTuple = o_tuple<>::__const_ref(innerTupleItem->embeddedPayload().begin()); + ASSERT_EQ(innerTuple.size(), 2u); + ASSERT_EQ(asInt64(innerTuple.item(0)), 22); + ASSERT_EQ(innerTuple.item(1).itemKind(), StorageClass::DB0_TUPLE); + + const auto &deepList = o_tuple<>::__const_ref(innerTuple.item(1).embeddedPayload().begin()); + ASSERT_EQ(deepList.size(), 2u); + ASSERT_EQ(asString(deepList.item(0)), "deep"); + ASSERT_EQ(deepList.item(1).itemKind(), StorageClass::DB0_DICT); + + const auto &deepDict = o_dict::__const_ref(deepList.item(1).embeddedPayload().begin()); + auto *answer = deepDict.get(o_dict::Element::string("answer")); + ASSERT_NE(answer, nullptr); + ASSERT_EQ(asInt64(*answer), 42); + + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::DB0_DICT); + const auto &rootDict = o_dict::__const_ref(tuple->item(1).embeddedPayload().begin()); + auto *numbersItem = rootDict.get(o_dict::Element::string("numbers")); + ASSERT_NE(numbersItem, nullptr); + ASSERT_EQ(numbersItem->itemKind(), StorageClass::DB0_TUPLE); + const auto &numbers = o_tuple<>::__const_ref(numbersItem->embeddedPayload().begin()); + ASSERT_EQ(numbers.size(), 2u); + ASSERT_EQ(asInt64(numbers.item(0)), 3); + ASSERT_EQ(asInt64(numbers.item(1)), 4); + + auto *flagsItem = rootDict.get(o_dict::Element::string("flags")); + ASSERT_NE(flagsItem, nullptr); + ASSERT_EQ(flagsItem->itemKind(), StorageClass::DB0_SET); + const auto &flags = o_set::__const_ref(flagsItem->embeddedPayload().begin()); + ASSERT_EQ(flags.size(), 2u); + ASSERT_TRUE(flags.contains(o_set::Element::boolean(true))); + ASSERT_TRUE(flags.contains(o_set::Element::string("ok"))); + + ASSERT_EQ(tuple->item(2).itemKind(), StorageClass::DB0_SET); + const auto &rootSet = o_set::__const_ref(tuple->item(2).embeddedPayload().begin()); + ASSERT_EQ(rootSet.size(), 3u); + ASSERT_TRUE(rootSet.contains(o_set::Element::string("root-set"))); + ASSERT_TRUE(rootSet.contains(o_set::Element::integer(99))); + + const o_tuple_item *setTupleItem = nullptr; + for (auto it = rootSet.begin(); it != rootSet.end(); ++it) { + if (it->itemKind() == StorageClass::DB0_TUPLE) { + setTupleItem = &*it; + break; + } + } + ASSERT_NE(setTupleItem, nullptr); + const auto &setTuple = o_tuple<>::__const_ref(setTupleItem->embeddedPayload().begin()); + ASSERT_EQ(setTuple.size(), 2u); + ASSERT_EQ(asString(setTuple.item(0)), "set-tuple"); + ASSERT_EQ(asInt64(setTuple.item(1)), 123); + } + + TEST_F( EmbeddedTupleTest , testPyTupleMeasureSizeOfAndSafeSizeOfWithMultipleVariableLengthElements ) + { + Py_Initialize(); + auto memspace = getMemspace(); + auto pyTuple = Py_OWN(PyTuple_New(6)); + PySafeTuple_SetItem(*pyTuple, 0, Py_OWN(PyUnicode_FromString(""))); + PySafeTuple_SetItem(*pyTuple, 1, Py_OWN(PyBytes_FromStringAndSize("abc", 3))); + PySafeTuple_SetItem(*pyTuple, 2, Py_OWN(PyUnicode_FromString("middle variable string"))); + PySafeTuple_SetItem(*pyTuple, 3, Py_OWN(PyBytes_FromStringAndSize("0123456789", 10))); + PySafeTuple_SetItem(*pyTuple, 4, Py_OWN(PyUnicode_FromString("tail variable string"))); + PySafeTuple_SetItem(*pyTuple, 5, Py_OWN(PyLong_FromLongLong(77))); + + v_object tuple(memspace, *pyTuple); + auto *begin = reinterpret_cast(tuple.getData()); + auto measured = o_py_tuple::measure(*pyTuple); + + ASSERT_EQ(tuple->size(), 6u); + ASSERT_EQ(tuple->sizeOf(), measured); + ASSERT_EQ(o_py_tuple::safeSizeOf(begin), measured); + ASSERT_EQ(o_py_tuple::safeSizeOf(const_bounded_buf_t(throwDecodeError, begin, begin + measured)), measured); + ASSERT_EQ(asString(tuple->item(0)), ""); + ASSERT_EQ(asBytes(tuple->item(1)), (std::vector{ std::byte{'a'}, std::byte{'b'}, std::byte{'c'} })); + ASSERT_EQ(asString(tuple->item(2)), "middle variable string"); + ASSERT_EQ(asBytes(tuple->item(3)), (std::vector{ + std::byte{'0'}, std::byte{'1'}, std::byte{'2'}, std::byte{'3'}, std::byte{'4'}, + std::byte{'5'}, std::byte{'6'}, std::byte{'7'}, std::byte{'8'}, std::byte{'9'} + })); + ASSERT_EQ(asString(tuple->item(4)), "tail variable string"); + ASSERT_EQ(asInt64(tuple->item(5)), 77); + } + + TEST_F( EmbeddedTupleTest , testPyTupleConstructsFromDateTimeAndDecimal ) + { + Py_Initialize(); + db0::python::init_datetime(); + if (!PyDateTimeAPI) { + PyDateTime_IMPORT; + } + ASSERT_NE(PyDateTimeAPI, nullptr); + auto memspace = getMemspace(); + auto pyTuple = Py_OWN(PyTuple_New(5)); + auto decimal = Py_OWN(PyObject_CallFunction(db0::python::getDecimalClass(), "s", "123.45")); + ASSERT_NE(decimal.get(), nullptr); + PySafeTuple_SetItem(*pyTuple, 0, Py_OWN(PyDate_FromDate(2026, 5, 19))); + PySafeTuple_SetItem(*pyTuple, 1, Py_OWN(PyDateTime_FromDateAndTime(2026, 5, 19, 12, 34, 56, 789))); + PySafeTuple_SetItem(*pyTuple, 2, Py_OWN(PyTime_FromTime(12, 34, 56, 789))); + PySafeTuple_SetItem(*pyTuple, 3, Py_OWN(Py_NewRef(*decimal))); + PySafeTuple_SetItem(*pyTuple, 4, Py_OWN(PyUnicode_FromString("tail"))); + + v_object tuple(memspace, *pyTuple); + + ASSERT_EQ(tuple->item(0).itemKind(), StorageClass::DATE); + ASSERT_EQ(asUint64(tuple->item(0)), db0::python::pyDateToUint64(PyTuple_GET_ITEM(*pyTuple, 0))); + ASSERT_EQ(tuple->item(1).itemKind(), StorageClass::DATETIME); + ASSERT_EQ(asUint64(tuple->item(1)), db0::python::pyDateTimeToToUint64(PyTuple_GET_ITEM(*pyTuple, 1))); + ASSERT_EQ(tuple->item(2).itemKind(), StorageClass::TIME); + ASSERT_EQ(asUint64(tuple->item(2)), db0::python::pyTimeToUint64(PyTuple_GET_ITEM(*pyTuple, 2))); + ASSERT_EQ(tuple->item(3).itemKind(), StorageClass::DECIMAL); + ASSERT_EQ(asUint64(tuple->item(3)), db0::python::pyDecimalToUint64(PyTuple_GET_ITEM(*pyTuple, 3))); + ASSERT_EQ(tuple->item(4).itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(asString(tuple->item(4)), "tail"); + } + +} diff --git a/tests/unit_tests/ObjectInitializerTest.cpp b/tests/unit_tests/ObjectInitializerTest.cpp index 8f534b22..56a18f83 100644 --- a/tests/unit_tests/ObjectInitializerTest.cpp +++ b/tests/unit_tests/ObjectInitializerTest.cpp @@ -2,13 +2,18 @@ // Copyright (c) 2025 DBZero Software sp. z o.o. #include +#include #include #include #include #include +#include +#include #include #include #include +#include +#include #include #include @@ -122,6 +127,213 @@ namespace tests workspace.close(); } + TEST_F( ObjectInitializerTest, testManagerCreatesImmutableInitializerForImmutableObjects ) + { + Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer()); + auto fixture = workspace.getFixture(prefix_name); + + int object = 0; + std::shared_ptr mock_class = getTestClass(fixture); + ObjectInitializerManager manager; + manager.addInitializerFor(object, mock_class); + + auto *initializer = manager.findInitializer(object); + ASSERT_NE(initializer, nullptr); + ASSERT_NE(dynamic_cast(initializer), nullptr); + ASSERT_EQ(dynamic_cast(initializer)->getClassPtr(), mock_class); + + manager.closeInitializer(object); + workspace.close(); + } + + TEST_F( ObjectInitializerTest, testImmutableInitializerStoresObjectForNonFixedValues ) + { + Py_Initialize(); + + Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer()); + auto fixture = workspace.getFixture(prefix_name); + + int object = 0; + std::shared_ptr mock_class = getTestClass(fixture); + ObjectInitializerManager manager; + manager.addInitializerFor(object, mock_class); + auto *initializer = dynamic_cast(manager.findInitializer(object)); + ASSERT_NE(initializer, nullptr); + + auto py_value = Py_OWN(PyLong_FromLong(42)); + ImmutableObjectInitializer::ObjectSharedPtr object_value(py_value.get()); + initializer->setObject({9, 0}, StorageClass::STRING_REF, Value(123), object_value); + + std::pair stored_value; + ASSERT_TRUE(initializer->tryGetAt({9, 0}, stored_value)); + ASSERT_EQ(stored_value.first, StorageClass::STRING_REF); + ASSERT_EQ(stored_value.second, Value(123)); + + ImmutableObjectInitializer::ObjectSharedPtr stored_object; + ASSERT_TRUE(initializer->tryGetObjectAt({9, 0}, stored_object)); + ASSERT_EQ(stored_object.get(), py_value.get()); + + ASSERT_TRUE(initializer->remove({9, 0})); + ASSERT_FALSE(initializer->tryGetObjectAt({9, 0}, stored_object)); + + workspace.close(); + } + + TEST_F( ObjectInitializerTest, testImmutableInitializerDoesNotStoreObjectForFixedValues ) + { + Py_Initialize(); + + Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer()); + auto fixture = workspace.getFixture(prefix_name); + + int object = 0; + std::shared_ptr mock_class = getTestClass(fixture); + ObjectInitializerManager manager; + manager.addInitializerFor(object, mock_class); + auto *initializer = dynamic_cast(manager.findInitializer(object)); + ASSERT_NE(initializer, nullptr); + + auto py_value = Py_OWN(PyLong_FromLong(42)); + initializer->setObject({3, 0}, StorageClass::INT64, Value(42), ImmutableObjectInitializer::ObjectSharedPtr(py_value.get())); + + ImmutableObjectInitializer::ObjectSharedPtr stored_object; + ASSERT_FALSE(initializer->tryGetObjectAt({3, 0}, stored_object)); + + workspace.close(); + } + + TEST_F( ObjectInitializerTest, testImmutableInitializerCollectsVariableValuesIntoFieldMap ) + { + Py_Initialize(); + + Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer()); + auto fixture = workspace.getFixture(prefix_name); + + int object = 0; + std::shared_ptr mock_class = getTestClass(fixture); + ObjectInitializerManager manager; + manager.addInitializerFor(object, mock_class); + auto *initializer = dynamic_cast(manager.findInitializer(object)); + ASSERT_NE(initializer, nullptr); + + initializer->set({0, 0}, StorageClass::INT64, Value(17)); + + auto py_string = Py_OWN(PyUnicode_FromString("variable-value")); + initializer->setObject( + {4, 0}, StorageClass::STRING_REF, Value(123), + ImmutableObjectInitializer::ObjectSharedPtr(py_string.get()) + ); + + auto measured = o_embedded_object::measure(33u, *initializer); + std::vector buffer(measured); + auto &embedded_object = o_embedded_object::__new(buffer.data(), 33u, *initializer); + + ASSERT_EQ(embedded_object.sizeOf(), measured); + auto fixed_value = embedded_object.fixedValue(0); + ASSERT_TRUE(fixed_value.has_value()); + ASSERT_EQ(fixed_value->m_kind, StorageClass::INT64); + ASSERT_EQ(fixed_value->m_value, 17u); + + auto *variable_value = embedded_object.variableValue(4); + ASSERT_NE(variable_value, nullptr); + ASSERT_EQ(variable_value->itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(variable_value->stringPayload().toString(), "variable-value"); + + workspace.close(); + } + + TEST_F( ObjectInitializerTest, testImmutableInitializerEmbedsPythonListFieldMapValue ) + { + Py_Initialize(); + + Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer()); + auto fixture = workspace.getFixture(prefix_name); + + int object = 0; + std::shared_ptr mock_class = getTestClass(fixture); + ObjectInitializerManager manager; + manager.addInitializerFor(object, mock_class); + auto *initializer = dynamic_cast(manager.findInitializer(object)); + ASSERT_NE(initializer, nullptr); + + auto py_list = Py_OWN(PyList_New(2)); + db0::python::PySafeList_SetItem(py_list.get(), 0, Py_OWN(PyLong_FromLong(7))); + db0::python::PySafeList_SetItem(py_list.get(), 1, Py_OWN(PyUnicode_FromString("seven"))); + initializer->setObject( + {8, 0}, StorageClass::DB0_LIST, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(py_list.get()) + ); + + auto measured = o_embedded_object::measure(33u, *initializer); + std::vector buffer(measured); + auto &embedded_object = o_embedded_object::__new(buffer.data(), 33u, *initializer); + + ASSERT_EQ(embedded_object.sizeOf(), measured); + ASSERT_FALSE(embedded_object.fixedValue(8).has_value()); + auto *variable_value = embedded_object.variableValue(8); + ASSERT_NE(variable_value, nullptr); + ASSERT_EQ(variable_value->itemKind(), StorageClass::DB0_TUPLE); + + const auto &tuple = o_tuple<>::__const_ref(variable_value->embeddedPayload().begin()); + ASSERT_EQ(tuple.size(), 2u); + ASSERT_EQ(tuple.item(0).itemKind(), StorageClass::PACKED_INT32); + ASSERT_EQ(tuple.item(0).packedIntPayload().value(), 7u); + ASSERT_EQ(tuple.item(1).itemKind(), StorageClass::STRING_REF); + ASSERT_EQ(tuple.item(1).stringPayload().toString(), "seven"); + + workspace.close(); + } + + TEST_F( ObjectInitializerTest, testEmbeddedObjectMeasureDoesNotConsumeImmutableVariableValues ) + { + Py_Initialize(); + + Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer()); + auto fixture = workspace.getFixture(prefix_name); + + int object = 0; + std::shared_ptr mock_class = getTestClass(fixture); + ObjectInitializerManager manager; + manager.addInitializerFor(object, mock_class); + auto *initializer = dynamic_cast(manager.findInitializer(object)); + ASSERT_NE(initializer, nullptr); + + initializer->setObject({0, 0}, StorageClass::INT64, Value(17), {}); + + auto py_list = Py_OWN(PyList_New(2)); + db0::python::PySafeList_SetItem(py_list.get(), 0, Py_OWN(PyLong_FromLong(7))); + db0::python::PySafeList_SetItem(py_list.get(), 1, Py_OWN(PyUnicode_FromString("seven"))); + initializer->setObject( + {8, 0}, StorageClass::DB0_LIST, Value(0), + ImmutableObjectInitializer::ObjectSharedPtr(py_list.get()) + ); + + auto measured = o_embedded_object::measure(33u, *initializer); + + ImmutableObjectInitializer::ObjectSharedPtr stored_object; + ASSERT_TRUE(initializer->tryGetObjectAt({8, 0}, stored_object)); + ASSERT_EQ(stored_object.get(), py_list.get()); + + std::vector buffer(measured); + auto &embedded_object = o_embedded_object::__new(buffer.data(), 33u, *initializer); + + ASSERT_EQ(embedded_object.sizeOf(), measured); + auto fixed_value = embedded_object.fixedValue(0); + ASSERT_TRUE(fixed_value.has_value()); + ASSERT_EQ(fixed_value->m_kind, StorageClass::INT64); + ASSERT_EQ(fixed_value->m_value, 17u); + + auto *list_value = embedded_object.variableValue(8); + ASSERT_NE(list_value, nullptr); + ASSERT_EQ(list_value->itemKind(), StorageClass::DB0_TUPLE); + const auto &tuple = o_tuple<>::__const_ref(list_value->embeddedPayload().begin()); + ASSERT_EQ(tuple.size(), 2u); + ASSERT_EQ(tuple.item(0).packedIntPayload().value(), 7u); + ASSERT_EQ(tuple.item(1).stringPayload().toString(), "seven"); + + workspace.close(); + } + TEST_F( ObjectInitializerTest, testPosVTLoFiExclusive ) { Workspace workspace("", {}, {}, {}, {}, db0::object_model::initializer());