Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Review Summary by QodoAdd List type support in storage layer with ListView and ListColumn
WalkthroughsDescription• Add comprehensive List type support in storage layer • Implement ListView and ListViewBuilder for efficient list serialization • Create ListColumn for persistent list property storage • Add conversion functions between execution Values and storage ListViews • Support nested lists and both POD and variable-length element types Diagramflowchart LR
A["Execution Value::LIST"] -->|ListViewToValue| B["ListView"]
B -->|get_view| C["ListColumn"]
C -->|set_value| D["ListViewBuilder"]
D -->|finish_pod/varlen| E["Binary Blob"]
E -->|storage| F["Persistent Storage"]
A -->|ValueToListBlob| E
G["Property"] -->|as_list_data| E
E -->|from_list_data| G
File Changes1. include/neug/utils/property/list_view.h
|
Code Review by Qodo
1. Dangling list blob view
|
|
Currently the implementation is finished. It turns out the it used the |
| case DataTypeId::kList: { | ||
| std::string blob = ValueToListBlob(value); | ||
| return Property::from_list_data(blob); | ||
| } |
There was a problem hiding this comment.
1. Dangling list blob view 🐞 Bug ≡ Correctness
execution::value_to_property builds a temporary std::string blob for list values and returns Property::from_list_data(blob), but Property stores list data as std::string_view so it dangles immediately. This can cause use-after-free when inserting into storage (Table::insert -> ColumnBase::set_any) and can corrupt stored list data or crash.
Agent Prompt
### Issue description
`Property` stores list blobs as `std::string_view`, but `execution::value_to_property` creates the blob as a temporary `std::string` and returns `Property::from_list_data(blob)`. The returned `Property` immediately holds a dangling pointer.
### Issue Context
List blobs are commonly produced dynamically (serialization from `Value::LIST`), so the system needs an ownership strategy. The current Property implementation uses `memcpy`-based assignment, so adding owning fields requires updating copy/assignment semantics.
### Fix Focus Areas
- src/execution/common/types/value.cc[817-820]
- include/neug/utils/property/property.h[146-166]
- src/utils/property/table.cc[278-285]
### Suggested fix
1. Introduce ownership for variable-length payloads (at minimum for `kList`, and ideally unify with `kVarchar`). Options:
- Add an owning buffer inside `Property` (e.g., `std::shared_ptr<std::string>` or `std::string` + proper copy/move/assignment/destructor) and make `as_list_data()` return a view into owned storage.
- Or redesign the insert path to avoid storing list blobs inside `Property` as a view (e.g., pass an owned blob alongside the Property vector).
2. Update `Property::operator=` and any serialization/deserialization code to handle the new ownership model safely (remove `memcpy`-based copying for non-trivial members).
3. Add tests that insert list properties via the normal execution/storage insert path to catch lifetime bugs (not just unit tests over builders).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| case DataTypeId::kList: | ||
| // An empty list blob (no elements) serves as the default value for list | ||
| // properties. ListView::size() returns 0 when the blob is shorter than | ||
| // 4 bytes, so an empty string_view is a valid representation. | ||
| default_value.set_list_data(std::string_view{}); | ||
| break; |
There was a problem hiding this comment.
2. Wal cannot serialize lists 🐞 Bug ☼ Reliability
WAL redo serialization writes Property values with arc << prop, but Property archive operators do not handle DataTypeId::kList, so inserts/updates with list properties will throw during WAL writing or fail during recovery ingestion. This breaks durability/recovery once list properties are used.
Agent Prompt
### Issue description
WAL serialization uses `arc << Property`, but `Property` serialization/deserialization does not support `DataTypeId::kList`. This will throw when a list property is present.
### Issue Context
Redo logs serialize all properties on insert/update. With list properties enabled, WAL must be able to write/read list blobs.
### Fix Focus Areas
- src/utils/property/property.cc[74-160]
- src/transaction/wal/wal.cc[374-415]
### Suggested fix
1. Extend `InArchive& operator<<(InArchive&, const Property&)` to handle `DataTypeId::kList` by writing the raw list blob (`as_list_data()`) similarly to how varchar writes `as_string_view()`.
2. Extend `OutArchive& operator>>(OutArchive&, Property&)` to handle reading `kList` by reading a `std::string_view` blob and calling `set_list_data(...)` (or the new owning setter if you implement ownership).
3. Add a WAL roundtrip test that inserts a vertex with a list property, flushes WAL, and replays it to ensure list blobs survive recovery.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| template <typename T> | ||
| void append_pod(const T& val) { | ||
| static_assert(std::is_trivially_copyable_v<T>, | ||
| "append_pod requires a trivially copyable type"); |
There was a problem hiding this comment.
3. Missing type_traits include 🐞 Bug ≡ Correctness
include/neug/utils/property/list_view.h uses std::is_trivially_copyable_v but does not include <type_traits>, which can cause compilation failures depending on transitive include order. This breaks header self-sufficiency.
Agent Prompt
### Issue description
`list_view.h` uses `std::is_trivially_copyable_v` but doesn't include `<type_traits>`.
### Fix Focus Areas
- include/neug/utils/property/list_view.h[15-26]
### Suggested fix
Add:
```cpp
#include <type_traits>
```
near the other standard includes.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if (is_pod_type(child_type.id())) { | ||
| switch (child_type.id()) { | ||
| #define APPEND_POD(type_enum, cpp_type) \ | ||
| case DataTypeId::type_enum: \ | ||
| for (const auto& c : children) { \ | ||
| builder.append_pod(c.GetValue<cpp_type>()); \ | ||
| } \ | ||
| return builder.finish_pod<cpp_type>(); | ||
| APPEND_POD(kBoolean, bool) | ||
| APPEND_POD(kInt32, int32_t) | ||
| APPEND_POD(kInt64, int64_t) | ||
| APPEND_POD(kUInt32, uint32_t) | ||
| APPEND_POD(kUInt64, uint64_t) | ||
| APPEND_POD(kFloat, float) | ||
| APPEND_POD(kDouble, double) | ||
| APPEND_POD(kDate, date_t) | ||
| APPEND_POD(kTimestampMs, timestamp_ms_t) | ||
| APPEND_POD(kInterval, interval_t) | ||
| #undef APPEND_POD | ||
| default: | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
4. Missing pod list child types 🐞 Bug ≡ Correctness
ValueToListBlob and ListViewToValue only support a subset of POD child types, but is_pod_type() classifies int8/int16/uint8/uint16 as POD. Lists of these element types will fail conversion between execution Values and storage blobs (either falling into the non-POD path and throwing, or being treated as unsupported).
Agent Prompt
### Issue description
`is_pod_type()` includes `kInt8`, `kInt16`, `kUInt8`, `kUInt16`, but `ValueToListBlob` / `ListViewToValue` don't implement these cases. This makes some list schemas unusable.
### Fix Focus Areas
- src/execution/common/types/value.cc[745-790]
- src/execution/common/types/value.cc[937-1000]
- include/neug/utils/property/list_view.h[35-55]
### Suggested fix
1. Add APPEND_POD cases in `ValueToListBlob` for:
- `kInt8` -> `int8_t`
- `kInt16` -> `int16_t`
- `kUInt8` -> `uint8_t`
- `kUInt16` -> `uint16_t`
2. Add corresponding cases in `ListViewToValue` to construct `Value::INT8/INT16/UINT8/UINT16` if those constructors exist; otherwise, implement them or decide to explicitly mark these child types unsupported and remove them from `is_pod_type()`.
3. Add unit tests for these missing element types.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| void resize(size_t size, const Property& default_value) override { | ||
| if (default_value.type() != DataTypeId::kList && | ||
| default_value.type() != DataTypeId::kEmpty) { | ||
| THROW_RUNTIME_ERROR("Default value type does not match list column"); | ||
| } | ||
| resize(size); | ||
| // Leave entries zero-initialized (empty lists) for new slots. | ||
| } |
There was a problem hiding this comment.
5. Listcolumn ignores non-empty defaults 🐞 Bug ≡ Correctness
ListColumn::resize(size, default_value) validates the type and then does not apply default_value to newly-added rows, leaving them empty lists. This violates the ColumnBase contract used elsewhere (TypedColumn fills new rows with the provided default) and breaks schema DEFAULT list values.
Agent Prompt
### Issue description
`ListColumn::resize(size, default_value)` does not populate new rows with `default_value`, unlike other column implementations.
### Issue Context
Schema default values are used when tables grow (e.g., `EnsureCapacity`). For list columns, non-empty defaults will be silently lost.
### Fix Focus Areas
- include/neug/utils/property/column.h[538-545]
- include/neug/utils/property/column.h[554-575]
- src/storages/graph/vertex_table.cc[185-187]
### Suggested fix
1. Implement default filling for newly-added indices similar to `TypedColumn<T>::resize`:
- Compute `old_size`, resize storage, then for i in [old_size, size): set items_ to reference the default blob.
2. To avoid duplicating blob bytes N times, you can store the default blob once in `data_` and set all new `items_[i]` to the same `{offset,length}`.
3. If the default blob is empty, leave the zero-initialized entries as-is.
4. Add a unit/integration test that defines a list column with a non-empty DEFAULT and verifies inserted rows (or resized capacity rows) read back that default.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Committed-by: Xiaoli Zhou from Dev container
Support Type List in storage.
Fix #159