Skip to content

feat: Support List in storage#157

Draft
zhanglei1949 wants to merge 2 commits intoalibaba:mainfrom
zhanglei1949:zl/feat-list
Draft

feat: Support List in storage#157
zhanglei1949 wants to merge 2 commits intoalibaba:mainfrom
zhanglei1949:zl/feat-list

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented Apr 1, 2026

Support Type List in storage.
Fix #159

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add List type support in storage layer with ListView and ListColumn

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add comprehensive List type support in storage layer
• Implement ListView and ListViewBuilder for efficient list serialization
• Create ListColumn for persistent list property storage
• Add conversion functions between execution Values and storage ListViews
• Support nested lists and both POD and variable-length element types
Diagram
flowchart LR
  A["Execution Value::LIST"] -->|ListViewToValue| B["ListView"]
  B -->|get_view| C["ListColumn"]
  C -->|set_value| D["ListViewBuilder"]
  D -->|finish_pod/varlen| E["Binary Blob"]
  E -->|storage| F["Persistent Storage"]
  A -->|ValueToListBlob| E
  G["Property"] -->|as_list_data| E
  E -->|from_list_data| G
Loading

Grey Divider

File Changes

1. include/neug/utils/property/list_view.h ✨ Enhancement +241/-0

New ListView and ListViewBuilder implementations

include/neug/utils/property/list_view.h


2. include/neug/utils/property/column.h ✨ Enhancement +199/-0

Add ListColumn class for list storage

include/neug/utils/property/column.h


3. include/neug/utils/property/property.h ✨ Enhancement +31/-0

Add list data accessors to Property class

include/neug/utils/property/property.h


View more (19)
4. include/neug/execution/common/types/value.h ✨ Enhancement +6/-0

Add ListViewToValue conversion function declaration

include/neug/execution/common/types/value.h


5. src/execution/common/types/value.cc ✨ Enhancement +122/-0

Implement list serialization and conversion functions

src/execution/common/types/value.cc


6. src/utils/pb_utils.cc ✨ Enhancement +66/-2

Add protobuf to list type conversion support

src/utils/pb_utils.cc


7. src/utils/property/column.cc ✨ Enhancement +3/-0

Register ListColumn in CreateColumn factory

src/utils/property/column.cc


8. src/utils/property/property.cc ✨ Enhancement +6/-0

Add list default value support

src/utils/property/property.cc


9. src/utils/yaml_utils.cc ✨ Enhancement +5/-0

Add YAML serialization for list types

src/utils/yaml_utils.cc


10. src/compiler/gopt/g_ddl_converter.cpp ✨ Enhancement +5/-5

Replace convertSimpleLogicalType with convertLogicalType

src/compiler/gopt/g_ddl_converter.cpp


11. src/compiler/gopt/g_expr_converter.cpp 🐞 Bug fix +2/-4

Allow empty lists in array conversion

src/compiler/gopt/g_expr_converter.cpp


12. tests/storage/test_list_column.cc 🧪 Tests +527/-0

Comprehensive test suite for list functionality

tests/storage/test_list_column.cc


13. tests/storage/CMakeLists.txt 🧪 Tests +3/-1

Register test_list_column test target

tests/storage/CMakeLists.txt


14. tools/python_bind/tests/test_ddl.py 🧪 Tests +15/-0

Add test for list type DDL and queries

tools/python_bind/tests/test_ddl.py


15. tools/python_bind/tests/test_lsqb.py Formatting +0/-1

Minor formatting cleanup

tools/python_bind/tests/test_lsqb.py


16. tools/python_bind/neug/proto/basic_type_pb2.py Formatting +0/-1

Remove unnecessary blank line in proto file

tools/python_bind/neug/proto/basic_type_pb2.py


17. tools/python_bind/neug/proto/common_pb2.py Formatting +0/-1

Remove unnecessary blank line in proto file

tools/python_bind/neug/proto/common_pb2.py


18. tools/python_bind/neug/proto/error_pb2.py Formatting +0/-1

Remove unnecessary blank line in proto file

tools/python_bind/neug/proto/error_pb2.py


19. tools/python_bind/neug/proto/expr_pb2.py Formatting +0/-1

Remove unnecessary blank line in proto file

tools/python_bind/neug/proto/expr_pb2.py


20. tools/python_bind/neug/proto/results_pb2.py Formatting +0/-1

Remove unnecessary blank line in proto file

tools/python_bind/neug/proto/results_pb2.py


21. tools/python_bind/neug/proto/type_pb2.py Formatting +0/-1

Remove unnecessary blank line in proto file

tools/python_bind/neug/proto/type_pb2.py


22. tools/python_bind/setup.py Formatting +0/-1

Remove unnecessary blank line in setup

tools/python_bind/setup.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 1, 2026

Code Review by Qodo

🐞 Bugs (7) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Dangling list blob view 🐞 Bug ≡ Correctness
Description
execution::value_to_property builds a temporary std::string blob for list values and returns
Property::from_list_data(blob), but Property stores list data as std::string_view so it dangles
immediately. This can cause use-after-free when inserting into storage (Table::insert ->
ColumnBase::set_any) and can corrupt stored list data or crash.
Code

src/execution/common/types/value.cc[R817-820]

+  case DataTypeId::kList: {
+    std::string blob = ValueToListBlob(value);
+    return Property::from_list_data(blob);
+  }
Evidence
The list Property stores only a string_view (no ownership), but value_to_property constructs the
blob as a local std::string and returns a Property that references it; later, insert paths store the
Property in vectors and then call set_any which reads the dangling view to memcpy into column
buffers.

src/execution/common/types/value.cc[817-820]
include/neug/utils/property/property.h[146-166]
src/execution/common/operators/insert/create_vertex.cc[60-81]
src/utils/property/table.cc[278-285]
include/neug/utils/property/column.h[573-575]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`Property` stores list blobs as `std::string_view`, but `execution::value_to_property` creates the blob as a temporary `std::string` and returns `Property::from_list_data(blob)`. The returned `Property` immediately holds a dangling pointer.

### Issue Context
List blobs are commonly produced dynamically (serialization from `Value::LIST`), so the system needs an ownership strategy. The current Property implementation uses `memcpy`-based assignment, so adding owning fields requires updating copy/assignment semantics.

### Fix Focus Areas
- src/execution/common/types/value.cc[817-820]
- include/neug/utils/property/property.h[146-166]
- src/utils/property/table.cc[278-285]

### Suggested fix
1. Introduce ownership for variable-length payloads (at minimum for `kList`, and ideally unify with `kVarchar`). Options:
  - Add an owning buffer inside `Property` (e.g., `std::shared_ptr<std::string>` or `std::string` + proper copy/move/assignment/destructor) and make `as_list_data()` return a view into owned storage.
  - Or redesign the insert path to avoid storing list blobs inside `Property` as a view (e.g., pass an owned blob alongside the Property vector).
2. Update `Property::operator=` and any serialization/deserialization code to handle the new ownership model safely (remove `memcpy`-based copying for non-trivial members).
3. Add tests that insert list properties via the normal execution/storage insert path to catch lifetime bugs (not just unit tests over builders).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. WAL cannot serialize lists 🐞 Bug ☼ Reliability
Description
WAL redo serialization writes Property values with arc << prop, but Property archive operators do
not handle DataTypeId::kList, so inserts/updates with list properties will throw during WAL writing
or fail during recovery ingestion. This breaks durability/recovery once list properties are used.
Code

src/utils/property/property.cc[R60-65]

+  case DataTypeId::kList:
+    // An empty list blob (no elements) serves as the default value for list
+    // properties.  ListView::size() returns 0 when the blob is shorter than
+    // 4 bytes, so an empty string_view is a valid representation.
+    default_value.set_list_data(std::string_view{});
+    break;
Evidence
WAL InsertVertexRedo serializes every Property via the Property operator<<, but that operator only
supports scalar types and throws for unsupported types; there is no kList case implemented, so any
list Property will fail serialization.

src/transaction/wal/wal.cc[374-383]
src/utils/property/property.cc[74-103]
include/neug/utils/property/property.h[146-158]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
WAL serialization uses `arc << Property`, but `Property` serialization/deserialization does not support `DataTypeId::kList`. This will throw when a list property is present.

### Issue Context
Redo logs serialize all properties on insert/update. With list properties enabled, WAL must be able to write/read list blobs.

### Fix Focus Areas
- src/utils/property/property.cc[74-160]
- src/transaction/wal/wal.cc[374-415]

### Suggested fix
1. Extend `InArchive& operator<<(InArchive&, const Property&)` to handle `DataTypeId::kList` by writing the raw list blob (`as_list_data()`) similarly to how varchar writes `as_string_view()`.
2. Extend `OutArchive& operator>>(OutArchive&, Property&)` to handle reading `kList` by reading a `std::string_view` blob and calling `set_list_data(...)` (or the new owning setter if you implement ownership).
3. Add a WAL roundtrip test that inserts a vertex with a list property, flushes WAL, and replays it to ensure list blobs survive recovery.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Missing type_traits include 🐞 Bug ≡ Correctness
Description
include/neug/utils/property/list_view.h uses std::is_trivially_copyable_v but does not include
<type_traits>, which can cause compilation failures depending on transitive include order. This
breaks header self-sufficiency.
Code

include/neug/utils/property/list_view.h[R167-170]

+  template <typename T>
+  void append_pod(const T& val) {
+    static_assert(std::is_trivially_copyable_v<T>,
+                  "append_pod requires a trivially copyable type");
Evidence
The header directly references std::is_trivially_copyable_v yet its include list omits
<type_traits>, so builds that include this header without other incidental includes will fail.

include/neug/utils/property/list_view.h[15-26]
include/neug/utils/property/list_view.h[166-173]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`list_view.h` uses `std::is_trivially_copyable_v` but doesn't include `<type_traits>`.

### Fix Focus Areas
- include/neug/utils/property/list_view.h[15-26]

### Suggested fix
Add:
```cpp
#include <type_traits>
```
near the other standard includes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
4. Missing POD list child types 🐞 Bug ≡ Correctness
Description
ValueToListBlob and ListViewToValue only support a subset of POD child types, but is_pod_type()
classifies int8/int16/uint8/uint16 as POD. Lists of these element types will fail conversion between
execution Values and storage blobs (either falling into the non-POD path and throwing, or being
treated as unsupported).
Code

src/execution/common/types/value.cc[R753-775]

+  if (is_pod_type(child_type.id())) {
+    switch (child_type.id()) {
+#define APPEND_POD(type_enum, cpp_type)           \
+  case DataTypeId::type_enum:                     \
+    for (const auto& c : children) {              \
+      builder.append_pod(c.GetValue<cpp_type>()); \
+    }                                             \
+    return builder.finish_pod<cpp_type>();
+      APPEND_POD(kBoolean, bool)
+      APPEND_POD(kInt32, int32_t)
+      APPEND_POD(kInt64, int64_t)
+      APPEND_POD(kUInt32, uint32_t)
+      APPEND_POD(kUInt64, uint64_t)
+      APPEND_POD(kFloat, float)
+      APPEND_POD(kDouble, double)
+      APPEND_POD(kDate, date_t)
+      APPEND_POD(kTimestampMs, timestamp_ms_t)
+      APPEND_POD(kInterval, interval_t)
+#undef APPEND_POD
+    default:
+      break;
+    }
+  }
Evidence
is_pod_type() returns true for kInt8/kInt16/kUInt8/kUInt16, but ValueToListBlob’s POD switch does
not handle them; ListViewToValue’s switch similarly omits them, so these list types cannot roundtrip
between execution and storage.

include/neug/utils/property/list_view.h[35-55]
src/execution/common/types/value.cc[752-790]
src/execution/common/types/value.cc[943-997]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`is_pod_type()` includes `kInt8`, `kInt16`, `kUInt8`, `kUInt16`, but `ValueToListBlob` / `ListViewToValue` don't implement these cases. This makes some list schemas unusable.

### Fix Focus Areas
- src/execution/common/types/value.cc[745-790]
- src/execution/common/types/value.cc[937-1000]
- include/neug/utils/property/list_view.h[35-55]

### Suggested fix
1. Add APPEND_POD cases in `ValueToListBlob` for:
  - `kInt8` -> `int8_t`
  - `kInt16` -> `int16_t`
  - `kUInt8` -> `uint8_t`
  - `kUInt16` -> `uint16_t`
2. Add corresponding cases in `ListViewToValue` to construct `Value::INT8/INT16/UINT8/UINT16` if those constructors exist; otherwise, implement them or decide to explicitly mark these child types unsupported and remove them from `is_pod_type()`.
3. Add unit tests for these missing element types.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. ListColumn ignores non-empty defaults 🐞 Bug ≡ Correctness
Description
ListColumn::resize(size, default_value) validates the type and then does not apply default_value to
newly-added rows, leaving them empty lists. This violates the ColumnBase contract used elsewhere
(TypedColumn fills new rows with the provided default) and breaks schema DEFAULT list values.
Code

include/neug/utils/property/column.h[R538-545]

+  void resize(size_t size, const Property& default_value) override {
+    if (default_value.type() != DataTypeId::kList &&
+        default_value.type() != DataTypeId::kEmpty) {
+      THROW_RUNTIME_ERROR("Default value type does not match list column");
+    }
+    resize(size);
+    // Leave entries zero-initialized (empty lists) for new slots.
+  }
Evidence
TypedColumn::resize(size, default_value) explicitly writes default_value into each newly added row,
and VertexTable grows storage by calling Table::resize(capacity, schema default values).
ListColumn’s implementation discards non-empty defaults, so newly allocated rows will not match the
schema defaults for list-typed properties.

include/neug/utils/property/column.h[140-153]
include/neug/utils/property/column.h[538-545]
src/storages/graph/vertex_table.cc[185-187]
src/utils/property/table.cc[295-305]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ListColumn::resize(size, default_value)` does not populate new rows with `default_value`, unlike other column implementations.

### Issue Context
Schema default values are used when tables grow (e.g., `EnsureCapacity`). For list columns, non-empty defaults will be silently lost.

### Fix Focus Areas
- include/neug/utils/property/column.h[538-545]
- include/neug/utils/property/column.h[554-575]
- src/storages/graph/vertex_table.cc[185-187]

### Suggested fix
1. Implement default filling for newly-added indices similar to `TypedColumn<T>::resize`:
  - Compute `old_size`, resize storage, then for i in [old_size, size): set items_ to reference the default blob.
2. To avoid duplicating blob bytes N times, you can store the default blob once in `data_` and set all new `items_[i]` to the same `{offset,length}`.
3. If the default blob is empty, leave the zero-initialized entries as-is.
4. Add a unit/integration test that defines a list column with a non-empty DEFAULT and verifies inserted rows (or resized capacity rows) read back that default.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

6. ListView lacks blob validation 🐞 Bug ☼ Reliability
Description
ListView parses raw bytes with reinterpret_cast and uses only asserts for bounds;
malformed/corrupted blobs can cause out-of-bounds reads or undefined behavior (including unaligned
loads). This can crash the process when reading corrupted storage or untrusted inputs.
Code

include/neug/utils/property/list_view.h[R93-139]

+  // Number of elements stored in this list.
+  size_t size() const {
+    if (data_.size() < sizeof(uint32_t)) {
+      return 0;
+    }
+    return static_cast<size_t>(
+        *reinterpret_cast<const uint32_t*>(data_.data()));
+  }
+
+  // Direct element access for POD child types.
+  // The template argument T must match the actual element C++ type.
+  template <typename T>
+  const T& GetElem(size_t idx) const {
+    assert(is_pod_type(ListType::GetChildType(type_).id()));
+    assert(idx < size());
+    return reinterpret_cast<const T*>(data_.data() + sizeof(uint32_t))[idx];
+  }
+
+  // Element access for varchar (string) child type.
+  std::string_view GetChildStringView(size_t idx) const {
+    return child_span(idx);
+  }
+
+  // Element access for a nested List child type.
+  // The returned ListView borrows from the same underlying buffer.
+  ListView GetChildListView(size_t idx) const {
+    const DataType& child = ListType::GetChildType(type_);
+    return ListView(child, child_span(idx));
+  }
+
+  const DataType& type_;
+  std::string_view data_;
+
+ private:
+  // Returns the raw byte span for the variable-length element at index idx.
+  // Must only be called for non-POD encodings.
+  std::string_view child_span(size_t idx) const {
+    assert(!data_.empty());
+    const uint32_t count = *reinterpret_cast<const uint32_t*>(data_.data());
+    assert(idx < static_cast<size_t>(count));
+    const uint32_t* offsets =
+        reinterpret_cast<const uint32_t*>(data_.data() + sizeof(uint32_t));
+    // data region starts right after the (count+1) offsets
+    const char* data_start =
+        data_.data() + sizeof(uint32_t) + (count + 1) * sizeof(uint32_t);
+    return std::string_view(data_start + offsets[idx],
+                            offsets[idx + 1] - offsets[idx]);
Evidence
ListView reads count and offsets via reinterpret_cast<const uint32_t*> and then indexes into the
buffer without verifying that the buffer length is sufficient for the declared count/offsets. These
checks are only asserts, so release builds may read past the end of the buffer on corrupted data.

include/neug/utils/property/list_view.h[93-109]
include/neug/utils/property/list_view.h[129-140]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ListView` does not validate blob size/layout before reading count/offsets/elements; it relies on `assert` and performs potentially unaligned `reinterpret_cast` loads.

### Fix Focus Areas
- include/neug/utils/property/list_view.h[93-140]

### Suggested fix
1. Replace direct dereferences of `reinterpret_cast<uint32_t*>` with `std::memcpy` into a local `uint32_t` to avoid alignment UB.
2. Add explicit size checks:
  - Ensure `data_.size() >= 4` before reading count.
  - For POD: ensure `data_.size() >= 4 + count*sizeof(T)`.
  - For varlen: ensure `data_.size() >= 4 + (count+1)*4`, then validate `off[count]` and each `off[i]` is monotonic and within bounds.
3. Decide on behavior for invalid blobs (throw exception vs. return empty list) and apply consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. List length truncation risk 🐞 Bug ☼ Reliability
Description
ListColumn stores blob length in a uint32_t without checking overflow, so blobs larger than 4GiB
will wrap/truncate and later reads will view incorrect spans. This can lead to corrupted reads and
potential memory errors.
Code

include/neug/utils/property/column.h[R569-571]

+    items_.set(idx, {static_cast<uint64_t>(offset),
+                     static_cast<uint32_t>(blob.size())});
+  }
Evidence
set_value casts blob.size() to uint32_t when writing list_storage_item.length, but the API accepts
arbitrary std::string_view sizes; without validation, large lists can be stored with incorrect
length metadata.

include/neug/utils/property/column.h[436-440]
include/neug/utils/property/column.h[554-571]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ListColumn` truncates blob sizes to `uint32_t`.

### Fix Focus Areas
- include/neug/utils/property/column.h[436-440]
- include/neug/utils/property/column.h[554-571]

### Suggested fix
Either:
1. Add an explicit guard in `set_value`:
  - If `blob.size() > std::numeric_limits<uint32_t>::max()`, throw a runtime error.

Or:
2. Change `list_storage_item.length` to `uint64_t` (and update serialization/layout accordingly) if you want to support blobs >4GiB.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@zhanglei1949 zhanglei1949 marked this pull request as draft April 1, 2026 08:43
@zhanglei1949
Copy link
Copy Markdown
Member Author

zhanglei1949 commented Apr 1, 2026

Currently the implementation is finished.
@shirly121 Could you please implement the compiler side code? Currently, it fails at compiler side for the test:
python3 -m pytest -sv tests/test_ddl.py -k test_list_type

E           RuntimeError: Failed to execute query: CREATE (:TestNode {id: 1, tags: ['tag1', 'tag2']});. Error code: 1011, Error Message: ERR_NOT_SUPPORTED: Not supported: Unexpected type: 100 at /data/zhanglei/alibaba/neug/src/execution/common/types/value.cc:822 func: value_to_property, Error Code: 1011

It turns out the it used the to_tuple message: https://github.com/alibaba/neug/blob/main/src/compiler/gopt/g_expr_converter.cpp#L717

Comment on lines +817 to +820
case DataTypeId::kList: {
std::string blob = ValueToListBlob(value);
return Property::from_list_data(blob);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Dangling list blob view 🐞 Bug ≡ Correctness

execution::value_to_property builds a temporary std::string blob for list values and returns
Property::from_list_data(blob), but Property stores list data as std::string_view so it dangles
immediately. This can cause use-after-free when inserting into storage (Table::insert ->
ColumnBase::set_any) and can corrupt stored list data or crash.
Agent Prompt
### Issue description
`Property` stores list blobs as `std::string_view`, but `execution::value_to_property` creates the blob as a temporary `std::string` and returns `Property::from_list_data(blob)`. The returned `Property` immediately holds a dangling pointer.

### Issue Context
List blobs are commonly produced dynamically (serialization from `Value::LIST`), so the system needs an ownership strategy. The current Property implementation uses `memcpy`-based assignment, so adding owning fields requires updating copy/assignment semantics.

### Fix Focus Areas
- src/execution/common/types/value.cc[817-820]
- include/neug/utils/property/property.h[146-166]
- src/utils/property/table.cc[278-285]

### Suggested fix
1. Introduce ownership for variable-length payloads (at minimum for `kList`, and ideally unify with `kVarchar`). Options:
   - Add an owning buffer inside `Property` (e.g., `std::shared_ptr<std::string>` or `std::string` + proper copy/move/assignment/destructor) and make `as_list_data()` return a view into owned storage.
   - Or redesign the insert path to avoid storing list blobs inside `Property` as a view (e.g., pass an owned blob alongside the Property vector).
2. Update `Property::operator=` and any serialization/deserialization code to handle the new ownership model safely (remove `memcpy`-based copying for non-trivial members).
3. Add tests that insert list properties via the normal execution/storage insert path to catch lifetime bugs (not just unit tests over builders).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +60 to +65
case DataTypeId::kList:
// An empty list blob (no elements) serves as the default value for list
// properties. ListView::size() returns 0 when the blob is shorter than
// 4 bytes, so an empty string_view is a valid representation.
default_value.set_list_data(std::string_view{});
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Wal cannot serialize lists 🐞 Bug ☼ Reliability

WAL redo serialization writes Property values with arc << prop, but Property archive operators do
not handle DataTypeId::kList, so inserts/updates with list properties will throw during WAL writing
or fail during recovery ingestion. This breaks durability/recovery once list properties are used.
Agent Prompt
### Issue description
WAL serialization uses `arc << Property`, but `Property` serialization/deserialization does not support `DataTypeId::kList`. This will throw when a list property is present.

### Issue Context
Redo logs serialize all properties on insert/update. With list properties enabled, WAL must be able to write/read list blobs.

### Fix Focus Areas
- src/utils/property/property.cc[74-160]
- src/transaction/wal/wal.cc[374-415]

### Suggested fix
1. Extend `InArchive& operator<<(InArchive&, const Property&)` to handle `DataTypeId::kList` by writing the raw list blob (`as_list_data()`) similarly to how varchar writes `as_string_view()`.
2. Extend `OutArchive& operator>>(OutArchive&, Property&)` to handle reading `kList` by reading a `std::string_view` blob and calling `set_list_data(...)` (or the new owning setter if you implement ownership).
3. Add a WAL roundtrip test that inserts a vertex with a list property, flushes WAL, and replays it to ensure list blobs survive recovery.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +167 to +170
template <typename T>
void append_pod(const T& val) {
static_assert(std::is_trivially_copyable_v<T>,
"append_pod requires a trivially copyable type");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Missing type_traits include 🐞 Bug ≡ Correctness

include/neug/utils/property/list_view.h uses std::is_trivially_copyable_v but does not include
<type_traits>, which can cause compilation failures depending on transitive include order. This
breaks header self-sufficiency.
Agent Prompt
### Issue description
`list_view.h` uses `std::is_trivially_copyable_v` but doesn't include `<type_traits>`.

### Fix Focus Areas
- include/neug/utils/property/list_view.h[15-26]

### Suggested fix
Add:
```cpp
#include <type_traits>
```
near the other standard includes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +753 to +775
if (is_pod_type(child_type.id())) {
switch (child_type.id()) {
#define APPEND_POD(type_enum, cpp_type) \
case DataTypeId::type_enum: \
for (const auto& c : children) { \
builder.append_pod(c.GetValue<cpp_type>()); \
} \
return builder.finish_pod<cpp_type>();
APPEND_POD(kBoolean, bool)
APPEND_POD(kInt32, int32_t)
APPEND_POD(kInt64, int64_t)
APPEND_POD(kUInt32, uint32_t)
APPEND_POD(kUInt64, uint64_t)
APPEND_POD(kFloat, float)
APPEND_POD(kDouble, double)
APPEND_POD(kDate, date_t)
APPEND_POD(kTimestampMs, timestamp_ms_t)
APPEND_POD(kInterval, interval_t)
#undef APPEND_POD
default:
break;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. Missing pod list child types 🐞 Bug ≡ Correctness

ValueToListBlob and ListViewToValue only support a subset of POD child types, but is_pod_type()
classifies int8/int16/uint8/uint16 as POD. Lists of these element types will fail conversion between
execution Values and storage blobs (either falling into the non-POD path and throwing, or being
treated as unsupported).
Agent Prompt
### Issue description
`is_pod_type()` includes `kInt8`, `kInt16`, `kUInt8`, `kUInt16`, but `ValueToListBlob` / `ListViewToValue` don't implement these cases. This makes some list schemas unusable.

### Fix Focus Areas
- src/execution/common/types/value.cc[745-790]
- src/execution/common/types/value.cc[937-1000]
- include/neug/utils/property/list_view.h[35-55]

### Suggested fix
1. Add APPEND_POD cases in `ValueToListBlob` for:
   - `kInt8` -> `int8_t`
   - `kInt16` -> `int16_t`
   - `kUInt8` -> `uint8_t`
   - `kUInt16` -> `uint16_t`
2. Add corresponding cases in `ListViewToValue` to construct `Value::INT8/INT16/UINT8/UINT16` if those constructors exist; otherwise, implement them or decide to explicitly mark these child types unsupported and remove them from `is_pod_type()`.
3. Add unit tests for these missing element types.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +538 to +545
void resize(size_t size, const Property& default_value) override {
if (default_value.type() != DataTypeId::kList &&
default_value.type() != DataTypeId::kEmpty) {
THROW_RUNTIME_ERROR("Default value type does not match list column");
}
resize(size);
// Leave entries zero-initialized (empty lists) for new slots.
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

5. Listcolumn ignores non-empty defaults 🐞 Bug ≡ Correctness

ListColumn::resize(size, default_value) validates the type and then does not apply default_value to
newly-added rows, leaving them empty lists. This violates the ColumnBase contract used elsewhere
(TypedColumn fills new rows with the provided default) and breaks schema DEFAULT list values.
Agent Prompt
### Issue description
`ListColumn::resize(size, default_value)` does not populate new rows with `default_value`, unlike other column implementations.

### Issue Context
Schema default values are used when tables grow (e.g., `EnsureCapacity`). For list columns, non-empty defaults will be silently lost.

### Fix Focus Areas
- include/neug/utils/property/column.h[538-545]
- include/neug/utils/property/column.h[554-575]
- src/storages/graph/vertex_table.cc[185-187]

### Suggested fix
1. Implement default filling for newly-added indices similar to `TypedColumn<T>::resize`:
   - Compute `old_size`, resize storage, then for i in [old_size, size): set items_ to reference the default blob.
2. To avoid duplicating blob bytes N times, you can store the default blob once in `data_` and set all new `items_[i]` to the same `{offset,length}`.
3. If the default blob is empty, leave the zero-initialized entries as-is.
4. Add a unit/integration test that defines a list column with a non-empty DEFAULT and verifies inserted rows (or resized capacity rows) read back that default.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Committed-by: Xiaoli Zhou from Dev container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce support for List type in storage.

2 participants