Skip to content

refactor: Introduce workspace, module, checkpoint#148

Open
zhanglei1949 wants to merge 3 commits intoalibaba:mainfrom
zhanglei1949:zl/workspace-ref
Open

refactor: Introduce workspace, module, checkpoint#148
zhanglei1949 wants to merge 3 commits intoalibaba:mainfrom
zhanglei1949:zl/workspace-ref

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented Mar 31, 2026

Refactor the management of database files.
Fix #133
FIx #125
Fix #271
Fix #262

Open & Close LDBC SNB sf300 takes 13mins

@zhanglei1949 zhanglei1949 marked this pull request as draft March 31, 2026 06:10
@qodo-code-review
Copy link
Copy Markdown

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Review Summary by Qodo

Refactor storage management with Module abstraction, Workspace/Checkpoint framework, and ModuleFactory registry

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Introduces comprehensive module abstraction layer with Module base class defining unified
  lifecycle interface (Open, Dump, Close, Fork) for all storage components
• Implements ModuleFactory singleton with type-erased instantiation and registration mechanism via
  NEUG_REGISTER_MODULE macro
• Introduces Workspace and Checkpoint abstractions for managing numbered checkpoint directories
  with snapshot/runtime/wal subdirectories
• Implements ModuleDescriptor metadata structure with JSON serialization for module inventory
  tracking
• Introduces SnapshotMeta module managing checkpoint's module inventory and persisting metadata as
  JSON
• Refactors PropertyGraph, VertexTable, EdgeTable, Schema, CsrBase, ColumnBase, and
  LFIndexer to inherit from Module and implement the unified lifecycle interface
• Registers 78+ template specializations (MutableCsr, ImmutableCsr, TypedColumn) with ModuleFactory
  covering all data types
• Adds StorageTypeName template trait for C++ type to string mapping with macro-based
  specialization
• Refactors NeugDB to use Workspace for checkpoint discovery and opens graphs from latest
  checkpoint
• Updates all storage tests and benchmarks to use new Checkpoint-based API instead of legacy
  directory management
• Creates new CMake target neug_storages_module for module factory and base classes
Diagram
flowchart LR
  A["Module Base Class<br/>Open/Dump/Close/Fork"] --> B["ModuleFactory<br/>Registry & Creation"]
  C["Workspace<br/>Checkpoint Management"] --> D["Checkpoint<br/>Numbered Directories"]
  D --> E["SnapshotMeta<br/>Module Inventory"]
  E --> F["Schema<br/>VertexTable<br/>EdgeTable"]
  F --> G["CsrBase<br/>ColumnBase<br/>LFIndexer"]
  G --> B
  H["PropertyGraph"] --> C
  H --> A
  I["NeugDB"] --> H
Loading

Grey Divider

File Changes

1. include/neug/storages/module/module.h ✨ Enhancement +106/-0

Abstract Module interface for storage lifecycle management

• Introduces abstract Module base class defining unified lifecycle interface for storage
 components
• Declares four core operations: Open, Dump, Close, and Fork driven by Checkpoint context
• Provides forward declaration of Checkpoint to avoid circular includes
• Includes ModuleTypeName() virtual method for factory registration

include/neug/storages/module/module.h


2. include/neug/storages/module/module_factory.h ✨ Enhancement +147/-0

Module factory registry for dynamic type instantiation

• Introduces ModuleFactory singleton for type-erased Module instantiation
• Provides registration mechanism via NEUG_REGISTER_MODULE macro
• Implements Create, List, and Has methods for module discovery
• Includes ModuleRegistrar RAII helper for static registration

include/neug/storages/module/module_factory.h


3. src/storages/module/module_factory.cc ✨ Enhancement +55/-0

ModuleFactory implementation with registration support

• Implements ModuleFactory singleton instance and registration logic
• Provides Create method to instantiate modules by type name
• Implements List and Has methods for module enumeration

src/storages/module/module_factory.cc


View more (34)
4. include/neug/storages/workspace.h ✨ Enhancement +177/-0

Workspace and Checkpoint abstractions for checkpoint management

• Introduces Checkpoint class representing a single numbered checkpoint directory with
 snapshot/runtime/wal subdirs
• Introduces Workspace class managing multiple numbered checkpoints in a database directory
• Provides checkpoint creation, listing, and retrieval operations
• Includes generate_uuid() helper for creating unique snapshot directories

include/neug/storages/workspace.h


5. src/storages/graph/workspace.cc ✨ Enhancement +125/-0

Workspace and Checkpoint implementation

• Implements Checkpoint directory creation and metadata loading/saving
• Implements Workspace checkpoint enumeration and creation with zero-padded ID formatting
• Provides LatestCheckpoint() to retrieve the most recent checkpoint

src/storages/graph/workspace.cc


6. include/neug/storages/module_descriptor.h ✨ Enhancement +181/-0

ModuleDescriptor metadata structure with JSON serialization

• Introduces ModuleDescriptor struct carrying metadata for Module serialization/deserialization
• Supports extra key-value pairs and nested sub-module descriptors
• Provides JSON serialization via rapidjson with ToJson/FromJson methods
• Uses pimpl pattern for self-referential sub_modules map

include/neug/storages/module_descriptor.h


7. src/storages/graph/module_descriptor.cc ✨ Enhancement +226/-0

ModuleDescriptor serialization and lifecycle implementation

• Implements ModuleDescriptor Rule-of-5 (copy/move constructors and assignment)
• Implements JSON serialization/deserialization with rapidjson
• Implements sub-module accessor methods with lazy initialization

src/storages/graph/module_descriptor.cc


8. include/neug/storages/snapshot_meta.h ✨ Enhancement +144/-0

SnapshotMeta module for checkpoint inventory management

• Introduces SnapshotMeta class as Module that manages checkpoint's module inventory
• Provides canonical key helpers for schema, vertex tables, and edge tables
• Implements JSON persistence at checkpoint meta path

include/neug/storages/snapshot_meta.h


9. src/storages/graph/snapshot_meta.cc ✨ Enhancement +145/-0

SnapshotMeta implementation with JSON persistence

• Implements SnapshotMeta module lifecycle: Open loads JSON, Dump persists JSON
• Implements module map access methods and canonical key generation
• Registers SnapshotMeta with ModuleFactory

src/storages/graph/snapshot_meta.cc


10. include/neug/storages/graph/vertex_table.h ✨ Enhancement +40/-6

VertexTable Module interface integration

• Makes VertexTable inherit from Module base class
• Adds snapshot_dir_ member to track snapshot directory location
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds DumpToDir() for legacy flat-directory dumping

include/neug/storages/graph/vertex_table.h


11. src/storages/graph/vertex_table.cc ✨ Enhancement +86/-3

VertexTable Module implementation with Dump/Fork

• Implements Dump(ckp) creating UUID subdirectory under checkpoint runtime dir
• Implements DumpToDir() for legacy flat-directory dumps
• Implements Fork() copying snapshot files and creating independent instance
• Registers VertexTable with ModuleFactory
• Tracks snapshot_dir_ in Open() methods

src/storages/graph/vertex_table.cc


12. include/neug/storages/graph/edge_table.h ✨ Enhancement +31/-4

EdgeTable Module interface integration

• Makes EdgeTable inherit from Module base class
• Adds snapshot_dir_ member to track snapshot directory
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds DumpToDir() for legacy flat-directory dumping

include/neug/storages/graph/edge_table.h


13. src/storages/graph/edge_table.cc ✨ Enhancement +57/-1

EdgeTable Module implementation with Dump/Fork

• Implements Dump(ckp) creating UUID subdirectory and returning descriptor
• Implements DumpToDir() for legacy flat-directory dumps
• Implements Fork() forking CSR and table components independently
• Implements Close() method
• Registers EdgeTable with ModuleFactory
• Tracks snapshot_dir_ in move constructor and Swap()

src/storages/graph/edge_table.cc


14. include/neug/storages/graph/property_graph.h ✨ Enhancement +10/-0

PropertyGraph Checkpoint-based Open method

• Adds new Open(ckp, memory_level) overload accepting Checkpoint
• Enables opening graph from checkpoint via Module interface

include/neug/storages/graph/property_graph.h


15. src/storages/graph/property_graph.cc ✨ Enhancement +139/-56

PropertyGraph Checkpoint-based Open and refactored Dump

• Implements Open(ckp, memory_level) loading SnapshotMeta and opening schema/vertex/edge tables
• Refactors Dump() to use Workspace/Checkpoint and SnapshotMeta instead of temp directories
• Replaces temp checkpoint directory logic with UUID-based runtime directories
• Persists SnapshotMeta JSON instead of separate schema file

src/storages/graph/property_graph.cc


16. include/neug/storages/graph/schema.h ✨ Enhancement +32/-1

Schema Module interface methods

• Adds Module interface methods: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)

include/neug/storages/graph/schema.h


17. src/storages/graph/schema.cc ✨ Enhancement +52/-0

Schema Module implementation with binary/YAML persistence

• Implements Open() deserializing binary schema from descriptor path
• Implements Dump() creating UUID directory and writing binary + YAML schema
• Implements Close() as no-op for in-memory object
• Implements Fork() returning copy of schema
• Registers Schema with ModuleFactory

src/storages/graph/schema.cc


18. include/neug/storages/csr/csr_base.h ✨ Enhancement +47/-1

CsrBase Module interface integration

• Makes CsrBase inherit from Module base class
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds abstract ForkAsCsr() method for typed fork implementations

include/neug/storages/csr/csr_base.h


19. include/neug/storages/csr/mutable_csr.h ✨ Enhancement +36/-0

MutableCsr Module Fork and type name implementations

• Implements ForkAsCsr() for MutableCsr, SingleMutableCsr, and EmptyCsr templates
• Implements ModuleTypeName() returning type-specific names like "mutable_csr_int64"

include/neug/storages/csr/mutable_csr.h


20. src/storages/csr/mutable_csr.cc ✨ Enhancement +77/-0

MutableCsr ModuleFactory registrations for all types

• Registers 33 CSR template specializations with ModuleFactory (MutableCsr, SingleMutableCsr,
 EmptyCsr)
• Covers all data types: empty, bool, int32/64, uint32/64, float, double, Date, DateTime, Interval

src/storages/csr/mutable_csr.cc


21. src/storages/csr/immutable_csr.cc ✨ Enhancement +55/-0

ImmutableCsr ModuleFactory registrations for all types

• Registers 33 ImmutableCsr template specializations with ModuleFactory
• Covers all data types: empty, bool, int32/64, uint32/64, float, double, Date, DateTime, Interval

src/storages/csr/immutable_csr.cc


22. include/neug/utils/property/column.h ✨ Enhancement +94/-1

ColumnBase Module interface integration

• Makes ColumnBase inherit from Module base class
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds abstract ForkAsColumn() method for typed implementations
• Adds opened_prefix_ and opened_dir_ members for tracking state

include/neug/utils/property/column.h


23. include/neug/utils/property/table.cc ✨ Enhancement +0/-0

Table Fork implementation and TypedColumn registrations

• Implements Fork() creating independent table copy with forked columns
• Registers 12 TypedColumn specializations with ModuleFactory
• Covers all types: empty, bool, int32/64, uint32/64, float, double, Date, DateTime, Interval,
 string

include/neug/utils/property/table.cc


24. include/neug/utils/id_indexer.h ✨ Enhancement +106/-1

LFIndexer Module interface integration

• Makes LFIndexer inherit from Module base class
• Implements Module interface: Open(ckp, desc), Dump(ckp), Fork(ckp, level)
• Adds name_, snapshot_dir_, and memory_level_ members
• Implements set_name() and ModuleTypeName() methods

include/neug/utils/id_indexer.h


25. src/main/neug_db.cc ✨ Enhancement +11/-1

NeugDB checkpoint-based graph opening

• Modifies openGraphAndSchema() to use Workspace for checkpoint discovery
• Opens graph from latest checkpoint if available, otherwise opens empty graph
• Logs checkpoint path when opening from checkpoint

src/main/neug_db.cc


26. tests/storage/test_vertex_table.cc 🧪 Tests +28/-44

VertexTable tests refactored for Checkpoint API

• Replaces manual checkpoint directory creation with Workspace::CreateCheckpoint()
• Updates tests to use ModuleDescriptor and Checkpoint-based Open(ckp, desc) API
• Removes manual temp/checkpoint directory renaming logic
• Simplifies test setup by using workspace abstraction

tests/storage/test_vertex_table.cc


27. tests/storage/test_edge_table.cc 🧪 Tests +16/-3

EdgeTable tests refactored for Checkpoint API

• Updates OpenEdgeTable() to use Workspace and Checkpoint-based API
• Replaces legacy Open(work_dir) calls with Open(ckp, desc)
• Changes Dump() call to DumpToDir() for legacy flat-directory dumping

tests/storage/test_edge_table.cc


28. tests/unittest/test_indexer.cc 🧪 Tests +34/-35

LFIndexer tests refactored for Checkpoint API

• Replaces manual checkpoint directory setup with Workspace::CreateCheckpoint()
• Updates indexer test to use Dump(ckp) and Open(ckp, desc) Module interface
• Removes legacy file_names.h include, adds workspace.h include
• Simplifies test directory structure

tests/unittest/test_indexer.cc


29. tests/storage/vertex_table_benchmark.cc 🧪 Tests +9/-5

VertexTable benchmark updated for Checkpoint API

• Updates benchmark to use Checkpoint and ModuleDescriptor for table opening
• Calls ckp.create_dirs() to initialize checkpoint structure

tests/storage/vertex_table_benchmark.cc


30. src/storages/module/CMakeLists.txt ⚙️ Configuration changes +2/-0

Module storage CMake build configuration

• Creates new CMake target neug_storages_module for module factory and base classes

src/storages/module/CMakeLists.txt


31. include/neug/storages/csr/immutable_csr.h ✨ Enhancement +29/-0

Add module fork and type name methods to CSR classes

• Added include for neug/storages/module/type_name.h to support type name generation
• Implemented ForkAsCsr() method in ImmutableCsr class that creates a UUID-based copy and opens
 it
• Implemented ModuleTypeName() method in ImmutableCsr returning type-specific identifier
• Implemented ForkAsCsr() and ModuleTypeName() methods in SingleImmutableCsr class with
 similar functionality

include/neug/storages/csr/immutable_csr.h


32. include/neug/storages/graph/schema.h ✨ Enhancement +32/-1

Make Schema inherit from Module with lifecycle methods

• Added include for neug/storages/module/module.h to inherit from Module base class
• Changed Schema class to inherit from Module base class
• Implemented module lifecycle methods: Open(), Dump(), Close(), and Fork()
• Added ModuleTypeName() override returning "schema" identifier

include/neug/storages/graph/schema.h


33. include/neug/storages/module/type_name.h ✨ Enhancement +60/-0

New type name mapping trait for storage element types

• Created new header file with StorageTypeName template trait for C++ type to string mapping
• Defined DEFINE_STORAGE_TYPE_NAME macro for specializing type names
• Provided specializations for common types: EmptyType, bool, integer types, floating-point
 types, Date, DateTime, Interval, and std::string_view

include/neug/storages/module/type_name.h


34. include/neug/utils/property/table.h ✨ Enhancement +6/-0

Add Fork method to Table class for column duplication

• Added Fork() method declaration that creates forked property columns to UUID paths under
 checkpoint runtime directory
• Method returns a new Table owning the forked columns

include/neug/utils/property/table.h


35. src/storages/CMakeLists.txt ⚙️ Configuration changes +2/-1

Add module subdirectory to CMake build configuration

• Added add_subdirectory(module) to include the new module subdirectory in build
• Updated NEUG_STORAGES_OBJFILES to include $<TARGET_OBJECTS:neug_storages_module> object files

src/storages/CMakeLists.txt


36. include/neug/storages/file_names.h Additional files +0/-4

...

include/neug/storages/file_names.h


37. src/utils/property/table.cc Additional files +52/-0

...

src/utils/property/table.cc


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 31, 2026

Code Review by Qodo

🐞 Bugs (7) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Checkpoint restore drops data 🐞 Bug ≡ Correctness
Description
PropertyGraph::Open(const Checkpoint&) ignores SnapshotMeta descriptor.path and calls legacy
VertexTable/EdgeTable Open with work_dir=ckp.path(), so they look for data under
checkpoint_dir(ckp.path()) ("…/checkpoint-�NNNNN/checkpoint/") instead of the dumped UUID
directories, yielding empty tables on restart.
Code

src/storages/graph/property_graph.cc[R825-866]

+    vertex_tables_.emplace_back(schema_.get_vertex_schema(i));
+    auto v_label = schema_.get_vertex_label_name(static_cast<label_t>(i));
+    auto key = SnapshotMeta::vertex_table_key(v_label);
+    if (auto desc = meta.module(key)) {
+      vertex_tables_.back().Open(ckp.path(), desc->memory_level);
+    }
+    auto v_size = vertex_tables_[i].Size();
+    vertex_tables_[i].EnsureCapacity(v_size < 4096 ? 4096
+                                                   : v_size + v_size / 4);
+    vertex_capacities[i] = vertex_tables_[i].Capacity();
+  }
+
+  // ── Edge tables ───────────────────────────────────────────────────────────
+  for (size_t src_i = 0; src_i < vertex_label_total_count_; ++src_i) {
+    if (!schema_.vertex_label_valid(src_i))
+      continue;
+    auto src_label = schema_.get_vertex_label_name(static_cast<label_t>(src_i));
+    for (size_t dst_i = 0; dst_i < vertex_label_total_count_; ++dst_i) {
+      if (!schema_.vertex_label_valid(dst_i))
+        continue;
+      auto dst_label =
+          schema_.get_vertex_label_name(static_cast<label_t>(dst_i));
+      for (size_t e_i = 0; e_i < edge_label_total_count_; ++e_i) {
+        if (!schema_.edge_label_valid(e_i))
+          continue;
+        auto edge_label =
+            schema_.get_edge_label_name(static_cast<label_t>(e_i));
+        if (!schema_.exist(src_label, dst_label, edge_label))
+          continue;
+        size_t index = schema_.generate_edge_label(src_i, dst_i, e_i);
+        auto key =
+            SnapshotMeta::edge_table_key(src_label, edge_label, dst_label);
+        EdgeTable et(schema_.get_edge_schema(src_i, dst_i, e_i));
+        if (auto desc = meta.module(key)) {
+          et.Open(ckp.path(), desc->memory_level);
+        }
+        auto e_size = et.Size();
+        size_t e_cap = e_size < 4096 ? 4096 : e_size + (e_size + 4) / 5;
+        et.EnsureCapacity(vertex_capacities[src_i], vertex_capacities[dst_i],
+                          e_cap);
+        edge_tables_.emplace(index, std::move(et));
+      }
Evidence
PropertyGraph checkpoint-open loads Schema from SnapshotMeta, but for vertex/edge tables it calls
the legacy Open(work_dir, level) with work_dir=ckp.path(), and does not use the descriptor.path that
Dump stored. Legacy Open derives snapshot paths via checkpoint_dir(work_dir), which (per
file_names.h) appends "/checkpoint/" to work_dir, so it will not find data dumped under
ckp.runtime_dir()/UUID. NeugDB now prefers LatestCheckpoint() at startup, amplifying this into
user-visible silent data loss after a restart.

src/storages/graph/property_graph.cc[795-866]
src/storages/graph/property_graph.cc[1078-1156]
src/storages/graph/vertex_table.cc[83-110]
src/main/neug_db.cc[228-244]
include/neug/storages/file_names.h[157-184]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`PropertyGraph::Open(const Checkpoint&)` currently restores vertex/edge tables by calling the legacy `Open(work_dir, memory_level)` with `work_dir=ckp.path()`, ignoring the `ModuleDescriptor` paths saved in `SnapshotMeta`. This makes the restore path search in the wrong directory (`checkpoint_dir(ckp.path())`) and results in empty vertex/edge data after restart.

### Issue Context
- `Dump()` persists module descriptors into `SnapshotMeta`, and some Dumps write into UUID dirs under `ckp.runtime_dir()`.
- Legacy `Open(work_dir, level)` expects a *workspace-style* root and derives snapshot locations using `checkpoint_dir(work_dir)`.

### Fix Focus Areas
- src/storages/graph/property_graph.cc[795-866]
- include/neug/storages/graph/vertex_table.h[124-136]
- include/neug/storages/graph/edge_table.h[54-66]
- src/storages/graph/vertex_table.cc[26-61]
- src/storages/graph/edge_table.cc[557-608]

### What to change
1. Implement `VertexTable::Open(const Checkpoint&, const ModuleDescriptor&)` and `EdgeTable::Open(const Checkpoint&, const ModuleDescriptor&)` to open from `descriptor.path` (the snapshot directory) + checkpoint runtime context.
2. In `PropertyGraph::Open(const Checkpoint&)`, call those checkpoint-based `Open(ckp, *desc)` methods instead of the legacy `Open(ckp.path(), desc->memory_level)`.
3. Ensure `Dump()` and `Open()` agree on whether snapshot data lives under `ckp.snapshot_dir()` or UUID dirs under `ckp.runtime_dir()`; pick one convention and apply consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. tmp_dir runtime nesting 🐞 Bug ☼ Reliability
Description
Code passes Checkpoint::runtime_dir() into file_names.h tmp_dir(), but tmp_dir() itself appends
"/runtime/tmp/"; this creates paths like ".../runtime/runtime/tmp" and breaks tmp cleanup/creation
for checkpoint opens and dumps.
Code

src/storages/graph/property_graph.cc[R811-816]

+  // Clean up any leftover tmp dir inside the checkpoint's runtime dir
+  auto tmp = tmp_dir(ckp.runtime_dir());
+  if (std::filesystem::exists(tmp)) {
+    remove_directory(tmp);
+  }
+  std::filesystem::create_directories(tmp_dir(ckp.runtime_dir()));
Evidence
tmp_dir(work_dir) is defined as runtime_dir(work_dir) + "tmp/", and runtime_dir(work_dir)
appends "/runtime/". Checkpoint::runtime_dir() already returns path_ + "/runtime" (a runtime
directory), so tmp_dir(ckp.runtime_dir()) becomes ckp.path()/runtime/runtime/tmp/. This same
misuse also exists in LFIndexer::Open, causing it to create and use nested runtime directories.

src/storages/graph/property_graph.cc[811-817]
src/storages/graph/property_graph.cc[1149-1152]
include/neug/storages/file_names.h[169-184]
include/neug/storages/workspace.h[68-75]
include/neug/utils/id_indexer.h[250-263]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`tmp_dir()`/`runtime_dir()` helpers in `file_names.h` expect a *work_dir root*, but several new checkpoint-based paths pass `ckp.runtime_dir()` into them, producing `.../runtime/runtime/tmp` and causing tmp directory management to break.

### Issue Context
- `Checkpoint::runtime_dir()` returns `.../checkpoint-NNNNN/runtime`.
- `tmp_dir(work_dir)` returns `work_dir + "/runtime/tmp/"`.

### Fix Focus Areas
- src/storages/graph/property_graph.cc[811-817]
- src/storages/graph/property_graph.cc[1149-1152]
- include/neug/utils/id_indexer.h[250-256]
- include/neug/storages/file_names.h[169-184]
- include/neug/storages/workspace.h[68-75]

### What to change
1. Replace `tmp_dir(ckp.runtime_dir())` calls with either:
  - `ckp.runtime_dir() + "/tmp"` (direct composition), **or**
  - `tmp_dir(ckp.path())` if you want to keep using `file_names.h` conventions.
2. In LFIndexer::Open and other modules, pass `ckp.path()` as the `work_dir` argument to legacy helpers (`tmp_dir`, `runtime_dir`) rather than `ckp.runtime_dir()`.
3. Update cleanup in `PropertyGraph::Dump` to remove the correct tmp directory (the one actually used during Open/Dump).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Checkpoint Open always throws 🐞 Bug ≡ Correctness
Description
VertexTable::Open(Checkpoint, ModuleDescriptor) and EdgeTable::Open(Checkpoint, ModuleDescriptor)
unconditionally throw NOT_IMPLEMENTED, but tests and benchmarks now call these overloads, causing
immediate runtime failures.
Code

include/neug/storages/graph/vertex_table.h[R124-130]

+  // Module interface (Checkpoint-based) — the single public Open.
+  void Open(const Checkpoint& ckp,
+            const ModuleDescriptor& descriptor) override {
+    THROW_NOT_IMPLEMENTED_EXCEPTION(
+        "Checkpoint-based Open is not supported for VertexTable. Use "
+        "Open(work_dir, memory_level) instead.");
+  }
Evidence
Both VertexTable and EdgeTable explicitly throw in their checkpoint-based Open overrides. The
updated tests/benchmarks call these overloads (new_table.Open(ckp, desc) / `edge_table->Open(ckp,
desc)`), so they will fail at runtime.

include/neug/storages/graph/vertex_table.h[124-136]
include/neug/storages/graph/edge_table.h[54-66]
tests/storage/test_vertex_table.cc[155-190]
tests/storage/test_edge_table.cc[138-156]
tests/storage/vertex_table_benchmark.cc[68-75]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`VertexTable::Open(const Checkpoint&, const ModuleDescriptor&)` and `EdgeTable::Open(const Checkpoint&, const ModuleDescriptor&)` currently throw, but new tests/benchmarks call them. This breaks the new checkpoint-based API and CI.

### Issue Context
The PR introduces `Module`/`Checkpoint` as the persistence interface; tests were updated accordingly.

### Fix Focus Areas
- include/neug/storages/graph/vertex_table.h[124-136]
- include/neug/storages/graph/edge_table.h[54-66]
- tests/storage/test_vertex_table.cc[155-190]
- tests/storage/test_edge_table.cc[138-156]
- tests/storage/vertex_table_benchmark.cc[68-75]

### What to change
Option A (recommended):
1. Implement checkpoint-based `Open(ckp, desc)` for both VertexTable and EdgeTable (reading from `desc.path` and using `ckp.runtime_dir()` for mutable working files).
2. Keep legacy `Open(work_dir, level)` as an internal/legacy helper.

Option B (if checkpoint Open is intentionally unsupported):
1. Revert/update tests and benchmark to use legacy `Open(work_dir, level)`.
2. Update PropertyGraph header/docs to avoid claiming checkpoint-based Open uses Module::Open for these modules.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Column descriptor contract broken 🐞 Bug ≡ Correctness
Description
ColumnBase::Dump writes to ckp.runtime_dir()/uuid and sets ModuleDescriptor.path to that file
path, but ColumnBase::Open expects desc.path to be a directory and requires extra["name"]; Dump
never sets the name, so a dumped column cannot be reopened via Module::Open.
Code

include/neug/utils/property/column.h[R96-115]

+  void Open(const Checkpoint& ckp, const ModuleDescriptor& desc) override {
+    opened_prefix_ = desc.get("name").value_or("");
+    opened_dir_ = desc.path;
+    open(opened_prefix_, desc.path, ckp.runtime_dir());
+  }
+
+  /**
+   * @brief Dump the column to a UUID-named path under ckp.runtime_dir().
+   */
+  ModuleDescriptor Dump(const Checkpoint& ckp) override {
+    std::string uuid = generate_uuid();
+    std::string fork_path = ckp.runtime_dir() + "/" + uuid;
+    dump(fork_path);
+    ModuleDescriptor d;
+    d.path = fork_path;
+    d.type = "column";
+    d.module_type = ModuleTypeName();
+    d.version = 1;
+    d.memory_level = MemoryLevel::kSyncToFile;
+    return d;
Evidence
TypedColumn::open uses snapshot_dir + "/" + name as the base path, so desc.path must be a
directory and name must be provided. The new ColumnBase::Open reads desc.get("name"), but
ColumnBase::Dump doesn't set it and dumps to a UUID path that is not combined with any name prefix.

include/neug/utils/property/column.h[52-157]
include/neug/utils/property/column.h[91-116]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ColumnBase::Open` expects `ModuleDescriptor` to provide:
- `desc.path` = directory containing the snapshot files
- `extra["name"]` = file prefix

But `ColumnBase::Dump` currently:
- calls `dump(fork_path)` where `fork_path` is a UUID path
- sets `d.path = fork_path`
- never sets `d.set("name", ...)`

So the descriptor produced by Dump cannot be consumed by Open.

### Fix Focus Areas
- include/neug/utils/property/column.h[91-116]
- include/neug/utils/property/column.h[52-57]

### What to change
1. Make `Dump()` create a UUID *directory* under `ckp.runtime_dir()`.
2. Dump files as `data_dir + "/" + <column_prefix>` and set:
  - `d.path = data_dir`
  - `d.set("name", <column_prefix>)`
3. Decide what `<column_prefix>` is:
  - reuse `opened_prefix_` if the column was opened before, or
  - generate a stable per-column name and persist it.
4. Update `Open()` to pass a `work_dir` consistent with the rest of the codebase (avoid passing `ckp.runtime_dir()` into legacy helpers that expect a root).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. Descriptors miss module_type 🐞 Bug ⚙ Maintainability
Description
Schema/VertexTable/EdgeTable/LFIndexer Dump implementations omit ModuleDescriptor::module_type
even though the Module contract says Dump should populate it for ModuleFactory reconstruction,
making factory-based restore impossible for those modules.
Code

src/storages/graph/vertex_table.cc[R90-96]

+  ModuleDescriptor desc;
+  desc.path = target_dir;
+  desc.size = static_cast<uint64_t>(LidNum());
+  desc.type = "vertex_table";
+  desc.version = 1;
+  desc.memory_level = memory_level_;
+  return desc;
Evidence
Module::ModuleTypeName documentation states the value is written into
ModuleDescriptor::module_type by Dump implementations. However, several Dumps only set desc.type
and never set desc.module_type, while ModuleFactory is designed to create modules by
descriptor.module_type.

include/neug/storages/module/module.h[92-103]
src/storages/graph/schema.cc[2284-2309]
src/storages/graph/vertex_table.cc[83-97]
src/storages/graph/edge_table.cc[610-635]
include/neug/utils/id_indexer.h[273-287]
include/neug/storages/module/module_factory.h[49-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Several `Dump()` implementations do not set `ModuleDescriptor::module_type`, but ModuleFactory expects to create modules using `descriptor.module_type`.

### Issue Context
The `Module` interface explicitly documents that `ModuleTypeName()` is written into `ModuleDescriptor::module_type`.

### Fix Focus Areas
- src/storages/graph/schema.cc[2301-2308]
- src/storages/graph/vertex_table.cc[90-96]
- src/storages/graph/edge_table.cc[628-634]
- include/neug/utils/id_indexer.h[280-286]
- include/neug/storages/module/module.h[92-103]

### What to change
1. In each Dump, add `desc.module_type = ModuleTypeName();`.
2. Keep `desc.type` as the coarse tag if desired, but ensure `module_type` is the factory key.
3. Add an assertion/log in ModuleFactory restore paths (wherever descriptors are consumed) if `module_type` is empty to catch future regressions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. CSR fork ignores MemoryLevel 🐞 Bug ➹ Performance
Description
CSR ForkAsCsr implementations ignore the requested MemoryLevel and always dump+open via the
sync-to-file open() path, so callers cannot obtain in-memory or hugepage forks as requested.
Code

include/neug/storages/csr/mutable_csr.h[R192-199]

+  std::unique_ptr<CsrBase> ForkAsCsr(const Checkpoint& ckp,
+                                     MemoryLevel level) const override {
+    std::string uuid = generate_uuid();
+    const_cast<MutableCsr<EDATA_T>*>(this)->dump(uuid, ckp.runtime_dir());
+    auto new_csr = std::make_unique<MutableCsr<EDATA_T>>();
+    new_csr->open(uuid, ckp.runtime_dir(), ckp.runtime_dir());
+    return new_csr;
+  }
Evidence
ForkAsCsr receives MemoryLevel level but does not branch on it; it always dumps to runtime_dir and
calls open(...), which is the sync-to-file path in CSR implementations. This defeats the purpose
of MemoryLevel-controlled forks (used by EdgeTable::Fork and Module::Fork).

include/neug/storages/csr/mutable_csr.h[192-203]
include/neug/storages/csr/immutable_csr.h[118-125]
src/storages/csr/mutable_csr.cc[39-78]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ForkAsCsr(const Checkpoint&, MemoryLevel)` ignores `level` and always reopens using `open(...)` (sync-to-file style). This prevents callers from requesting in-memory/hugepage forks.

### Fix Focus Areas
- include/neug/storages/csr/mutable_csr.h[192-203]
- include/neug/storages/csr/immutable_csr.h[118-125]
- include/neug/storages/csr/csr_base.h[141-147]

### What to change
1. After dumping to a UUID prefix under `ckp.runtime_dir()`, reopen based on `level`:
  - `kSyncToFile`: `open(uuid, <snapshot_dir>, ckp.path())` (or consistent work_dir)
  - `kInMemory`: `open_in_memory(<dir> + "/" + uuid)`
  - `kHugePagePrefered`: `open_with_hugepages(<dir> + "/" + uuid)`
2. Make sure the chosen `<dir>` and `work_dir` parameters match the tmp_dir/runtime_dir conventions in `file_names.h` (avoid passing `ckp.runtime_dir()` as a work_dir root).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

7. UUID generation unchecked 🐞 Bug ☼ Reliability
Description
generate_uuid() does not check whether opening or reading "/proc/sys/kernel/random/uuid" succeeded;
it may return an empty string which then becomes a directory/file name, causing collisions or writes
into unexpected paths.
Code

include/neug/storages/workspace.h[R34-39]

+inline std::string generate_uuid() {
+  std::ifstream f("/proc/sys/kernel/random/uuid");
+  std::string uuid;
+  std::getline(f, uuid);
+  return uuid;
+}
Evidence
The function assumes the /proc file exists and is readable. If the ifstream fails or getline fails,
uuid remains empty and is returned unchecked; many Dump/Fork paths use this as a directory
component.

include/neug/storages/workspace.h[34-39]
src/storages/graph/property_graph.cc[1078-1147]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`generate_uuid()` can return an empty string if `/proc/sys/kernel/random/uuid` cannot be read, but callers use it to form directory/file paths.

### Fix Focus Areas
- include/neug/storages/workspace.h[34-39]

### What to change
1. Check `f.is_open()` and the result of `std::getline`.
2. If read fails, either:
  - throw a clear exception, or
  - fallback to a portable UUID generator (e.g., std::random_device + formatting).
3. Consider trimming whitespace/newlines from the UUID string before returning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread src/storages/graph/property_graph.cc
Comment thread src/storages/graph/property_graph.cc Outdated
Comment thread include/neug/storages/graph/vertex_table.h Outdated
Comment thread include/neug/utils/property/column.h Outdated
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 5 times, most recently from 516e96f to 5bc7c6d Compare April 2, 2026 06:44
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 5 times, most recently from b0c0529 to 382166b Compare April 14, 2026 03:30
Comment thread include/neug/storages/csr/mutable_csr.h Outdated
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from ac894ec to 347749a Compare April 15, 2026 02:22
@zhanglei1949 zhanglei1949 marked this pull request as ready for review April 15, 2026 07:16
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from 46823b4 to 96eb6ed Compare April 16, 2026 02:36
@zhanglei1949 zhanglei1949 requested review from Copilot April 16, 2026 02:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors persistent storage management to revolve around Workspace + Checkpoint + ModuleDescriptor (module-based Open/Dump/Fork), and updates graph/table/CSR codepaths and tests accordingly (Fix #133).

Changes:

  • Introduce checkpointed workspace layout (Workspace, Checkpoint, SnapshotMeta, ModuleDescriptor) and module registry (ModuleFactory).
  • Migrate PropertyGraph, VertexTable, EdgeTable, Table/Column, and CSR implementations to the new module-driven Open/Dump/Fork API.
  • Update/extend unit & storage tests for the new checkpoint/module workflow and add new fork/dump matrix tests.

Reviewed changes

Copilot reviewed 67 out of 67 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Updates table tests to use Workspace/Checkpoint and descriptors instead of hardcoded dirs.
tests/unittest/utils.h Adds standard library includes needed by updated unit test helpers.
tests/unittest/test_column.cc New typed tests for column fork isolation + dump/open matrix.
tests/unittest/logical_delete_test.cc Migrates PropertyGraph open to checkpoint-based API.
tests/unittest/CMakeLists.txt Adds new test_column target.
tests/storage/vertex_table_benchmark.cc Updates benchmark to use checkpoint/module descriptors.
tests/storage/test_vertex_table.cc Migrates vertex table tests to checkpoint-based Open/Dump.
tests/storage/test_set_property.cc Includes updated unittest utils header.
tests/storage/test_property_graph.cc Migrates property graph test fixture to checkpoint-based Open.
tests/storage/test_open_graph.cc Includes updated unittest utils header and tweaks logging.
tests/storage/test_mutable_csr.cc Adds fork/dump tests and migrates CSR tests to checkpoint-based API.
tests/storage/test_immutable_csr.cc Migrates immutable CSR tests to checkpoint-based API and adds workspace usage.
tests/storage/test_export.cc Includes updated unittest utils header.
tests/storage/test_ddl.cc Includes updated unittest utils header.
tests/storage/test_csr_stream_ops.cc Migrates CSR stream tests to checkpoint-based API and workspace-managed dirs.
tests/storage/test_csr_batch_ops.cc Migrates CSR batch tests to checkpoint-based Open/Dump.
tests/storage/alter_property_test.cc Migrates alter-property test to open graph via checkpoint.
tests/execution/test_value.cc Minor formatting cleanup for EXPECT_DEATH.
src/utils/uuid.cc Adds UUID generator implementation.
src/utils/property/table.cc Refactors Table to module-based Open/Dump/Fork and registers column module types.
src/utils/id_indexer.cc Registers LF indexer module type.
src/storages/module/module_factory.cc Adds module factory implementation and registry.
src/storages/module/CMakeLists.txt Adds object library for module sources.
src/storages/loader/abstract_property_graph_loader.cc Loader now creates a workspace+checkpoint and uses checkpoint meta schema.
src/storages/graph/workspace.cc New workspace implementation that discovers/creates checkpoints.
src/storages/graph/vertex_timestamp.cc Migrates vertex timestamp to module-based Open/Dump/Fork and registers module.
src/storages/graph/vertex_table.cc Migrates vertex table to module-based Open/Dump and descriptor submodules.
src/storages/graph/snapshot_meta.cc Adds JSON-persisted snapshot meta (schema + module inventory).
src/storages/graph/schema.cc Adds Schema JSON (de)serialization via YAML<->JSON conversion and fixes edge-strategy labels.
src/storages/graph/property_graph.cc Migrates graph Open/Dump to checkpoint/meta-driven module open.
src/storages/graph/module_descriptor.cc Implements ModuleDescriptor JSON and submodule tree handling.
src/storages/graph/edge_table.cc Migrates edge table to module-based Open/Dump and property mutations using checkpoint.
src/storages/graph/checkpoint.cc New checkpoint implementation (dirs, meta persistence, commit semantics, cleanup).
src/storages/csr/mutable_csr.cc Migrates mutable CSR variants to module API, implements descriptor-based dump/open, registers module types.
src/storages/csr/immutable_csr.cc Migrates immutable CSR variants to module API, implements dump/open, registers module types.
src/storages/container/mmap_container.cc Improves mmap resize error and adds Fork for containers.
src/storages/container/container_utils.cc Improves error logging when tmp path is missing in disk-backed mode.
src/storages/CMakeLists.txt Adds module/ subdir into storages build.
src/server/neug_db_service.cc Updates session pool init after WAL path handling changes.
src/main/neug_db.cc Migrates DB open/ingest/checkpoint flow to workspace+checkpoint model.
include/neug/utils/uuid.h Declares UUIDGenerator.
include/neug/utils/property/table.h Updates Table API to checkpoint/module descriptor and adds Fork.
include/neug/utils/property/column.h Migrates columns to Module interface and implements Open/Dump/Fork/type names.
include/neug/utils/id_indexer.h Migrates LFIndexer to Module interface and adds Open/Dump/Fork.
include/neug/storages/workspace.h Declares Workspace API.
include/neug/storages/snapshot_meta.h Declares SnapshotMeta (schema + modules) and key helpers.
include/neug/storages/module_descriptor.h Declares ModuleDescriptor with JSON + submodules.
include/neug/storages/module/type_name.h Adds StorageTypeName<> trait for module type strings.
include/neug/storages/module/module_factory.h Declares ModuleFactory + registration macro.
include/neug/storages/module/module.h Declares the module lifecycle interface.
include/neug/storages/loader/abstract_property_graph_loader.h Loader now owns a Workspace and seeds checkpoint meta schema.
include/neug/storages/graph/vertex_timestamp.h VertexTimestamp now implements Module.
include/neug/storages/graph/vertex_table.h VertexTable migrated to descriptor-based Open/Dump; indexer is now a pointer.
include/neug/storages/graph/schema.h Adds Schema JSON API declarations.
include/neug/storages/graph/property_graph.h Switches graph open/dump API to checkpoint-based, stores Checkpoint*.
include/neug/storages/graph/edge_table.h Migrates EdgeTable API to checkpoint-based Open/Dump/Close and property mutations.
include/neug/storages/file_names.h Removes temp_checkpoint_dir helper (no longer used with checkpoint model).
include/neug/storages/csr/mutable_csr.h Migrates CSR base implementations to module API and adds Fork/type names.
include/neug/storages/csr/immutable_csr.h Migrates immutable CSR to module API and defines module type names.
include/neug/storages/csr/csr_base.h Makes CSR base inherit Module and adds fork_vertex hook.
include/neug/storages/container/mmap_container.h Declares container Fork.
include/neug/storages/container/i_container.h Adds container Fork virtual method.
include/neug/storages/checkpoint.h Declares Checkpoint class and helpers (commit/meta/containers).
include/neug/server/session_pool.h SessionPool now derives WAL location from graph.checkpoint().
include/neug/main/neug_db.h NeugDB::work_dir() now returns workspace db dir; adds Workspace member.
Comments suppressed due to low confidence (12)

src/storages/graph/property_graph.cc:1

  • SnapshotMeta::edge_table_key is declared as (src_label, edge_label, dst_label) in include/neug/storages/snapshot_meta.h, but here it’s called as (src_label, dst_label, edge_label). This will break meta lookups (and likely won’t compile if overloads don’t match). Align the argument order either in the helper signature or at all call sites.
    src/storages/graph/vertex_table.cc:1
  • Table was refactored to Open/Dump/Fork and no longer shows a close() API in the updated header diff, so table_->close() is very likely a compile error. Either reintroduce a Table::Close() method (closing all columns’ modules) and call that, or update all callers to the new lifecycle method name consistently.
    src/storages/graph/edge_table.cc:1
  • Same issue as VertexTable: table_->close() is inconsistent with the refactored Table API and is likely missing now. Consider adding and using Table::Close() consistently from both VertexTable and EdgeTable.
    src/utils/property/table.cc:1
  • add_columns ignores the memory_level argument and always opens new columns as kInMemory. This changes behavior for disk-backed or hugepage configurations. Use the provided memory_level (and a stable per-column sub-descriptor/path strategy) so added columns follow the table’s storage policy.
    src/utils/property/table.cc:1
  • add_columns ignores the memory_level argument and always opens new columns as kInMemory. This changes behavior for disk-backed or hugepage configurations. Use the provided memory_level (and a stable per-column sub-descriptor/path strategy) so added columns follow the table’s storage policy.
    src/storages/graph/vertex_timestamp.cc:1
  • When desc is default-constructed, desc.path is empty; std::filesystem::exists(\"\") can resolve to the current directory on some platforms/libstdc++ versions, causing an unintended load_ts call with an empty path. Guard explicitly with !desc.path.empty() before checking exists().
    tests/utils/test_table.cc:1
  • This uses ::getpid() but the file diff doesn’t show an include for <unistd.h> (or an equivalent platform header). This can fail to compile on some toolchains. Add the appropriate include or replace getpid() usage with a portable alternative.
    tests/storage/vertex_table_benchmark.cc:1
  • VertexTable::Open expects the ModuleDescriptor to contain sub-modules (indexer, property_table, vertex_timestamp). Providing only module_type+path will result in empty sub-descriptors and can cause runtime failures (e.g., LFIndexer needing key type). For a benchmark, either (1) create a new empty table by passing a default descriptor but ensuring the indexer is initialized with pk type and opened accordingly, or (2) load a real descriptor produced by VertexTable::Dump.
    src/storages/graph/checkpoint.cc:1
  • This constructs a new UUIDGenerator (and thus a new std::random_device/PRNG) on every loop iteration. Consider reusing a static/shared generator (e.g., generate_uuid() already uses a static generator) to reduce overhead, especially since this is called frequently for runtime object creation.
    src/utils/uuid.cc:1
  • <regex> is included but not used in this translation unit. Removing unused includes reduces compile time and avoids confusion about intended validation logic.
    src/storages/graph/checkpoint.cc:1
  • Checkpoint meta updates now drive deletion of orphaned snapshot files. This is correctness-critical and potentially destructive; please add focused tests that (1) write meta referencing a subset of UUID files, (2) call UpdateMeta, and (3) verify only unreferenced UUID files are deleted while referenced ones remain.
    src/storages/graph/checkpoint.cc:1
  • Checkpoint meta updates now drive deletion of orphaned snapshot files. This is correctness-critical and potentially destructive; please add focused tests that (1) write meta referencing a subset of UUID files, (2) call UpdateMeta, and (3) verify only unreferenced UUID files are deleted while referenced ones remain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/neug_db.cc Outdated
Comment thread include/neug/storages/graph/vertex_table.h Outdated
Comment thread include/neug/storages/checkpoint.h Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the on-disk workspace layout by introducing first-class Workspace, Checkpoint, and ModuleDescriptor abstractions, migrating graph/storage modules and tests away from ad-hoc file_names.h path composition (Fix #133).

Changes:

  • Introduce Workspace + Checkpoint directory model with JSON SnapshotMeta and hierarchical ModuleDescriptor trees for module open/dump.
  • Convert core storage modules (columns, tables, CSRs, indexer, vertex/edge tables, property graph) to a unified Module interface (Open/Dump/Close/Fork).
  • Update unit/integration tests and benchmarks to create checkpoints via Workspace and reopen modules via returned descriptors.

Reviewed changes

Copilot reviewed 68 out of 68 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Updates table tests to use Workspace/Checkpoint and descriptor-based open/dump.
tests/unittest/utils.h Adds standard headers needed by updated unit-test utilities.
tests/unittest/test_column.cc Adds typed-column fork/dump/open matrix tests using new checkpoint/descriptors.
tests/unittest/logical_delete_test.cc Switches logical-delete tests to open graphs from a Checkpoint.
tests/unittest/CMakeLists.txt Registers new test_column target.
tests/storage/vertex_table_benchmark.cc Updates benchmark open path to use checkpoint + descriptor.
tests/storage/test_vertex_table.cc Migrates vertex table tests to checkpoint-based open/dump/reopen.
tests/storage/test_set_property.cc Includes common unit-test utilities header.
tests/storage/test_property_graph.cc Uses Workspace + Checkpoint to open PropertyGraph.
tests/storage/test_open_graph.cc Includes common unit-test utilities header and adjusts logging.
tests/storage/test_mutable_csr.cc Adds fork/dump/open tests and migrates CSR open/dump APIs to checkpoint/descriptors.
tests/storage/test_immutable_csr.cc Migrates immutable CSR tests to checkpoint/descriptors and adds workspace usage.
tests/storage/test_export.cc Includes common unit-test utilities header.
tests/storage/test_ddl.cc Includes common unit-test utilities header.
tests/storage/test_csr_stream_ops.cc Replaces manual dir layout with Workspace + checkpoint-based CSR open.
tests/storage/test_csr_batch_ops.cc Migrates batch CSR tests to checkpoint/descriptors and removes manual dir plumbing.
tests/storage/alter_property_test.cc Uses Workspace + Checkpoint to open/modify graph instead of raw work dir.
tests/execution/test_value.cc Minor formatting change to EXPECT_DEATH.
src/utils/uuid.cc Adds UUID generation implementation.
src/utils/property/table.cc Refactors Table to module-based Open/Dump/Fork integration.
src/utils/property/column.cc Registers TypedColumn<T> template modules with ModuleFactory.
src/utils/id_indexer.cc Registers LFIndexer as a template module.
src/storages/module/module_factory.cc Implements ModuleFactory registry and creation.
src/storages/module/CMakeLists.txt Adds object library for module factory sources.
src/storages/loader/abstract_property_graph_loader.cc Uses Workspace and checkpoint meta for loader error reporting / initialization.
src/storages/graph/workspace.cc Implements workspace directory scanning and checkpoint creation.
src/storages/graph/vertex_timestamp.cc Converts VertexTimestamp to Module with descriptor-based open/dump/fork.
src/storages/graph/vertex_table.cc Converts VertexTable to descriptor-based module open/dump and updated indexer usage.
src/storages/graph/snapshot_meta.cc Introduces JSON-backed snapshot metadata for schema + module inventory.
src/storages/graph/schema.cc Adds YAML↔JSON conversion helpers and schema JSON (de)serialization.
src/storages/graph/property_graph.cc Refactors graph open/dump to operate on a Checkpoint + SnapshotMeta.
src/storages/graph/module_descriptor.cc Implements hierarchical ModuleDescriptor with JSON serialization.
src/storages/graph/edge_table.cc Converts EdgeTable open/dump and property mutation flows to checkpoint/descriptors.
src/storages/graph/checkpoint.cc Implements checkpoint directory management, meta update, and orphan cleanup.
src/storages/csr/mutable_csr.cc Refactors mutable CSR open/dump/close to checkpoint/descriptors and registers templates.
src/storages/csr/immutable_csr.cc Refactors immutable CSR open/dump/close to checkpoint/descriptors and registers templates.
src/storages/container/mmap_container.cc Adds IDataContainer::Fork support and improves mmap resize error detail.
src/storages/container/container_utils.cc Strengthens validation/logging for disk-backed container creation.
src/storages/CMakeLists.txt Adds new module subdirectory into storages object build.
src/server/neug_db_service.cc Adjusts session pool initialization to derive WAL location from graph/checkpoint.
src/main/neug_db.cc Uses Workspace checkpoints to open graph, ingest WALs, and create new checkpoints.
include/neug/utils/uuid.h Declares UUIDGenerator.
include/neug/utils/property/table.h Updates Table API to checkpoint/descriptors and adds Fork.
include/neug/utils/property/column.h Converts columns to Module interface and adds fork/type naming.
include/neug/utils/id_indexer.h Converts LFIndexer to Module interface with open/dump/fork + descriptor metadata.
include/neug/storages/workspace.h Declares Workspace API for checkpoint management.
include/neug/storages/snapshot_meta.h Declares SnapshotMeta for schema + module descriptors persisted as JSON.
include/neug/storages/module_descriptor.h Declares hierarchical ModuleDescriptor with JSON and submodules.
include/neug/storages/module/type_name.h Adds template type-name mapping for module registration keys.
include/neug/storages/module/module_factory.h Declares ModuleFactory and registration macros.
include/neug/storages/module/module.h Declares unified Module lifecycle interface.
include/neug/storages/loader/abstract_property_graph_loader.h Refactors loader to create a workspace/checkpoint and seed meta schema.
include/neug/storages/graph/vertex_timestamp.h Converts VertexTimestamp to Module interface.
include/neug/storages/graph/vertex_table.h Refactors VertexTable open/dump APIs and indexer ownership.
include/neug/storages/graph/schema.h Adds schema JSON conversion APIs.
include/neug/storages/graph/property_graph.h Refactors graph open/dump to checkpoint-based model and exposes checkpoint accessor.
include/neug/storages/graph/edge_table.h Refactors EdgeTable open/dump and property mutation APIs to include checkpoint.
include/neug/storages/file_names.h Removes temp_checkpoint_dir helper (obsolete under new checkpoint model).
include/neug/storages/csr/mutable_csr.h Updates CSR base to Module interface, adds fork/type naming, and shared nbr-list semantics.
include/neug/storages/csr/immutable_csr.h Updates immutable CSR to Module interface and adds type naming.
include/neug/storages/csr/csr_base.h Makes CsrBase a Module and introduces fork_vertex requirement.
include/neug/storages/container/mmap_container.h Adds Fork API to MMapContainer.
include/neug/storages/container/i_container.h Adds IDataContainer::Fork API.
include/neug/storages/checkpoint.h Declares Checkpoint with runtime/snapshot/wal dirs and commit utilities.
include/neug/server/session_pool.h Derives WAL writer URI from graph.checkpoint().wal_dir().
include/neug/main/neug_db.h Stores a Workspace and exposes work_dir() via ws_.db_dir().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/neug/utils/id_indexer.h Outdated
Comment thread include/neug/storages/graph/property_graph.h
Comment thread src/utils/property/table.cc
Comment thread tests/utils/test_table.cc
@zhanglei1949
Copy link
Copy Markdown
Member Author

TODO(@zhanglei1949 ) Avoid full dump if the file is not changes(not dirty)

@zhanglei1949
Copy link
Copy Markdown
Member Author

TODO(@zhanglei1949 ) Avoid full dump if the file is not changes(not dirty)

Implemented. TODO: validate on sf300

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors on-disk workspace management by introducing explicit Workspace, Checkpoint, and ModuleDescriptor/Module abstractions, replacing the previous ad-hoc file_names.h-driven layout (Fix #133).

Changes:

  • Introduce Workspace/Checkpoint directory model plus SnapshotMeta JSON for persisted module inventory.
  • Convert core storage components (graph, tables, CSRs, columns, indexer) to Module::Open/Dump/Close/Fork using ModuleDescriptor.
  • Update tests/benchmarks and build wiring to use the new checkpointed workspace layout; remove file_names.h usage across the codebase.

Reviewed changes

Copilot reviewed 72 out of 72 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Refactor table tests to use Workspace + checkpoint/module descriptors.
tests/unittest/utils.h Add common stdlib includes for updated unittest utilities.
tests/unittest/test_column.cc New typed tests for column fork/dump/open across memory levels.
tests/unittest/logical_delete_test.cc Open graph via Workspace checkpoint instead of raw directory.
tests/unittest/CMakeLists.txt Add test_column target.
tests/storage/vertex_table_benchmark.cc Update benchmark to use Checkpoint/ModuleDescriptor open path.
tests/storage/test_set_property.cc Remove file_names.h dependency; include unittest utils.
tests/storage/test_property_graph.cc Open graph via Workspace checkpoint.
tests/storage/test_open_graph.cc Remove file_names.h; adjust logging and test utilities usage.
tests/storage/test_mutable_csr.cc Convert CSR tests to checkpoint/module descriptor APIs; add fork matrix tests.
tests/storage/test_immutable_csr.cc Convert immutable CSR tests to checkpoint/module descriptor APIs.
tests/storage/test_export.cc Remove file_names.h; include unittest utils.
tests/storage/test_ddl.cc Remove file_names.h; include unittest utils.
tests/storage/test_csr_stream_ops.cc Open CSRs via checkpoint/module descriptors; remove manual dir management.
tests/storage/test_csr_batch_ops.cc Open/dump/open CSR flows via checkpoint/module descriptors.
tests/storage/test_checkpoint.cc Add checkpoint optimization tests (hardlink fast path).
tests/storage/alter_property_test.cc Open graph via checkpoint created from workspace.
tests/execution/test_value.cc Minor formatting change to EXPECT_DEATH.
src/utils/uuid.cc Add UUID generator implementation.
src/utils/property/table.cc Convert Table open/dump/add-columns to module-descriptor-based APIs.
src/utils/property/column.cc Register TypedColumn<T> module instantiations.
src/utils/id_indexer.cc Register LFIndexer module instantiation.
src/transaction/wal/wal.cc Remove file_names.h dependency.
src/transaction/update_transaction.cc Remove file_names.h dependency.
src/storages/module/module_factory.cc Add module factory registry implementation.
src/storages/module/CMakeLists.txt Add storages module object library build.
src/storages/loader/abstract_property_graph_loader.cc Use workspace db_dir() when reporting disk remaining.
src/storages/graph/workspace.cc Add Workspace implementation for checkpoint discovery/creation.
src/storages/graph/vertex_timestamp.cc Make VertexTimestamp a Module with descriptor-based open/dump/fork.
src/storages/graph/vertex_table.cc Convert vertex table open/dump/close to module-descriptor APIs.
src/storages/graph/snapshot_meta.cc Add JSON-backed SnapshotMeta for schema + module descriptors.
src/storages/graph/schema.cc Add schema JSON <-> YAML conversion helpers for checkpoint meta.
src/storages/graph/property_graph.cc Open/dump graph via checkpoint meta + module descriptors; remove schema file IO.
src/storages/graph/module_descriptor.cc Implement ModuleDescriptor JSON serialization + sub-module tree.
src/storages/graph/edge_table.cc Convert edge table open/dump/add/delete props to module-descriptor APIs.
src/storages/graph/checkpoint.cc Add checkpoint implementation (runtime/snapshot dirs, meta update, cleanup, hardlinks).
src/storages/csr/mutable_csr.cc Convert mutable CSRs to module-descriptor open/dump/close/fork model.
src/storages/csr/immutable_csr.cc Convert immutable CSRs to module-descriptor open/dump/close model; add registration.
src/storages/container/mmap_container.cc Improve error message; add IDataContainer::Fork implementation.
src/storages/container/container_utils.cc Improve error logging for missing tmp path in disk-backed containers.
src/storages/CMakeLists.txt Add module subdirectory and include in storages object files.
src/server/neug_db_service.cc Session pool now derives WAL path from graph checkpoint instead of wal_dir(...).
src/main/neug_db.cc Open DB via workspace checkpoints; checkpoint creation now creates new checkpoint dir + dumps into it.
include/neug/utils/uuid.h Add UUID generator header.
include/neug/utils/property/table.h Update table API to checkpoint/module descriptor pattern.
include/neug/utils/property/column.h Make columns Modules with descriptor-based open/dump/fork.
include/neug/utils/id_indexer.h Make LFIndexer a Module with descriptor-based open/dump/fork.
include/neug/storages/workspace.h Add workspace API for managing checkpoint set.
include/neug/storages/snapshot_meta.h Add snapshot meta interface (schema + module map).
include/neug/storages/module_descriptor.h Add module descriptor structure with JSON/sub-module support.
include/neug/storages/module/type_name.h Add storage-type-to-name mapping for template module registration.
include/neug/storages/module/module_factory.h Add module factory + registration macros.
include/neug/storages/module/module.h Define Module interface for open/dump/close/fork.
include/neug/storages/loader/abstract_property_graph_loader.h Loader now creates/uses workspace + checkpoint meta schema.
include/neug/storages/graph/vertex_timestamp.h Convert vertex timestamp tracker to Module interface.
include/neug/storages/graph/vertex_table.h Refactor vertex table to depend on checkpoint/module descriptors and module indexer.
include/neug/storages/graph/schema.h Add schema JSON serialization hooks.
include/neug/storages/graph/property_graph.h Change graph open/dump API to checkpoint-based; expose checkpoint for WAL writers.
include/neug/storages/graph/edge_table.h Update edge table interface to checkpoint/module descriptor APIs.
include/neug/storages/file_names.h Remove legacy path helper header.
include/neug/storages/csr/mutable_csr.h Update mutable CSR interfaces to Module open/dump/close/fork and fork-vertex API.
include/neug/storages/csr/immutable_csr.h Update immutable CSR interfaces to Module open/dump/close.
include/neug/storages/csr/csr_base.h Make CSR base class a Module and add fork_vertex contract.
include/neug/storages/container/mmap_container.h Add container fork API surface.
include/neug/storages/container/i_container.h Add IDataContainer::Fork interface.
include/neug/storages/checkpoint.h Add checkpoint API (open/commit/runtime objects/meta).
include/neug/server/session_pool.h WAL writers now use graph.checkpoint().wal_dir() instead of injected wal uri.
include/neug/main/neug_db.h Expose db work dir via workspace; allocator init now takes allocator dir.
.github/workflows/neug-test.yml Adjust ccache key hashing inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storages/graph/workspace.cc
Comment thread include/neug/storages/workspace.h
Comment thread include/neug/storages/graph/vertex_table.h Outdated
Comment thread include/neug/utils/uuid.h
Comment thread src/utils/uuid.cc
Comment thread include/neug/storages/checkpoint.h Outdated
Comment thread src/storages/graph/workspace.cc
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the database directory layout and persistence layer by introducing Workspace, Checkpoint, and ModuleDescriptor/SnapshotMeta, replacing the previous ad-hoc file_names.h-based path conventions (Fix #133).

Changes:

  • Introduce checkpoint-scoped workspace management (Workspace, Checkpoint) and JSON-based metadata (SnapshotMeta) to track module snapshots.
  • Convert core storage components (CSR, columns/indexers, vertex/edge tables, property graph) to a Module-style Open/Dump/Close/Fork lifecycle driven by ModuleDescriptor.
  • Update and expand tests/benchmarks to open/dump via checkpoints and to cover fork + hardlink reuse behavior.

Reviewed changes

Copilot reviewed 72 out of 72 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Moves table tests to checkpoint/workspace-based open/dump.
tests/unittest/utils.h Adds includes used by new test helpers.
tests/unittest/test_column.cc Adds new typed-column fork/dump/open matrix tests.
tests/unittest/logical_delete_test.cc Opens graph via workspace checkpoint.
tests/unittest/CMakeLists.txt Registers new test_column target.
tests/storage/vertex_table_benchmark.cc Updates benchmark to open vertex table via checkpoint + descriptor.
tests/storage/test_set_property.cc Switches test utilities away from file_names.h.
tests/storage/test_property_graph.cc Opens property graph via workspace checkpoint.
tests/storage/test_open_graph.cc Updates test includes/utilities; minor logging change.
tests/storage/test_mutable_csr.cc Updates CSR tests to checkpoint/module descriptor API; adds fork tests.
tests/storage/test_immutable_csr.cc Updates immutable CSR tests to checkpoint/module descriptor API.
tests/storage/test_export.cc Switches test utilities away from file_names.h.
tests/storage/test_ddl.cc Switches test utilities away from file_names.h.
tests/storage/test_csr_stream_ops.cc Updates streaming CSR tests to checkpoint/module API.
tests/storage/test_csr_batch_ops.cc Updates batch CSR tests to checkpoint/module API.
tests/storage/test_checkpoint.cc Adds hardlink optimization tests for unchanged dumps; removes file_names.h.
tests/storage/alter_property_test.cc Opens graph via workspace checkpoint in alter-property flow.
tests/execution/test_value.cc Formatting-only change in EXPECT_DEATH.
src/utils/uuid.cc Adds UUID generator implementation.
src/utils/property/table.cc Refactors Table to dump/open via Checkpoint + ModuleDescriptor.
src/utils/property/column.cc Registers TypedColumn<T> module types.
src/utils/id_indexer.cc Registers indexer template module types.
src/transaction/wal/wal.cc Removes dependency on removed file_names.h.
src/transaction/update_transaction.cc Removes dependency on removed file_names.h.
src/storages/module/module_factory.cc Implements module factory registry.
src/storages/module/CMakeLists.txt Adds storages module object library.
src/storages/loader/abstract_property_graph_loader.cc Uses workspace db dir for disk remaining reporting.
src/storages/graph/workspace.cc Implements workspace checkpoint enumeration + creation.
src/storages/graph/vertex_timestamp.cc Converts vertex timestamp tracker into a Module.
src/storages/graph/vertex_table.cc Converts vertex table to descriptor-driven open/dump.
src/storages/graph/snapshot_meta.cc Implements JSON meta persistence for checkpoint inventories.
src/storages/graph/schema.cc Adds Schema YAML↔JSON conversion for meta persistence.
src/storages/graph/property_graph.cc Refactors graph open/dump to use Checkpoint + SnapshotMeta.
src/storages/graph/module_descriptor.cc Implements descriptor JSON serialization + nested submodules.
src/storages/graph/edge_table.cc Refactors edge table open/dump + property rebuild paths for checkpoint API.
src/storages/graph/checkpoint.cc Implements checkpoint dirs, commit/hardlink/copy semantics, cleanup.
src/storages/csr/mutable_csr.cc Converts mutable CSR to module lifecycle + descriptor dump/open.
src/storages/csr/immutable_csr.cc Converts immutable CSR to module lifecycle + descriptor dump/open.
src/storages/container/mmap_container.cc Adds container Fork() and improves resize error diagnostics.
src/storages/container/container_utils.cc Adds explicit error logging for missing tmp path in disk-backed mode.
src/storages/CMakeLists.txt Adds module/ subdirectory to storages build.
src/server/neug_db_service.cc Updates session pool init now that WAL dir comes from checkpoint.
src/main/neug_db.cc Refactors DB open/checkpoint to use workspace/head checkpoint and checkpoint-scoped dirs.
include/neug/utils/uuid.h Declares UUID generator.
include/neug/utils/property/table.h Updates Table API to checkpoint/module descriptor model.
include/neug/utils/property/column.h Converts columns into Module implementations with dump/open/fork.
include/neug/utils/id_indexer.h Converts LFIndexer into a Module with descriptor-based persistence/fork.
include/neug/storages/workspace.h Declares Workspace API.
include/neug/storages/snapshot_meta.h Declares checkpoint meta format and key scheme.
include/neug/storages/module_descriptor.h Declares module descriptor structure and JSON support.
include/neug/storages/module/type_name.h Adds type-name trait for module registration keys.
include/neug/storages/module/module_factory.h Declares module factory + registration macros.
include/neug/storages/module/module.h Introduces the core module lifecycle interface.
include/neug/storages/loader/abstract_property_graph_loader.h Constructs loader graph through workspace/checkpoint + meta schema.
include/neug/storages/graph/vertex_timestamp.h Updates vertex timestamp to implement Module.
include/neug/storages/graph/vertex_table.h Updates vertex table API and internals for module-based persistence.
include/neug/storages/graph/schema.h Declares Schema JSON conversion API.
include/neug/storages/graph/property_graph.h Changes property graph open/dump API to checkpoint-based.
include/neug/storages/graph/edge_table.h Changes edge table open/dump and property ops to checkpoint-based.
include/neug/storages/file_names.h Removes legacy ad-hoc directory naming helpers.
include/neug/storages/csr/mutable_csr.h Updates CSR interfaces to module lifecycle + fork support.
include/neug/storages/csr/immutable_csr.h Updates immutable CSR interfaces to module lifecycle.
include/neug/storages/csr/csr_base.h Makes CSR base a Module and adds fork_vertex hook.
include/neug/storages/container/mmap_container.h Declares container Fork().
include/neug/storages/container/i_container.h Adds container Fork() to the interface.
include/neug/storages/checkpoint.h Declares checkpoint API including commit semantics.
include/neug/server/session_pool.h Constructs WAL writers from graph checkpoint WAL dir.
include/neug/main/neug_db.h Uses workspace db dir as work dir and updates allocator init signature.
.github/workflows/neug-test.yml Adjusts ccache key inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/storage/test_checkpoint.cc Outdated
Comment thread src/storages/csr/immutable_csr.cc Outdated
Comment thread src/storages/csr/immutable_csr.cc Outdated
Comment thread tests/storage/test_open_graph.cc Outdated
Comment thread tests/storage/test_checkpoint.cc
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from 1059b20 to ba65938 Compare April 20, 2026 09:12
@zhanglei1949 zhanglei1949 requested a review from Copilot April 21, 2026 02:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors database directory/file management by introducing a structured WorkspaceCheckpointModuleDescriptor architecture, replacing the previous ad-hoc file_names.h-based layout (Fix #133).

Changes:

  • Introduce Workspace, Checkpoint, SnapshotMeta (JSON), ModuleDescriptor, and ModuleFactory to standardize persistence and restore of storage modules.
  • Refactor graph/table/CSR/indexer components (and many tests) to open/dump through Checkpoint + ModuleDescriptor instead of direct path conventions.
  • Add/extend tests for module fork/dump/open behavior and checkpoint hardlink optimization.

Reviewed changes

Copilot reviewed 72 out of 72 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Migrates table tests to Workspace/Checkpoint-based open/dump paths.
tests/unittest/utils.h Expands test utilities includes (support for new filesystem usage).
tests/unittest/test_column.cc New typed-column fork/dump/open matrix tests across memory levels.
tests/unittest/logical_delete_test.cc Updates graph opening to go through Workspace checkpoint.
tests/unittest/CMakeLists.txt Adds new test_column target.
tests/storage/vertex_table_benchmark.cc Updates benchmark to open VertexTable via Checkpoint/ModuleDescriptor.
tests/storage/test_vertex_table.cc Migrates vertex table tests to Workspace/Checkpoint descriptors.
tests/storage/test_set_property.cc Removes legacy file_names dependency (uses unittest utils).
tests/storage/test_property_graph.cc Migrates PropertyGraph test open path to Workspace checkpoint.
tests/storage/test_open_graph.cc Removes file_names dependency; minor logging fix.
tests/storage/test_mutable_csr.cc Adds CSR fork tests + migrates dump/open APIs to descriptors.
tests/storage/test_immutable_csr.cc Migrates immutable CSR tests to descriptor-based open/dump/close.
tests/storage/test_export.cc Removes legacy file_names dependency (uses unittest utils).
tests/storage/test_ddl.cc Removes legacy file_names dependency (uses unittest utils).
tests/storage/test_csr_stream_ops.cc Migrates stream CSR tests to Workspace/Checkpoint module open.
tests/storage/test_csr_batch_ops.cc Migrates batch CSR tests to descriptor-based open/dump/open.
tests/storage/test_checkpoint.cc Adds new checkpoint hardlink optimization tests.
tests/storage/alter_property_test.cc Updates helper to open graph via Checkpoint instead of directory paths.
tests/execution/test_value.cc Minor formatting change to death test.
src/utils/uuid.cc Adds UUIDGenerator implementation used for runtime/snapshot object naming.
src/utils/property/table.cc Refactors Table open/dump/add-columns to use Checkpoint + ModuleDescriptor.
src/utils/property/column.cc Registers TypedColumn module types with ModuleFactory.
src/utils/id_indexer.cc Registers LFIndexer module instantiation with ModuleFactory.
src/transaction/wal/wal.cc Removes legacy file_names include usage.
src/transaction/update_transaction.cc Removes legacy file_names include usage.
src/storages/module/module_factory.cc Implements ModuleFactory registry.
src/storages/module/CMakeLists.txt Adds module sub-library build definition.
src/storages/loader/abstract_property_graph_loader.cc Uses Workspace/db_dir from new architecture for disk remaining logs.
src/storages/graph/workspace.cc Implements Workspace directory manager for checkpoints.
src/storages/graph/vertex_timestamp.cc Converts VertexTimestamp to Module (Open/Dump/Fork/Close) + registers it.
src/storages/graph/vertex_table.cc Migrates VertexTable open/dump/close to module descriptors.
src/storages/graph/snapshot_meta.cc Implements SnapshotMeta JSON read/write and schema/module inventory.
src/storages/graph/schema.cc Adds Schema JSON (de)serialization via YAML↔JSON conversion helpers.
src/storages/graph/property_graph.cc Refactors PropertyGraph open/dump to use SnapshotMeta + ModuleDescriptors.
src/storages/graph/module_descriptor.cc Implements ModuleDescriptor JSON serialization + sub-module tree.
src/storages/graph/edge_table.cc Refactors EdgeTable open/dump/add/delete-props to descriptor-based flow.
src/storages/graph/checkpoint.cc Implements Checkpoint lifecycle, meta update, commit/link/copy semantics.
src/storages/csr/immutable_csr.cc Refactors immutable CSR open/dump/close to Checkpoint+ModuleDescriptor.
src/storages/container/mmap_container.cc Adds container Fork + improves mmap resize error detail.
src/storages/container/container_utils.cc Improves error logging when tmp path is missing for disk-backed containers.
src/storages/CMakeLists.txt Adds storages/module subdirectory to build objects.
src/server/neug_db_service.cc Adjusts SessionPool init (WAL path derived from graph checkpoint).
src/main/neug_db.cc Refactors DB open/checkpoint creation to use Workspace/Checkpoint and meta.
include/neug/utils/uuid.h Adds UUIDGenerator API.
include/neug/utils/property/table.h Updates Table API to Open/Dump via Checkpoint + ModuleDescriptor.
include/neug/utils/property/column.h Converts ColumnBase to Module; adds descriptor-based Open/Dump/Fork.
include/neug/utils/id_indexer.h Converts LFIndexer to Module; adds Open/Dump/Fork using descriptors.
include/neug/storages/workspace.h Adds Workspace API for managing checkpoint directories.
include/neug/storages/snapshot_meta.h Adds SnapshotMeta API (schema + module descriptor map).
include/neug/storages/module_descriptor.h Adds ModuleDescriptor (path/type/size/extra + sub-modules) definition.
include/neug/storages/module/type_name.h Adds type→string mapping helper for template module registration keys.
include/neug/storages/module/module_factory.h Adds ModuleFactory + registration macros.
include/neug/storages/module/module.h Introduces Module interface (Open/Dump/Close/Fork).
include/neug/storages/loader/abstract_property_graph_loader.h Loader now initializes an empty Workspace + initial checkpoint meta schema.
include/neug/storages/graph/vertex_timestamp.h Makes VertexTimestamp a Module and declares new lifecycle methods.
include/neug/storages/graph/vertex_table.h Refactors VertexTable internals to use module-based indexer + dump descriptors.
include/neug/storages/graph/schema.h Declares Schema ToJson/FromJson API.
include/neug/storages/graph/property_graph.h Refactors PropertyGraph API to open/dump via Checkpoint and expose checkpoint().
include/neug/storages/graph/edge_table.h Refactors EdgeTable API to open/dump via Checkpoint + descriptor.
include/neug/storages/file_names.h Removes legacy file/directory naming helpers (deleted).
include/neug/storages/csr/mutable_csr.h Converts mutable CSRs to Module lifecycle + adds Fork/fork_vertex plumbing.
include/neug/storages/csr/immutable_csr.h Converts immutable CSRs to Module lifecycle (descriptor-based open/dump).
include/neug/storages/csr/csr_base.h Makes CsrBase inherit Module; adds fork_vertex hook.
include/neug/storages/container/mmap_container.h Adds GetHeader and Fork API for mmap containers.
include/neug/storages/container/i_container.h Adds IDataContainer::Fork interface.
include/neug/storages/checkpoint.h Adds Checkpoint API for file/container open + commit + meta management.
include/neug/server/session_pool.h WAL writer path now derived from graph.checkpoint().wal_dir().
include/neug/main/neug_db.h NeugDB now exposes work_dir via Workspace db_dir and updates allocator init signature.
.github/workflows/neug-test.yml Adjusts ccache key inputs for CI caching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storages/graph/workspace.cc
Comment thread tests/storage/test_checkpoint.cc
Comment thread tests/storage/test_checkpoint.cc
Comment thread include/neug/utils/uuid.h
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from ed23154 to 15d924f Compare April 22, 2026 04:06
refine module_store and module_desc

refactor snapshot_meta/checkpoint, remove unncessary changes

remove module store

minor ref

ref wal apply

refine

refine checkpoint manage

fix ci

minor

refine

fix

minor refine

refine module register macro

fix format

rm file_names.h

minor refine

minor refine

supporing fast dump

remove some cmments

TBD: add dirty flag for MutableCSR, to avoid redumping the data

minor fix

remove method generate_uuid

fix test

revert is_dirty_ flag and compute md5 for mutable_csr's nbr_list before dump

fix test

check whether data_buffer is modified at column level

refine

remove unused field last_descriptor
@zhanglei1949 zhanglei1949 requested a review from Copilot April 22, 2026 09:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors on-disk database state management around Workspace / Checkpoint and a Module + ModuleDescriptor persistence API, aiming to speed up checkpoint/open/close flows (including hardlink-based snapshot reuse) and update tests accordingly.

Changes:

  • Introduce Workspace, Checkpoint, SnapshotMeta, ModuleDescriptor, and ModuleFactory to standardize module open/dump/fork/close and persist module inventories as JSON.
  • Migrate core storage components (graph, vertex/edge tables, CSR, columns/indexer, vertex timestamp) and many tests from path-based open/dump to descriptor-based checkpoint operations.
  • Add new unit tests/benchmarks for fork isolation and checkpoint hardlink optimization; update CI cache key and gitignore.

Reviewed changes

Copilot reviewed 72 out of 73 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Updates table tests to use Workspace/Checkpoint + ModuleDescriptor open/dump.
tests/unittest/utils.h Adds standard headers used by new test utilities.
tests/unittest/test_column.cc Adds typed column fork/dump/open matrix tests.
tests/unittest/logical_delete_test.cc Switches graph open to checkpoint-based API.
tests/unittest/CMakeLists.txt Adds new test_column target.
tests/storage/vertex_table_benchmark.cc Updates benchmark to checkpoint + descriptor-based VertexTable::Open.
tests/storage/test_vertex_table.cc Migrates vertex table tests to checkpoint-based open/dump/reopen.
tests/storage/test_set_property.cc Replaces old file path utilities include with unittest utils.
tests/storage/test_property_graph.cc Migrates property graph tests to Workspace/Checkpoint open.
tests/storage/test_open_graph.cc Removes legacy file-names include; minor logging fix.
tests/storage/test_mutable_csr.cc Adds CSR fork tests; migrates open/dump/close to checkpoint-based API.
tests/storage/test_immutable_csr.cc Migrates immutable CSR tests to checkpoint-based API and module dump/open.
tests/storage/test_export.cc Removes legacy file-names include; uses unittest utils.
tests/storage/test_ddl.cc Removes legacy file-names include; uses unittest utils.
tests/storage/test_csr_stream_ops.cc Migrates CSR stream tests to checkpoint-based API.
tests/storage/test_csr_batch_ops.cc Migrates CSR batch tests to checkpoint-based open/dump.
tests/storage/test_checkpoint.cc Adds hardlink optimization tests; removes legacy file-names include.
tests/storage/alter_property_test.cc Opens graph via checkpoint rather than directory.
tests/execution/test_value.cc Formatting tweak for EXPECT_DEATH.
src/utils/uuid.cc Adds UUID generator implementation used by checkpoint runtime objects.
src/utils/property/table.cc Refactors Table to ModuleDescriptor dump/open and checkpoint-based column management.
src/utils/property/column.cc Registers TypedColumn<T> module types with ModuleFactory.
src/utils/id_indexer.cc Registers LFIndexer module type with ModuleFactory.
src/transaction/wal/wal.cc Removes dependency on legacy file naming utilities.
src/transaction/update_transaction.cc Removes dependency on legacy file naming utilities.
src/storages/module/module_factory.cc Implements ModuleFactory registry.
src/storages/module/CMakeLists.txt Adds new storages module object library build.
src/storages/loader/abstract_property_graph_loader.cc Uses Workspace db dir in disk remaining logging.
src/storages/graph/workspace.cc Implements Workspace checkpoint discovery and creation.
src/storages/graph/vertex_timestamp.cc Converts VertexTimestamp to Module supporting dump/open/fork via checkpoint.
src/storages/graph/vertex_table.cc Converts VertexTable open/dump to module descriptors + checkpoint.
src/storages/graph/snapshot_meta.cc Implements JSON meta read/write for schema + module inventory.
src/storages/graph/schema.cc Adds YAML<->JSON conversion and schema JSON serialization hooks.
src/storages/graph/property_graph.cc Refactors graph open/dump to checkpoint + SnapshotMeta inventory.
src/storages/graph/module_descriptor.cc Implements ModuleDescriptor JSON serialization and sub-module handling.
src/storages/graph/edge_table.cc Refactors edge tables to checkpoint/descriptors; updates property-alter rebuild paths.
src/storages/graph/checkpoint.cc Implements checkpoint runtime/snapshot dirs, commit/link semantics, and cleanup.
src/storages/csr/immutable_csr.cc Refactors immutable CSR to module descriptors + checkpoint commit.
src/storages/container/mmap_container.cc Improves resize error; adds container Fork support.
src/storages/container/container_utils.cc Improves error logging for missing tmp path on disk-backed containers.
src/storages/CMakeLists.txt Adds module/ subdirectory and links module object library.
src/server/neug_db_service.cc Updates SessionPool init signature (wal dir now derived elsewhere).
src/main/neug_db.cc Refactors DB open/checkpoint to use Workspace/Checkpoint and checkpoint-local wal/allocator dirs.
include/neug/utils/uuid.h Declares UUID generator.
include/neug/utils/property/table.h Updates table API for checkpoint-based open/dump/add_columns.
include/neug/utils/property/column.h Makes columns implement Module; adds dump/open/fork; string column hardlink fast path.
include/neug/utils/id_indexer.h Makes LFIndexer implement Module with descriptor dump/open/fork.
include/neug/storages/workspace.h Declares Workspace public API.
include/neug/storages/snapshot_meta.h Declares SnapshotMeta JSON-persisted module inventory and schema.
include/neug/storages/module_descriptor.h Declares ModuleDescriptor structure + JSON serialization/sub-modules.
include/neug/storages/module/type_name.h Adds template type-name mapping used for module type strings.
include/neug/storages/module/module_factory.h Declares ModuleFactory and registration macros.
include/neug/storages/module/module.h Declares Module interface (Open/Dump/Close/Fork).
include/neug/storages/loader/abstract_property_graph_loader.h Moves loader graph initialization to workspace/checkpoint + schema in meta.
include/neug/storages/graph/vertex_timestamp.h Updates VertexTimestamp API to Module.
include/neug/storages/graph/vertex_table.h Updates VertexTable open/dump/add-properties to checkpoint/descriptors.
include/neug/storages/graph/schema.h Adds schema JSON conversion interface.
include/neug/storages/graph/property_graph.h Updates graph open/dump to checkpoint-based API and exposes checkpoint.
include/neug/storages/graph/edge_table.h Updates edge table open/dump/property ops to checkpoint-based API.
include/neug/storages/file_names.h Removes legacy file naming helpers.
include/neug/storages/csr/mutable_csr.h Refactors mutable CSR to Module API, adds fork semantics and fork_vertex.
include/neug/storages/csr/immutable_csr.h Refactors immutable CSR to Module API.
include/neug/storages/csr/csr_base.h Makes CsrBase a Module and adds fork_vertex requirement.
include/neug/storages/container/mmap_container.h Exposes file header and container fork API.
include/neug/storages/container/i_container.h Adds IDataContainer::Fork.
include/neug/storages/checkpoint.h Declares checkpoint APIs for commit/link/open and meta management.
include/neug/server/session_pool.h Removes explicit wal uri param; constructs wal writers from graph checkpoint.
include/neug/main/neug_db.h Wires Workspace into DB state and allocator init signature change.
.gitignore Adds .aone_copilot/ ignore entry.
.github/workflows/neug-test.yml Adjusts ccache key hashing inputs.
Comments suppressed due to low confidence (7)

include/neug/server/session_pool.h:1

  • WAL writers are bound to graph.checkpoint().wal_dir() at SessionPool::Init() time, but checkpoints can rotate during runtime (NeugDB::createCheckpoint() now creates a new checkpoint). Existing sessions would continue writing WAL to the old checkpoint directory, while recovery (openGraphAndIngestWals()) reads WAL from the latest checkpoint directory, risking missed WAL replay or split WAL history. Consider keeping WAL under a stable workspace-level directory (e.g., <db_dir>/wal) or adding a mechanism to rebind/recreate per-session WAL writers on checkpoint rotation.
/** Copyright 2020 Alibaba Group Holding Limited.

src/storages/graph/property_graph.cc:1

  • Deleting the checkpoint's WAL directory during Dump() can break subsequent writes if WAL is expected to continue in that location (especially since SessionPool creates WAL writers targeting graph.checkpoint().wal_dir()). If the intent is to truncate old WAL logs after a successful checkpoint, prefer deleting WAL files while keeping the directory, or recreating the directory immediately after cleanup. Also consider whether WAL should be workspace-scoped rather than checkpoint-scoped (to avoid coupling runtime logging to snapshot rotation).
    src/storages/graph/workspace.cc:1
  • Checkpoint directories are named without zero-padding (e.g., checkpoint-1, checkpoint-10). Any code that does lexicographic sorting on directory names (including the new tests) will mis-order checkpoints once ids reach 10+. Consider zero-padding to a fixed width (matching the header comment checkpoint-00001) or ensuring any directory listing sorts by parsed numeric id rather than by string path.
    tests/storage/test_checkpoint.cc:1
  • std::sort(dirs) sorts lexicographically by path, which becomes incorrect once checkpoint ids reach multiple digits (e.g., checkpoint-10 sorts before checkpoint-2). This can make the optimization tests flaky as the DB accumulates checkpoints. Prefer sorting by parsed numeric checkpoint id (e.g., extract the integer suffix and sort by that), or enforce zero-padded checkpoint directory names in Workspace::CreateCheckpoint().
    src/storages/container/mmap_container.cc:1
  • errno is used here but this file doesn't show an explicit include for <cerrno>/<errno.h>. Depending on transitive includes, this can fail to compile. Add an explicit include (#include <cerrno> or #include <errno.h>) in this translation unit to make errno available.
    tests/storage/test_checkpoint.cc:1
  • The test uses a fixed /tmp/... directory name. This can make CI flaky when tests run concurrently (or if a previous run crashed and left files with restrictive permissions). Prefer using a unique per-test directory (e.g., std::filesystem::temp_directory_path() / (name + pid + timestamp)) similar to other updated fixtures in this PR.
    src/utils/uuid.cc:1
  • <regex> is included but not used in this implementation unit. Removing it reduces compile time and avoids unnecessary dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/neug/storages/loader/abstract_property_graph_loader.h
Comment thread include/neug/storages/module/module_factory.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants