fix: refactor string-column dump#172
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Review Summary by QodoRefactor file I/O with BufferWriter utility class
WalkthroughsDescription• Introduce BufferWriter utility class for efficient buffered file I/O • Replace manual file operations with BufferWriter in CSR dump method • Refactor string-view array dump to use buffered writing approach • Improve error handling and resource management across file operations Diagramflowchart LR
A["Manual fopen/fwrite/fclose"] -->|"Replace with"| B["BufferWriter class"]
B -->|"Used in"| C["MutableCsr::dump"]
B -->|"Used in"| D["mmap_array string_view dump"]
C -->|"Writes"| E[".nbr files"]
D -->|"Writes"| F[".data and .items files"]
File Changes1. include/neug/utils/file_utils.h
|
Code Review by Qodo
1.
|
5b3b0ab to
bdefdd8
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new buffered file-writing utility (BufferWriter) and refactors dump paths to use buffered writes instead of direct FILE* I/O, aiming to improve dump performance and centralize error handling.
Changes:
- Added
BufferWriter(4MB buffer) toinclude/neug/utils/file_utils.h. - Refactored
MutableCsr<EDATA_T>::dump()to write.nbrviaBufferWriter. - Refactored
mmap_array<std::string_view>::dump()to emit.data/.itemsviaBufferWriterinstead of the prior compaction/streaming dump flow.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
include/neug/utils/file_utils.h |
Adds the new BufferWriter abstraction used by multiple dump paths. |
include/neug/utils/mmap_array.h |
Refactors string-column dump to buffered writes for .data/.items. |
src/storages/csr/mutable_csr.cc |
Updates CSR neighbor list dumping to use BufferWriter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
afe7d52 to
2b5c1da
Compare
35c536b to
fe5a2ff
Compare
…into refactor-string-dump
This pull request refactors the interface and implementation of edge table size tracking in the property graph storage system. The main goal is to clarify the meaning of "size" for edge tables, especially distinguishing between bundled and unbundled edge storage, and to simplify related code and tests. Additionally, the PR modernizes file I/O in several places, replacing C-style file operations with C++ streams, and removes unused or overly complex compaction logic for string columns.
Key changes include:
Edge Table Size Refactor
EdgeTable::Size()toPropTableSize()and updated its semantics: for bundled tables, it now always returns 0, clarifying that bundled edges do not have a separate property table size. All code and tests referencing edge table size were updated accordingly.File I/O Modernization
FILE*operations with C++std::ofstreaminMutableCsr::dumpand related code, improving exception safety and code clarity.String Column Compaction Simplification
TypedColumn<std::string_view>::dump, replacing it with a straightforward forward pass that writes unique string data and item metadata using C++ streams. The MD5 computation is stubbed for now.Minor Cleanups
MutableCsr::dumprelated to capacity lists.These changes improve code clarity, maintainability, and correctness, especially around edge table size semantics and file operations.