Skip to content

Pipeline vfs raw parsing and DuckDB appends#3

Closed
fanyang89 wants to merge 1 commit intomainfrom
ci41ko-codex/parallelize-parse-and-write-pipeline
Closed

Pipeline vfs raw parsing and DuckDB appends#3
fanyang89 wants to merge 1 commit intomainfrom
ci41ko-codex/parallelize-parse-and-write-pipeline

Conversation

@fanyang89
Copy link
Collaborator

Motivation

  • Improve throughput by decoupling JSON parsing from DuckDB writes so parsing can proceed while writes complete, reducing idle IO waiting.

Description

  • Introduced a buffered events channel and a writer goroutine that consumes events and calls appender.AppendRow in cmd_vfs_raw.go.
  • Added a writeErrCh to propagate writer errors back to the parser and a done channel to coordinate shutdown.
  • Changed the jsonParseThenAppend callback to send parsed *vfsEvent into the events channel and handle early writer errors.
  • Ensured orderly shutdown by closing events, waiting for the writer to finish, and returning any parse or write errors.

Testing

  • No automated tests were executed for this change.

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9bb17a68d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +318 to +322
parseErr := jsonParseThenAppend(r, func(e *vfsEvent) error {
select {
case events <- e:
return nil
case err := <-writeErrCh:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop parsing immediately when writer errors

The select can still choose events <- e even when writeErrCh is already readable, because Go selects randomly among ready cases. If the writer goroutine fails, the parser may enqueue up to the remaining buffer capacity (1024) of additional events before it eventually notices the error, and those events will be dropped because the writer has exited. That creates silent data loss in the exact scenario the error reporting is meant to handle. Consider prioritizing the error path (e.g., non-blocking check of writeErrCh before enqueue) so parsing halts as soon as the writer fails.

Useful? React with 👍 / 👎.

@fanyang89
Copy link
Collaborator Author

Wrong

@fanyang89 fanyang89 closed this Feb 9, 2026
@fanyang89 fanyang89 deleted the ci41ko-codex/parallelize-parse-and-write-pipeline branch February 10, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant