docs: restructure README to lead with primary surface and use cases#63
Draft
clee704 wants to merge 2 commits into
Draft
docs: restructure README to lead with primary surface and use cases#63clee704 wants to merge 2 commits into
clee704 wants to merge 2 commits into
Conversation
Reorganize the README to lead with the tool's value proposition and audience (AI agents and humans verifying encoding-level parquet behavior), with verb-noun subcommand examples in the first screen. Changes: - First paragraph states purpose and audience clearly; adds a 3-line verb-noun example before any prose - New 'What you can do' section with 5 concrete investigation flows (encoding audit, byte-range lookup, top-down navigation, page decode, file validation), each linked to the relevant subcommand section - New 'Compared to' section positioning the tool vs parquet-tools, pqrs, pyarrow.parquet library, and DuckDB parquet_metadata() - CLI Reference reorganized: verb-noun subcommands first (primary surface), legacy --output-mode modes collapsed into a 'Legacy whole-file modes' subsection (secondary surface) - All examples updated to use actual titanic.parquet output; removed placeholder '...' tokens throughout - New 'Design' section pointing to docs/output-principles.md and docs/tree-schema.md - Existing Library API, decoder usage, dev instructions, benchmarks, and technical details preserved and reorganized; Development section moved to the bottom Closes #23 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Widen tagline to 'CLI and Python library' so Python library consumers see the tool is relevant on first glance - Add flow 6 in 'What you can do': library API usage with a real runnable snippet against titanic.parquet showing lazy open, column walk, page.decode() RLE run inspection, and physical_values() - Move '## Library API' above '### Legacy whole-file modes' so the primary Python API surface sits above the secondary legacy CLI modes in the reference section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Restructures README.md to lead with the tool's value proposition and
audience — AI agents and humans verifying encoding-level parquet behavior —
with a verb-noun example in the first paragraph and the primary CLI surface
(verb-noun subcommands) clearly introduced before the legacy modes.
The previous README led with a generic one-liner and an HTML report link,
with the subcommand surface buried halfway down. This underrepresented what
the tool is now primarily useful for.
Key changes:
establish the tool's purpose and JSON/AI-agent-friendly design before any
prose sections.
audit, byte-range lookup, top-down navigation, page decode, file
validation), each with a real command example and a link to the subcommand
reference.
pyarrow.parquet library, and DuckDB
parquet_metadata().surface), legacy
--output-modemodes collapsed into a "Legacywhole-file modes" subsection (secondary surface).
titanic.parquetoutput (or a small pyarrow-generated file for theArrow:schema KV example); no
...placeholder tokens.docs/output-principles.mdanddocs/tree-schema.mdfor contributors and sophisticated consumers.instructions, benchmarks, footer cache, and Thrift details are all
retained and reorganized; Development moved to the bottom.
Closes #23
Refactoring checkpoint
Dogfooding
file summary,file kv,file schema,file validate,rowgroup list,rowgroup show,column list,column show,page list,page header,page decode) againsttests/data/titanic.parquet; also generated a small pyarrow file to capture theARROW:schemaKV key output. All real output pasted directly — no manual editing.Tests
hatch run dev:checkis green (format, lint, type-check, tests, per-module 95% coverage).