Skip to content

feat: HWPX (한글/Hancom) document handler#54

Open
lidge-jun wants to merge 54 commits intoiOfficeAI:mainfrom
lidge-jun:feat/hwpx
Open

feat: HWPX (한글/Hancom) document handler#54
lidge-jun wants to merge 54 commits intoiOfficeAI:mainfrom
lidge-jun:feat/hwpx

Conversation

@lidge-jun
Copy link
Copy Markdown

Summary

Hi! I'm a Korean developer building cli-jaw, a CLI orchestrator that relies heavily on OfficeCLI. First of all, thank you for this incredible tool — it's been invaluable for our document automation workflows.

Korean users frequently work with .hwpx files (Hancom/한글 Office format, the de facto standard for government and business documents in Korea). This PR adds comprehensive HWPX support to OfficeCLI.

This doesn't need to be merged into main right away. If you'd prefer, please feel free to merge it into a feat/hwpx branch so we can iterate together before it's production-ready.

What's included

  • Full HWPX handler — view, get, query, set, add, remove, move, copy, raw, validate
  • Mutate support — paragraphs, tables, images, equations, bookmarks, shapes, fields, and more
  • Label-based table fill — fill Korean government form fields by label matching (e.g., 성명, 주소)
  • Markdown import--from-markdown option for create command
  • CJK font segmentation — proper font handling for mixed CJK/Latin text in PowerPoint and Word
  • 180 passing tests including handler and corpus tests
  • SKILL.md for agent-assisted HWPX workflows

Notes

  • Rebased cleanly on current main (b5cd116)
  • All existing tests still pass
  • HWPX handler follows the same partial-class pattern as existing handlers
  • TryExtractBinary is stubbed out (binary extraction not yet implemented)

Happy to address any feedback. Would love to collaborate on making this production-ready!

Test plan

  • dotnet build — 0 errors
  • dotnet test — 180 passed, 0 failed
  • Rebased on latest upstream/main
  • Review by maintainer

@goworm
Copy link
Copy Markdown
Contributor

goworm commented Apr 13, 2026

Hi @lidge-jun, thanks for the incredible effort here — a full HWPX handler with 180 tests is no small feat, and we appreciate your interest in expanding OfficeCLI's format coverage.

After reviewing the PR, we've decided not to merge this into main just yet. Here's our thinking:

Plugin-based approach for HWPX
HWPX is an independent file format with adoption primarily in Korea. Rather than bundling it into the core, we'd like to support it as a plugin so the main project stays focused on the three universal Office formats. We'll be designing a plugin architecture that makes this possible.

Contribution guidelines
This PR also includes changes that fall outside the scope of HWPX support (CJK font segmentation in existing PPTX/DOCX handlers, a new compare command, CI workflow modifications, fork-internal documents like BRANCH_STRATEGY.md), and the commit history would need to be reorganized before merging. These don't align with our external contribution conventions.

Next steps
We'll keep this PR open until OfficeCLI's plugin system is ready for HWPX integration. You're welcome to keep updating this branch in the meantime — your domain knowledge of the HWPX format and Hancom compatibility is exactly what we'd need when the time comes.

Thanks again for the work — looking forward to collaborating!


안녕하세요 @lidge-jun 님, 먼저 이렇게 큰 작업을 해주셔서 정말 감사합니다. 180개의 테스트를 포함한 HWPX 핸들러 전체 구현은 결코 쉬운 작업이 아닌데, OfficeCLI의 포맷 확장에 관심을 가져주셔서 감사드립니다.

검토 결과, 현 시점에서는 main에 머지하지 않기로 결정했습니다. 아래에 저희 생각을 공유드립니다.

HWPX는 플러그인 방식으로 지원 예정
HWPX는 독립적인 파일 포맷이고 주로 한국에서 사용되고 있습니다. 코어에 직접 포함하기보다는 플러그인 형태로 지원하여, 메인 프로젝트가 Word/Excel/PowerPoint 세 가지 범용 Office 포맷에 집중할 수 있도록 할 계획입니다. 이를 위한 플러그인 아키텍처를 설계할 예정입니다.

외부 기여 가이드라인 관련
이번 PR에는 HWPX 지원 범위를 벗어나는 변경 사항이 포함되어 있습니다. 기존 PPTX/DOCX 핸들러의 CJK 폰트 세그먼테이션 수정, compare 명령어 신규 추가, CI 워크플로 변경, BRANCH_STRATEGY.md 같은 포크 내부 문서 등이 해당됩니다. 또한 커밋 히스토리도 정리가 필요한 상태입니다. 이러한 부분들이 저희 외부 기여 규정과 맞지 않아 머지가 어려운 상황입니다.

앞으로의 방향
이 PR은 OfficeCLI의 플러그인 시스템이 HWPX 통합을 지원할 준비가 될 때까지 오픈 상태로 유지할 예정입니다. 그 동안 이 브랜치에서 계속 업데이트해 주셔도 됩니다. HWPX 포맷과 한컴 호환성에 대한 도메인 지식이 저희에게 꼭 필요한 부분이거든요.

다시 한번 수고해 주신 점 감사드립니다. 앞으로의 협업을 기대하겠습니다!

lidge-jun and others added 25 commits April 14, 2026 18:56
…t tests

- CjkScript enum, detection, font chains for Korean/Japanese/Chinese
- WordML + DrawingML helpers for DOCX/PPTX CJK font metadata
- Mixed-text segmentation (English+CJK in same paragraph)
- Kinsoku (line-breaking) rules
- Integration into 7 handler files at 11+ points
- 41 xUnit tests covering all public methods

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract 5 helper methods from inline patterns:
- WrapInRun(): wraps content in hp:run element
- CreateSubList(): creates standard hp:subList with paragraph
- CloneParaPrIfShared(): clone-on-write for shared paraPr
- NextBorderFillId(): max-ID-based (fixes count-based ID bug)
- MakeBorder(): static border element factory

Refactored call sites: BuildCell, CreateFootnote, AddHeaderFooter,
SetParagraphAlignment, SetParagraphIndent, EnsureTableBorderFill.
Also fixes subList id="" bug in CreateFootnote (now uses NewId()).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ty (Plans 40-41)

Header/Footer: secPr-inline pattern → ctrl pattern (golden template verified).
- hp:ctrl > hp:header/footer > hp:subList structure
- subList id="" (empty), textWidth/textHeight from pagePr margins
- linesegarray required for rendering
- vertAlign: header=TOP, footer=BOTTOM

Hyperlink: add parameters, charPr registration, proper encoding.
- 6 parameters (Command, Path, Category, TargetType, DocOpenType, Prop)
- URL/Email/File auto-classification with correct Category values
- EnsureHyperlinkCharPr() registers blue underline charPr in header.xml
- EscapeHyperlinkCommand() handles colon/semicolon escaping
- dirty=1, zorder=-1, fieldid attributes added

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lidge-jun and others added 28 commits April 14, 2026 18:56
…ce changes

Add InsertPosition? parameter support for Add/Move/CopyFrom and
implement TryExtractBinary stub to match upstream interface updates.
…ection to hwpx skill

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to-sync

When skills/ directory changes on main/agent branch, sends repository_dispatch
to lidge-jun/cli-jaw-skills to trigger skill sync.

Requires SKILLS_SYNC_TOKEN secret (GitHub PAT with repo scope).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants