Skip to content

Commit f6cf0f3

Browse files
committed
Update documentation with PDF outline feature details
- Add comprehensive PDF outline tool documentation to CHANGELOG.md - Update README.md with get_pdf_outline in multiple sections: - Add to PDF information tools list - Include natural language usage examples - Add CLI command examples with all parameters - Create detailed tool documentation section with examples - Document all outline features: hierarchical structure, simple/detailed modes, max_depth, fuzzy_filter - Add docs-updater agent configuration for future documentation updates
1 parent e5a75b0 commit f6cf0f3

3 files changed

Lines changed: 150 additions & 1 deletion

File tree

.claude/agents/docs-updater.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
name: docs-updater
3+
description: Use this agent when you need to update project documentation files, specifically CHANGELOG.md and README.md, to reflect recent code changes, new features, or implementation updates. This agent should be used after significant code changes or feature additions to ensure documentation stays synchronized with the codebase.\n\nExamples:\n- <example>\n Context: The user has just implemented a new feature or made significant changes to the codebase.\n user: "I've finished implementing the new authentication system"\n assistant: "Great! Now let me use the docs-updater agent to update the CHANGELOG.md and README.md to reflect these changes"\n <commentary>\n Since new features have been implemented, use the docs-updater agent to ensure documentation is updated accordingly.\n </commentary>\n</example>\n- <example>\n Context: The user explicitly asks for documentation updates.\n user: "Update CHANGELOG.md and README.md to reflect the new API endpoints"\n assistant: "I'll use the docs-updater agent to update both documentation files with the new API endpoint information"\n <commentary>\n The user is explicitly requesting documentation updates, so use the docs-updater agent.\n </commentary>\n</example>
4+
tools: Task, Bash, Glob, Grep, LS, ExitPlanMode, Read, Edit, MultiEdit, Write, NotebookRead, NotebookEdit, WebFetch, TodoWrite, WebSearch, mcp__file-search__search_files, mcp__file-search__filter_files, ListMcpResourcesTool, ReadMcpResourceTool, mcp__sequential_thinking__sequentialthinking, mcp__playwright__browser_close, mcp__playwright__browser_resize, mcp__playwright__browser_console_messages, mcp__playwright__browser_handle_dialog, mcp__playwright__browser_evaluate, mcp__playwright__browser_file_upload, mcp__playwright__browser_install, mcp__playwright__browser_press_key, mcp__playwright__browser_type, mcp__playwright__browser_navigate, mcp__playwright__browser_navigate_back, mcp__playwright__browser_navigate_forward, mcp__playwright__browser_network_requests, mcp__playwright__browser_take_screenshot, mcp__playwright__browser_snapshot, mcp__playwright__browser_click, mcp__playwright__browser_drag, mcp__playwright__browser_hover, mcp__playwright__browser_select_option, mcp__playwright__browser_tab_list, mcp__playwright__browser_tab_new, mcp__playwright__browser_tab_select, mcp__playwright__browser_tab_close, mcp__playwright__browser_wait_for, mcp__sqlite__query, mcp__sqlite__execute, mcp__sqlite__list_tables, mcp__sqlite__describe_table, mcp__sqlite__create_table, mcp__fuzzy-search__extract_pdf_pages, mcp__fuzzy-search__get_pdf_page_labels, mcp__fuzzy-search__get_pdf_page_count, mcp__fuzzy-search__get_pdf_outline, mcp__fuzzy-search__fuzzy_search_files, mcp__fuzzy-search__fuzzy_search_content, mcp__fuzzy-search__fuzzy_search_documents
5+
model: sonnet
6+
---
7+
8+
You are a meticulous documentation specialist focused on maintaining accurate and up-to-date project documentation. Your primary responsibility is updating CHANGELOG.md and README.md files to reflect the current state of the codebase.
9+
10+
When updating documentation:
11+
12+
1. **Analyze Recent Changes**: Examine the codebase to identify what has changed, been added, or removed. Focus on:
13+
- New features or functionality
14+
- Breaking changes
15+
- Bug fixes
16+
- Performance improvements
17+
- Dependency updates
18+
- API changes
19+
20+
2. **Update CHANGELOG.md**:
21+
- Follow the Keep a Changelog format (if already in use) or maintain consistency with existing format
22+
- Add entries under the appropriate version section or create a new version section if needed
23+
- Use clear, concise descriptions that explain what changed and why it matters to users
24+
- Include dates for releases
25+
- Categorize changes appropriately (Added, Changed, Deprecated, Removed, Fixed, Security)
26+
27+
3. **Update README.md**:
28+
- Ensure all features are accurately documented
29+
- Update installation instructions if dependencies or setup process changed
30+
- Revise usage examples to reflect current API or interface
31+
- Update configuration options if any were added or modified
32+
- Ensure all code examples are working with the current implementation
33+
- Update any outdated links or references
34+
35+
4. **Quality Checks**:
36+
- Verify all technical details are accurate
37+
- Ensure consistency in formatting and style with the rest of the documentation
38+
- Check that version numbers are correct and consistent
39+
- Confirm that all new features mentioned in CHANGELOG are properly documented in README
40+
41+
5. **Best Practices**:
42+
- Write from the user's perspective - focus on impact rather than implementation details
43+
- Be concise but comprehensive
44+
- Use clear, simple language
45+
- Include examples where they add clarity
46+
- Maintain chronological order in CHANGELOG (newest first)
47+
48+
You should ONLY edit existing CHANGELOG.md and README.md files. Do not create new documentation files unless they already exist in the project. Focus exclusively on updating these two files to accurately reflect the current state of the implementation.

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@
3030
- **New PDF Information Tools**:
3131
- `get_pdf_page_labels`: Returns mapping of all page indices to their labels
3232
- `get_pdf_page_count`: Returns total number of pages in a PDF
33+
- `get_pdf_outline`: Extracts table of contents/bookmarks from PDFs
34+
- Returns hierarchical outline structure with levels, titles, page numbers, and page labels
35+
- Supports `simple` mode (default) with basic info or detailed mode with link information
36+
- Optional `max_depth` parameter to limit traversal depth for deep hierarchies
37+
- Optional `fuzzy_filter` parameter to search outline entries by title using fzf
38+
- Handles PDFs without outlines gracefully by returning empty structure
39+
- Available in both CLI (`pdf-outline` command) and MCP tool interface
3340
- **Type Checking**: Added support for `ty` type checker
3441
- Updated Makefile to use `ty check --exclude git-repos`
3542
- Added `ty>=0.0.1a16` as dev dependency
@@ -56,6 +63,7 @@
5663
- Added `TYPE_CHECKING` imports and type annotations for conditional imports
5764
- Fixed `mcp.context` attribute issues with type: ignore comments
5865
- **Test Assertion**: Added missing assertion for `proc.stdin` in CLI tests
66+
- **Test Tool Count**: Updated `test_list_tools` to expect 7 tools with addition of `get_pdf_outline`
5967

6068
### Dependencies
6169
- Replaced `pdfminer.six>=20221105` with `PyMuPDF>=1.23.0` for PDF operations

README.md

Lines changed: 94 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,10 @@ Advanced search with both file name and content capabilities using `ripgrep` and
2323
- `fuzzy_search_content`: Search file CONTENTS with path+content matching by default
2424
- **PDF and document search** (optional) - search through PDFs, Office docs, and archives using `ripgrep-all`
2525
- **PDF page extraction** (optional) - extract specific pages from PDFs using PyMuPDF with page label support
26-
- **PDF information tools** (optional) - get page labels and page count from PDF files:
26+
- **PDF information tools** (optional) - get page labels, page count, and table of contents from PDF files:
2727
- `get_pdf_page_labels`: Get all page labels from a PDF file
2828
- `get_pdf_page_count`: Get the total number of pages in a PDF file
29+
- `get_pdf_outline`: Extract table of contents/bookmarks from a PDF file
2930
- **Simplified interface** - just provide fuzzy search terms (NO regex support)
3031
- **Multiline record processing** for complex pattern matching
3132
- **Standalone CLI** for testing and direct usage
@@ -339,6 +340,8 @@ Once configured in Claude Desktop, you can use natural language for advanced sea
339340
- "Search for 'vector' in PDF documents" (requires ripgrep-all)
340341
- "Find all references to 'machine learning' in PDFs and Word documents"
341342
- "Extract pages 5-10 from the user manual PDF"
343+
- "Get the table of contents from the research paper PDF"
344+
- "Show me the outline of chapters in the user manual"
342345

343346
#### CLI Usage
344347

@@ -385,6 +388,10 @@ The fuzzy search server also works as a standalone CLI tool:
385388
./mcp_fuzzy_search.py page-labels manual.pdf # List all page labels
386389
./mcp_fuzzy_search.py page-labels manual.pdf --start 100 --limit 20 # Get labels for pages 100-119
387390
./mcp_fuzzy_search.py page-count manual.pdf # Get total page count
391+
./mcp_fuzzy_search.py pdf-outline manual.pdf # Get table of contents
392+
./mcp_fuzzy_search.py pdf-outline manual.pdf --max-depth 2 # Limit to 2 levels
393+
./mcp_fuzzy_search.py pdf-outline manual.pdf --fuzzy-filter "chapter" # Filter by title
394+
./mcp_fuzzy_search.py pdf-outline manual.pdf --no-simple # Detailed output with links
388395
```
389396

390397
### SQLite Server
@@ -965,6 +972,92 @@ Get the total number of pages in a PDF file.
965972
}
966973
```
967974

975+
#### `get_pdf_outline`
976+
Extract the table of contents (outline/bookmarks) from a PDF file.
977+
978+
**Purpose:** Returns the hierarchical outline structure with levels, titles, page numbers, and page labels, helpful for navigating complex PDFs and understanding document structure.
979+
980+
**Parameters:**
981+
- `file` (required): Path to PDF file
982+
- `simple` (optional): Return basic info (default: true) or detailed info with link data (false)
983+
- `max_depth` (optional): Maximum depth to traverse in the outline hierarchy (default: unlimited)
984+
- `fuzzy_filter` (optional): Fuzzy search string to filter outline entries by title using fzf
985+
986+
**Example:**
987+
```python
988+
{
989+
"file": "research_paper.pdf"
990+
}
991+
992+
# Returns (simple mode):
993+
{
994+
"outline": [
995+
[1, "Introduction", 1, "i"],
996+
[1, "Chapter 1: Background", 5, "1"],
997+
[2, "1.1 History", 6, "2"],
998+
[2, "1.2 Related Work", 10, "6"],
999+
[1, "Chapter 2: Methods", 15, "11"],
1000+
[2, "2.1 Data Collection", 16, "12"],
1001+
[3, "2.1.1 Sources", 17, "13"],
1002+
[2, "2.2 Analysis", 20, "16"]
1003+
],
1004+
"total_entries": 8,
1005+
"max_depth_found": 3
1006+
}
1007+
1008+
# Example with filtering:
1009+
{
1010+
"file": "research_paper.pdf",
1011+
"fuzzy_filter": "chapter"
1012+
}
1013+
1014+
# Returns:
1015+
{
1016+
"outline": [
1017+
[1, "Chapter 1: Background", 5, "1"],
1018+
[1, "Chapter 2: Methods", 15, "11"]
1019+
],
1020+
"total_entries": 8,
1021+
"max_depth_found": 3,
1022+
"filtered_count": 2
1023+
}
1024+
1025+
# Example with detailed output:
1026+
{
1027+
"file": "research_paper.pdf",
1028+
"simple": false,
1029+
"max_depth": 2
1030+
}
1031+
1032+
# Returns:
1033+
{
1034+
"outline": [
1035+
[1, "Introduction", 1, "i", {
1036+
"page": 1,
1037+
"uri": "#page=1&zoom=100,0,0",
1038+
"is_external": false,
1039+
"is_open": true,
1040+
"dest": {
1041+
"kind": 1,
1042+
"page": 0,
1043+
"uri": "#page=1&zoom=100,0,0"
1044+
}
1045+
}],
1046+
# ... more entries with link details
1047+
],
1048+
"total_entries": 8,
1049+
"max_depth_found": 2
1050+
}
1051+
```
1052+
1053+
**Outline Format:**
1054+
- Simple mode returns: `[level, title, page, page_label]`
1055+
- `level`: Hierarchy level (1-based, 1 = top level)
1056+
- `title`: The bookmark/outline entry title
1057+
- `page`: Page number (1-based)
1058+
- `page_label`: Page label as shown in PDF readers (e.g., "i", "ii", "1", "ToC")
1059+
- Detailed mode adds a 5th element with link information including destination details
1060+
9681061
### SQLite Server Tools
9691062

9701063
#### `query`

0 commit comments

Comments
 (0)