PDF support?

https://github.com/pymupdf/pymupdf4llm this repo converts pdfs to text. There's no semantic treesitter support (I found the markdown grammar underwhelming), but would you take a PR that takes PDFs in and indexes them by chunks (with seek etc)? Claude doesn't handle pdfs very well by default. I've also implemented this on my branch (https://github.com/DieracDelta/coderlm), and have had some really good success with it so far. It's a huge improvement from the default behavior of giving up when pdfs are too large. And the recursion helps with the top level context length as usual

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF support? #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PDF support? #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions