Skip to content

PDF support? #10

@DieracDelta

Description

@DieracDelta

https://github.com/pymupdf/pymupdf4llm this repo converts pdfs to text. There's no semantic treesitter support (I found the markdown grammar underwhelming), but would you take a PR that takes PDFs in and indexes them by chunks (with seek etc)? Claude doesn't handle pdfs very well by default. I've also implemented this on my branch (https://github.com/DieracDelta/coderlm), and have had some really good success with it so far. It's a huge improvement from the default behavior of giving up when pdfs are too large. And the recursion helps with the top level context length as usual

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions