feat(indexing): add index-file-document tool for file uploads by adityamparikh · Pull Request #88 · apache/solr-mcp

adityamparikh · 2026-04-06T20:42:29Z

Summary

Adds a new MCP tool index-file-document that enables users to upload files of any format (PDF, Word, Excel, PowerPoint, etc.) through their AI chat client and have the content indexed into Solr for full-text search.

Closes #69

How it works

When a user uploads a file in an AI chat client like Claude Desktop, the client handles text extraction — not the MCP server. Here's the flow:

User uploads a file (e.g., report.pdf) in Claude Desktop
Claude Desktop extracts text from the PDF before Claude ever sees it — Claude receives the readable text content, not the raw binary bytes
Claude calls the index-file-document tool, passing the extracted text as content and the original filename (report.pdf) as filename
The MCP server indexes a SolrInputDocument with id (auto-generated UUID), content (the full text), and filename (for filtering/display)
User can now search over the indexed content using existing search tools

This design means no binary parsing library (Tika, Docling, etc.) is needed on the server side — the AI chat client already does the heavy lifting of text extraction before invoking MCP tools. This keeps the server lightweight and avoids ~100MB of transitive dependencies.

Tool signature

index-file-document(collection, content, filename)

Parameter	Description
`collection`	Solr collection to index into
`content`	Text content extracted from the file by the chat client
`filename`	Original filename with extension (e.g. `report.pdf`) — stored as a searchable field

Changes

FileDocumentCreator (new) — @Component that creates a SolrInputDocument with id, content, and filename fields. Does not implement SolrDocumentCreator because it requires a filename parameter in addition to content.
IndexingDocumentCreator — Added FileDocumentCreator dependency and createSchemalessDocumentsFromFile() delegation method
IndexingService — New indexFileDocument() MCP tool with @PreAuthorize("isAuthenticated()")
AGENTS.md — Updated architecture docs

Test plan

FileDocumentCreatorTest — 9 unit tests: valid input, null/empty/blank content, null/empty filename, oversized content, unique IDs, multiline content
FileIndexingTest — Spring Boot integration test through IndexingDocumentCreator
IndexingServiceTest — 2 new Testcontainers integration tests verifying index-then-search round-trip (search by content, search by filename)
IndexingServiceTest.UnitTests — 2 new mocked unit tests for the MCP tool method
Existing test constructors updated for new FileDocumentCreator parameter
./gradlew build passes with all tests green

🤖 Generated with Claude Code

https://claude.ai/code/session_018sFVmRBfFQ3aaU8yyPDvCG

Add a new MCP tool `index-file-document` that accepts text content extracted by AI chat clients from uploaded files (PDF, Word, Excel, etc.) and indexes it into Solr for full-text search. New components: - FileDocumentCreator: creates SolrInputDocument with id, content, and filename fields from pre-extracted text - IndexingDocumentCreator: new createSchemalessDocumentsFromFile() delegation method - IndexingService: new indexFileDocument() MCP tool Closes apache#69 Signed-off-by: Aditya Parikh <adityamparikh@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>

claude and others added 2 commits March 31, 2026 20:58

docs: update Spring AI version reference in AGENTS.md

42606aa

https://claude.ai/code/session_018sFVmRBfFQ3aaU8yyPDvCG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(indexing): add index-file-document tool for file uploads#88

feat(indexing): add index-file-document tool for file uploads#88
adityamparikh wants to merge 2 commits intoapache:mainfrom
adityamparikh:index-text

adityamparikh commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adityamparikh commented Apr 6, 2026

Summary

How it works

Tool signature

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants