Skip to content

RAGForge#401

Open
ianarawjo wants to merge 29 commits intomainfrom
ragforge
Open

RAGForge#401
ianarawjo wants to merge 29 commits intomainfrom
ragforge

Conversation

@ianarawjo
Copy link
Owner

TBA

ianarawjo and others added 29 commits October 23, 2025 15:08
* WIP: RAG Engineering in ChainForge (#350)

* Added UploadNode

* Added ChunkingNode

* Added RetrievalNode

* Added upload chunking and retrieval backend

* Added multivar grouping on Join Node

* Added reasoning mode to Eval Nodes

* Added RAG presets and output to Multi-Eval Node

* Fixed state persistance

* Added RAG example flow

* Add files via upload

* Added FAISS vector store

---------

Co-authored-by: Ilias Zouine <146049321+izouiny@users.noreply.github.com>
Co-authored-by: izouiny <ilias.zouine@umontreal.ca>

* Refactor rag handlers within rag dir, and move imports inside method bodies

* Add MediaLookup and backend endpoints for uploading/retrieving documents.

* Fixes to Chunk Node. Upgrade req Python version to 3.10. Use Markitdown for doc to text.

* Add Chonkie chunkers. Fix bug-ridden existing chunker code.

* Typing fix MediaLookup. Add Chonkie to requirements.

* WIP: Bundle media files via .cfzip files. Add import/export of cfzip. Add backup import unzip that works entirely in the front-end, if back-end not present.

* Add pytest and tests folder. Test all Chonkie chunking methods.

* Fix remaining chunking methods.

* Move func-specific imports from top of retrieval handlers file into specific funcs

* Lock gensim to 4.3.3 and above, since a major dependency issue with scipy was fixed in this release

* Bug and UI fixing for chunking methods

* Refactor embeddings handlers into separate file from retrieval methods.

* Added LocalVectorStore with LanceDB and tests for it. Added dependencies for lancedb.

WARNING: Adding lancedb caused dependency hell between pydantic and spacy dependencies. Fix this before release

* Work way out of dependency hell

* Add distance metric to LanceDB vector store tests

* Add numpy<2.0 to setup.py

* wip testing retriever endpoint

* Multimodal Node in Chainforge (#352)


---------

Co-authored-by: lolo md <65371299+loloMD@users.noreply.github.com>
Co-authored-by: Roy Eyono <royeyono@Roys-MBP.office1.mila.quebec>
Co-authored-by: Ian Arawjo <fatso784@gmail.com>

* Remove Taskfiles. Cleanup.

* Many fixes and refactorings to MultimediaNode

* More bug fixing

* Improved Media Node UI. Renamed Multimedia node to Media Node for simplicity.

* Even more bug fixing. Add ability to paste images. Pass images to custom providers. Rate limit media uploads.

* Add multiple files at once. Fix issue with unique rows for images in response inspector.

* Verify that other provider calls work with images.

* Updated keyword-based retrieval methods and NestedMenu toggle behavior (#365)

* bug fixing eval score display

* side-by-side retriever display; fix carrying LLM name through retrieved chunks

* Embeddings:
- Local folder for huggingface
- AzureOpenAI implementation
- Fix pour OpenAI

* - Change in lanceDB to work as one of the vector_stores
- Implementation of faiss as a vector store
- Front for lanceDB

* local path for chunkers that need HF repo

* fix error in sdpm for car type of min_chunk_size

* fix include delim in SDPMChunker (should be Optional[Literal["prev", "next"]] = "prev") keep it as default : prev

* local path for neural chonkie (req modification of NeuralChunker)

* adding ; split in SplitNode.tsx

* add metadata with python processor

* -fix var input for cosine in retriver
-parallelization of azure openai embeddings
-adding

* wip

* Implement custom provider protocols and update retriever/chunker logic (#370)

* Fix retrieval response inspector display

* adding chunkmethod as a parameters for visualisation in inspect

* - fix markitdown with {}
- add Filter variable node

* new retriever logic that:
- allow the use of local db for vector database
- return empty list if no chunk is retrieved (very important for eval)

Fix in vector store:
- use of local db (was always create)
- use similarity threshold in faiss
- front end adding

* fixed custom provider setting buton logic; added 2 RAG example flows; added markdown chunker method (#383)

* Added 2 example flows

* fixed custom provider setting buton logic; added 2 RAG example flows; added markdown chunker method

* Fix viewport node origin bug

---------

Co-authored-by: MkYacine <96208478+MkYacine@users.noreply.github.com>
Co-authored-by: Ilias Zouine <146049321+izouiny@users.noreply.github.com>
Co-authored-by: izouiny <ilias.zouine@umontreal.ca>
Co-authored-by: Roy Eyono <72818357+RoyHEyono@users.noreply.github.com>
Co-authored-by: lolo md <65371299+loloMD@users.noreply.github.com>
Co-authored-by: Roy Eyono <royeyono@Roys-MBP.office1.mila.quebec>
Co-authored-by: Derin Akay <82316603+dxa204@users.noreply.github.com>
Co-authored-by: freddupo <frederic.dupont-marillia@lautorite.qc.ca>
* Bug fixes

* Very small fix

* Chunk method UI change
Brings over changes from before the merge with main, which were not pushed to the `ragforge` branch before the merge.
* Add init backend reranker scripts

* Add [rag] option to pip install. Check RAG dependencies on startup; if present, show RAG nodes.

* Doc Upload files now appear by default.

* Add reranker test, verify it passes, and adjust reranker to use v2 of Cohere's API

* Pass API keys to reranker backend and check RAG availability
…#399)

* fixed checkRagAvailable as chromadb was not required and the check was redundent due to the function IS_RAG_AVAILABLE

* added conda environment.yml file for easier installation of the Chaninforge
* Retrieval fusion and bug fixes

* Polish the Fusion functionality, reduce the file sizes of the example flows, and build fusion modal using RJSF

* Small fix
* adding bun.lock to gitignore because I use bun instead of npm

* added Cohere API key block in Globalsettings and the relevant textfield also add Deepseek entry as the textfield already existed

* fixed Rerank UI
Updated Python package workflow to use Python 3.10 and removed macOS versions 13 and 14. Changed installation steps to use editable mode and added installation for RAG extras.
* Bug fixes

* Small fix

* Reverted VisNode and added minimal tweak to fix y-axis alignment bug

* Bug fixes

* Chunk methods menu reordering and deleted SDPM Chunker as it depreciated

* Implementation of study feedback

* Retrieval pop up message fix

* Small fix

* message fix

* implement copilot suggestions

* Revert "implement copilot suggestions"

This reverts commit 6d14908.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants