Feature request: adding links to documentation websites/other data sources

### What?
Similar to Cursor AI's feature, we can add the ability for the user to be able to add links to any websites and allowing the LLM to reference them during the conversation.

### How?
1. When the user adds link to a website, scrape it using the standard `fetch` API (this may not be allowed on many websites, but we can implement a workaround by using a paid service like smartproxy's web scraper)
2. Parse the HTML and create small text chunks. This is the most important step - a good chunking algorithm will lead to better output
3. Create embeddings for chunks (using the anthropic API) and save them locally in the melty directory. This can be done using sqlite and [sqlite-vec](https://github.com/asg017/sqlite-vec). We can have a single sqlite database for all embeddings as they will be shared across conversation
4. During the conversation, if the user includes the website, perform a similarity search  and retrieve the top 5-10 chunks based on the available context window
5. Pass the relevant context to the LLM.

Additionally, we can even display the references on the basis of which the LLM responded. This can be done by:
1. Once the LLM finishes responding, we take the response and create an embedding (using anthropic API)
2. Perform similarity search between the embedding created above and the embeddings of the chunks we sent
3. Show the top 3 chunks as references. Rather than displaying as simple text, we can include metadata such as the URL, title and favicon of the website during the chunking process (in the previous list's step 2) and show a nicely formatted reference like Google search where we include the metadata and show the relevant chunk's text below it. Something like this:
<img width="620" alt="image" src="https://github.com/user-attachments/assets/22dab2c4-7da2-4a20-8abd-3348679ebd81">

---
This is a typical RAG workflow, however, it is a feature which needs a lot of time to implement and get right. I would love to start working on this. This issue is meant to be a technical discussion on the implementation details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: adding links to documentation websites/other data sources #62

What?

How?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: adding links to documentation websites/other data sources #62

Description

What?

How?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions