Skip to content

Feature request: adding links to documentation websites/other data sources #62

@abhi12299

Description

@abhi12299

What?

Similar to Cursor AI's feature, we can add the ability for the user to be able to add links to any websites and allowing the LLM to reference them during the conversation.

How?

  1. When the user adds link to a website, scrape it using the standard fetch API (this may not be allowed on many websites, but we can implement a workaround by using a paid service like smartproxy's web scraper)
  2. Parse the HTML and create small text chunks. This is the most important step - a good chunking algorithm will lead to better output
  3. Create embeddings for chunks (using the anthropic API) and save them locally in the melty directory. This can be done using sqlite and sqlite-vec. We can have a single sqlite database for all embeddings as they will be shared across conversation
  4. During the conversation, if the user includes the website, perform a similarity search and retrieve the top 5-10 chunks based on the available context window
  5. Pass the relevant context to the LLM.

Additionally, we can even display the references on the basis of which the LLM responded. This can be done by:

  1. Once the LLM finishes responding, we take the response and create an embedding (using anthropic API)
  2. Perform similarity search between the embedding created above and the embeddings of the chunks we sent
  3. Show the top 3 chunks as references. Rather than displaying as simple text, we can include metadata such as the URL, title and favicon of the website during the chunking process (in the previous list's step 2) and show a nicely formatted reference like Google search where we include the metadata and show the relevant chunk's text below it. Something like this:
image

This is a typical RAG workflow, however, it is a feature which needs a lot of time to implement and get right. I would love to start working on this. This issue is meant to be a technical discussion on the implementation details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions