added validation using tonic validate by varchanaiyer · Pull Request #141 · StampyAI/stampy-chat

varchanaiyer · 2024-05-20T14:52:31Z

Screen.Recording.2024-05-20.at.10.39.56.PM.mp4

Demo video attached. For now, I am printing the output. But I can either push it to langsmith or to a db.

mruwnik · 2024-05-20T18:05:47Z

+        RetrievalPrecisionMetric()
+    ])
+    run = scorer.score_responses([llm_response])
+    print(run.overall_scores)


from stampy_chat import logging logging.info(run.overall_scores)

you can also send messages to discord with the logger

That would be really cool! I will add that in!

mruwnik · 2024-05-20T18:09:18Z

    result = chain.invoke({"query": query, 'history': history}, {'callbacks': []})
+
+    #Validate results
+    contexts=[c['text'] for c in get_top_k_blocks(query, 5)]


you're querying the vectorstore twice for each request, which will make things a lot slower - how hard would it be to reuse the previously fetched examples?

Yes you are right. I had a lot of challenges in this part and it took the majority of my time with this ticket.

Unfortunately, I could not figure out how to get a hook into the LangChain Semantic Similarity Selector. Without a hook, there is no way to get the previously fetched examples. I also looked at the source code of RAGAs (another RAG validation framework) and saw that they too could not figure out how to put a hook into LangChain and stopped supporting it last year.

I tested the latency and saw that querying the vectorstore does not add much latency and unlike querying an LLM, there is no additional cost to it. This is why I decided to query the vector store again. I think we can put this in the backlog and monitor if LangChain updates their API

mruwnik · 2024-05-20T18:11:26Z

+        AugmentationAccuracyMetric(),
+        RetrievalPrecisionMetric()
+    ])
+    run = scorer.score_responses([llm_response])


is this actually returned to the user? If not, then I'd suggest doing it after notifying the frontend that everything is done, i.e. moving the if callback (...) clause before these

Makes sense. I will make this change. The reason I kept it this way is because I was worried that a "fast" user might ask the next question before the evaluation is done running. This might overwhelm the system and spawn a lot of processes.

I don't think this will be useful for our users. We may want to track it internally first and then decide what would be the best way to display this to users. For instance, I don't think users will know what "context-precision" is or care about it.

mruwnik · 2024-05-20T18:13:05Z

    result = chain.invoke({"query": query, 'history': history}, {'callbacks': []})
+
+    #Validate results
+    contexts=[c['text'] for c in get_top_k_blocks(query, 5)]


I'm guessing this shouldn't be executed for each query. How about adding a flag to the settings object?

I think that this is required for each query because tonic_validate, checks to see if the LLM's answer is consistent with the context that was retrieved.

yes, but should it always do that, is the question. This makes the query slower, so for now I'd just use it for testing, rather than always running it

I get what you mean. I will add a flag to the settings object to prevent it from running all the time.

mruwnik · 2024-05-20T18:14:24Z

you have something wrong with the pipenv dependancies, which you'll have to fix. Did you add dependancies with pipenv install <whatever>?

varchanaiyer · 2024-05-25T04:05:23Z

you have something wrong with the pipenv dependancies, which you'll have to fix. Did you add dependancies with pipenv install <whatever>?

Yes that is how I added dependencies. Is that the wrong way of doing it? I am not familiar with pipenv (have been using vanilla virtualenv all this time), so maybe I missed a step?

varchanaiyer · 2024-05-27T05:30:32Z

#136

added validation using tonic validate

af7bc23

mruwnik reviewed May 20, 2024

View reviewed changes

varchanaiyer force-pushed the tonic_validate branch from 0d257ce to af7bc23 Compare May 21, 2024 14:37

fixed pipfile issue

fc946c3

ccstan99 mentioned this pull request Jun 10, 2024

Refine prompts #3

Open

Conversation

varchanaiyer commented May 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mruwnik commented May 20, 2024

Uh oh!

varchanaiyer commented May 25, 2024

Uh oh!

varchanaiyer commented May 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants