Final Deliverables by error9098x · Pull Request #61 · The-OpenROAD-Project/ORAssistant

error9098x · 2024-08-28T04:54:17Z

Added Chat History for Streamlit
Updated Human_Eval wrt new backend
Added auto eval - chainforge, script based
Added analysis -
- discussion directly puts latest discussion dataset from hugging face
- previously conducted evaluation visualisation
Added NextJs frontend

error9098x · 2024-08-28T07:46:56Z

apologies for the last-minute PR. It passed pycheck strict locally, but there are issues with the checks now. I'm currently in-flight to New Zealand for a week-long conference and will try to fix these issues as soon as possible. Let me know if any other updates are needed.

luarss · 2024-08-28T10:08:42Z

Also you need to signoff your commits. See the DCO section for help.

edit: You need to add the frontend/requirements-test.txt file for .github/workflows/mypy.yaml CI to work.,

  frontend:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: 3.12
      - name: Install dependencies
        run: |
          pip install -r frontend/requirements-test.txt
      - name: Run MyPy
        run: |
          python -m mypy --strict frontend
      - name: Run pre-commit
        run: |
          pre-commit run --files frontend/*

luarss · 2024-08-28T10:10:14Z

analysis/evaluation_analysis/eval_analysis.py

@@ -0,0 +1,277 @@
+import streamlit as st


can you rename the dataset files to be separated with _?

instead of Human Eval Dataset.csv -> Human_Eval_Dataset.csv

luarss · 2024-08-28T10:10:37Z

analysis/evaluation_analysis/readme.md

@@ -0,0 +1,15 @@
+# Retriever Metrics Visualization


file rename to README.md

luarss · 2024-08-28T10:11:21Z

analysis/evaluation_analysis/readme.md

+## Dependencies
+- streamlit
+- pandas
+- plotly


why are there ._***.py files? what are their purposes

luarss · 2024-08-28T10:12:01Z

evaluation/auto_evaluation/chainforge_autoeval/Gemini 1.5 Pro Wrapper.py

@@ -0,0 +1,74 @@
+from chainforge.providers import provider


Python files should not contain spaces

luarss · 2024-08-28T10:17:53Z

evaluation/auto_evaluation/chainforge_autoeval/Or Assistant.py

@@ -0,0 +1,110 @@
+import requests


filename change and put main logic under if __name__ == "__main__":

luarss · 2024-08-28T10:27:24Z

evaluation/auto_evaluation/script_based/readme.md

@@ -0,0 +1,66 @@
+# OpenROAD Retriever Benchmark
+


luarss · 2024-08-28T10:31:37Z

evaluation/auto_evaluation/script_based/eval_benchmark.py

+    "response_mime_type": "text/plain",
+}
+
+safety_settings = {


What is the default value for this i am wondering. also for production we need to think whether to enable some blocking of harmful outputs.

What is the default value for this i am wondering. also for production we need to think whether to enable some blocking of harmful outputs.

its default, cause there was issue with the safety settings, sometimes it would block certain questions without having any harmful content which hinders the evaluation script.

That's fine, but I was curious what is the default safety level. I get that we have to disable it for evaluation, but for production we might need it back since we didn't set it for our backend code.

I guess my question is can we make the evaluation mode as close to production mode as close as possible.

luarss · 2024-08-28T10:35:35Z

frontend/nextjs_frontend/orassistant-frontend/.gitignore

@@ -0,0 +1,36 @@
+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.


any reason these src folders are nested in another folder orassistant-frontend? if not, prefer it to be under nextjs_frontend directly

luarss · 2024-08-28T10:36:29Z

frontend/nextjs_frontend/readme.md

@@ -0,0 +1,47 @@
+# NextJS Frontend for ORAssistant


luarss · 2024-08-28T10:38:04Z

frontend/streamlit_frontend/streamlit_app.py

                    message_placeholder = st.empty()

-                    response_buffer = ''
+                    # Option 1: Streaming effect - Send response in chunks


There seems to be only one option...

luarss · 2025-01-06T15:32:10Z

@error9098x Please add the nextJS code in a separate PR. Thanks!

error9098x · 2025-01-06T16:05:37Z

@error9098x Please add the nextJS code in a separate PR. Thanks!

Sure.

error9098x added 5 commits August 28, 2024 07:58

added chat history

59e4c92

updated human eval

b563bb9

added new frontend nextjs

1ec5681

added autoeval

0a94a9a

added analysis

6f8d3f6

luarss reviewed Aug 28, 2024

View reviewed changes

luarss added the priority Priority item label Sep 5, 2024

luarss mentioned this pull request Oct 28, 2024

Automated feedback loop #75

Open

1 task

luarss closed this Jan 5, 2025

luarss reopened this Jan 6, 2025

		@@ -0,0 +1,36 @@
		# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.

Conversation

error9098x commented Aug 28, 2024

Uh oh!

error9098x commented Aug 28, 2024

Uh oh!

luarss commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luarss Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luarss commented Jan 6, 2025

Uh oh!

error9098x commented Jan 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luarss commented Aug 28, 2024 •

edited

Loading

luarss Aug 31, 2024 •

edited

Loading