Build a chatbot (Web App or CLI) that demonstrates how you’d ship an AI-first product feature:
- File-grounded Q&A (RAG) with citations
- Durable memory written to markdown
- (Optional) Safe compute tool calling with Open-Meteo time series analysis
You may implement one feature or multiple. Partial implementations are acceptable.
-
Root
README.mdis the submission README (Quick Start + Video link goes here). -
sample_docs/is optional sample input data to test ingestion quickly;sample_docs/README.mdis only documentation for that folder. -
README.md Main instructions. You must update the Participant Info, Quick Start section and paste your Video Walkthrough link.
-
ARCHITECTURE.md
Brief architecture overview (1–2 pages). Explain ingestion, retrieval/citations, memory logic, and optional sandbox. -
EVAL_QUESTIONS.md
Example questions you can use to test your bot and to guide your demo/video. -
USER_MEMORY.md
Your app must write selective user-specific memory here (high-signal facts only). -
COMPANY_MEMORY.md
Your app must write selective org-wide memory here (reusable learnings only). -
sample_docs/
Optional small docs for quick local testing. -
scripts/sanity_check.sh
Judge helper: runsmake sanityand validates output format. -
scripts/verify_output.py
Validator forartifacts/sanity_output.json. -
artifacts/sanity_output.json
This file is generated bymake sanity. It is required for evaluation.
- Full Name:
- Email:
- GitHub Username:
Users can:
- Upload files and add them to a RAG pipeline (parse → chunk → index)
- Ask questions later and receive answers grounded in uploaded content
- Provide citations pointing to source chunks/sections
Minimum expectation: working ingestion + retrieval + grounded response + citations.
Suggested test data: arXiv PDFs/HTML (open access).
Extra points:
- Hybrid retrieval (BM25 + embeddings), reranking, metadata filters
- Smart chunking (section-aware, semantic boundaries)
- Knowledge-graph flavored RAG
Add a memory subsystem that writes selective, high-signal knowledge to:
-
USER_MEMORY.md
Store user-specific facts worth remembering.
Example: “User is a Project Finance Analyst”, “Prefers weekly summaries on Mondays”. -
COMPANY_MEMORY.md
Store org-wide learnings useful to colleagues.
Example: “Asset Management interfaces often with Project Finance”, “Recurring workflow bottleneck is X”.
Rules:
- Selective (no transcript dumping)
- High-signal and reusable
- Avoid storing secrets or sensitive information
Implementation hint (optional):
Use an internal decision structure like:
{should_write, target, summary, confidence} and only append when confident.
Spin up a Python environment using llm-sandbox (or similar isolation) and allow the chatbot to execute an analysis task by calling a public time series API.
Use this API (no key required): Open-Meteo (historical + forecast weather time series).
The Chatbot should:
- Call Open-Meteo for a location/time range
- Retrieve time series data
- Compute basic analytics (rolling averages, volatility, missingness checks, anomaly flags, etc.)
- Return a clear explanation of findings
We care about safe execution boundaries + clean tool interface, not perfect data science.
Your repo must include:
README.mdwith setup + run instructions- A brief architecture overview in
ARCHITECTURE.md(or in this README) - A working demo flow (based on what you implemented):
- Upload → index → ask questions with citations
- Memory written into
USER_MEMORY.mdandCOMPANY_MEMORY.md - (Optional) Sandbox + Open-Meteo time series analysis
- Basic tests or at least a small sanity-check script (preferred)
- A short video walkthrough (5–10 minutes) demonstrating:
- The working product end-to-end
- Key design choices and tradeoffs
- What you would improve next with more time
You may use any language, framework, model, and any vector DB (FAISS/Chroma/pgvector/etc.).
Judges must be able to run:
make sanityYour make sanity must:
- Run a minimal end-to-end flow (based on what you implemented)
- Produce this file:
artifacts/sanity_output.json
Judges may also run:
bash scripts/sanity_check.sh(This script runs make sanity and validates the output format.)
Add your video link here:
PASTE YOUR LINK HERE
Submissions missing the Participant Info block may be deprioritized during review.
- Open the GitHub Classroom invite link provided to you after registration.
- Accept the assignment.
- GitHub Classroom will automatically create a new repository under your GitHub account.
- This new repo is your official submission repo.
Important:
- Do not submit work in the agentic-rag-chatbot-template repository. That is the starter/template repo.
- You must complete your work in the repository created for you by GitHub Classroom after you accept the assignment link.
- Only the GitHub Classroom-created repo will be evaluated.
Clone your Classroom repo and push your commits as usual.
In your Classroom repo:
- Fill in the Quick Start section in
README.md(exact run commands) - Paste your Video Walkthrough link in
README.md - Ensure
make sanityworks and generates:artifacts/sanity_output.json
- Ensure your app writes memory to:
USER_MEMORY.mdCOMPANY_MEMORY.md
Your submission is automatic once your code is pushed to your Classroom repo. No separate zip upload is required unless explicitly instructed.
We evaluate holistically:
- RAG answers are grounded and cite sources
- Graceful behavior when retrieval fails (no hallucinations)
- Clean structure and modular design
- Readable code and thoughtful naming
- Error handling and reproducibility
- Sensible retrieval design
- Thoughtful memory criteria
- Clear tradeoffs explained in README/architecture
- Prompt-injection awareness in RAG
- Sandbox isolation (if implementing Feature C)
- Safe handling of external API calls
These are optional enhancements. They are not required, but can earn bonus points if implemented well:
- Streaming responses
- Conversation history view
- Multi-user support
- File management tools (re-index / delete / inspect chunks)
- Simple evaluation harness with test questions and expected citations
Provide exact commands a judge can run.
Example (replace with your real commands):
# install dependencies
# run the app
# open UI or run CLI
See: EVAL_QUESTIONS.md