l8r Customer Service Chatbot

A Next.js customer service chatbot for the fictional "l8r" buy-now-pay-later service, with comprehensive evaluation capabilities using Braintrust.

Prerequisites

Node.js 18+
npm
OpenAI API key
Braintrust API key

Setup

1. Install dependencies

npm install

2. Configure environment variables

Copy the example environment file and add your API keys:

cp .env.example .env

Edit .env with your actual keys:

DATABASE_URL="file:./dev.db"
OPENAI_API_KEY=sk-...
BRAINTRUST_API_KEY=bt-...

3. Initialize the database

The app uses SQLite with Prisma. Initialize and seed the database:

npm run db:reset

This will:

Create the SQLite database at prisma/dev.db
Apply the schema (Users, Orders, InstallmentPlans, Payments, RefundRequests)
Seed with sample customer data

Running the App

Start the development server:

npm run dev

Open http://localhost:3000 to interact with the chatbot.

Evaluations

This project includes two evaluation approaches for testing the chatbot across multi-turn conversations:

Understanding Multi-turn Evaluation Approaches

Approach	Best For	How It Works
Simple (Next-Turn)	Regression testing, fast iteration	Takes conversation snapshots and predicts the next turn
Comprehensive (Simulation)	End-to-end testing, realistic scenarios	Simulates complete conversations with AI-generated users

1. Next-Turn Evaluation (`conversation.eval.ts`)

Tests the agent's ability to produce the correct next response given a conversation history.

When to use:

Fast regression testing during development
Validating specific conversation patterns
Testing tool call behavior

Run the eval:

npx braintrust eval evals/conversation.eval.ts

What it evaluates:

ToolCallCheck - Did the agent call the expected tools?
StructureCheck - Is the response well-formatted for customer service?
HelpfulnessCheck - Does the response address the customer's needs?

Data source: Uses the L8rCustomerServiceDataset from Braintrust, which contains conversation snapshots at various points.

2. Simulated Conversation Evaluation (`sim.eval.ts`)

Runs complete multi-turn conversations between a simulated user and the agent.

When to use:

End-to-end conversation testing
Testing conversation flow and resolution
Evaluating how well the agent achieves user goals

Run the eval:

npx braintrust eval evals/sim.eval.ts

How it works:

Profile Extraction - Extracts the user's goal, personality, and communication style from a seed conversation
Conversation Loop - The simulated user and agent exchange messages until:
- The user is satisfied (goal achieved)
- The user is frustrated (agent unhelpful)
- Max turns reached (10 turns)
Scoring - Evaluates the entire conversation

User Personalities:

direct - Brief, to-the-point, ends conversation promptly
exploratory - Asks follow-up questions, explores related topics
frustrated - Impatient, expresses dissatisfaction if not helped quickly
friendly - Polite, appreciative, engages in pleasantries
confused - Needs extra clarification, uncertain

What it evaluates:

ResolutionCheck - Did the agent handle the conversation appropriately?
GoalAchievement - Did the agent help achieve the user's goal (or correctly redirect off-topic requests)?
QualityCheck - Was the conversation professional and coherent?
EfficiencyCheck - How many turns did it take to resolve?

Data source: Uses the same L8rCustomerServiceDataset, treating each entry as a seed for generating a simulated user.

Customizing Evaluations

Both evals support a instructions parameter to test different system prompts via remote evals:

npx braintrust eval evals/sim.eval.ts --dev'

Hosting the remote eval dev server (Modal)

This repo includes deploy/modal_eval_server.py, which starts Braintrust's eval dev server for the TypeScript evals and exposes it as an HTTPS endpoint on Modal.

Prereqs:

Install Modal locally (pip install modal) and authenticate (modal setup)
Ensure your .env contains BRAINTRUST_API_KEY OPENAI_API_KEY

Deploy:

modal deploy deploy/modal_eval_server.py

Database Commands

Command	Description
`npm run db:push`	Push schema changes to database
`npm run db:seed`	Seed database with sample data
`npm run db:reset`	Reset database and reseed (destructive)

Project Structure

├── src/
│   ├── app/              # Next.js app router
│   │   └── api/chat/     # Chat API endpoint (streaming)
│   └── lib/
│       ├── chatbot/      # Chat logic, tools, system prompt
│       └── braintrust.ts # Braintrust tracing setup
├── evals/
│   ├── conversation.eval.ts  # Next-turn evaluation
│   └── sim.eval.ts           # Simulated conversation evaluation
├── prisma/
│   ├── schema.prisma     # Database schema
│   └── seed.ts           # Database seeding script
└── assets/
    └── Multiturn.png     # Evaluation approach diagram

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
deploy		deploy
evals		evals
prisma		prisma
src		src
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
pyproject.toml		pyproject.toml
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

l8r Customer Service Chatbot

Prerequisites

Setup

1. Install dependencies

2. Configure environment variables

3. Initialize the database

Running the App

Evaluations

Understanding Multi-turn Evaluation Approaches

1. Next-Turn Evaluation (`conversation.eval.ts`)

2. Simulated Conversation Evaluation (`sim.eval.ts`)

Customizing Evaluations

Hosting the remote eval dev server (Modal)

Database Commands

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

l8r Customer Service Chatbot

Prerequisites

Setup

1. Install dependencies

2. Configure environment variables

3. Initialize the database

Running the App

Evaluations

Understanding Multi-turn Evaluation Approaches

1. Next-Turn Evaluation (conversation.eval.ts)

2. Simulated Conversation Evaluation (sim.eval.ts)

Customizing Evaluations

Hosting the remote eval dev server (Modal)

Database Commands

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Next-Turn Evaluation (`conversation.eval.ts`)

2. Simulated Conversation Evaluation (`sim.eval.ts`)

Packages