Rebuild Annotation App by nadahlberg · Pull Request #145 · freelawproject/classifier-experiments

nadahlberg · 2026-04-13T18:03:11Z

This PR includes a from-scratch rebuild of the annotation app. Fixes #144 see issue for the motivation.

An Agent-first Annotation App

Overview

Similar to the old app, the core structure of this app is that there are Projects which include Labels and Documents. Data can be added to projects from a list of texts or a DataFrame. Columns other than text are stored in Documents in a meta dict, which we may add search support for in the future.

from clx.models import Project

project = Project.objects.get(name="My Project")
project.add_docs(texts)

Run the app with clx manage runserver. Projects and labels can be added through the UI.

Texts get a trigram index for making expensive boolean logic on substring comparisons more efficient. A text_prefix field (50 chars) is added with an index for more efficient startswith queries.

Examples can be sampled into a label's training set, these are stored as LabelDocuments.

LabelDocuments can be linked to an Annotation. Annotations are scoped by source although currently only source="agent" is supported.

Agent Tools

The agent has several tools allowing it to manage itself and the label training data:

Memory management tools:

Consolidate Memory: Generate a compacted summary of the current session replacing the conversation history. Used to keep costs manageable with long running sessions.
Clear Tool History: Wipes the tool inputs and outputs of earlier tool calls / response messages in the conversation. Search and annotation tools add a ton of tokens. This allows us to drop these inputs, dramatically reducing a session's tokens without losing anything else from the conversation history.
Update project / label instructions: Project and label instructions are injected in the agent's system prompt for long term memory across sessions. Agents have tools to overwrite or append to these.

Data management tools:

Search: Allows the agent to search through the project data. It has an expressively powerful text query parameter, allowing it to do substring searches with arbitrarily composed AND, OR, NOT, and STARTSWITH operators. Search can also filter documents by their annotation and prediction values for the current label.
Add Trainset Examples: Allows the agent to sample examples from a search into the trainset. Examples are added from a prior search tool call.
Annotate: Allows the agent to annotate examples in bulk. Annotations reference document ids returned by the search tool.

User interaction tools:

Ask User Question: This tool allows the agent to pause an automated workflow and present a question and proposed answer choices to the user. The workflow will be flagged as requiring user input, so that the user can easily jump around to tasks that require intervention.
Complete Task: This tool is used for tasks run in autopilot mode, allowing the agent to signal that a task is complete so it can move on to the next.

Workflow Prompts

There are currently three pre-defined steps that can be customized per-project. These are just prompts that instruct the agent to take certain actions in a specific order. Users can overwrite the default instructions for their project, making this fully customizable depending on project needs. Users can also define custom project-specific workflow prompts that can be triggered manually.

Label Understanding: This is a prompt that instructs the agent to get familiar with the data, come up with an initial hypothesis about the label's annotation boundaries, and then conduct an interview with the user to clarify it's understanding of the label
Sampling Strategy: This is a prompt that instructs the agent on how to build an initial training set. The default instructions mirror our approach for the old version of this app. We ask the agent to come up with minimal and likely keyword conditions and sample evenly across excluded, neutral, and likely buckets. The agent is then asked to sample from searches anchored on specific tricky language.
Annotate: This instructs the agent to search for any unannotated examples and annotate them in batches of 100. The agent is also instructed to clear its tool history between batches to free up context.

Patterns for Dynamic Configs with Natural Language

Our sampling strategy is quite similar to our old approach, except things like the creation of minimal / likely condition queries or the number of examples to target for the trainingset are not baked into the application logic. Instead the workflow steps are just described in the workflow prompt.

One pattern that is useful here is the ability to have a prompt that asks the agent to store information in the project or label instructions if it doesn't already exist. For example, in our sampling prompt we tell the agent that the project instructions should mention the number of examples to target for each trainingset, and that if this isn't present to ask the user about it and add it. So for the first label in a project, the agent will prompt the user for this info. Then on subsequent labels it will see the needed info in the project instructions and move forward automatically. Thus we can use project and label instructions as a kind of dynamic configuration file that can be used any way your project needs.

We do something similar with minimal and likely keyword searches -- asking the agent to store it in label memory so it can be referenced in future tasks.

Autopilot Mode

Autopilot mode is a worker that runs in the background and automatically runs the three pre-defined workflows for each label. When a label is initialized, a label understanding task is triggered. Once the user has responded to any open questions, the sampling strategy is triggered. Then the annotation workflow runs on repeat any time there are unannotated examples.

Autopilot runs in a dedicated thread visible on the chat tab. All autopilot tasks run in the same long-running session which is automatically compacted when the session exceeds 25k tokens. The user can see the autopilot tasks running and can inject messages if it gets stuck.

Finetunes and Predictions

Once a label has annotated data, buttons will appear for finetuning a single-label classifier and making predicitons. Finetuning jobs are serverless and can be fired off in the background, a progress bar tracking the current steps. Predictions can then be added to the trainset from the finetuned model. These will display in search alongside the annotated value. We have search filter helpers for filtering on prediction-annotation disagreements and for sorting examples by confidence scores close to 0.5.

Multi-label Training

There is also a CLI script that makes a project-level trainset that combines the unique documents from all labels' trainsets. It will compute predictions across all examples for all labels, updating automatically when a new model has been finetuned and the predictions are stale. These are then used as a training set for training a single, multi-label model that can make predictions across all labels.

Screenshots and examples

Chat with an agent directly to manage your label

Agent pauses automated workflows and requests input from users

Search through project data, review annotations, predictions, confidence scores

Customize and add new workflow prompts for your project

Trigger finetuning and prediction Jobs

See autopilot progress and what needs human input

Get an overview of label progress, trainset distribution, and finetune scores

Delete cc/ folder and move docket data script to projects/ - Remove cc/CLAUDE.md and cc/helpers.py (no longer needed) - Move clx/cli/generate_docket_sample.py to projects/docket_data/run.py as standalone argparse script - Unregister generate_docket_sample from CLI commands Closes #1

Add base model, Document model, and project/document views - Add abstract Base model with ShortUUIDField PKs and timestamps - Refactor Project to inherit from Base - Add Document model with text, meta, shuffle_key, text_hash, and dedup constraint - Add Project.add_docs() for bulk document ingestion via django-postgres-copy - Add projects grid page with Alpine.js new-project modal - Add project detail page with paginated document list - Add API endpoints: GET/POST /api/projects/, GET /api/projects/<id>/docs - Add django-shortuuid dependency, bump requires-python to >=3.13 Closes #3

Redesign projects and detail pages with polished styling - Add Tailwind config with design tokens, Inter font, and sticky global nav to base.html - Restyle project cards with border-based hover instead of shadows - Polish new-project modal with rounded-xl, neutral color scheme - Rework project detail page with fixed 280px sidebars, CSS grid layout, independently scrollable panels, pinned doc count header, and refined pagination - Add .scrollable-panel CSS utility Closes #5

- Remove main navbar on project detail page (project navbar replaces it) - Match project navbar height to main navbar (h-12) - Make sidebars and count header fixed during scroll using flex layout with overflow-hidden on the outer container - Show full document text instead of 50-char prefix on cards - Results already ordered by shuffle_key (no change needed) https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

The flex column container needs min-h-0 so its overflow-y-auto child can properly constrain and scroll its content. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Move the h-screen flex-col overflow-hidden layout to the body in base.html so all pages get a fixed navbar with scrollable content below. Update projects page content to use flex-1 overflow-y-auto. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Remove the project navbar entirely. Add a header box at the top of the left sidebar showing the project name and a Font Awesome layer-group icon linking back to all projects. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add text search input in the filters sidebar with search button - Filter documents server-side with case-insensitive text contains - Make counting lazy: show a "Get count" button instead of auto-counting - Clicking the button fetches the count via a separate API endpoint - Changing filters and searching resets the count back to button state https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Replace Django Paginator (which runs COUNT) with manual slicing that fetches one extra row to detect has_next. Pagination now shows only the current page number. Count is only fetched when the user clicks the Get count button. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Update search input focus ring and search button to use blue instead of gray. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Bump text-gray-400 to text-gray-500 and text-muted-foreground to text-gray-600 throughout the project detail page. Also darken the New Project card text and apply blue primary color to the projects modal Create button and input focus ring. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add Label model with name and project FK, unique together - Add active_label FK on Project - Lazily create "Initial Label" when visiting a project with no labels - Right sidebar header shows active label with dropdown to switch - Switching labels persists via API so the page reloads to the last used label - Migration not included (user will add manually) https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Move chevron-down to left of label name, remove tag icon - Add truncation to label names in dropdown - Widen right panel from 280px to 340px - Add plus and gear icon buttons in label header - Plus opens a modal to create a new label (auto-switches to it) - Gear opens a modal to rename the active label - Add create_label_api and rename_label_api endpoints https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add clx/query.py with Pydantic query schema, recursive descent parser, Django Q builder, and TextQuerySet/TextManager - Parser: comma=AND, pipe=OR, tilde=NOT, caret=STARTSWITH, parens for grouping. AND binds lower than OR so `A, B | C` = `A AND (B OR C)` - STARTSWITH queries use text_prefix field for better index utilization - Document model now uses TextManager with query_string() helper - Views filter with ?q= param instead of ?text= - Filters panel renamed to "Query" with info button showing syntax help https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Starts at one row and grows as content wraps to new lines. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Use .exact modifier so only bare enter triggers search; shift+enter inserts a newline as normal. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Use explicit shiftKey check instead of Alpine .exact modifier which doesn't behave as expected. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add tab navbar above center content with Agent, Search, Labels, Settings - Agent tab: placeholder (default active tab) - Search tab: existing doc search with count sub-nav and pagination - Labels tab: shows all labels as cards - Settings tab: project name editing with rename API - Searching from filters panel auto-switches to Search tab - Add rename_project_api endpoint https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Settings content scrolls independently with a fixed Save button at the bottom, ready for additional settings fields. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Increase height to h-12, text to text-sm, active weight to font-semibold, and add bg-surface to visually distinguish from the smaller count nav below. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Tabs stretch to fill the full width equally with bg-gray-50 inactive and bg-gray-100 active, separated by border dividers. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

All tab content areas and the count sub-nav now have bg-surface (white), contrasting with the gray tab bar above. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add prediction (yes/no) and prediction_confidence (float) fields to LabelDocument model - Add predicted_at field to Label model - Add Label.predict() method that runs remote pipeline across all label documents and bulk-updates predictions with normalized confidence (abs(score - 0.5) * 2) - Add predict_label_api endpoint, guarded by completed finetune status - Add Run Predictions button in sidebar, only visible when finetune is completed https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Include finetuned_at and predicted_at in labels API response - Overview tab: show Predicted (teal check) or Needs predictions on label cards when finetune is completed - Active label panel: show Predictions status row with same logic - Update predicted_at on activeLabel after successful prediction run - Sync finetuned_at and predicted_at during polling https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add prediction_stats JSONField to Label model - After running predictions, compute F1, accuracy, precision, recall using only yes/no annotated examples as ground truth - Include prediction_stats in labels API and predict API responses - Show F1, accuracy, and sample count in active label panel - Sync prediction_stats during polling https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

All three statuses (Instructions, Finetuned, Predicted) now use the same style: green circle-check when done, gray circle with "Needs X" when not. Training in-progress shows a spinner. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Layout is now: Model status → progress bar → training args → Run Finetune button → Predictions status → prediction stats → Run Predictions button. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Remove "Needs X" prefix — all three statuses now always show the same text (Instructions, Finetuned, Predicted) with green/gray color and check/circle icon to indicate completion. Predictions status is always visible, not gated on finetune completion. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Display F1, accuracy, and sample count below the annotation stats bar on label cards when prediction stats are available. Also fix stray closing tag. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add filter_prediction() to SearchQuerySet with support for 'yes', 'no', 'any', and 'disagree' (prediction != agent annotation) - Update _filtered_documents to handle prediction query param - Rename "Training Data" label to "Annotations" in search filters - Add "Predictions" filter section with yes/no/any/disagree buttons - Annotation and prediction filters are mutually exclusive https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add with_prediction() to SearchQuerySet for annotating docs with prediction value and confidence via subquery - Include prediction and prediction_confidence in docs API response - Show prediction badge on search cards: outlined border style with circle-nodes icon to distinguish from solid annotation badges - Confidence shown on hover via title attribute https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Support filtering by prediction value: 'yes', 'no', 'any', or 'disagree' (prediction differs from agent annotation). Uses the existing filter_prediction() queryset method. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add latest_annotation_at (yes/no only) to labels API stats via Max aggregate - Label cards show finetune status as amber with empty circle when completed but annotations are newer than finetuned_at, matching the active label panel behavior https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add sort param to docs API: 'shuffle' (default) or 'tricky' (ascending prediction confidence — least confident first) - Add grouped sort selector + pagination bar in search results UI - Add sort field to agent Search tool with same options - Use fixed-width badges for annotation and prediction on search cards https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Shows currently selected sort with a chevron; clicking opens a dropdown with checkmark on the active option. Pagination is now a separate grouped element beside it. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Display confidence percentage to the left of the annotation and prediction badges, right-aligned in a fixed-width column. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Order is now: confidence % → prediction badge → annotation badge https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add Label.recalculate_prediction_stats() method that recomputes F1/accuracy from existing predictions vs current annotations - Add prediction-stats API endpoint - Add small rotate icon next to "Predictions" label in active label panel that triggers recalculation on click https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Project.project_dir returns CLX_HOME/projects/{project_id} - Project.export_data() dumps document id+text to docs.csv in batches, with tqdm progress bar. Tracks last export time in exported.txt and only exports documents created since then on subsequent calls. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

- Add cancel_finetune_api endpoint that sends cancel request to RunPod and sets finetune_status to error so user can restart - Show Cancel button below the Training button when finetune is in progress; stops polling and refreshes status on click https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

nadahlberg and others added 30 commits April 3, 2026 13:34

wipe old app

c3ca096

remove old cli commands

4e76750

move side projects to experiments dir

4043c6d

Fix document area scrolling by adding min-h-0 to center column

8f0a8bd

The flex column container needs min-h-0 so its overflow-y-auto child can properly constrain and scroll its content. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Increase pagination to 100 results per page

650ffc2

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Use blue as primary color for filters panel

69089e4

Update search input focus ring and search button to use blue instead of gray. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Thin the focus ring on the filters text input

1bfd721

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

add label migrations

c68bb31

Change query input to auto-expanding textarea

3581fce

Starts at one row and grows as content wraps to new lines. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Allow shift+enter for newlines in query textarea

6943b39

Use .exact modifier so only bare enter triggers search; shift+enter inserts a newline as normal. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Fix enter key handling in query textarea

b84d155

Use explicit shiftKey check instead of Alpine .exact modifier which doesn't behave as expected. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Add x-cloak to hidden tab panels to prevent flash on load

b738694

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Move settings Save button to pinned footer bar

b66ed44

Settings content scrolls independently with a fixed Save button at the bottom, ready for additional settings fields. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Make tab navbar more prominent than count sub-nav

2225f11

Increase height to h-12, text to text-sm, active weight to font-semibold, and add bg-surface to visually distinguish from the smaller count nav below. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Make tab buttons wall-to-wall with permanent backgrounds

59bbba6

Tabs stretch to fill the full width equally with bg-gray-50 inactive and bg-gray-100 active, separated by border dividers. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Add white backgrounds to center content panels

899648c

All tab content areas and the count sub-nav now have bg-surface (white), contrasting with the gray tab bar above. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Move settings Save button to header row next to title

d716db4

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

claude and others added 29 commits April 9, 2026 18:19

Use fa-circle-nodes icon for finetune button

503083d

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Standardize label card status indicators

506639d

All three statuses (Instructions, Finetuned, Predicted) now use the same style: green circle-check when done, gray circle with "Needs X" when not. Training in-progress shows a spinner. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Reorder active label panel: finetune button above predictions section

31e9654

Layout is now: Model status → progress bar → training args → Run Finetune button → Predictions status → prediction stats → Run Predictions button. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Show prediction stats on overview tab label cards

757d219

Display F1, accuracy, and sample count below the annotation stats bar on label cards when prediction stats are available. Also fix stray closing tag. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Add divider between model and predictions sections in label panel

7241d8c

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Use inset ring for prediction badge on search results

91bccf6

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Add prediction filter to agent Search tool

4b01aad

Support filtering by prediction value: 'yes', 'no', 'any', or 'disagree' (prediction differs from agent annotation). Uses the existing filter_prediction() queryset method. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Add offset parameter to Search tool for pagination

b67a927

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Make model/predictions divider go wall to wall

8935810

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Replace sort button group with dropdown selector

2b3e794

Shows currently selected sort with a chevron; clicking opens a dropdown with checkmark on the active option. Pagination is now a separate grouped element beside it. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Show prediction confidence score on search result items

3a49bfa

Display confidence percentage to the left of the annotation and prediction badges, right-aligned in a fixed-width column. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Move annotation badge after prediction on search results

5d68718

Order is now: confidence % → prediction badge → annotation badge https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Move imports to module level, default batch_size to 100k

93ef58e

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

Wipe docs.csv and exported.txt on export error

9aa9711

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

add migrations

8c16e6e

Catch BaseException in export_data to handle KeyboardInterrupt

99bb0be

https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme

add multilabel training cli script

e012c37

add predicitons to multi-label train script

8a2fb75

nadahlberg merged commit 1832fba into freelawproject:main Apr 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rebuild Annotation App#145

Rebuild Annotation App#145
nadahlberg merged 228 commits intofreelawproject:mainfrom
nadahlberg:claude/fix-project-navbar-ut8vj

nadahlberg commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nadahlberg commented Apr 13, 2026

An Agent-first Annotation App

Overview

Agent Tools

Workflow Prompts

Patterns for Dynamic Configs with Natural Language

Autopilot Mode

Finetunes and Predictions

Multi-label Training

Screenshots and examples

Chat with an agent directly to manage your label

Agent pauses automated workflows and requests input from users

Search through project data, review annotations, predictions, confidence scores

Customize and add new workflow prompts for your project

Trigger finetuning and prediction Jobs

See autopilot progress and what needs human input

Get an overview of label progress, trainset distribution, and finetune scores

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants