Rebuild Annotation App#145
Merged
nadahlberg merged 228 commits intofreelawproject:mainfrom Apr 13, 2026
Merged
Conversation
Delete cc/ folder and move docket data script to projects/ - Remove cc/CLAUDE.md and cc/helpers.py (no longer needed) - Move clx/cli/generate_docket_sample.py to projects/docket_data/run.py as standalone argparse script - Unregister generate_docket_sample from CLI commands Closes #1
Add base model, Document model, and project/document views - Add abstract Base model with ShortUUIDField PKs and timestamps - Refactor Project to inherit from Base - Add Document model with text, meta, shuffle_key, text_hash, and dedup constraint - Add Project.add_docs() for bulk document ingestion via django-postgres-copy - Add projects grid page with Alpine.js new-project modal - Add project detail page with paginated document list - Add API endpoints: GET/POST /api/projects/, GET /api/projects/<id>/docs - Add django-shortuuid dependency, bump requires-python to >=3.13 Closes #3
Redesign projects and detail pages with polished styling - Add Tailwind config with design tokens, Inter font, and sticky global nav to base.html - Restyle project cards with border-based hover instead of shadows - Polish new-project modal with rounded-xl, neutral color scheme - Rework project detail page with fixed 280px sidebars, CSS grid layout, independently scrollable panels, pinned doc count header, and refined pagination - Add .scrollable-panel CSS utility Closes #5
- Remove main navbar on project detail page (project navbar replaces it) - Match project navbar height to main navbar (h-12) - Make sidebars and count header fixed during scroll using flex layout with overflow-hidden on the outer container - Show full document text instead of 50-char prefix on cards - Results already ordered by shuffle_key (no change needed) https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
The flex column container needs min-h-0 so its overflow-y-auto child can properly constrain and scroll its content. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Move the h-screen flex-col overflow-hidden layout to the body in base.html so all pages get a fixed navbar with scrollable content below. Update projects page content to use flex-1 overflow-y-auto. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Remove the project navbar entirely. Add a header box at the top of the left sidebar showing the project name and a Font Awesome layer-group icon linking back to all projects. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add text search input in the filters sidebar with search button - Filter documents server-side with case-insensitive text contains - Make counting lazy: show a "Get count" button instead of auto-counting - Clicking the button fetches the count via a separate API endpoint - Changing filters and searching resets the count back to button state https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Replace Django Paginator (which runs COUNT) with manual slicing that fetches one extra row to detect has_next. Pagination now shows only the current page number. Count is only fetched when the user clicks the Get count button. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Update search input focus ring and search button to use blue instead of gray. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Bump text-gray-400 to text-gray-500 and text-muted-foreground to text-gray-600 throughout the project detail page. Also darken the New Project card text and apply blue primary color to the projects modal Create button and input focus ring. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add Label model with name and project FK, unique together - Add active_label FK on Project - Lazily create "Initial Label" when visiting a project with no labels - Right sidebar header shows active label with dropdown to switch - Switching labels persists via API so the page reloads to the last used label - Migration not included (user will add manually) https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Move chevron-down to left of label name, remove tag icon - Add truncation to label names in dropdown - Widen right panel from 280px to 340px - Add plus and gear icon buttons in label header - Plus opens a modal to create a new label (auto-switches to it) - Gear opens a modal to rename the active label - Add create_label_api and rename_label_api endpoints https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add clx/query.py with Pydantic query schema, recursive descent parser, Django Q builder, and TextQuerySet/TextManager - Parser: comma=AND, pipe=OR, tilde=NOT, caret=STARTSWITH, parens for grouping. AND binds lower than OR so `A, B | C` = `A AND (B OR C)` - STARTSWITH queries use text_prefix field for better index utilization - Document model now uses TextManager with query_string() helper - Views filter with ?q= param instead of ?text= - Filters panel renamed to "Query" with info button showing syntax help https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Starts at one row and grows as content wraps to new lines. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Use .exact modifier so only bare enter triggers search; shift+enter inserts a newline as normal. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Use explicit shiftKey check instead of Alpine .exact modifier which doesn't behave as expected. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add tab navbar above center content with Agent, Search, Labels, Settings - Agent tab: placeholder (default active tab) - Search tab: existing doc search with count sub-nav and pagination - Labels tab: shows all labels as cards - Settings tab: project name editing with rename API - Searching from filters panel auto-switches to Search tab - Add rename_project_api endpoint https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Settings content scrolls independently with a fixed Save button at the bottom, ready for additional settings fields. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Increase height to h-12, text to text-sm, active weight to font-semibold, and add bg-surface to visually distinguish from the smaller count nav below. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Tabs stretch to fill the full width equally with bg-gray-50 inactive and bg-gray-100 active, separated by border dividers. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
All tab content areas and the count sub-nav now have bg-surface (white), contrasting with the gray tab bar above. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add prediction (yes/no) and prediction_confidence (float) fields to LabelDocument model - Add predicted_at field to Label model - Add Label.predict() method that runs remote pipeline across all label documents and bulk-updates predictions with normalized confidence (abs(score - 0.5) * 2) - Add predict_label_api endpoint, guarded by completed finetune status - Add Run Predictions button in sidebar, only visible when finetune is completed https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Include finetuned_at and predicted_at in labels API response - Overview tab: show Predicted (teal check) or Needs predictions on label cards when finetune is completed - Active label panel: show Predictions status row with same logic - Update predicted_at on activeLabel after successful prediction run - Sync finetuned_at and predicted_at during polling https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add prediction_stats JSONField to Label model - After running predictions, compute F1, accuracy, precision, recall using only yes/no annotated examples as ground truth - Include prediction_stats in labels API and predict API responses - Show F1, accuracy, and sample count in active label panel - Sync prediction_stats during polling https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
All three statuses (Instructions, Finetuned, Predicted) now use the same style: green circle-check when done, gray circle with "Needs X" when not. Training in-progress shows a spinner. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Layout is now: Model status → progress bar → training args → Run Finetune button → Predictions status → prediction stats → Run Predictions button. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Remove "Needs X" prefix — all three statuses now always show the same text (Instructions, Finetuned, Predicted) with green/gray color and check/circle icon to indicate completion. Predictions status is always visible, not gated on finetune completion. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Display F1, accuracy, and sample count below the annotation stats bar on label cards when prediction stats are available. Also fix stray closing tag. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add filter_prediction() to SearchQuerySet with support for 'yes', 'no', 'any', and 'disagree' (prediction != agent annotation) - Update _filtered_documents to handle prediction query param - Rename "Training Data" label to "Annotations" in search filters - Add "Predictions" filter section with yes/no/any/disagree buttons - Annotation and prediction filters are mutually exclusive https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add with_prediction() to SearchQuerySet for annotating docs with prediction value and confidence via subquery - Include prediction and prediction_confidence in docs API response - Show prediction badge on search cards: outlined border style with circle-nodes icon to distinguish from solid annotation badges - Confidence shown on hover via title attribute https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Support filtering by prediction value: 'yes', 'no', 'any', or 'disagree' (prediction differs from agent annotation). Uses the existing filter_prediction() queryset method. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add latest_annotation_at (yes/no only) to labels API stats via Max aggregate - Label cards show finetune status as amber with empty circle when completed but annotations are newer than finetuned_at, matching the active label panel behavior https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add sort param to docs API: 'shuffle' (default) or 'tricky' (ascending prediction confidence — least confident first) - Add grouped sort selector + pagination bar in search results UI - Add sort field to agent Search tool with same options - Use fixed-width badges for annotation and prediction on search cards https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Shows currently selected sort with a chevron; clicking opens a dropdown with checkmark on the active option. Pagination is now a separate grouped element beside it. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Display confidence percentage to the left of the annotation and prediction badges, right-aligned in a fixed-width column. https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
Order is now: confidence % → prediction badge → annotation badge https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add Label.recalculate_prediction_stats() method that recomputes F1/accuracy from existing predictions vs current annotations - Add prediction-stats API endpoint - Add small rotate icon next to "Predictions" label in active label panel that triggers recalculation on click https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Project.project_dir returns CLX_HOME/projects/{project_id}
- Project.export_data() dumps document id+text to docs.csv in batches,
with tqdm progress bar. Tracks last export time in exported.txt and
only exports documents created since then on subsequent calls.
https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
- Add cancel_finetune_api endpoint that sends cancel request to RunPod and sets finetune_status to error so user can restart - Show Cancel button below the Training button when finetune is in progress; stops polling and refreshes status on click https://claude.ai/code/session_01ED8wwV23ThoQbSfb2vUWme
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR includes a from-scratch rebuild of the annotation app. Fixes #144 see issue for the motivation.
An Agent-first Annotation App
Overview
Similar to the old app, the core structure of this app is that there are
Projectswhich includeLabelsandDocuments. Data can be added to projects from a list of texts or a DataFrame. Columns other thantextare stored inDocumentsin ametadict, which we may add search support for in the future.Run the app with
clx manage runserver. Projects and labels can be added through the UI.Texts get a trigram index for making expensive boolean logic on substring comparisons more efficient. A text_prefix field (50 chars) is added with an index for more efficient startswith queries.
Examples can be sampled into a label's training set, these are stored as
LabelDocuments.LabelDocumentscan be linked to anAnnotation. Annotations are scoped bysourcealthough currently onlysource="agent"is supported.Agent Tools
The agent has several tools allowing it to manage itself and the label training data:
Memory management tools:
Data management tools:
User interaction tools:
Workflow Prompts
There are currently three pre-defined steps that can be customized per-project. These are just prompts that instruct the agent to take certain actions in a specific order. Users can overwrite the default instructions for their project, making this fully customizable depending on project needs. Users can also define custom project-specific workflow prompts that can be triggered manually.
Patterns for Dynamic Configs with Natural Language
Our sampling strategy is quite similar to our old approach, except things like the creation of minimal / likely condition queries or the number of examples to target for the trainingset are not baked into the application logic. Instead the workflow steps are just described in the workflow prompt.
One pattern that is useful here is the ability to have a prompt that asks the agent to store information in the project or label instructions if it doesn't already exist. For example, in our sampling prompt we tell the agent that the project instructions should mention the number of examples to target for each trainingset, and that if this isn't present to ask the user about it and add it. So for the first label in a project, the agent will prompt the user for this info. Then on subsequent labels it will see the needed info in the project instructions and move forward automatically. Thus we can use project and label instructions as a kind of dynamic configuration file that can be used any way your project needs.
We do something similar with minimal and likely keyword searches -- asking the agent to store it in label memory so it can be referenced in future tasks.
Autopilot Mode
Autopilot mode is a worker that runs in the background and automatically runs the three pre-defined workflows for each label. When a label is initialized, a label understanding task is triggered. Once the user has responded to any open questions, the sampling strategy is triggered. Then the annotation workflow runs on repeat any time there are unannotated examples.
Autopilot runs in a dedicated thread visible on the chat tab. All autopilot tasks run in the same long-running session which is automatically compacted when the session exceeds 25k tokens. The user can see the autopilot tasks running and can inject messages if it gets stuck.
Finetunes and Predictions
Once a label has annotated data, buttons will appear for finetuning a single-label classifier and making predicitons. Finetuning jobs are serverless and can be fired off in the background, a progress bar tracking the current steps. Predictions can then be added to the trainset from the finetuned model. These will display in search alongside the annotated value. We have search filter helpers for filtering on prediction-annotation disagreements and for sorting examples by confidence scores close to 0.5.
Multi-label Training
There is also a CLI script that makes a project-level trainset that combines the unique documents from all labels' trainsets. It will compute predictions across all examples for all labels, updating automatically when a new model has been finetuned and the predictions are stale. These are then used as a training set for training a single, multi-label model that can make predictions across all labels.
Screenshots and examples
Chat with an agent directly to manage your label
Agent pauses automated workflows and requests input from users
Search through project data, review annotations, predictions, confidence scores
Customize and add new workflow prompts for your project
Trigger finetuning and prediction Jobs
See autopilot progress and what needs human input
Get an overview of label progress, trainset distribution, and finetune scores