Skip to content

LevMuchnik/SupremeCourtCasesAnalysisWebsite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Israeli Supreme Court Dataset — Web Platform

A self-hosted web platform for exploring ~751K Israeli Supreme Court documents. Provides full-text search, corpus statistics with visualizations, on-demand LLM-powered NER analysis, a REST API, and an admin backend.

Built as a deliverable of an Israel Innovation Authority research project (grants #78560, #78561), conducted by the Hebrew University of Jerusalem and Tel Aviv University.

Features

  • Full-text search across 733K+ documents using SQLite FTS5
  • Faceted filtering by year, document type, legal division, judge, lawyer, party
  • Document viewer with proper RTL Hebrew text rendering
  • Case view with document timeline and participants (judges, lawyers, parties)
  • Statistics dashboard — interactive charts (documents by year, type distribution, top judges/lawyers, technical ratio)
  • NER analysis — on-demand LLM-powered Named Entity Recognition with inline highlighting
  • REST API with API key authentication and rate limiting
  • Admin backend — dashboard, API key management, data import, LLM configuration

Tech Stack

  • Backend: Python 3.11+, FastAPI, SQLAlchemy, SQLite + FTS5
  • Frontend: React, TypeScript, Tailwind CSS, Recharts
  • Package management: uv (Python), npm (frontend)
  • LLM integration: LiteLLM (supports Claude, GPT-4, Gemini, Ollama, etc.)
  • Deployment: Docker Compose

Prerequisites

  • uv (v0.8+)
  • Node.js (v20+)
  • ~6 GB disk space for the database

Quick Start (Local)

1. Install dependencies

# Python dependencies
uv sync

# Frontend dependencies
cd frontend && npm install && cd ..

2. Configure environment

cp .env.example .env

Edit .env as needed. The defaults work for local development (except LLM features which require an API key).

3. Import the dataset

The dataset is a 1.5 GB Parquet file. If you have it locally at docs/cases_all.parquet:

uv run python backend/import_dataset.py --db data/court.db --source docs/cases_all.parquet

Or download directly from HuggingFace (~5 GB):

uv run python backend/import_dataset.py --db data/court.db

Import takes ~5 minutes and produces a ~5 GB SQLite database with FTS5 indexes.

4. Build the frontend

cd frontend && npm run build && cd ..

5. Start the server

uv run uvicorn backend.app.main:app --host 0.0.0.0 --port 8018

Open http://localhost:8000 in your browser.

Development Mode

For development with hot reload, run backend and frontend separately:

# Terminal 1 — Backend (auto-reloads on Python changes)
uv run uvicorn backend.app.main:app --reload --port 8000

# Terminal 2 — Frontend dev server (hot reload, proxies API to backend)
cd frontend && npm run dev

Then open http://localhost:5173 (the Vite dev server).

Docker

cp .env.example .env
# Edit .env as needed

# Import dataset (one-time)
docker compose run --rm app uv run python backend/import_dataset.py --db /data/court.db

# Start the platform
docker compose up -d

The database is persisted in ./data/ via volume mount.

Configuration

All configuration is via environment variables (or .env file):

Variable Default Description
DATABASE_PATH ./data/court.db Path to SQLite database
ADMIN_USERNAME admin Admin login username
ADMIN_PASSWORD changeme Admin login password
LLM_MODEL claude-sonnet-4-20250514 LiteLLM model for NER
LLM_API_KEY API key for the LLM provider
LLM_MAX_TOKENS 4096 Max output tokens for NER
LLM_TEMPERATURE 0.0 LLM temperature
LLM_MAX_INPUT_TOKENS 8000 Max document tokens sent to LLM
NER_RATE_LIMIT_PER_HOUR 10 NER requests per hour per IP
API_RATE_LIMIT_PER_HOUR 100 API requests per hour per key
HOST 0.0.0.0 Server bind host
PORT 8000 Server bind port

API

Interactive API documentation is available at http://localhost:8000/docs (Swagger UI).

Key Endpoints

Method Endpoint Description
GET /api/v1/documents Search/list documents (FTS5, filterable, paginated)
GET /api/v1/documents/{hash} Get document by SHA-256 hash
GET /api/v1/documents/{hash}/text Get raw document text
POST /api/v1/documents/{hash}/ner Run NER analysis
GET /api/v1/cases Search/list cases
GET /api/v1/cases/{case_desc} Get case with documents and participants
GET /api/v1/stats/overview Corpus-level statistics
GET /api/v1/stats/by-year Document counts by year
GET /api/v1/stats/judges Top judges by case count
GET /api/v1/stats/lawyers Top lawyers by appearance count

Query Parameters

Parameter Type Description
q string Full-text search query
year_from / year_to int Year range filter
type string Document type (החלטה or פסק-דין)
division string Legal division
judge / lawyer / party string Name filter (partial match)
technical bool Technical documents only
page / per_page int Pagination (default: 1 / 20, max per_page: 100)
sort string Sort by: date, year, pages, relevance

Project Structure

website/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI entry point
│   │   ├── config.py            # Settings from env vars
│   │   ├── database.py          # SQLite connection
│   │   ├── models.py            # SQLAlchemy ORM models
│   │   ├── schemas.py           # Pydantic request/response models
│   │   ├── auth.py              # Admin session + API key auth
│   │   ├── routers/
│   │   │   ├── cases.py         # /api/v1/cases
│   │   │   ├── documents.py     # /api/v1/documents
│   │   │   ├── stats.py         # /api/v1/stats/*
│   │   │   ├── ner.py           # /api/v1/documents/{hash}/ner
│   │   │   └── admin.py         # /admin/api/*
│   │   └── services/
│   │       ├── search.py        # FTS5 search logic
│   │       └── ner_service.py   # LiteLLM NER integration
│   └── import_dataset.py        # ETL: Parquet → SQLite
├── frontend/
│   └── src/
│       ├── pages/               # Route-level components
│       ├── components/          # Reusable UI components
│       └── api/                 # API client + types
├── data/                        # SQLite database (created by import)
├── pyproject.toml
├── Dockerfile
└── docker-compose.yml

Data Sources

Acknowledgments

Funded by the Israel Innovation Authority (grants #78560, #78561) under the "Kamin" track for applied academic research. Conducted by the Hebrew University of Jerusalem (Prof. Lev Muchnik) and Tel Aviv University (Dr. Inbal Yahav Shenberger).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors