Israeli Supreme Court Dataset — Web Platform

A self-hosted web platform for exploring ~751K Israeli Supreme Court documents. Provides full-text search, corpus statistics with visualizations, on-demand LLM-powered NER analysis, a REST API, and an admin backend.

Built as a deliverable of an Israel Innovation Authority research project (grants #78560, #78561), conducted by the Hebrew University of Jerusalem and Tel Aviv University.

Features

Full-text search across 733K+ documents using SQLite FTS5
Faceted filtering by year, document type, legal division, judge, lawyer, party
Document viewer with proper RTL Hebrew text rendering
Case view with document timeline and participants (judges, lawyers, parties)
Statistics dashboard — interactive charts (documents by year, type distribution, top judges/lawyers, technical ratio)
NER analysis — on-demand LLM-powered Named Entity Recognition with inline highlighting
REST API with API key authentication and rate limiting
Admin backend — dashboard, API key management, data import, LLM configuration

Tech Stack

Backend: Python 3.11+, FastAPI, SQLAlchemy, SQLite + FTS5
Frontend: React, TypeScript, Tailwind CSS, Recharts
Package management: uv (Python), npm (frontend)
LLM integration: LiteLLM (supports Claude, GPT-4, Gemini, Ollama, etc.)
Deployment: Docker Compose

Prerequisites

uv (v0.8+)
Node.js (v20+)
~6 GB disk space for the database

Quick Start (Local)

1. Install dependencies

# Python dependencies
uv sync

# Frontend dependencies
cd frontend && npm install && cd ..

2. Configure environment

cp .env.example .env

Edit .env as needed. The defaults work for local development (except LLM features which require an API key).

3. Import the dataset

The dataset is a 1.5 GB Parquet file. If you have it locally at docs/cases_all.parquet:

uv run python backend/import_dataset.py --db data/court.db --source docs/cases_all.parquet

Or download directly from HuggingFace (~5 GB):

uv run python backend/import_dataset.py --db data/court.db

Import takes ~5 minutes and produces a ~5 GB SQLite database with FTS5 indexes.

4. Build the frontend

cd frontend && npm run build && cd ..

5. Start the server

uv run uvicorn backend.app.main:app --host 0.0.0.0 --port 8018

Open http://localhost:8000 in your browser.

Development Mode

For development with hot reload, run backend and frontend separately:

# Terminal 1 — Backend (auto-reloads on Python changes)
uv run uvicorn backend.app.main:app --reload --port 8000

# Terminal 2 — Frontend dev server (hot reload, proxies API to backend)
cd frontend && npm run dev

Then open http://localhost:5173 (the Vite dev server).

Docker

cp .env.example .env
# Edit .env as needed

# Import dataset (one-time)
docker compose run --rm app uv run python backend/import_dataset.py --db /data/court.db

# Start the platform
docker compose up -d

The database is persisted in ./data/ via volume mount.

Configuration

All configuration is via environment variables (or .env file):

Variable	Default	Description
`DATABASE_PATH`	`./data/court.db`	Path to SQLite database
`ADMIN_USERNAME`	`admin`	Admin login username
`ADMIN_PASSWORD`	`changeme`	Admin login password
`LLM_MODEL`	`claude-sonnet-4-20250514`	LiteLLM model for NER
`LLM_API_KEY`	—	API key for the LLM provider
`LLM_MAX_TOKENS`	`4096`	Max output tokens for NER
`LLM_TEMPERATURE`	`0.0`	LLM temperature
`LLM_MAX_INPUT_TOKENS`	`8000`	Max document tokens sent to LLM
`NER_RATE_LIMIT_PER_HOUR`	`10`	NER requests per hour per IP
`API_RATE_LIMIT_PER_HOUR`	`100`	API requests per hour per key
`HOST`	`0.0.0.0`	Server bind host
`PORT`	`8000`	Server bind port

API

Interactive API documentation is available at http://localhost:8000/docs (Swagger UI).

Key Endpoints

Method	Endpoint	Description
`GET`	`/api/v1/documents`	Search/list documents (FTS5, filterable, paginated)
`GET`	`/api/v1/documents/{hash}`	Get document by SHA-256 hash
`GET`	`/api/v1/documents/{hash}/text`	Get raw document text
`POST`	`/api/v1/documents/{hash}/ner`	Run NER analysis
`GET`	`/api/v1/cases`	Search/list cases
`GET`	`/api/v1/cases/{case_desc}`	Get case with documents and participants
`GET`	`/api/v1/stats/overview`	Corpus-level statistics
`GET`	`/api/v1/stats/by-year`	Document counts by year
`GET`	`/api/v1/stats/judges`	Top judges by case count
`GET`	`/api/v1/stats/lawyers`	Top lawyers by appearance count

Query Parameters

Parameter	Type	Description
`q`	string	Full-text search query
`year_from` / `year_to`	int	Year range filter
`type`	string	Document type (`החלטה` or `פסק-דין`)
`division`	string	Legal division
`judge` / `lawyer` / `party`	string	Name filter (partial match)
`technical`	bool	Technical documents only
`page` / `per_page`	int	Pagination (default: 1 / 20, max per_page: 100)
`sort`	string	Sort by: `date`, `year`, `pages`, `relevance`

Project Structure

website/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI entry point
│   │   ├── config.py            # Settings from env vars
│   │   ├── database.py          # SQLite connection
│   │   ├── models.py            # SQLAlchemy ORM models
│   │   ├── schemas.py           # Pydantic request/response models
│   │   ├── auth.py              # Admin session + API key auth
│   │   ├── routers/
│   │   │   ├── cases.py         # /api/v1/cases
│   │   │   ├── documents.py     # /api/v1/documents
│   │   │   ├── stats.py         # /api/v1/stats/*
│   │   │   ├── ner.py           # /api/v1/documents/{hash}/ner
│   │   │   └── admin.py         # /admin/api/*
│   │   └── services/
│   │       ├── search.py        # FTS5 search logic
│   │       └── ner_service.py   # LiteLLM NER integration
│   └── import_dataset.py        # ETL: Parquet → SQLite
├── frontend/
│   └── src/
│       ├── pages/               # Route-level components
│       ├── components/          # Reusable UI components
│       └── api/                 # API client + types
├── data/                        # SQLite database (created by import)
├── pyproject.toml
├── Dockerfile
└── docker-compose.yml

Data Sources

Dataset: LevMuchnik/SupremeCourtOfIsrael on HuggingFace
Legal-HeBERT: github.com/avichaychriqui/Legal-HeBERT

Acknowledgments

Funded by the Israel Innovation Authority (grants #78560, #78561) under the "Kamin" track for applied academic research. Conducted by the Hebrew University of Jerusalem (Prof. Lev Muchnik) and Tel Aviv University (Dr. Inbal Yahav Shenberger).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Israeli Supreme Court Dataset — Web Platform

Features

Tech Stack

Prerequisites

Quick Start (Local)

1. Install dependencies

2. Configure environment

3. Import the dataset

4. Build the frontend

5. Start the server

Development Mode

Docker

Configuration

API

Key Endpoints

Query Parameters

Project Structure

Data Sources

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
backend		backend
data		data
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Israeli Supreme Court Dataset — Web Platform

Features

Tech Stack

Prerequisites

Quick Start (Local)

1. Install dependencies

2. Configure environment

3. Import the dataset

4. Build the frontend

5. Start the server

Development Mode

Docker

Configuration

API

Key Endpoints

Query Parameters

Project Structure

Data Sources

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages