Slide Transcriber

A web-based tool for batch-transcribing presentation slides using AI vision models. Upload a folder of slide images, transcribe them with structured output, review and edit transcriptions, and export tracking data as CSV.

Features

Batch transcription — process entire folders of slide images
Structured output — each transcription includes description, text content, key information, and a suggested filename
Verification workflow — review, edit, and verify transcriptions one by one
SSE streaming — real-time progress during transcription and re-processing
Model selection — switch between available AI models at runtime
Custom instructions — per-folder instructions for fine-tuning transcription output
CLI batch processor — headless mode for processing without the web UI
Versioned transcriptions — re-process images with incrementing versions
CSV export — download tracking spreadsheet with processing and review status

Quick Start

1. Install dependencies

npm install

2. Configure an AI provider

Set an API key environment variable (choose one):

export ANTHROPIC_API_KEY=sk-ant-...
# or
export OPENAI_API_KEY=sk-...
# or
export GOOGLE_API_KEY=...

For local models or custom providers, see Provider Configuration below.

3. Start the server

npm run dev

This starts both the Express API server and the Vite dev server for the web frontend.

4. Open the app

Navigate to http://localhost:3000 and enter the path to a folder containing slide images (JPG, JPEG, PNG, or WebP).

CLI mode

For headless batch processing without the web UI:

npx tsx src/index.ts            # process pending images
npx tsx src/index.ts --all       # re-process all images
npx tsx src/index.ts --errors-only  # retry only errored images

Configure the image directory and other settings in src/config.ts.

Provider Configuration

Slide Transcriber uses the pi-ai SDK to connect to AI providers. Authentication can be configured via environment variables or a models.json file.

Environment variables

The simplest way is to set an API key for a built-in provider:

Variable	Provider
`ANTHROPIC_API_KEY`	Anthropic (Claude)
`OPENAI_API_KEY`	OpenAI (GPT)
`GOOGLE_API_KEY`	Google (Gemini)
`MISTRAL_API_KEY`	Mistral
`XAI_API_KEY`	xAI (Grok)

export ANTHROPIC_API_KEY=sk-ant-...
npm run dev

models.json (custom providers & local models)

For local models or providers not covered by built-in environment variables, create ~/.pi/agent/models.json:

{
  "providers": {
    "my-provider": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "my-key",
      "models": [
        { "id": "model-name" }
      ]
    }
  }
}

The api field supports:

API	Description
`openai-completions`	OpenAI Chat Completions (most compatible)
`openai-responses`	OpenAI Responses API
`anthropic-messages`	Anthropic Messages API
`google-generative-ai`	Google Generative AI

The server may need apiKey in the config even if the local server ignores it (e.g., Ollama uses "apiKey": "ollama").

Migrating from LM Studio

If you were previously using LM Studio with @lmstudio/sdk, configure it as an OpenAI-compatible provider:

{
  "providers": {
    "lm-studio": {
      "baseUrl": "http://127.0.0.1:1234/v1",
      "api": "openai-completions",
      "apiKey": "lm-studio",
      "models": [
        { "id": "my-loaded-model" }
      ]
    }
  }
}

Replace "my-loaded-model" with the model identifier loaded in LM Studio. The server auto-detects models from LM Studio's /v1/models endpoint, so you can also inspect available models there.

Note: LM Studio must be running with a model loaded for transcription to work. Vision-capable models (e.g., LLaVA, Qwen-VL) are required for image transcription.

auth.json

For OAuth-based providers (ChatGPT Plus, Claude Pro, etc.), authenticate via:

pi /login

This stores credentials in ~/.pi/agent/auth.json.

Project Structure

src/
  ai-provider.ts    # AI provider abstraction (pi-ai SDK)
  config.ts          # Configuration (imageDir, statusFile, outputDir, maxTokens)
  index.ts           # CLI batch processor
  storage.ts          # Status file I/O and versioned transcriptions
  transcription.ts    # Image file discovery
server/
  app.ts             # Express routes and handlers
  index.ts           # Server entry point
  sse.ts             # SSE streaming module
web/
  src/               # React frontend components
tests/               # Test suite

Development

npm run dev           # Start both server and web dev servers
npm run typecheck     # Type-check both server and web
npm run test           # Run full test suite

Configuration

Edit src/config.ts to set defaults for the CLI processor:

export const config = {
  imageDir: "/path/to/your/images",
  statusFile: "transcription-status.json",
  outputDir: "transcriptions",
  maxTokens: 1000,
};

In the web UI, the folder path is specified per-request and imageDir is only used by the CLI.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
server		server
src		src
tests		tests
web		web
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slide Transcriber

Features

Quick Start

1. Install dependencies

2. Configure an AI provider

3. Start the server

4. Open the app

CLI mode

Provider Configuration

Environment variables

models.json (custom providers & local models)

Migrating from LM Studio

auth.json

Project Structure

Development

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Slide Transcriber

Features

Quick Start

1. Install dependencies

2. Configure an AI provider

3. Start the server

4. Open the app

CLI mode

Provider Configuration

Environment variables

models.json (custom providers & local models)

Migrating from LM Studio

auth.json

Project Structure

Development

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages