Parsely - Language Learning Vocabulary Extractor

Parsely is a tool that uses AI to extract vocabulary from language learning course notes (PDF/DOCX files) and stores them in a searchable database. It features both a command-line interface (TUI) and a web interface.

Features

AI-Powered Extraction: Uses Claude AI to intelligently extract vocabulary and phrases
Document Support: Parses PDF, DOCX, and plain text (TXT) files
Deduplication: Automatically skips vocabulary that's already in the database
Dual Interface: Choose between CLI (Terminal UI) or Web interface
Export: Export vocabulary to JSON for use in other applications
Security: Built with security best practices (SQL injection prevention, file validation, etc.)

Requirements

Go 1.23 or later
Claude API key (get one from Anthropic)
Optional: Bun or Node.js for the web frontend (if you want to develop it)

Installation

Clone the repository

git clone https://github.com/parsely/parsely.git
cd parsely

Install dependencies

go mod download

Build the binaries

# Build CLI version
go build -o parsely-cli ./cmd/cli

# Build web version
go build -o parsely-web ./cmd/web

Configuration

Parsely uses environment variables for configuration:

Variable	Required	Default	Description
`ANTHROPIC_API_KEY`	Yes	—	Your Anthropic API key
`DATABASE_PATH`	No	`/data/parsely.db`	Path to the SQLite database file
`LANGUAGE`	No	`auto-detect`	Target language for extraction
`PORT`	No	`8080`	Port for the web server
`API_TOKEN`	No	—	Bearer token to protect API endpoints. If unset, auth is disabled (fine for local use). Set this in production.

Deployment

Running locally

The default DATABASE_PATH is /data/parsely.db, which is intended for the Railway deployment (see below). When running locally, override it to a path that exists on your machine:

DATABASE_PATH=parsely.db ANTHROPIC_API_KEY=sk-ant-... go run ./cmd/web

Or export the variables in your shell before running:

export ANTHROPIC_API_KEY="sk-ant-..."
export DATABASE_PATH="parsely.db"
./parsely-web

Deploying to Railway

The project includes a Dockerfile configured for Railway.

Push the repository to GitHub and connect it to a new Railway project.
Add a Volume in Railway and set the mount path to /data.
Set the following environment variables in the Railway service settings:
- ANTHROPIC_API_KEY — your Anthropic API key (required)
- API_TOKEN — a secret token to protect your API (recommended);
- LANGUAGE — target language, e.g. Spanish (optional)
- DATABASE_PATH — can be left unset; defaults to /data/parsely.db
Railway automatically injects the PORT variable — no action needed.

The SQLite database will be persisted on the mounted volume at /data/parsely.db across deployments and restarts.

Usage

CLI Version

Run the interactive terminal UI:

./parsely-cli

Features:

Parse new documents (PDF/DOCX)
View all vocabulary
Export to JSON
Navigate with arrow keys or vim keys (j/k)

Web Version

Start the web server:

./parsely-web

The API will be available at http://localhost:8080

API Endpoints

GET    /api/vocabulary       - List all vocabulary
GET    /api/vocabulary/{id}  - Get specific vocabulary item
DELETE /api/vocabulary/{id}  - Delete vocabulary item
POST   /api/upload           - Upload and process document
POST   /api/export           - Export vocabulary to JSON
GET    /api/stats            - Get vocabulary statistics
GET    /health               - Health check

Authentication

When API_TOKEN is set, all /api/* endpoints require a Bearer token header:

curl -H "Authorization: Bearer your-token" http://localhost:8080/api/vocabulary

The /health endpoint is always public. When API_TOKEN is not set (e.g. local development), no header is required.

Upload Document Example

curl -X POST \
  -H "Authorization: Bearer your-token" \
  -F "file=@/path/to/document.pdf" \
  http://localhost:8080/api/upload

Running Tests

Run all tests with coverage:

go test ./... -cover

Run tests for a specific package:

go test ./internal/db -v
go test ./internal/parser -v
go test ./internal/ai -v
go test ./internal/core -v
go test ./internal/api -v

Project Structure

parsely/
├── cmd/
│   ├── cli/          # CLI application entry point
│   └── web/          # Web server entry point
├── internal/
│   ├── ai/           # Claude AI integration
│   ├── parser/       # PDF/DOCX parsers
│   ├── db/           # SQLite database layer
│   ├── core/         # Core business logic
│   └── api/          # HTTP API handlers
├── testdata/         # Test fixtures
├── go.mod
├── go.sum
├── README.md
└── CLAUDE.md         # Development guidelines

Security Features

API Authentication: Bearer token auth protects all endpoints when API_TOKEN is set
SQL Injection Prevention: All database queries use parameterized statements
Path Traversal Protection: File paths are validated to prevent directory traversal
File Size Limits: Maximum 10MB per document
File Type Validation: Only PDF and DOCX files accepted
Input Sanitization: All user input is validated and sanitized
Secure Permissions: Database and temp files created with restrictive permissions

Contributing

Fork the repository
Create a feature branch
Write tests first (TDD approach)
Implement your feature
Ensure all tests pass
Submit a pull request

See CLAUDE.md for detailed development guidelines.

Troubleshooting

"ANTHROPIC_API_KEY not set"

Make sure you've exported your API key:

export ANTHROPIC_API_KEY="your-key"

Database Permission Errors

Ensure the database file has proper permissions:

chmod 600 parsely.db

PDF Parsing Errors

Some PDFs may not contain extractable text. Try:

Ensuring the PDF has selectable text (not scanned images)
Using a different PDF viewer to verify text content
Converting scanned PDFs to text-based PDFs using OCR

Large File Errors

Files over 10MB are rejected. Compress or split your documents.

License

MIT License - see LICENSE file for details

Acknowledgments

Anthropic Claude for AI vocabulary extraction
Charm Bracelet for the beautiful TUI framework
SQLite for the embedded database

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsely - Language Learning Vocabulary Extractor

Features

Requirements

Installation

Clone the repository

Install dependencies

Build the binaries

Configuration

Deployment

Running locally

Deploying to Railway

Usage

CLI Version

Web Version

API Endpoints

Authentication

Upload Document Example

Running Tests

Project Structure

Security Features

Contributing

Troubleshooting

"ANTHROPIC_API_KEY not set"

Database Permission Errors

PDF Parsing Errors

Large File Errors

License

Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Parsely - Language Learning Vocabulary Extractor

Features

Requirements

Installation

Clone the repository

Install dependencies

Build the binaries

Configuration

Deployment

Running locally

Deploying to Railway

Usage

CLI Version

Web Version

API Endpoints

Authentication

Upload Document Example

Running Tests

Project Structure

Security Features

Contributing

Troubleshooting

"ANTHROPIC_API_KEY not set"

Database Permission Errors

PDF Parsing Errors

Large File Errors

License

Acknowledgments