Parsely is a tool that uses AI to extract vocabulary from language learning course notes (PDF/DOCX files) and stores them in a searchable database. It features both a command-line interface (TUI) and a web interface.
- AI-Powered Extraction: Uses Claude AI to intelligently extract vocabulary and phrases
- Document Support: Parses PDF, DOCX, and plain text (TXT) files
- Deduplication: Automatically skips vocabulary that's already in the database
- Dual Interface: Choose between CLI (Terminal UI) or Web interface
- Export: Export vocabulary to JSON for use in other applications
- Security: Built with security best practices (SQL injection prevention, file validation, etc.)
- Go 1.23 or later
- Claude API key (get one from Anthropic)
- Optional: Bun or Node.js for the web frontend (if you want to develop it)
git clone https://github.com/parsely/parsely.git
cd parselygo mod download# Build CLI version
go build -o parsely-cli ./cmd/cli
# Build web version
go build -o parsely-web ./cmd/webParsely uses environment variables for configuration:
| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | — | Your Anthropic API key |
DATABASE_PATH |
No | /data/parsely.db |
Path to the SQLite database file |
LANGUAGE |
No | auto-detect |
Target language for extraction |
PORT |
No | 8080 |
Port for the web server |
API_TOKEN |
No | — | Bearer token to protect API endpoints. If unset, auth is disabled (fine for local use). Set this in production. |
The default DATABASE_PATH is /data/parsely.db, which is intended for the Railway deployment (see below). When running locally, override it to a path that exists on your machine:
DATABASE_PATH=parsely.db ANTHROPIC_API_KEY=sk-ant-... go run ./cmd/webOr export the variables in your shell before running:
export ANTHROPIC_API_KEY="sk-ant-..."
export DATABASE_PATH="parsely.db"
./parsely-webThe project includes a Dockerfile configured for Railway.
- Push the repository to GitHub and connect it to a new Railway project.
- Add a Volume in Railway and set the mount path to
/data. - Set the following environment variables in the Railway service settings:
ANTHROPIC_API_KEY— your Anthropic API key (required)API_TOKEN— a secret token to protect your API (recommended);LANGUAGE— target language, e.g.Spanish(optional)DATABASE_PATH— can be left unset; defaults to/data/parsely.db
- Railway automatically injects the
PORTvariable — no action needed.
The SQLite database will be persisted on the mounted volume at /data/parsely.db across deployments and restarts.
Run the interactive terminal UI:
./parsely-cliFeatures:
- Parse new documents (PDF/DOCX)
- View all vocabulary
- Export to JSON
- Navigate with arrow keys or vim keys (j/k)
Start the web server:
./parsely-webThe API will be available at http://localhost:8080
GET /api/vocabulary - List all vocabulary
GET /api/vocabulary/{id} - Get specific vocabulary item
DELETE /api/vocabulary/{id} - Delete vocabulary item
POST /api/upload - Upload and process document
POST /api/export - Export vocabulary to JSON
GET /api/stats - Get vocabulary statistics
GET /health - Health check
When API_TOKEN is set, all /api/* endpoints require a Bearer token header:
curl -H "Authorization: Bearer your-token" http://localhost:8080/api/vocabularyThe /health endpoint is always public. When API_TOKEN is not set (e.g. local development), no header is required.
curl -X POST \
-H "Authorization: Bearer your-token" \
-F "file=@/path/to/document.pdf" \
http://localhost:8080/api/uploadRun all tests with coverage:
go test ./... -coverRun tests for a specific package:
go test ./internal/db -v
go test ./internal/parser -v
go test ./internal/ai -v
go test ./internal/core -v
go test ./internal/api -vparsely/
├── cmd/
│ ├── cli/ # CLI application entry point
│ └── web/ # Web server entry point
├── internal/
│ ├── ai/ # Claude AI integration
│ ├── parser/ # PDF/DOCX parsers
│ ├── db/ # SQLite database layer
│ ├── core/ # Core business logic
│ └── api/ # HTTP API handlers
├── testdata/ # Test fixtures
├── go.mod
├── go.sum
├── README.md
└── CLAUDE.md # Development guidelines
- API Authentication: Bearer token auth protects all endpoints when
API_TOKENis set - SQL Injection Prevention: All database queries use parameterized statements
- Path Traversal Protection: File paths are validated to prevent directory traversal
- File Size Limits: Maximum 10MB per document
- File Type Validation: Only PDF and DOCX files accepted
- Input Sanitization: All user input is validated and sanitized
- Secure Permissions: Database and temp files created with restrictive permissions
- Fork the repository
- Create a feature branch
- Write tests first (TDD approach)
- Implement your feature
- Ensure all tests pass
- Submit a pull request
See CLAUDE.md for detailed development guidelines.
Make sure you've exported your API key:
export ANTHROPIC_API_KEY="your-key"Ensure the database file has proper permissions:
chmod 600 parsely.dbSome PDFs may not contain extractable text. Try:
- Ensuring the PDF has selectable text (not scanned images)
- Using a different PDF viewer to verify text content
- Converting scanned PDFs to text-based PDFs using OCR
Files over 10MB are rejected. Compress or split your documents.
MIT License - see LICENSE file for details
- Anthropic Claude for AI vocabulary extraction
- Charm Bracelet for the beautiful TUI framework
- SQLite for the embedded database