Pluckr

Schema-first, self-healing HTML data extraction powered by LLMs. Define what you want with a Zod schema, and Pluckr figures out how to extract it — no selectors to write, no scraper to maintain.

Why Pluckr?

Traditional scrapers break when websites change. Pluckr uses an LLM to generate CSS selectors, validates them against your Zod schema, caches what works, and automatically heals when pages change. You write the schema once and never touch selectors again.

Quick Start

npm install @pluckr/core @ai-sdk/google zod

import { Pluckr } from '@pluckr/core'
import { google } from '@ai-sdk/google'
import { z } from 'zod'

const pluckr = new Pluckr({
  model: google('gemini-2.5-pro'),
})

const result = await pluckr.extract({
  html: '<html>...</html>',  // BYO HTML from any source
  schema: z.object({
    title: z.string(),
    price: z.coerce.number().positive(),
    inStock: z.coerce.boolean(),
  }),
  cacheKey: 'product-page',
})

if (result.success) {
  console.log(result.data)
  // { title: "A Light in the Attic", price: 51.77, inStock: true }
}

await pluckr.close()

How It Works

You provide raw HTML and a Zod schema describing the data you want
An LLM generates CSS selectors for each field via an agentic tool loop
Selectors are tested and validated before being committed
Working selectors are cached — subsequent extractions are instant (zero LLM calls)
If selectors break due to page changes, Pluckr self-heals automatically

Packages

Package	Description
`@pluckr/core`	Core extraction library
`@pluckr/sqlite`	SQLite storage — persistent caching to disk
`@pluckr/redis`	Redis storage — shared caching for distributed deployments

Installation

npm install @pluckr/core zod

Pick an AI provider (any Vercel AI SDK provider works):

npm install @ai-sdk/google      # or @ai-sdk/anthropic, @ai-sdk/openai

Add a storage backend for persistent caching (optional):

npm install @pluckr/sqlite       # file-based, single-process
npm install @pluckr/redis        # distributed, multi-process

Key Features

Schema-first — define what you want with Zod, get fully typed results
Self-healing — selectors automatically repair when pages change
BYO HTML — bring your own HTML from any source (fetch, Puppeteer, Playwright, curl)
BYO model — works with any Vercel AI SDK model (Google, Anthropic, OpenAI, etc.)
Pluggable storage — in-memory default, SQLite and Redis included, or implement Storage for any backend
No exceptions — extract() returns a discriminated union (success: true | false)

Storage Backends

In-memory (default) — no setup, cache lives for the process lifetime:

const pluckr = new Pluckr({ model })

SQLite — persists selectors to disk across restarts:

import { SqliteStorage } from '@pluckr/sqlite'

const pluckr = new Pluckr({
  model,
  storage: new SqliteStorage(),  // .pluckr/cache.db
})

Redis — shared cache for distributed deployments:

import { RedisStorage } from '@pluckr/redis'

const pluckr = new Pluckr({
  model,
  storage: new RedisStorage({ url: 'redis://localhost:6379', ttl: 86400 }),
})

Examples

Standalone, runnable examples for different use cases:

Example	Description
`with-fetch`	Simplest setup — native `fetch()`
`with-cheerio`	Pre-process HTML before extraction
`with-playwright`	Extract from JS-rendered SPAs
`with-puppeteer`	Stealth browser with bot-detection avoidance
`with-sqlite`	Persistent caching across runs
`with-redis`	Shared caching with TTL

cd examples/with-fetch
npm install
echo "GOOGLE_GENERATIVE_AI_API_KEY=your-key" > .env
npm start

Documentation

Development

npm install          # install all workspace dependencies
npm run build        # build all packages
npm test             # run all tests (65 tests, fully mocked)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
examples		examples
packages		packages
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pluckr

Why Pluckr?

Quick Start

How It Works

Packages

Installation

Key Features

Storage Backends

Examples

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pluckr

Why Pluckr?

Quick Start

How It Works

Packages

Installation

Key Features

Storage Backends

Examples

Documentation

Development

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages