AI Product Manager and Technical Program Manager. I build operational AI systems: pipelines and harnesses that do actual work, plus the operational, maintenance, and measurement layers that make them possible. My projects are architected and built from modular primitives, developed to allow for their own decomposition when desired.
Most of the substantive builds are private client and product work. The repositories pinned here are the parts I can show in full (or have explicit client permission to display). They are smaller than the private work, but they are complete, and they carry the decisions behind them in the open.
A harnessed image-generation agent, built as a worked example of clean seams between a model and its tools. Claude reasons, Cloudflare executes, D1 remembers.
The point of view: most agent demos hand a model an API key and hope. limner is built the other way. The reasoning loop runs on Anthropic's managed agents platform and never holds provider credentials. Tool execution runs in Cloudflare Workers as deterministic TypeScript with explicit bindings. The seam between them is an OAuth-gated MCP surface, the same contract any MCP client uses. Memory is a database, not a context window, so the agent recalls what it recorded last week instead of being re-told.
A public case study of PHINEAS, a CEFR teaching assistant I built and operate for an English-language school: paste in a text and it returns the CEFR level, a word-by-word vocabulary breakdown, and the passage rewritten to a new CEFR level.
The product is live at phineas.app; this repo is the part I can show in full — the architecture write-up and a redacted, runnable TypeScript reference implementation.
The decision worth reading: the CEFR levels are served from a bespoke corpus, not asked of the model, so leveling is regulated before any model call is made. A human-gated dual-loop flywheel feeds approved corrections back both ways — into the per-word database lookups and into the model's training examples. Client identity and data are redacted; the vocabulary is synthetic and the prompts paraphrased.
vinsonconsulting/phineas-case-study
A product manager's job includes building the measurement, not only the feature. Two repos are that work.
claude-skill-foundry is a development and scoring harness for Claude skills. Each carded skill ships a Skill Card: a recorded SkillSpector security scan, eval results, and trigger evals: generated and gated in CI so the catalog cannot quietly drift from what was measured. Coverage is still filling in (trigger precision/recall is the next measurement to land), and the catalog says so rather than printing a number it cannot back up.
califa-cards is the specification and toolkit underneath: the Skill Card schema, the security gate, and the generators the foundry consumes as a dependency. It is the nutrition label for an agent skill, plus the machinery that enforces it.
A one-page Astro site for domains that have an email address and not much else. I keep it here as a sample of shipping discipline: semantic-version releases, Renovate for dependencies, Biome for linting, and CI that runs Lighthouse three times per push and takes the median (so the badge stays honest, and a bit less flakey).
vinsonconsulting/lily-livered · live demo
Portfolio, case studies, and credentials: jimvinson.com. Find me on LinkedIn.
Open to full-time AI Product Manager, Technical Program Manager, or Operations roles. Bay Area or genuine remote.