TaskSignal - AI Problem Discovery Engine

From Reddit/forum complaints → evidence-backed project ideas → build-ready Codex prompts.

TaskSignal is an AI-assisted engine that mines public developer and community discussions, detects concrete repetitive tasks people complain about, clusters similar pain signals, scores software opportunities, and generates Codex-ready MVP prompts.

Project Status

TaskSignal is a portfolio-ready MVP built by Yurii Bakurov. It is designed for one local operator on their own machine: fixture data works out of the box, a local workspace profile stores that user's research defaults, and repeatable API-backed workflows can be enabled for supported public sources when credentials are provided.

Current public posture: TaskSignal is an early public application repository, not a widely adopted package. Its strongest evidence today is reproducibility, release hygiene, CI, security/privacy documentation, contributor issues, and a browser-verified demo flow. See the demo evidence snapshot and Codex for OSS evidence for the current review package.

Useful starting points:

Why This Exists

Most idea lists are generic. TaskSignal is a task-replacement radar: it looks for specific repeated workflows people hate doing, such as exporting Stripe data into a spreadsheet every Friday and turning it into a client report.

Who Should Use This

TaskSignal is for maintainers, builders, indie hackers, developer-tool teams, and researchers who want a local-first way to review public pain signals before deciding what to build. It is not for scraping private communities, profiling individuals, spam, outreach automation, or replacing human product judgment.

What It Does

Loads demo fixture data with no API keys.
Stores one local workspace profile with owner/focus/default research settings.
Saves repeatable research projects with source, query, limit, labels, cadence, last run, next run, and run count.
Reports integration readiness without exposing secret values.
Records scan outcomes with found/saved items, detected signals, generated opportunities, and guidance when live data produces no ranked opportunity.
Normalizes Reddit, Hacker News, GitHub Issues, Stack Exchange, and fixture-style records.
Stores author hashes instead of raw usernames by default.
Detects complaints, manual workflows, tool requests, workarounds, buying intent, and confusion.
Generates local embeddings with sentence-transformers/all-MiniLM-L6-v2 when available.
Falls back to deterministic local vectors when the model is unavailable.
Clusters signals with a local thematic fallback by default, with optional DBSCAN when TASKSIGNAL_USE_SKLEARN_CLUSTERING=1.
Scores opportunities using frequency, recency, pain, concreteness, buying intent, feasibility, and competition penalty.
Generates opportunity cards, full Codex-ready build prompts, and richer Codex task packs.
Optionally enhances generated prompts through OpenAI API or local Ollama when explicitly configured.

Architecture

flowchart TD
  A[Public sources and fixtures] --> B[Ingestion connectors]
  B --> C[Normalizer and deduplicator]
  C --> D[(PostgreSQL + pgvector)]
  D --> E[Pain and task detector]
  E --> F[Embedding service]
  F --> G[Thematic fallback clustering / optional DBSCAN]
  G --> H[Opportunity scoring]
  H --> I[Prompt generator]
  I --> J[FastAPI API]
  J --> K[Next.js dashboard]

Tech Stack

Frontend: Next.js, TypeScript, Tailwind CSS, TanStack Query, Recharts, React Markdown, Zod-ready types.

Backend: FastAPI, Pydantic v2, SQLAlchemy 2, Alembic, PostgreSQL, pgvector, pytest, ruff, scikit-learn.

ML/NLP: sentence-transformers with local-only load when the model cache exists, deterministic fallback vectors, optional DBSCAN clustering, rule-based signal detector.

Infra: Docker Compose, Makefile, GitHub Actions CI, scheduled ingestion template.

Quickstart

cp .env.example .env
make doctor
make up

Open the frontend at http://localhost:3000, go to Projects, save a research workflow, then run it. For a first proof path, go to Dashboard and click Process demo data. To use live public data, choose a source, query, and limit in Live source, then click Run scan.

If setup fails or a fresh checkout looks incomplete, run:

make doctor

make doctor checks the required files, local .env, Python, Node 20+, npm, repo-local Python dev tools, fixture files, and whether generated files are accidentally tracked. Docker is only required for the Compose quickstart.

API health check:

curl http://localhost:8000/health

Local Development

Run the API and frontend separately:

cd apps/api
../../.venv/bin/uvicorn app.main:app --reload

cd apps/web
npm run dev

Run checks before publishing changes:

make test
make lint
make verify

The Makefile prefers repo-local Python tools in .venv/bin. On Apple Silicon macOS it also prepends Homebrew Node 20 from /opt/homebrew/opt/node@20/bin when available, matching the runtime required by the Next.js web app.

Run the release-readiness gate before tagging a release:

make release-check

Run the first-run smoke check to verify the credential-free fixture path against a temporary database, including dashboard route wiring and task-pack export:

make smoke

To also boot the Next.js dev server and request /dashboard, run:

apps/api/.venv/bin/python -u scripts/first_run_smoke.py --with-web-server

Use the local CLI for headless operation:

scripts/tasksignal_cli.py readiness
scripts/tasksignal_cli.py configure-workspace --owner "Local Builder" --goal "Find developer-tool opportunities" --source hackernews --query ask --cadence daily
scripts/tasksignal_cli.py create-project --name "Track CI/CD pain" --source hackernews --query ask --cadence daily
scripts/tasksignal_cli.py run-due
scripts/tasksignal_cli.py task-pack <opportunity-id> --output task-pack.md

TaskSignal does not require multi-user accounts for this local mode. The local workspace profile is a singleton in the app database and is meant for the person running the app on that machine.

Distribution

TaskSignal is currently an application repository, not a published Python or npm library. Use the source checkout or Docker Compose workflow above. Reusable packages may be split out later if a stable library boundary emerges.

Reviewer Quick Check

For a quick public review, inspect:

Repository Layout

apps/api      FastAPI backend, ML pipeline, database models, tests
apps/web      Next.js dashboard, opportunity views, prompt export UI
data          Demo fixtures for local-first processing
docs          Architecture, API, deployment, ethics, and model notes
notebooks     Classifier training and evaluation workbooks

Fixture Demo Mode

Fixture mode is the default. It loads records from data/fixtures, processes them end to end, and should generate at least five opportunity cards:

AI-generated code audit tool
Early-stage SaaS lead/community signal radar
Simple onboarding drop-off analyzer
GitHub Actions workflow debugging assistant
Spreadsheet-to-report automation helper

API Connector Setup

Live scans use official APIs and keep the same local-first scoring/generation pipeline as fixture mode. The unauthenticated POST /api/scans endpoint is restricted to public API-safe sources (fixture and hackernews) so network callers cannot spend server-side credentials or retrieve data visible to server-side tokens.

Trusted operators can still configure the internal connector pipeline with source credentials when running controlled jobs outside the public endpoint:

REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_USER_AGENT
GITHUB_TOKEN
STACK_EXCHANGE_KEY

Hacker News works without credentials through the public Firebase API. GitHub and Stack Exchange can run without keys at lower rate limits. Reddit requires OAuth credentials. No paid LLM key is required. LLM_PROVIDER=none is the default.

Connector credentials belong in environment variables, not source registry records. Source registry write endpoints require OPERATOR_SCAN_TOKEN, reject secret-like config_json keys, and read endpoints return redacted config so local rows cannot expose token values.

PUBLIC_SCAN_SOURCES can narrow the public endpoint further, for example to hackernews only. Credentialed sources such as GitHub, Reddit, and Stack Exchange stay reserved for trusted internal scan jobs.

Browser-triggered runs of credentialed sources are available through saved research projects only when OPERATOR_SCAN_TOKEN is configured on the API and the same token is entered locally in the Projects or Integrations page. This keeps hosted deployments from silently spending server-side credentials while still letting trusted local operators connect APIs.

Saved projects support manual, hourly, daily, weekly, and custom-hour cadences. TaskSignal does not hide a scheduler inside the web process. Run due projects from the Projects page, scripts/tasksignal_cli.py run-due, cron, GitHub Actions, or another explicit worker.

Optional prompt enhancement uses LLM_PROVIDER=openai plus OPENAI_API_KEY, or LLM_PROVIDER=ollama plus a local Ollama server. Browser-triggered enhancement requires OPERATOR_SCAN_TOKEN on the API and the matching X-Operator-Scan-Token request header so network callers cannot spend server-side model credentials. ChatGPT/Codex subscriptions do not provide backend API credentials; TaskSignal supports subscription users by exporting task packs they can open in their own signed-in Codex app, CLI, IDE extension, or Codex web session.

Destructive fixture resets require DEMO_RESET_TOKEN and the matching X-Demo-Reset-Token request header. The normal dashboard demo-processing action is non-destructive by default.

Codex And Agent Handoff

Each opportunity can export:

A generated Codex prompt.
An evidence bundle.
A Codex task pack with objective, suggested MVP, score, evidence, acceptance criteria, privacy constraints, and recommended Codex flow.

Task packs are designed for users who want to use their own signed-in Codex app, CLI, IDE extension, or Codex web session. They do not spend ChatGPT/Codex plan usage from the TaskSignal backend. A repo-local skill package is available at skills/tasksignal-opportunity-builder for agents that can load Codex-style skills.

ML/NLP Approach

The MVP uses transparent rules first. It scores pain phrases, repetition phrases, tool requests, buying intent, and task concreteness hints. Embeddings use sentence-transformers/all-MiniLM-L6-v2 only when locally available; otherwise deterministic vectors keep the demo working.

Scoring Formula

opportunity_score =
  0.25 * frequency_score
+ 0.20 * recency_score
+ 0.20 * pain_intensity_score
+ 0.15 * task_concreteness_score
+ 0.10 * buying_intent_score
+ 0.10 * feasibility_score
- 0.10 * competition_penalty

Privacy And Ethics

TaskSignal is designed for public-data research, product discovery, and learning. It does not store raw usernames by default, preserves source URLs for attribution, respects API boundaries, and should not be used for spam or harassment workflows.

Before enabling live connectors, review Data ethics, configure API credentials through environment variables or GitHub repository secrets, and avoid committing .env files or exported datasets.

Example Generated Opportunity

Developers need clearer GitHub Actions failure diagnosis

Problem: teams spend repetitive time reading noisy CI logs, searching YAML errors, and guessing root causes.

Suggested MVP: a CI log summarizer and workflow linter that identifies likely YAML mistakes, dependency failures, and next fixes.

Example Generated Codex Prompt

# Build Developers need clearer GitHub Actions failure diagnosis

You are a senior full-stack engineer. Build a working MVP...

Portfolio Notes

This repository demonstrates full-stack engineering, API design, Python backend development, TypeScript frontend development, PostgreSQL/pgvector modeling, ML/NLP pipelines, clustering, product scoring, privacy-conscious design, Docker, CI/CD, tests, and technical writing.

Roadmap

Publish and maintain tagged releases with changelog entries.
Expand contributor-friendly fixtures, docs, and public issues.
Add richer source scheduling and rate-limit state after privacy review.
Add pgvector ANN search in production mode.
Add reviewer workflow for human labels.

See Roadmap for maintainer tasks, security milestones, and longer-term ideas.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.cursor		.cursor
.github		.github
apps		apps
data/fixtures		data/fixtures
docs		docs
notebooks		notebooks
scripts		scripts
skills/tasksignal-opportunity-builder		skills/tasksignal-opportunity-builder
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
Makefile		Makefile
PRODUCT.md		PRODUCT.md
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TaskSignal - AI Problem Discovery Engine

Project Status

Why This Exists

Who Should Use This

What It Does

Architecture

Tech Stack

Quickstart

Local Development

Distribution

Reviewer Quick Check

Repository Layout

Fixture Demo Mode

API Connector Setup

Codex And Agent Handoff

ML/NLP Approach

Scoring Formula

Privacy And Ethics

Example Generated Opportunity

Example Generated Codex Prompt

Portfolio Notes

Roadmap

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TaskSignal - AI Problem Discovery Engine

Project Status

Why This Exists

Who Should Use This

What It Does

Architecture

Tech Stack

Quickstart

Local Development

Distribution

Reviewer Quick Check

Repository Layout

Fixture Demo Mode

API Connector Setup

Codex And Agent Handoff

ML/NLP Approach

Scoring Formula

Privacy And Ethics

Example Generated Opportunity

Example Generated Codex Prompt

Portfolio Notes

Roadmap

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages