The complete pipeline is now built! Here's how to test it:
Run the ingestion manually:
curl -X POST http://localhost:3000/api/ingest/hackernewsThis will:
- Fetch 20 recent Ask HN posts
- Filter those with score >= 3 and text content
- Send each to the LLM (Deepseek) for normalization
- Insert problems into the database
- Create tags automatically
Expected output:
{
"source": "hackernews",
"fetched": 20,
"processed": 5-15,
"errors": 0,
"startedAt": "2024-12-04...",
"completedAt": "2024-12-04..."
}After ingestion, verify the data:
# List problems
curl http://localhost:3000/api/problems
# List tags
curl http://localhost:3000/api/tags
# Search problems
curl "http://localhost:3000/api/problems?q=authentication"Watch the terminal running npm run dev to see detailed logs:
[HN Connector]- Fetching stories[Normalizer]- LLM processing[LLM]- Token usage[Ingestion]- Pipeline progress
Deepseek pricing (as of Dec 2024):
- ~$0.14 per 1M input tokens
- ~$0.28 per 1M output tokens
Processing 20 stories with ~2000 tokens each:
- Input: ~40K tokens = $0.006
- Output: ~10K tokens = $0.003
- Total: ~$0.01 per run π°
If you see errors:
- "LLM_API_KEY not set" β Add your Deepseek API key to
.env - "isProblem: false" β Normal, LLM filtered out non-problems
- Rate limit errors β The connector has built-in delays
- JSON parse errors β LLM response format issue, will retry
Ready to test! π