Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions apps/next-js/vitest.config.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
import { defineConfig } from 'vitest/config'

export default defineConfig({
esbuild: {
jsx: 'automatic',
},
test: {
environment: 'jsdom',
globals: true,
Expand Down
9 changes: 9 additions & 0 deletions examples/cookbook/firecrawl/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Example env for Firecrawl + Moss cookbook
# Copy to .env and fill in values before running the notebook.

# Moss credentials
MOSS_PROJECT_ID=your_moss_project_id
MOSS_PROJECT_KEY=your_moss_project_key

# Firecrawl API key
FIRECRAWL_API_KEY=your_firecrawl_api_key
97 changes: 97 additions & 0 deletions examples/cookbook/firecrawl/README.md

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 AGENTS.md cookbook table not updated with firecrawl entry

The AGENTS.md file documents all cookbook integrations in a table under "Framework Cookbooks". This new firecrawl/ cookbook is not listed there. While AGENTS.md is descriptive (documenting repo state for AI agents) rather than prescriptive (mandating rules), keeping it synchronized helps agents understand the repo. Other recently added cookbooks like pydantic-ai/ and langgraph/ are already in the table, suggesting it should be maintained.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Firecrawl + Moss Cookbook Example

Use Firecrawl to turn one or more URLs into clean markdown, then index the results into Moss and query them semantically from a notebook.

> This is a cookbook example, not a packaged integration. Open [firecrawl_moss.ipynb](firecrawl_moss.ipynb) to follow the full URL-to-query pipeline.

## Installation

```bash
pip install firecrawl-py moss python-dotenv

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add pyproject

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add pyproject

@yatharthk2 Added pyproject and updated readme (Removed Markdown Normalization from the architecture diagram)

```

## Setup

Set these environment variables in your shell or a `.env` file:

```bash
FIRECRAWL_API_KEY=your-firecrawl-api-key
MOSS_PROJECT_ID=your-project-id
MOSS_PROJECT_KEY=your-project-key
```

## Quick Start

1. Open [firecrawl_moss.ipynb](firecrawl_moss.ipynb) in Jupyter or VS Code.
2. Run the setup and helper cells.
3. Set `urls` to the pages you want to ingest.
4. Run `await build_and_query_knowledge_base(urls)` to crawl, index, and query the content.

## Workflow

The notebook is structured for efficiency:

1. **Prepare** (one-time): Crawl URLs → normalize markdown → index into Moss
2. **Query** (repeated): Run semantic queries against the indexed knowledge base without re-crawling

This design lets you crawl once (which can be slow/expensive) and then iterate on queries quickly.

## Architecture

```
┌─────────────┐
│ URLs │
└──────┬──────┘
|
┌──────▼─────────────────┐
│ Crawled Pages │
│ (raw HTML/markdown) │
└──────┬─────────────────┘
|
┌──────▼─────────────────┐
│ Markdown │
│ (one DocumentInfo │
│ per page) │
└──────┬─────────────────┘
├──> Moss Create Index
┌──────▼─────────────────┐
│ Indexed Knowledge │
│ Base (local or cloud) │
└──────┬─────────────────┘
├──> Semantic Query (reusable)
│ (no re-crawling needed)
┌──────▼─────────────────┐
│ Top-K Results │
│ (scored passages) │
└─────────────────────────┘
```

## What the notebook does

```python
from firecrawl import Firecrawl
from moss import DocumentInfo, MossClient, QueryOptions

job = Firecrawl(api_key=FIRECRAWL_API_KEY).crawl(
url="https://example.com",
limit=3,
scrape_options={"formats": ["markdown"]},
)

documents = [DocumentInfo(id="1", text=job.data[0].markdown, metadata={"source_url": "https://example.com"})]
await MossClient(MOSS_PROJECT_ID, MOSS_PROJECT_KEY).create_index("firecrawl-demo", documents)
```

## Files

| File | Description |
|------|-------------|
| `firecrawl_moss.ipynb` | Notebook that crawls URLs, indexes markdown into Moss, and runs semantic search |
Loading
Loading