Skip to content

Kapil-chn7/ReMeet

Repository files navigation

ReMeet

AI-powered networking memory assistant. Capture business cards, QR codes, brochures, and photos at events. Search them later in plain English — "the robotics founder from Wednesday". Built for the Pirates of the Coral-bean hackathon.

Stack

  • Mobile: React Native (Expo SDK 56), offline-first SQLite
  • Server: Node.js + Express + TypeScript
  • DB: Postgres 16 + pgvector (Docker, port 5433)
  • OCR: Google Cloud Vision (text + face detection)
  • Blob storage: Google Cloud Storage (uploaded images)
  • Embeddings: OpenAI text-embedding-3-small (1536-dim)
  • Search: pgvector cosine similarity + MMR diversity re-ranking
  • PDF text: pdf-parse when a QR resolves to a PDF
  • Coral: read-only HTTP endpoints → SQL tables via custom source spec

Required keys

Copy server/.env.exampleserver/.env and fill in:

Key Why
DATABASE_URL Postgres + pgvector (Docker container — see run section)
JWT_SECRET Signs mobile session tokens
GOOGLE_CLIENT_ID/SECRET Google OAuth for mobile sign-in
GOOGLE_APPLICATION_CREDENTIALS Path to GCP service account JSON (Vision + GCS)
GCS_BUCKET Bucket name for uploaded images
OPENAI_API_KEY Embeddings only — see budget note below

OpenAI budget — $5 goes a long way

We use text-embedding-3-small at $0.02 / 1M tokens. Concretely:

  • Each capture embedded ≈ 300–2000 tokens → ~$0.000006–$0.00004 per capture
  • Each search query embedded ≈ 20–50 tokens → essentially free
  • $5 budget ≈ ~125,000 captures or millions of searches

Cost controls already baked in:

  • Plain photos (memory_type='photo') are never embedded — they go to the photo gallery instead.
  • Text input is capped at 8000 characters before embedding (services/embeddings.ts:18).
  • Each capture produces exactly one embedding (idempotent insert).

You will not run out of budget during this hackathon.

Run

# 1. Start Postgres + pgvector
docker run -d --name remeet-db \
  -e POSTGRES_USER=remeet -e POSTGRES_PASSWORD=remeet -e POSTGRES_DB=remeet \
  -p 5433:5432 \
  -v "$(pwd)/db/init.sql:/docker-entrypoint-initdb.d/init.sql" \
  pgvector/pgvector:pg16

# 2. Server
cd server
npm install
npm run dev    # http://localhost:4000

# 3. Mobile
cd ../mobile
npx expo start

Capture flow (what happens when mobile syncs)

User captures offline on phone (SQLite)
           ↓ tap "Sync"
POST /memory/capture/image   (one per image, base64 in JSON body)
POST /memory/batch           (QR/URL items in one shot)
           ↓
       JWT auth
           ↓
   ┌──── parallel ────┐
   ↓                  ↓
GCS upload    Google Vision API
(returns URL)  (TEXT + FACE detect)
           ↓
   Route by memory_type:
     business_card  → save to memory_items → embed → save to semantic_memories
     photo          → save to memory_items → STOP (no embed; gallery only)
     url            → scrape (cheerio or pdf-parse) → save → embed
     qr_text        → save → embed

What lives in memory_items after one capture

{
  "id": "6c443b14-f0a0-47cc-8fe6-1585c369b992",
  "user_id": "00000000-0000-0000-0000-000000000001",
  "source_type": "camera",
  "memory_type": "business_card",
  "content_text": "BlueFetch Robotics\nLet Robots Do the Job\nbluefetch.co\nRequest Demo",
  "document_url": "https://storage.googleapis.com/remeetbucket/images/.../609372b9.jpeg",
  "interaction_type": "personal",
  "has_text": true,
  "has_faces": false,
  "venue": "Bangalore AI Summit",
  "captured_at": "2026-05-31T03:42:00.000Z"
}

What lives in semantic_memories for the same capture

{
  "id": "a1f2...",
  "memory_item_id": "6c443b14-f0a0-47cc-8fe6-1585c369b992",
  "user_id": "00000000-0000-0000-0000-000000000001",
  "semantic_summary": "BlueFetch Robotics. Let Robots Do the Job. At: Bangalore AI Summit",
  "embedding": "[0.0123, -0.045, ..., 0.011]"
}

The summary is the text that was embedded. The embedding is the 1536-dim vector. We do NOT expose the embedding column over the /coral/* endpoints — only the summary.


Search flow — two-stage retrieval

Why two stages? Vector search is the right tool for "find what's similar." SQL is the right tool for "give me the exact details." Combining them means the vector DB only carries small summaries + IDs, and the relational DB carries the heavy metadata. Token-efficient by design.

User: "robotics person from Wednesday"
            ↓
[Stage 1 — semantic]
  embed query (≈30 tokens, near-zero cost)
            ↓
  pgvector:  SELECT id, memory_item_id
             FROM semantic_memories
             WHERE user_id = $1
             ORDER BY embedding <=> $query_vector
             LIMIT 50
            ↓
  MMR re-rank → top 10 diverse memory_item_ids
            ↓
[Stage 2 — relational]
  JOIN memory_items mi ON mi.id = sm.memory_item_id
  → full row: content_text, document_url, venue, captured_at, ...
            ↓
[Response to mobile]

Example search response

Input: GET /memory/search?context=robotics+person

[
  {
    "id": "a1f2...",
    "memory_item_id": "6c443b14-f0a0-47cc-8fe6-1585c369b992",
    "semantic_summary": "BlueFetch Robotics. Let Robots Do the Job. At: Bangalore AI Summit",
    "score": 0.8421,
    "memory_item": {
      "memory_type": "business_card",
      "content_text": "BlueFetch Robotics\nLet Robots Do the Job\n...",
      "document_url": "https://storage.googleapis.com/.../609372b9.jpeg",
      "venue": "Bangalore AI Summit",
      "captured_at": "2026-05-31T03:42:00.000Z",
      "interaction_type": "personal"
    }
  }
]

Note the score (cosine similarity, 0–1) and the joined memory_item payload. Mobile renders this directly.

Photos are excluded from search results and served separately via GET /memory/photos?date=YYYY-MM-DD.


Coral integration

Coral is a local CLI that wraps any HTTP API as queryable SQL tables. ReMeet exposes 4 read-only endpoints under /coral/* (no auth — Coral runs on your laptop), and coral/remeet.yaml tells Coral how to map them to SQL.

One-time setup

# 1. Install Coral
brew install withcoral/tap/coral

# 2. Validate the source spec
coral source lint ./coral/remeet.yaml

# 3. Register it (server must be running)
coral source add --file ./coral/remeet.yaml
# When prompted:
#   REMEET_API_BASE → http://localhost:4000

# 4. Confirm it shows up
coral source list
coral source info remeet --verbose

Tables Coral exposes

Table What
remeet.memory_items every capture, all metadata
remeet.semantic_memories embeddable items + summary (no vector column)
remeet.users accounts
remeet.search vector + MMR search exposed as a table — WHERE user_id=... AND q=...

Example Coral queries

# All business cards, most recent first
coral sql "SELECT memory_type, content_text, venue, captured_at
           FROM remeet.memory_items
           WHERE memory_type = 'business_card'
           ORDER BY captured_at DESC
           LIMIT 10"

# Semantic search via Coral
coral sql "SELECT memory_type, semantic_summary, score, venue
           FROM remeet.search
           WHERE user_id = '00000000-0000-0000-0000-000000000001'
             AND q       = 'robotics conference'"

# Personal conversations at conference-named venues
coral sql "SELECT semantic_summary, venue, captured_at
           FROM remeet.memory_items mi
           JOIN remeet.semantic_memories sm ON sm.memory_item_id = mi.id
           WHERE mi.interaction_type = 'personal'
             AND mi.venue ILIKE '%conference%'"

Debugging guide

1. Server doesn't start

cd server
npx tsc --noEmit          # surface any TypeScript errors
npm run dev               # see runtime error

Common causes:

  • Missing env var → check .env against .env.example
  • OPENAI_API_KEY not set → the embeddings client is lazy now, so server starts, but the first /memory/capture/* request will fail with "OPENAI_API_KEY is not set"
  • Port 4000 already in use → lsof -i :4000 and kill it

2. DB connection refused (ECONNREFUSED ::1:5432)

The Postgres container isn't running:

docker ps | grep remeet-db          # should show Up
docker logs remeet-db --tail 30     # if not running, see why
docker start remeet-db              # if it exists but is stopped

Note: we use port 5433 (not 5432) to avoid clashing with any other Postgres on your machine.

3. Password authentication failed

You probably have a different Postgres on 5432. Either stop it or confirm DATABASE_URL points to 5433:

grep DATABASE_URL server/.env
# Expected: postgres://remeet:remeet@localhost:5433/remeet

4. Image upload fails with PayloadTooLargeError

The body limit is already set to 20 MB in server/src/index.ts:11. If you bumped capture size, raise it further:

app.use(express.json({ limit: '50mb' }));

5. Cannot update access control for an object when uniform bucket-level access is enabled

Your GCS bucket uses uniform IAM. We don't call makePublic() — files become public via the bucket-level policy. If images return 403:

  • GCP Console → Cloud Storage → bucket → Permissions → add allUsersStorage Object Viewer.

6. Search returns nothing

# Are there any rows to search over?
docker exec remeet-db psql -U remeet -d remeet -c \
  "SELECT count(*) FROM semantic_memories;"

If 0:

  • You captured before adding OPENAI_API_KEY → embeddings silently failed. Re-sync captures.
  • Or: everything you captured was a plain photo (we don't embed those). Confirm with:
    docker exec remeet-db psql -U remeet -d remeet -c \
      "SELECT memory_type, count(*) FROM memory_items GROUP BY memory_type;"

7. Coral can't reach the server

curl http://localhost:4000/coral/memory_items?limit=2     # should return JSON
curl http://localhost:4000/health                          # {"status":"ok"}

If those work but coral sql fails, run coral source info remeet --verbose to see the resolved REMEET_API_BASE.

8. Inspect the DB directly

docker exec -it remeet-db psql -U remeet -d remeet

\dt                                          -- list tables
SELECT memory_type, count(*)
FROM memory_items GROUP BY memory_type;       -- distribution of captures
SELECT id, semantic_summary
FROM semantic_memories LIMIT 5;               -- preview embeddings

9. Mobile upload returns Host unreachable

Phone isn't on the same WiFi as the laptop running the server. Check both are on the same SSID.


Schema summary

  • users — Google-signed-in accounts (one demo user pre-seeded)
  • meetings — optional session grouping (column exists, not populated by mobile)
  • memory_items — one row per capture, with memory_type ∈ {business_card, photo, url, qr_text, profile}
  • semantic_memories — one row per embeddable item; pgvector embedding + human-readable semantic_summary

Foreign key chain: semantic_memories.memory_item_id → memory_items.id. Both tables carry user_id directly for fast filter-before-search.

About

I forget contacts & context after meeting, so I build this app. Now, we can re-meet after meeting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors