Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:

- name: Create .env file
run: |
echo "${{ secrets.ENV_FILE }}" > .env
echo "${{ secrets.ENV }}" > .env
echo ".env file created"

- name: Set up Flutter
Expand Down Expand Up @@ -173,7 +173,7 @@ jobs:
# .env 파일 생성 (보안을 위해 아티팩트가 아닌 시크릿에서 생성)
- name: Create .env file
run: |
echo "${{ secrets.ENV_FILE }}" > .env
echo "${{ secrets.ENV }}" > .env
echo "✅ .env 파일 생성됨 (크기: $(wc -c < .env) bytes)"

# Release Keystore 설정
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
# .env 파일 생성
- name: Create .env file from GitHub Secret
run: |
echo "${{ secrets.ENV_FILE }}" > .env
echo "${{ secrets.ENV }}" > .env
echo ".env file created"
ls -la

Expand Down
94 changes: 94 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

MapSee-AI is a Python-based SNS content data extraction pipeline that processes Instagram and YouTube content to extract place/location information. It's a FastAPI service that receives URLs, downloads media content, performs speech-to-text (STT), and uses LLM (Gemini) to extract structured place data.

## Development Commands

```bash
# Install dependencies (Python 3.13+)
uv sync

# Run the development server
uv run uvicorn src.main:app --host 0.0.0.0 --port 8001 --reload

# Alternative: run directly
uv run python -m src.main
```

### External Dependencies
- **ffmpeg/ffprobe**: Required for audio/video processing
- **yt-dlp**: Used for downloading Instagram/YouTube content

## Architecture

### Request Flow
1. `/api/extract-places` receives `contentId` + `snsUrl`
2. Request returns immediately (async processing)
3. Background task runs the extraction pipeline
4. Results sent to backend via callback URL

### Pipeline Stages (workflow.py)
```
URL → sns_router → get_audio → get_transcription (STT) → get_video_narration → get_llm_response → callback
Platform detection (YouTube/Instagram)
Content type detection (video/image)
Download media via yt-dlp
```

### Key Components

**src/apis/**: FastAPI routers
- `place_router.py`: Main API endpoint for place extraction

**src/services/**: Business logic
- `workflow.py`: Main extraction pipeline orchestration
- `content_router.py`: Routes to appropriate downloader based on platform/content type
- `background_tasks.py`: Async task execution and callback handling
- `smb_service.py`: SMB file server integration

**src/services/modules/**: Processing modules
- `llm.py`: Gemini API integration for place extraction
- `stt.py`: Faster-Whisper speech-to-text

**src/services/preprocess/**: Media preprocessing
- `sns.py`: Instagram/YouTube content download (yt-dlp)
- `audio.py`: FFmpeg audio extraction
- `video.py`: Video frame extraction (OCR currently disabled)

**src/models/**: Pydantic schemas
- `ExtractionState`: TypedDict that flows through the pipeline, accumulating data at each stage

**src/core/**: Configuration and utilities
- `config.py`: Settings from .env (API keys, SMB config, etc.)
- `exceptions.py`: CustomError class for pipeline errors

### State Flow Pattern
The pipeline uses `ExtractionState` (TypedDict) as a mutable state object that gets passed through each processing stage. Each stage updates specific fields:
- `contentStream`/`imageStream`: Downloaded media
- `captionText`: Post caption/description
- `audioStream`: Extracted audio
- `transcriptionText`: STT output
- `ocrText`: Video text (currently disabled)
- `result`: Final extracted places

## Configuration

Required environment variables in `.env`:
- `GOOGLE_API_KEY`: Gemini API key
- `AI_SERVER_API_KEY`: API key for this service
- `YOUTUBE_API_KEY`: YouTube Data API key
- `INSTAGRAM_POST_DOC_ID`, `INSTAGRAM_APP_ID`: Instagram API config
- `BACKEND_CALLBACK_URL`, `BACKEND_API_KEY`: Callback endpoint config
- `SMB_*`: SMB file server settings (optional)

## Notes

- OCR functionality is currently disabled (noted with comments throughout)
- The service uses in-memory BytesIO streams for media processing
- Faster-Whisper runs on CPU with int8 quantization by default
- LLM responses are validated against Pydantic schemas using `response_json_schema`
6 changes: 3 additions & 3 deletions version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@
# - 버전은 항상 높은 버전으로 자동 동기화됩니다
# ===================================================================

version: "0.0.3"
version_code: 3 # app build number
version: "0.0.4"
version_code: 4 # app build number
project_type: "python" # spring, flutter, react, react-native, react-native-expo, node, python, basic
metadata:
last_updated: "2026-01-11 12:12:03"
last_updated: "2026-01-11 12:18:38"
last_updated_by: "Cassiiopeia"
default_branch: "main"
integrated_from: "SUH-DEVOPS-TEMPLATE"
Expand Down