Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
83b0591
MapSee-AI ๋ฒ„์ „ ๊ด€๋ฆฌ : docs : v0.0.4 README ๋ฒ„์ „ ์ •๋ณด ์—…๋ฐ์ดํŠธ [skip ci]
actions-user Jan 11, 2026
0466ba9
ํ”„๋กœ์ ํŠธ ๊ธฐ๋ณธ ์„ธํŒ… ์ดˆ๊ธฐํ™” : feat : pr-preview ์›Œํฌํ”Œ๋กœ์šฐ ์ถ”๊ฐ€ https://github.com/MapSeeโ€ฆ
Cassiiopeia Jan 11, 2026
5370434
Merge branch 'main' of https://github.com/MapSee-Lab/MapSee-AI
Cassiiopeia Jan 11, 2026
a3a9fd7
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.5 [skip ci]
actions-user Jan 11, 2026
461a04d
ํ”„๋กœ์ ํŠธ ๊ธฐ๋ณธ ์„ธํŒ… ์ดˆ๊ธฐํ™” : fix : uv.lock ์ถ”๊ฐ€ https://github.com/MapSee-Lab/MapSeโ€ฆ
Cassiiopeia Jan 11, 2026
da1a089
Merge branch 'main' of https://github.com/MapSee-Lab/MapSee-AI
Cassiiopeia Jan 11, 2026
5f3bfab
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.6 [skip ci]
actions-user Jan 11, 2026
52a25d1
ํ”„๋กœ์ ํŠธ ๊ธฐ๋ณธ ์„ธํŒ… ์ดˆ๊ธฐํ™” : refactor : ํŠธ๋ฆฌ๊ฑฐ ๋ฉ”์‹œ์ง€ @suh-lab server bulid ๋กœ ๋ณ€๊ฒฝ https:โ€ฆ
Cassiiopeia Jan 11, 2026
d3565a3
Merge branch 'main' of https://github.com/MapSee-Lab/MapSee-AI
Cassiiopeia Jan 11, 2026
93268a5
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.7 [skip ci]
actions-user Jan 11, 2026
c7528a1
ํ”„๋กœ์ ํŠธ ๊ธฐ๋ณธ ์„ธํŒ… ์ดˆ๊ธฐํ™” : feat : PR์ด์™ธ์— ์ด์Šˆ๋„ ์ง€์›ํ•˜๋„๋ก ์ˆ˜์ • https://github.com/MapSeeโ€ฆ
Cassiiopeia Jan 11, 2026
d9047a5
Merge branch 'main' of https://github.com/MapSee-Lab/MapSee-AI
Cassiiopeia Jan 11, 2026
c61c610
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.8 [skip ci]
actions-user Jan 11, 2026
b9a6c77
ํ”„๋กœ์ ํŠธ ๊ธฐ๋ณธ ์„ธํŒ… ์ดˆ๊ธฐํ™” : fix : branch marker ์ˆ˜์ • https://github.com/MapSee-Labโ€ฆ
Cassiiopeia Jan 11, 2026
7e2c4a3
Merge branch 'main' of https://github.com/MapSee-Lab/MapSee-AI
Cassiiopeia Jan 11, 2026
9b3b2ec
์ธ์Šคํƒ€ ๊ฒŒ์‹œ๊ธ€์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ์ด๋ฏธ์ง€, ์บก์…˜ ์ถ”์ถœ ๋กœ์ง ์ถ”๊ฐ€ : feat : ์ธ์Šคํƒ€๊ทธ๋žจ post url์— ๋Œ€ํ•œ ๋ฉ”ํƒ€ ๋ฐ์ดโ€ฆ
Cassiiopeia Jan 11, 2026
48ea33b
์ธ์Šคํƒ€ ๊ฒŒ์‹œ๊ธ€์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ์ด๋ฏธ์ง€, ์บก์…˜ ์ถ”์ถœ ๋กœ์ง ์ถ”๊ฐ€ : feat : ํŒŒ์ผ ๊ตฌ์กฐ ๋ถ„๋ฅ˜ ๋ฐ ๊ณ ๋„ํ™” ์œ ํŠœ๋ธŒ๋กœ์ง ๊ป๋ฐ๊ธฐโ€ฆ
Cassiiopeia Jan 11, 2026
8e49906
์ธ์Šคํƒ€ ๊ฒŒ์‹œ๊ธ€์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ์ด๋ฏธ์ง€, ์บก์…˜ ์ถ”์ถœ ๋กœ์ง ์ถ”๊ฐ€ : refactor : PR Preview ๋กœ์ง ๊ณ ๋„ํ™”, ํ™˜โ€ฆ
Cassiiopeia Jan 11, 2026
2c470d6
Merge pull request #4 from MapSee-Lab/20260111_#3_์ธ์Šคํƒ€_๊ฒŒ์‹œ๊ธ€์—_๋Œ€ํ•œ_๋ฉ”ํƒ€๋ฐ์ดํ„ฐ_์ดโ€ฆ
Cassiiopeia Jan 11, 2026
94c1e80
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.9 [skip ci]
actions-user Jan 11, 2026
5993ddb
์ธ์Šคํƒ€ ๊ฒŒ์‹œ๊ธ€์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ์ด๋ฏธ์ง€, ์บก์…˜ ์ถ”์ถœ ๋กœ์ง ์ถ”๊ฐ€ : refactor : pull_request ์ด๋ฒคํŠธ + cโ€ฆ
Cassiiopeia Jan 11, 2026
3a99b1c
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.10 [skip ci]
actions-user Jan 11, 2026
8838883
์ธ์Šคํƒ€ ๊ฒŒ์‹œ๊ธ€์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ์ด๋ฏธ์ง€, ์บก์…˜ ์ถ”์ถœ ๋กœ์ง ์ถ”๊ฐ€ : fix : Line 147์˜ IS_PR: ${{ githโ€ฆ
Cassiiopeia Jan 11, 2026
968071c
Merge branch 'main' of https://github.com/MapSee-Lab/MapSee-AI
Cassiiopeia Jan 11, 2026
dc9f87a
MapSee-AI ๋ฒ„์ „ ์ •๋ณด ๊ด€๋ฆฌ: chore: ๋ฒ„์ „ 0.0.11 [skip ci]
actions-user Jan 11, 2026
cca64ef
MapSee-AI ๋ฒ„์ „ ๊ด€๋ฆฌ : docs : v0.0.11 ๋ฆด๋ฆฌ์ฆˆ ๋ฌธ์„œ ์—…๋ฐ์ดํŠธ (PR #7)
actions-user Jan 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,338 changes: 1,338 additions & 0 deletions .github/workflows/PROJECT-PYTHON-SYNOLOGY-PR-PREVIEW.yaml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ env/
.env.*.local

# uv
uv.lock
# uv.lock - lock ํŒŒ์ผ์€ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๋นŒ๋“œ๋ฅผ ์œ„ํ•ด ๋ฒ„์ „ ๊ด€๋ฆฌ์— ํฌํ•จ

# Logs
logs/
Expand Down
41 changes: 41 additions & 0 deletions CHANGELOG.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"metadata": {
"lastUpdated": "2026-01-11T15:07:27Z",
"currentVersion": "0.0.11",
"projectType": "python",
"totalReleases": 1
},
"releases": [
{
"version": "0.0.11",
"project_type": "python",
"date": "2026-01-11",
"pr_number": 7,
"raw_summary": "## Summary by CodeRabbit\n\n## ๋ฆด๋ฆฌ์Šค ๋…ธํŠธ\n\n* **์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ**\n * ํ…Œ์ŠคํŠธ API ์—”๋“œํฌ์ธํŠธ ์ถ”๊ฐ€ (/api/test/scrape, /api/test/health)\n * Instagram ์ฝ˜ํ…์ธ  ์Šคํฌ๋ž˜ํ•‘ ๊ธฐ๋Šฅ ์ถ”๊ฐ€ (๊ฒŒ์‹œ๋ฌผ, ๋ฆด์Šค, IGTV ์ง€์›)\n * URL ๋ถ„๋ฅ˜ ๋ฐ ๋ผ์šฐํŒ… ๊ธฐ๋Šฅ ์ถ”๊ฐ€\n \n* **๋ฌธ์„œ**\n * ๊ฐœ๋ฐœ์ž ๋ฌธ์„œ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ํ˜„์ง€ํ™”\n\n* **๋ฒ„์ „ ๋ฐ ์˜์กด์„ฑ**\n * ๋ฒ„์ „ 0.0.4 โ†’ 0.0.11๋กœ ์—…๊ทธ๋ ˆ์ด๋“œ\n * Playwright ๋ธŒ๋ผ์šฐ์ € ์ž๋™ํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ถ”๊ฐ€",
"parsed_changes": {
"์ƒˆ๋กœ์šด_๊ธฐ๋Šฅ": {
"title": "์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ",
"items": [
"ํ…Œ์ŠคํŠธ API ์—”๋“œํฌ์ธํŠธ ์ถ”๊ฐ€ (/api/test/scrape, /api/test/health)",
"Instagram ์ฝ˜ํ…์ธ  ์Šคํฌ๋ž˜ํ•‘ ๊ธฐ๋Šฅ ์ถ”๊ฐ€ (๊ฒŒ์‹œ๋ฌผ, ๋ฆด์Šค, IGTV ์ง€์›)",
"URL ๋ถ„๋ฅ˜ ๋ฐ ๋ผ์šฐํŒ… ๊ธฐ๋Šฅ ์ถ”๊ฐ€"
]
},
"๋ฌธ์„œ": {
"title": "๋ฌธ์„œ",
"items": [
"๊ฐœ๋ฐœ์ž ๋ฌธ์„œ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ํ˜„์ง€ํ™”"
]
},
"๋ฒ„์ „_๋ฐ_์˜์กด์„ฑ": {
"title": "๋ฒ„์ „ ๋ฐ ์˜์กด์„ฑ",
"items": [
"๋ฒ„์ „ 0.0.4 โ†’ 0.0.11๋กœ ์—…๊ทธ๋ ˆ์ด๋“œ",
"Playwright ๋ธŒ๋ผ์šฐ์ € ์ž๋™ํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ถ”๊ฐ€"
]
}
},
"parse_method": "markdown"
}
]
}
25 changes: 25 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Changelog

**ํ˜„์žฌ ๋ฒ„์ „:** 0.0.11
**๋งˆ์ง€๋ง‰ ์—…๋ฐ์ดํŠธ:** 2026-01-11T15:07:27Z

---

## [0.0.11] - 2026-01-11

**PR:** #7

**์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ**
- ํ…Œ์ŠคํŠธ API ์—”๋“œํฌ์ธํŠธ ์ถ”๊ฐ€ (/api/test/scrape, /api/test/health)
- Instagram ์ฝ˜ํ…์ธ  ์Šคํฌ๋ž˜ํ•‘ ๊ธฐ๋Šฅ ์ถ”๊ฐ€ (๊ฒŒ์‹œ๋ฌผ, ๋ฆด์Šค, IGTV ์ง€์›)
- URL ๋ถ„๋ฅ˜ ๋ฐ ๋ผ์šฐํŒ… ๊ธฐ๋Šฅ ์ถ”๊ฐ€

**๋ฌธ์„œ**
- ๊ฐœ๋ฐœ์ž ๋ฌธ์„œ๋ฅผ ํ•œ๊ตญ์–ด๋กœ ํ˜„์ง€ํ™”

**๋ฒ„์ „ ๋ฐ ์˜์กด์„ฑ**
- ๋ฒ„์ „ 0.0.4 โ†’ 0.0.11๋กœ ์—…๊ทธ๋ ˆ์ด๋“œ
- Playwright ๋ธŒ๋ผ์šฐ์ € ์ž๋™ํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ถ”๊ฐ€

---

164 changes: 92 additions & 72 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,94 +1,114 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
์ด ํŒŒ์ผ์€ Claude Code (claude.ai/code)๊ฐ€ ์ด ์ €์žฅ์†Œ์˜ ์ฝ”๋“œ๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ์ฐธ๊ณ ํ•˜๋Š” ๊ฐ€์ด๋“œ์ž…๋‹ˆ๋‹ค.

## Project Overview
## ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

MapSee-AI is a Python-based SNS content data extraction pipeline that processes Instagram and YouTube content to extract place/location information. It's a FastAPI service that receives URLs, downloads media content, performs speech-to-text (STT), and uses LLM (Gemini) to extract structured place data.
MapSee-AI๋Š” Python ๊ธฐ๋ฐ˜์˜ SNS ์ฝ˜ํ…์ธ  ๋ฐ์ดํ„ฐ ์ถ”์ถœ ํŒŒ์ดํ”„๋ผ์ธ์ž…๋‹ˆ๋‹ค. Instagram๊ณผ YouTube ์ฝ˜ํ…์ธ ๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ์žฅ์†Œ/์œ„์น˜ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. FastAPI ์„œ๋น„์Šค๋กœ URL์„ ๋ฐ›์•„ ๋ฏธ๋””์–ด ์ฝ˜ํ…์ธ ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ , ์Œ์„ฑ-ํ…์ŠคํŠธ ๋ณ€ํ™˜(STT)์„ ์ˆ˜ํ–‰ํ•œ ๋’ค, LLM(Gemini)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌ์กฐํ™”๋œ ์žฅ์†Œ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

## Development Commands
## ๊ฐœ๋ฐœ ๋ช…๋ น์–ด

```bash
# Install dependencies (Python 3.13+)
# ์˜์กด์„ฑ ์„ค์น˜ (Python 3.13+)
uv sync

# Run the development server
# ๊ฐœ๋ฐœ ์„œ๋ฒ„ ์‹คํ–‰
uv run uvicorn src.main:app --host 0.0.0.0 --port 8001 --reload

# Alternative: run directly
# ๋Œ€์•ˆ: ์ง์ ‘ ์‹คํ–‰
uv run python -m src.main
```

### External Dependencies
- **ffmpeg/ffprobe**: Required for audio/video processing
- **yt-dlp**: Used for downloading Instagram/YouTube content
### ์™ธ๋ถ€ ์˜์กด์„ฑ
- **ffmpeg/ffprobe**: ์˜ค๋””์˜ค/๋น„๋””์˜ค ์ฒ˜๋ฆฌ์— ํ•„์š”
- **yt-dlp**: Instagram/YouTube ์ฝ˜ํ…์ธ  ๋‹ค์šด๋กœ๋“œ์— ์‚ฌ์šฉ

## Architecture
## ๋„ค์ด๋ฐ ๊ทœ์น™

### Request Flow
1. `/api/extract-places` receives `contentId` + `snsUrl`
2. Request returns immediately (async processing)
3. Background task runs the extraction pipeline
4. Results sent to backend via callback URL
### ํŒŒ์ผ๋ช…
- ํŒŒ์ผ๋ช…๋งŒ ๋ณด๊ณ  ์—ญํ• ์„ ์•Œ ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ
- ์˜ˆ: `base.py` โŒ โ†’ `playwright_browser.py` โœ…
- ์˜ˆ: `router.py` โŒ โ†’ `scrape_router.py` โœ…

### Pipeline Stages (workflow.py)
### ๋ณ€์ˆ˜/ํ•จ์ˆ˜๋ช…
- ๊ธธ์–ด๋„ ๋ช…ํ™•ํ•œ ์ด๋ฆ„ ์„ ํ˜ธ
- ์ถ•์•ฝ์–ด ์‚ฌ์šฉ ์ตœ์†Œํ™”
- ์˜ˆ: `desc` โŒ โ†’ `description` โœ…
- ์˜ˆ: `res` โŒ โ†’ `response` โœ…
- ์˜ˆ: `cnt` โŒ โ†’ `count` โœ…

## API ์‘๋‹ต ๊ทœ์น™

- `success` ํ•„๋“œ ์‚ฌ์šฉ ๊ธˆ์ง€ - HTTP ์ƒํƒœ ์ฝ”๋“œ๋กœ ์„ฑ๊ณต/์‹คํŒจ ํŒ๋‹จ
- 200 OK โ†’ ์„ฑ๊ณต
- 4xx/5xx โ†’ ์‹คํŒจ (์—๋Ÿฌ ๋ฉ”์‹œ์ง€๋Š” `detail` ํ•„๋“œ์—)

## ์•„ํ‚คํ…์ฒ˜

### ์š”์ฒญ ํ๋ฆ„
1. `/api/extract-places`๊ฐ€ `contentId` + `snsUrl`์„ ๋ฐ›์Œ
2. ์š”์ฒญ์€ ์ฆ‰์‹œ ๋ฐ˜ํ™˜ (๋น„๋™๊ธฐ ์ฒ˜๋ฆฌ)
3. ๋ฐฑ๊ทธ๋ผ์šด๋“œ ํƒœ์Šคํฌ๊ฐ€ ์ถ”์ถœ ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰
4. ๊ฒฐ๊ณผ๋Š” ์ฝœ๋ฐฑ URL์„ ํ†ตํ•ด ๋ฐฑ์—”๋“œ๋กœ ์ „์†ก

### ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ๊ณ„ (workflow.py)
```
URL โ†’ sns_router โ†’ get_audio โ†’ get_transcription (STT) โ†’ get_video_narration โ†’ get_llm_response โ†’ callback
โ†“
Platform detection (YouTube/Instagram)
Content type detection (video/image)
Download media via yt-dlp
ํ”Œ๋žซํผ ๊ฐ์ง€ (YouTube/Instagram)
์ฝ˜ํ…์ธ  ํƒ€์ž… ๊ฐ์ง€ (๋น„๋””์˜ค/์ด๋ฏธ์ง€)
yt-dlp๋กœ ๋ฏธ๋””์–ด ๋‹ค์šด๋กœ๋“œ
```

### Key Components

**src/apis/**: FastAPI routers
- `place_router.py`: Main API endpoint for place extraction

**src/services/**: Business logic
- `workflow.py`: Main extraction pipeline orchestration
- `content_router.py`: Routes to appropriate downloader based on platform/content type
- `background_tasks.py`: Async task execution and callback handling
- `smb_service.py`: SMB file server integration

**src/services/modules/**: Processing modules
- `llm.py`: Gemini API integration for place extraction
- `stt.py`: Faster-Whisper speech-to-text

**src/services/preprocess/**: Media preprocessing
- `sns.py`: Instagram/YouTube content download (yt-dlp)
- `audio.py`: FFmpeg audio extraction
- `video.py`: Video frame extraction (OCR currently disabled)

**src/models/**: Pydantic schemas
- `ExtractionState`: TypedDict that flows through the pipeline, accumulating data at each stage

**src/core/**: Configuration and utilities
- `config.py`: Settings from .env (API keys, SMB config, etc.)
- `exceptions.py`: CustomError class for pipeline errors

### State Flow Pattern
The pipeline uses `ExtractionState` (TypedDict) as a mutable state object that gets passed through each processing stage. Each stage updates specific fields:
- `contentStream`/`imageStream`: Downloaded media
- `captionText`: Post caption/description
- `audioStream`: Extracted audio
- `transcriptionText`: STT output
- `ocrText`: Video text (currently disabled)
- `result`: Final extracted places

## Configuration

Required environment variables in `.env`:
- `GOOGLE_API_KEY`: Gemini API key
- `AI_SERVER_API_KEY`: API key for this service
- `YOUTUBE_API_KEY`: YouTube Data API key
- `INSTAGRAM_POST_DOC_ID`, `INSTAGRAM_APP_ID`: Instagram API config
- `BACKEND_CALLBACK_URL`, `BACKEND_API_KEY`: Callback endpoint config
- `SMB_*`: SMB file server settings (optional)

## Notes

- OCR functionality is currently disabled (noted with comments throughout)
- The service uses in-memory BytesIO streams for media processing
- Faster-Whisper runs on CPU with int8 quantization by default
- LLM responses are validated against Pydantic schemas using `response_json_schema`
### ์ฃผ์š” ์ปดํฌ๋„ŒํŠธ

**src/apis/**: FastAPI ๋ผ์šฐํ„ฐ
- `place_router.py`: ์žฅ์†Œ ์ถ”์ถœ API ๋ฉ”์ธ ์—”๋“œํฌ์ธํŠธ

**src/services/**: ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง
- `workflow.py`: ๋ฉ”์ธ ์ถ”์ถœ ํŒŒ์ดํ”„๋ผ์ธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜
- `content_router.py`: ํ”Œ๋žซํผ/์ฝ˜ํ…์ธ  ํƒ€์ž…์— ๋”ฐ๋ผ ์ ์ ˆํ•œ ๋‹ค์šด๋กœ๋”๋กœ ๋ผ์šฐํŒ…
- `background_tasks.py`: ๋น„๋™๊ธฐ ํƒœ์Šคํฌ ์‹คํ–‰ ๋ฐ ์ฝœ๋ฐฑ ์ฒ˜๋ฆฌ
- `smb_service.py`: SMB ํŒŒ์ผ ์„œ๋ฒ„ ์—ฐ๋™

**src/services/modules/**: ์ฒ˜๋ฆฌ ๋ชจ๋“ˆ
- `llm.py`: ์žฅ์†Œ ์ถ”์ถœ์„ ์œ„ํ•œ Gemini API ์—ฐ๋™
- `stt.py`: Faster-Whisper ์Œ์„ฑ-ํ…์ŠคํŠธ ๋ณ€ํ™˜

**src/services/preprocess/**: ๋ฏธ๋””์–ด ์ „์ฒ˜๋ฆฌ
- `sns.py`: Instagram/YouTube ์ฝ˜ํ…์ธ  ๋‹ค์šด๋กœ๋“œ (yt-dlp)
- `audio.py`: FFmpeg ์˜ค๋””์˜ค ์ถ”์ถœ
- `video.py`: ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ์ถ”์ถœ (OCR ํ˜„์žฌ ๋น„ํ™œ์„ฑํ™”)

**src/models/**: Pydantic ์Šคํ‚ค๋งˆ
- `ExtractionState`: ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ์ „๋‹ฌ๋˜๋ฉฐ ๊ฐ ๋‹จ๊ณ„์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถ•์ ๋˜๋Š” TypedDict

**src/core/**: ์„ค์ • ๋ฐ ์œ ํ‹ธ๋ฆฌํ‹ฐ
- `config.py`: .env์—์„œ ์„ค์ • ๋กœ๋“œ (API ํ‚ค, SMB ์„ค์ • ๋“ฑ)
- `exceptions.py`: ํŒŒ์ดํ”„๋ผ์ธ ์˜ค๋ฅ˜๋ฅผ ์œ„ํ•œ CustomError ํด๋ž˜์Šค

### ์ƒํƒœ ํ๋ฆ„ ํŒจํ„ด
ํŒŒ์ดํ”„๋ผ์ธ์€ `ExtractionState` (TypedDict)๋ฅผ ๊ฐ€๋ณ€ ์ƒํƒœ ๊ฐ์ฒด๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„์—์„œ ํŠน์ • ํ•„๋“œ๊ฐ€ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค:
- `contentStream`/`imageStream`: ๋‹ค์šด๋กœ๋“œ๋œ ๋ฏธ๋””์–ด
- `captionText`: ๊ฒŒ์‹œ๊ธ€ ์บก์…˜/์„ค๋ช…
- `audioStream`: ์ถ”์ถœ๋œ ์˜ค๋””์˜ค
- `transcriptionText`: STT ์ถœ๋ ฅ
- `ocrText`: ๋น„๋””์˜ค ํ…์ŠคํŠธ (ํ˜„์žฌ ๋น„ํ™œ์„ฑํ™”)
- `result`: ์ตœ์ข… ์ถ”์ถœ๋œ ์žฅ์†Œ๋“ค

## ์„ค์ •

`.env`์— ํ•„์š”ํ•œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜:
- `GOOGLE_API_KEY`: Gemini API ํ‚ค
- `AI_SERVER_API_KEY`: ์ด ์„œ๋น„์Šค์˜ API ํ‚ค
- `YOUTUBE_API_KEY`: YouTube Data API ํ‚ค
- `INSTAGRAM_POST_DOC_ID`, `INSTAGRAM_APP_ID`: Instagram API ์„ค์ •
- `BACKEND_CALLBACK_URL`, `BACKEND_API_KEY`: ์ฝœ๋ฐฑ ์—”๋“œํฌ์ธํŠธ ์„ค์ •
- `SMB_*`: SMB ํŒŒ์ผ ์„œ๋ฒ„ ์„ค์ • (์„ ํƒ์‚ฌํ•ญ)

## ์ฐธ๊ณ ์‚ฌํ•ญ

- OCR ๊ธฐ๋Šฅ์€ ํ˜„์žฌ ๋น„ํ™œ์„ฑํ™” ์ƒํƒœ (์ฝ”๋“œ ์ „๋ฐ˜์— ์ฃผ์„์œผ๋กœ ํ‘œ์‹œ๋จ)
- ์„œ๋น„์Šค๋Š” ๋ฏธ๋””์–ด ์ฒ˜๋ฆฌ์— ์ธ๋ฉ”๋ชจ๋ฆฌ BytesIO ์ŠคํŠธ๋ฆผ ์‚ฌ์šฉ
- Faster-Whisper๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ CPU์—์„œ int8 ์–‘์žํ™”๋กœ ์‹คํ–‰
- LLM ์‘๋‹ต์€ `response_json_schema`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Pydantic ์Šคํ‚ค๋งˆ๋กœ ๊ฒ€์ฆ
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# MapSee-AI

<!-- ์ˆ˜์ •ํ•˜์ง€๋งˆ์„ธ์š” ์ž๋™์œผ๋กœ ๋™๊ธฐํ™” ๋ฉ๋‹ˆ๋‹ค -->
## ์ตœ์‹  ๋ฒ„์ „ : v0.0.0
## ์ตœ์‹  ๋ฒ„์ „ : v0.0.4 (2026-01-11)

[์ „์ฒด ๋ฒ„์ „ ๊ธฐ๋ก ๋ณด๊ธฐ](CHANGELOG.md)

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ dependencies = [
"google-genai>=1.39.1",
"imagehash>=4.3.2",
"pillow>=12.0.0",
"playwright>=1.49.0",
"pydantic-settings>=2.11.0",
"uvicorn[standard]>=0.37.0",
"yt-dlp>=2025.10.22",
Expand Down
35 changes: 35 additions & 0 deletions src/apis/test_router.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""src.apis.test_router
ํ…Œ์ŠคํŠธ API ๋ผ์šฐํ„ฐ - SNS ์Šคํฌ๋ž˜ํ•‘ ํ…Œ์ŠคํŠธ์šฉ
"""
import logging
from fastapi import APIRouter
from pydantic import BaseModel

from src.services.scraper.scrape_router import route_and_scrape

logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/test", tags=["ํ…Œ์ŠคํŠธ API"])


class ScrapeRequest(BaseModel):
url: str


@router.post("/scrape", status_code=200)
async def scrape_url(request: ScrapeRequest):
"""
SNS URL์—์„œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ Playwright๋กœ ์Šคํฌ๋ž˜ํ•‘

- POST /api/test/scrape
- Body: {"url": "https://www.instagram.com/p/..."}
- ์„ฑ๊ณต: 200 + ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ
- ์‹คํŒจ: 4xx/5xx + ์—๋Ÿฌ ๋ฉ”์‹œ์ง€
"""
logger.info(f"์Šคํฌ๋ž˜ํ•‘ ์š”์ฒญ: {request.url}")
return await route_and_scrape(request.url)


@router.get("/health", status_code=200)
async def health_check():
"""์Šคํฌ๋ž˜ํ•‘ ํ…Œ์ŠคํŠธ API ์ƒํƒœ ํ™•์ธ"""
return {"status": "ok"}
2 changes: 2 additions & 0 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from fastapi import FastAPI, Request
from src.core.logging import setup_logging
from src.apis.place_router import router as place_router
from src.apis.test_router import router as test_router

# ๋กœ๊น… ์ดˆ๊ธฐํ™”
setup_logging(log_level="INFO")
Expand Down Expand Up @@ -45,6 +46,7 @@ async def lifespan(app: FastAPI):

# ๋ผ์šฐํ„ฐ ๋“ฑ๋ก
app.include_router(place_router)
app.include_router(test_router)


@app.middleware("http")
Expand Down
6 changes: 6 additions & 0 deletions src/services/scraper/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""src.services.scraper
SNS ์Šคํฌ๋ž˜ํ•‘ ์„œ๋น„์Šค ํŒจํ‚ค์ง€
"""
from src.services.scraper.scrape_router import route_and_scrape

__all__ = ["route_and_scrape"]
7 changes: 7 additions & 0 deletions src/services/scraper/platforms/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""src.services.scraper.platforms
ํ”Œ๋žซํผ๋ณ„ ์Šคํฌ๋ž˜ํผ ํŒจํ‚ค์ง€
"""
from src.services.scraper.platforms.instagram_scraper import InstagramScraper
from src.services.scraper.platforms.youtube_scraper import YouTubeScraper

__all__ = ["InstagramScraper", "YouTubeScraper"]
Loading