Skip to content

๐Ÿ“„ [๋ฌธ์„œ] Playwright์„ ์‚ฌ์šฉํ•œ ์Šคํฌ๋ž˜ํ•‘์— ๋Œ€ํ•œ ์ธ์Šคํƒ€๊ทธ๋žจ Block ์˜ํ–ฅ๋„ ๋ถ„์„ย #6

@Cassiiopeia

Description

@Cassiiopeia

Instagram Playwright ์Šคํฌ๋ž˜ํ•‘ ์ฐจ๋‹จ ๊ฐ€๋Šฅ์„ฑ ๋ถ„์„

๐Ÿ“Œ ๋ถ„์„ ๊ฐœ์š”

SociaVault ๋ธ”๋กœ๊ทธ ๊ธ€("How to Scrape Instagram Without Getting Blocked")์—์„œ ์ฃผ์žฅํ•˜๋Š” Instagram ์Šคํฌ๋ž˜ํ•‘ ์ฐจ๋‹จ ์œ„ํ—˜์„ฑ์— ๋Œ€ํ•œ ๋ถ„์„ ๋ฐ MapSee-AI ํ”„๋กœ์ ํŠธ์— ๋Œ€ํ•œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ ๊ฒ€ํ† 

๋ถ„์„ ์ผ์ž: 2026-01-11
๊ด€๋ จ ํ”„๋กœ์ ํŠธ: MapSee-AI (Instagram ๊ฒŒ์‹œ๊ธ€ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ถ”์ถœ)


๐Ÿ” SociaVault ๋ธ”๋กœ๊ทธ ๊ธ€ ๋ถ„์„

๊ธ€์˜ ์„ฑ๊ฒฉ

  • ๋งˆ์ผ€ํŒ… ์ฝ˜ํ…์ธ : ์ž์‚ฌ API ์„œ๋น„์Šค(SociaVault) ํŒ๋งค ๋ชฉ์ 
  • ํƒ€๊ฒŸ ๋…์ž: ๋Œ€๋Ÿ‰ ์Šคํฌ๋ž˜ํ•‘์ด ํ•„์š”ํ•œ ์‚ฌ์šฉ์ž
  • ํŽธํ–ฅ์ : DIY ์Šคํฌ๋ž˜ํ•‘์˜ ์œ„ํ—˜์„ฑ์„ ๊ณผ์žฅํ•˜์—ฌ API ์‚ฌ์šฉ ์œ ๋„

์ฃผ์žฅํ•˜๋Š” ์ฐจ๋‹จ ์‚ฌ์œ 

์ฐจ๋‹จ ๋ฐฉ์‹ ์„ค๋ช…
Rate limiting ๊ณผ๋„ํ•œ ์š”์ฒญ ์‹œ 429 ์—๋Ÿฌ
Fingerprinting ๋ธŒ๋ผ์šฐ์ €/๊ธฐ๊ธฐ ๊ฐ์ง€
Behavior analysis ๋น„์ธ๊ฐ„์  ํŒจํ„ด ๊ฐ์ง€
IP reputation ๋ฐ์ดํ„ฐ์„ผํ„ฐ IP ์ฐจ๋‹จ
Session validation ๋กœ๊ทธ์ธ ์ƒํƒœ ๊ฒ€์ฆ

๊ธ€์˜ ๋ฌธ์ œ์ 

  1. ๋งฅ๋ฝ์˜ ์ฐจ์ด ๋ฌด์‹œ

    • ๊ธ€์€ ํ”„๋กœํ•„ ์ „์ฒด ํฌ๋กค๋ง, ๋Œ€๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๊ฐ€์ •
    • ๋‹จ์ผ ๊ฒŒ์‹œ๊ธ€ URL ์ฒ˜๋ฆฌ์™€๋Š” ์™„์ „ํžˆ ๋‹ค๋ฅธ ์ƒํ™ฉ
  2. ๋น„์šฉ ๋น„๊ต์˜ ํŽธํ–ฅ

    • DIY: ์ตœ๋Œ€ ๋น„์šฉ ์ œ์‹œ ($200-500/์›”)
    • API: ์ตœ์†Œ ๋น„์šฉ ์ œ์‹œ ($49-199/์›”)
  3. ์„ฑ๊ณต๋ฅ  ๊ณผ์žฅ

    • DIY ์‹ ๋ขฐ์„ฑ 20-70%๋กœ ๋‚ฎ๊ฒŒ ์ œ์‹œ
    • API ์‹ ๋ขฐ์„ฑ 99%+๋กœ ๋†’๊ฒŒ ์ œ์‹œ
    • ์‹ค์ œ๋กœ๋Š” ์‚ฌ์šฉ ํŒจํ„ด์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋‹ค๋ฆ„

๐ŸŽฏ MapSee-AI ํ”„๋กœ์ ํŠธ ํ˜„ํ™ฉ

ํ˜„์žฌ ๊ตฌํ˜„ ์ƒํƒœ

๊ตฌํ˜„ ๋ฐฉ์‹ ํŒŒ์ผ ์œ„์น˜ ์ƒํƒœ ์šฉ๋„
yt-dlp src/services/preprocess/sns.py ํ™œ์„ฑ ๋ฉ”์ธ ํŒŒ์ดํ”„๋ผ์ธ
Playwright src/services/scraper/ ๋ฏธ์™„์„ฑ ํ…Œ์ŠคํŠธ API

Playwright ๊ตฌํ˜„ (InstagramScraper) ๋™์ž‘ ๋ฐฉ์‹

URL ์ž…๋ ฅ โ†’ Playwright ๋ธŒ๋ผ์šฐ์ € ์‹คํ–‰ โ†’ ํŽ˜์ด์ง€ ๋กœ๋“œ โ†’ og ๋ฉ”ํƒ€ ํƒœ๊ทธ ์ถ”์ถœ โ†’ ์ด๋ฏธ์ง€ URL ์ถ”์ถœ

์ถ”์ถœ ๋ฐ์ดํ„ฐ:

  • og:title, og:description, og:image, og:url (๋ฉ”ํƒ€ ํƒœ๊ทธ)
  • ์ž‘์„ฑ์ž, ์บก์…˜, ์ข‹์•„์š” ์ˆ˜, ๋Œ“๊ธ€ ์ˆ˜, ํ•ด์‹œํƒœ๊ทธ (์ •๊ทœ์‹ ํŒŒ์‹ฑ)
  • CDN ์ด๋ฏธ์ง€ URL (DOM ์ฟผ๋ฆฌ)

์‚ฌ์šฉ ํŒจํ„ด ํŠน์„ฑ

  • โœ… Public ๊ฒŒ์‹œ๊ธ€๋งŒ ์ ‘๊ทผ (๋กœ๊ทธ์ธ ๋ถˆํ•„์š”)
  • โœ… ์‚ฌ์šฉ์ž๊ฐ€ URL ์ œ๊ณตํ•  ๋•Œ๋งŒ ์ฒ˜๋ฆฌ (๋Œ€๋Ÿ‰ ์š”์ฒญ ์•„๋‹˜)
  • โœ… og ๋ฉ”ํƒ€ ํƒœ๊ทธ๋Š” Instagram์ด ์˜๋„์ ์œผ๋กœ ๋…ธ์ถœํ•˜๋Š” ๊ณต๊ฐœ ์ •๋ณด
  • โœ… ๋‹จ์ผ ๊ฒŒ์‹œ๊ธ€ ๋‹จ์œ„ ์ฒ˜๋ฆฌ

๐Ÿ“Š ์ฐจ๋‹จ ๊ฐ€๋Šฅ์„ฑ ํ‰๊ฐ€

์œ„ํ—˜๋„ ๋งคํŠธ๋ฆญ์Šค

ํ‰๊ฐ€ ํ•ญ๋ชฉ ํ˜„์žฌ ๊ตฌํ˜„ ์œ„ํ—˜๋„
์ ‘๊ทผ ๋ฐฉ์‹ Public ๊ฒŒ์‹œ๊ธ€ ๋‹จ๊ฑด ๐ŸŸข ๋‚ฎ์Œ
๋ฐ์ดํ„ฐ ์ถ”์ถœ og ๋ฉ”ํƒ€ ํƒœ๊ทธ ๐ŸŸข ๋งค์šฐ ๋‚ฎ์Œ
์š”์ฒญ ๋นˆ๋„ ์‚ฌ์šฉ์ž ์š”์ฒญ ์‹œ์—๋งŒ ๐ŸŸข ๋‚ฎ์Œ
๋ธŒ๋ผ์šฐ์ € ๊ฐ์ง€ headless=True (๊ธฐ๋ณธ) ๐ŸŸก ์ค‘๊ฐ„
๋กœ๊ทธ์ธ ํ•„์š” ๋ถˆํ•„์š” ๐ŸŸข ๋‚ฎ์Œ

์ข…ํ•ฉ ํ‰๊ฐ€: ์ฐจ๋‹จ ๊ฐ€๋Šฅ์„ฑ ๋‚ฎ์Œ

์ด์œ :

  1. og ๋ฉ”ํƒ€ ํƒœ๊ทธ์˜ ์„ฑ๊ฒฉ

    • Facebook/Twitter ๊ณต์œ ์šฉ์œผ๋กœ Instagram์ด ์˜๋„์ ์œผ๋กœ ์ œ๊ณต
    • ๋งํฌ ๋ฏธ๋ฆฌ๋ณด๊ธฐ(Link Preview) ์ˆ˜์ค€์˜ ์ ‘๊ทผ
    • ์ด๋ฅผ ์ฝ๋Š” ๊ฒƒ์€ "์Šคํฌ๋ž˜ํ•‘"์ด๋ผ๊ธฐ๋ณด๋‹ค "๊ณต๊ฐœ ์ •๋ณด ์กฐํšŒ"
  2. ์š”์ฒญ ํŒจํ„ด

    • ๋ถ„๋‹น ์ˆ˜์‹ญ ๊ฐœ๊ฐ€ ์•„๋‹Œ ์‚ฐ๋ฐœ์  ์š”์ฒญ
    • ์‚ฌ์šฉ์ž ํ–‰๋™ ๊ธฐ๋ฐ˜ (์ž๋™ํ™” ๋Œ€๋Ÿ‰ ์ˆ˜์ง‘ ์•„๋‹˜)
  3. ๋กœ๊ทธ์ธ ๋ถˆํ•„์š”

    • Private ์ฝ˜ํ…์ธ  ์ ‘๊ทผ ์‹œ๋„ ์—†์Œ
    • ๊ณ„์ • ๊ธฐ๋ฐ˜ ์ถ”์  ๋ถˆ๊ฐ€

โš ๏ธ ์ฐจ๋‹จ๋  ์ˆ˜ ์žˆ๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค

๋‹ค์Œ ์ƒํ™ฉ์—์„œ๋Š” ์ฐจ๋‹จ ์œ„ํ—˜ ์ฆ๊ฐ€:

์‹œ๋‚˜๋ฆฌ์˜ค ์œ„ํ—˜๋„ ๋Œ€์‘ ๋ฐฉ์•ˆ
๋ถ„๋‹น 50+ ์š”์ฒญ ๐Ÿ”ด ๋†’์Œ Rate limiting ๊ตฌํ˜„
๋™์ผ IP์—์„œ ์ง€์†์  ์š”์ฒญ ๐ŸŸก ์ค‘๊ฐ„ ์š”์ฒญ ๊ฐ„ ๋žœ๋ค delay
Headless ๋ธŒ๋ผ์šฐ์ € ๊ฐ์ง€ ๐ŸŸก ์ค‘๊ฐ„ Stealth mode ์ ์šฉ
๋ฐ์ดํ„ฐ์„ผํ„ฐ IP ์‚ฌ์šฉ ๐Ÿ”ด ๋†’์Œ Residential proxy ์‚ฌ์šฉ

๐Ÿ”ง ๊ถŒ์žฅ ์‚ฌํ•ญ

1. ํ˜„์žฌ ๊ตฌํ˜„์œผ๋กœ ๋จผ์ € ํ…Œ์ŠคํŠธ

# ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•
# 1. ์„œ๋ฒ„ ์‹คํ–‰
uv run uvicorn src.main:app --host 0.0.0.0 --port 8001 --reload

# 2. ํ…Œ์ŠคํŠธ API ํ˜ธ์ถœ (10-20๊ฐœ ๊ฒŒ์‹œ๊ธ€)
curl -X POST http://localhost:8001/api/test/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.instagram.com/p/XXXXX/"}'

ํ…Œ์ŠคํŠธ ํฌ์ธํŠธ:

  • ์—ฐ์† ์š”์ฒญ ์‹œ 429 ์—๋Ÿฌ ๋ฐœ์ƒ ์—ฌ๋ถ€
  • ์‘๋‹ต ๋ฐ์ดํ„ฐ ์ •์ƒ ์ถ”์ถœ ์—ฌ๋ถ€
  • ํŽ˜์ด์ง€ ๋กœ๋“œ ์‹คํŒจ ๋นˆ๋„

2. ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ Anti-detection ์ถ”๊ฐ€

ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋งŒ ๋‹ค์Œ ๋กœ์ง ์ถ”๊ฐ€:

# 1. ๋žœ๋ค delay ์ถ”๊ฐ€
import random
import asyncio

async def scrape_with_delay(url):
    await asyncio.sleep(random.uniform(2, 5))  # 2-5์ดˆ ๋Œ€๊ธฐ
    return await scrape(url)

# 2. Stealth mode ์ ์šฉ (playwright-stealth)
from playwright_stealth import stealth_sync

browser = playwright.chromium.launch(headless=True)
page = browser.new_page()
stealth_sync(page)

# 3. ํ˜„์‹ค์ ์ธ User-Agent
context = browser.new_context(
    user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ..."
)

3. ๋Œ€์•ˆ ๊ณ ๋ ค (๋ฌธ์ œ ์ง€์† ์‹œ)

๋Œ€์•ˆ ์žฅ์  ๋‹จ์ 
yt-dlp ์œ ์ง€ ์ด๋ฏธ ๊ตฌํ˜„๋จ, ์•ˆ์ •์  ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ œํ•œ์ 
Instagram Basic Display API ๊ณต์‹ API ์‚ฌ์šฉ์ž ์ธ์ฆ ํ•„์š”
oEmbed API ๊ณต์‹, ๋ฌด๋ฃŒ ์ œํ•œ๋œ ๋ฐ์ดํ„ฐ

๐Ÿ“Œ ๊ฒฐ๋ก 

SociaVault ๊ธ€์— ๋Œ€ํ•œ ํ‰๊ฐ€

  • ๋Œ€๋Ÿ‰ ์Šคํฌ๋ž˜ํ•‘ ์ƒํ™ฉ์—์„œ๋Š” ์œ ํšจํ•œ ์ •๋ณด
  • ๋‹จ์ผ ๊ฒŒ์‹œ๊ธ€ ์ฒ˜๋ฆฌ์—๋Š” ๊ณผ์žฅ๋œ ์œ„ํ—˜ ์ œ์‹œ
  • ๋งˆ์ผ€ํŒ… ๋ชฉ์ ์˜ ํŽธํ–ฅ๋œ ์ฝ˜ํ…์ธ 

MapSee-AI์— ๋Œ€ํ•œ ๊ถŒ์žฅ

  1. ๋จผ์ € ํ…Œ์ŠคํŠธ - ํ˜„์žฌ ๊ตฌํ˜„์œผ๋กœ 10-20๊ฐœ ๊ฒŒ์‹œ๊ธ€ ํ…Œ์ŠคํŠธ
  2. ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ๋Œ€์‘ - Anti-detection ๋กœ์ง ์ถ”๊ฐ€
  3. ๊ณผ๋„ํ•œ ์„ ์ œ ๋Œ€์‘ ๋ถˆํ•„์š” - ์‚ฌ์šฉ ํŒจํ„ด์ƒ ์ฐจ๋‹จ ๊ฐ€๋Šฅ์„ฑ ๋‚ฎ์Œ

ํ•ต์‹ฌ ์š”์•ฝ

Instagram์˜ og ๋ฉ”ํƒ€ ํƒœ๊ทธ๋ฅผ ์ฝ๋Š” ๊ฒƒ์€ "์Šคํฌ๋ž˜ํ•‘"์ด๋ผ๊ธฐ๋ณด๋‹ค "๋งํฌ ๋ฏธ๋ฆฌ๋ณด๊ธฐ" ์ˆ˜์ค€์ด๋‹ค.
๋‹จ์ผ ๊ฒŒ์‹œ๊ธ€ URL ์ฒ˜๋ฆฌ ์ˆ˜์ค€์—์„œ๋Š” ์ฐจ๋‹จ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋‚ฎ์œผ๋ฉฐ,
์‹ค์ œ๋กœ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ๊ทธ๋•Œ ๋Œ€์‘ํ•˜๋Š” ๊ฒƒ์ด ํšจ์œจ์ ์ด๋‹ค.


์ž‘์„ฑ์ž: Claude (AI Assistant)
์ฐธ๊ณ  ์ž๋ฃŒ: SociaVault Blog - "How to Scrape Instagram Without Getting Blocked (2025 Guide)"

Metadata

Metadata

Assignees

Labels

๋ฌธ์„œ๋ฌธ์„œ ์ž‘์—… ๊ด€๋ จ์ž‘์—… ์™„๋ฃŒ์ž‘์—… ์™„๋ฃŒ ์ƒํƒœ์ธ ๊ฒฝ์šฐ (์ด์Šˆ ํ์‡„)

Type

No type

Projects

Status

๋ฌธ์„œ

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions