Skip to content

Oskarowski/sentimentator

Repository files navigation

๐ŸŽฅ Sentimentator - Review Sentiment Analyzer

๐Ÿง  Overview

Sentimentator is a distributed, event-driven system that automatically analyzes video reviews of smartphones from YouTube, Instagram, and TikTok and produces a synthetic sentiment summary of opinions about device features (e.g., camera, battery, screen, performance).

๐ŸŒ High-Level Architecture

The system is composed of several microservices that communicate asynchronously via a message queue.

graph TB
    subgraph L1["Client Layer"]
        User[๐Ÿ‘ค User]
        Browser[๐ŸŒ Web Browser]
        User --> Browser
    end

    subgraph L2["Frontend Layer"]
        direction TB
            Frontend[Vue.js Frontend
            โ€ข URL Submission
            โ€ข Results Display]
    end

    subgraph L3["API Gateway Layer"]
        direction TB
        Orchestrator[C# Core Orchestrator
        โ€ข REST API
        โ€ข Job Management
        โ€ข Auth & Validation]
    end

    subgraph L4["Message Queue Layer"]
        direction TB
        RabbitMQ[๐Ÿฐ RabbitMQ<br/>Port: 5672<br/>Management: 15672<br/>โ€ข Event Routing<br/>โ€ข Async Communication]
    end

    subgraph L5["Processing Services"]
        VideoProc[Go Video Processor<br/>โ€ข Multi-Platform Extraction<br/>โ€ขTranscript Generation<br/>โ€ขDLQ Error Handling<br/>โ€ข Admin Web UI]
        NLP[Python NLP Analyzer<br/>โ€ข Feature Detection<br/>โ€ข Sentiment Analysis]
    end

    subgraph L6["External APIs"]
        YouTube[๐Ÿ“บ YouTube]
        Instagram[๐Ÿ“บ Instagram]
        TikTok[๐Ÿ“บ TikTok]
        Whisper[๐ŸŽ™๏ธ OpenAI Whisper API<br/>Speech-to-Text]
    end

    subgraph L7["Data Layer"]
        OrchestratorDB[(๐Ÿ—„๏ธ PostgreSQL<br/>orchestrator DB<br/>โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€<br/>โ€ข Analysis<br/>โ€ข User & Identity)]
        VideoProcDB[(๐Ÿ—„๏ธ PostgreSQL<br/>video_processing DB<br/>โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€<br/>โ€ข Content Metadata<br/>โ€ข Transcripts)]
    end

    Browser --> Frontend
    Frontend <-->|REST API| Orchestrator

    Orchestrator <-->|EF Core| OrchestratorDB
    Orchestrator -->|Publish: video.transcript.requested| RabbitMQ
    RabbitMQ -->|Consume: video.analysis.completed| Orchestrator

    RabbitMQ -->|Consume: video.transcript.requested| VideoProc
    VideoProc <-->|GORM| VideoProcDB
    VideoProc -->|Extract| YouTube
    VideoProc -->|Extract| Instagram
    VideoProc -->|Extract| TikTok
    VideoProc -->|Transcribe| Whisper
    VideoProc -->|Publish: video.transcript.completed| RabbitMQ
    VideoProc -->|DLQ: Failed Messages| VideoProcDB

    RabbitMQ -->|Consume: video.transcript.completed| NLP
    NLP -->|Publish: video.analysis.completed| RabbitMQ

    style Frontend fill:#42b883
    style Orchestrator fill:#9b42f5
    style VideoProc fill:#00add8
    style NLP fill:#3776ab
    style RabbitMQ fill:#ff6600
    style OrchestratorDB fill:#336791
    style VideoProcDB fill:#336791
Loading

๐Ÿ“ˆ System Flow Overview

This sequence diagram illustrates the detailed, step-by-step flow of a request through the system.

sequenceDiagram
    participant User
    participant Frontend as Vue.js Frontend
    participant Orchestrator as Core Orchestrator (C#)
    participant Exchange as RabbitMQ Exchange<br/>(sentimentator.topic)
    participant Q1 as Queue: video.processing
    participant VideoProcessor as Video Processor (Go)
    participant YouTube
    participant WhisperAPI as OpenAI Whisper API
    participant Q2 as Queue: video.transcript
    participant NlpAnalyzer as NLP Analyzer (Python)
    participant Q3 as Queue: video.analysis
    participant Database as PostgreSQL

    User->>Frontend: Paste YouTube URL
    Frontend->>Orchestrator: POST /analyses/{url}
    Orchestrator->>Database: Create Analysis record
    Orchestrator->>Exchange: Publish video.transcript.requested
    Exchange->>Q1: Route to video.processing
    Q1->>VideoProcessor: Consume message

    VideoProcessor->>YouTube: Fetch metadata + download audio via yt-dlp
    YouTube-->>VideoProcessor: Metadata + audio stream/file
    VideoProcessor->>Exchange: Publish video.metadata.completed
    Exchange->>Q4: Route to video.metadata
    Q4->>Orchestrator: Consume message
    Orchestrator->>Database: Update analysis with video metadata
    VideoProcessor->>WhisperAPI: Request transcription
    WhisperAPI-->>VideoProcessor: Return transcript text
    VideoProcessor->>Database: Store video & transcript
    VideoProcessor->>Exchange: Publish video.transcript.completed
    Exchange->>Q2: Route to video.transcript

    Q2->>NlpAnalyzer: Consume message
    NlpAnalyzer->>Database: Fetch transcript
    NlpAnalyzer->>NlpAnalyzer: Perform NLP + Sentiment Analysis
    NlpAnalyzer->>Exchange: Publish video.analysis.completed
    Exchange->>Q3: Route to video.analysis

    Q3->>Orchestrator: Consume message
    Orchestrator->>Database: Update analysis with results

    User->>Frontend: Poll for results
    Frontend->>Orchestrator: GET /analyses/{id}
    Orchestrator->>Database: Query results
    Database-->>Orchestrator: Return results
    Orchestrator-->>Frontend: Return analysis
    Frontend-->>User: Display sentiment results
Loading

๐Ÿ“จ RabbitMQ Event Architecture

This diagram shows the detailed RabbitMQ topology with exchanges, queues, routing keys, and event flows:

graph TB
    subgraph "Publishers"
        Orchestrator[C# Core Orchestrator]
        VideoProc[Go Video Processor]
        NLP[Python NLP Analyzer]
    end

    subgraph "RabbitMQ Broker"
        Exchange[๐Ÿ“ฎ Topic Exchange<br/>`sentimentator.topic`]

        subgraph "Queues"
            Q1[๐Ÿ“ฆ video.processing<br/>Bound: video.transcript.requested]
            Q2[๐Ÿ“ฆ video.transcript<br/>Bound: video.transcript.completed]
            Q3[๐Ÿ“ฆ video.analysis<br/>Bound: video.analysis.completed]
            Q4[๐Ÿ“ฆ video.metadata<br/>Bound: video.metadata.completed]
        end
    end

    subgraph "Consumers"
        VideoProcConsumer[Go Video Processor]
        NLPConsumer[Python NLP Analyzer]
        OrchestratorConsumer[C# Core Orchestrator]
    end

    Orchestrator -->|"Publish<br/>video.transcript.requested"| Exchange
    Exchange -->|"Route by key"| Q1
    Q1 --> VideoProcConsumer

    VideoProc -->|"Publish<br/>video.transcript.completed"| Exchange
    Exchange -->|"Route by key"| Q2
    Q2 --> NLPConsumer

    NLP -->|"Publish<br/>video.analysis.completed"| Exchange
    Exchange -->|"Route by key"| Q3
    Q3 --> OrchestratorConsumer

    VideoProc -->|"Publish<br/>video.metadata.completed"| Exchange
    Exchange -->|"Route by key"| Q4
    Q4 --> OrchestratorConsumer

    style Exchange fill:#ff6600,color:#fff
    style Q1 fill:#00add8,color:#fff
    style Q2 fill:#3776ab,color:#fff
    style Q3 fill:#9b42f5,color:#fff
    style Orchestrator fill:#9b42f5,color:#fff
    style VideoProc fill:#00add8,color:#fff
    style VideoProcConsumer fill:#00add8,color:#fff
    style NLP fill:#3776ab,color:#fff
    style NLPConsumer fill:#3776ab,color:#fff
    style OrchestratorConsumer fill:#9b42f5,color:#fff
Loading

Event Details

Event Name Routing Key Publisher Queue Consumer Payload
Transcript Requested video.transcript.requested C# Orchestrator video.processing Go Video Processor {analysis_id, url, timestamp, open_ai_user_key}
Transcript Completed video.transcript.completed Go Video Processor video.transcript Python NLP Analyzer {analysis_id, video_id, transcript_id, text, language, duration, created_at}
Transcript Failed video.transcript.failed Go Video Processor N/A (DLQ) C# Orchestrator {analysis_id, error, retry_count, failed_at}
Analysis Completed video.analysis.completed Python NLP Analyzer video.analysis C# Orchestrator ???
Metadata Completed video.metadata.completed Go Video Processor video.metadata C# Orchestrator {analysis_id, platform, title, thumbnail_url}

๐Ÿ”„ Data Flow Diagram

This diagram illustrates the data transformation pipeline:

flowchart LR
    A[๐Ÿ“น YouTube/Instagram/TikTok URL] --> B[๐ŸŽต Audio Extraction]
    B --> C[๐Ÿ“ Raw Transcript]
    C --> D[๐Ÿ” Feature Detection]
    D --> E[๐Ÿ’ญ Sentiment Analysis]
    E --> F[๐Ÿ“Š Structured Results]

    subgraph "Video Processing Service"
        B
        C
    end

    subgraph "NLP Analyzer Service"
        D
        E
    end

    subgraph "Core Orchestrator"
        F
    end

    F --> G[(PostgreSQL)]
    F --> H[๐Ÿ“ฑ Frontend Display]

    style A fill:#ff0000
    style C fill:#ffeb3b
    style F fill:#4caf50
    style G fill:#336791
    style H fill:#42b883
Loading

๐Ÿ“Š Technology Stack Overview

Layer Technologies
Frontend Vue.js 3, Vite, Tailwind CSS v4, Pinia, OpenAPI/fetch
Backend API C#, ASP.NET Core, Entity Framework Core
Video Processing Go 1.25+, Gin, Templ, HTMX, GORM, yt-dlp, FFmpeg, OpenAI Whisper API, faster-whisper-server
NLP & ML Python ???
Message Queue RabbitMQ 4.1.4 with Management Plugin, Dead Letter Exchange (DLX)
Databases PostgreSQL 18
DevOps Docker, Docker Compose (CPU/GPU profiles)
External APIs YouTube, Instagram & TikTok (via yt-dlp), OpenAI Whisper API

๐Ÿค Contributing

Each service has its own README with detailed information:

About

Sentimentator is an event-driven microservices platform that automatically extracts and analyzes smartphone video reviews from platforms like YouTube, Instagram, and TikTok. It leverages tools like yt-dlp for media extraction and the OpenAI Whisper API to generate transcripts of the video content.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors