InsightLens is a dual-pipeline deep learning tool designed to audit social media content. It analyzes image aesthetics and caption engagement potential to provide creators with a pre-publish quality check.
- Visual Audit: specific analysis of image technical quality (sharpness, exposure, composition) using MobileNetV2.
- Caption Audit: specific analysis of text engagement potential using a Bidirectional LSTM.
- Full-Stack Deployment: Served via FastAPI with a vanilla JS/CSS frontend for zero-dependency usage.
InsightLens does not use a multimodal approach (like CLIP). Instead, it uses a decoupled architecture to evaluate visual and textual signals independently.
- Model: MobileNetV2 (Transfer Learning).
- Training Data: KonIQ-10k (10,000 images with human quality ratings).
- Target: Regresses a Mean Opinion Score (MOS) normalized to 0-100.
- Focus: It judges technical aesthetics, not content context. (e.g., A high-quality photo of a wall scores higher than a blurry photo of a celebrity).
- Model: Embedding Layer + Bidirectional LSTM.
- Training Data: Twitter Engagement Dataset (~120k tweets).
- Target: Classification (Low/Average/High Engagement) based on 'Like' counts.
- Focus: It judges lexical patterns associated with high engagement, not semantic coherence with the image.
- No Contextual Awareness: The system does not know if the caption matches the image. A picture of a cat with the caption "Nice car" will be scored purely on the photo quality of the cat and the engagement history of the words "Nice car".
- Twitter Bias: The text model is trained on Twitter data. It prioritizes "Like-getting" patterns (short, punchy text) over "Comment-getting" patterns (questions).
- Subjectivity: "Quality" is subjective. The Image model predicts the average human opinion (MOS), which may not align with specific artistic styles.
Contains: 10,073 natural images Human-rated aesthetic/quality scores (MOS) Large diversity in lighting, noise, composition Purpose in this project: Train a model to estimate aesthetic quality in a way that aligns with human judgment.
Contains: Captions/tweets Associated engagement metrics (likes, retweets, replies) Used to derive: Low, Medium, or High engagement classes based on percentile thresholds. Purpose in this project: Teach the model to understand textual patterns that correlate with user engagement.
- Python 3.8+
- TensorFlow 2.x
# Clone the repository
git clone [https://github.com/Mugilan1309/insightlens.git](https://github.com/Mugilan1309/insightlens.git)
cd InsightLens
# Install dependencies
pip install -r requirements.txt
- Run the Application We use FastAPI to serve the models.
python src/api.py
Open your browser to http://localhost:8000.
📂 Project Structure
InsightLens/
├── models/ # Pre-trained .h5 artifacts
├── src/
│ ├── api.py # FastAPI Backend
│ ├── train_image_model.py # Vision Training Pipeline
│ └── train_text_model.py # NLP Training Pipeline
├── public/ # Frontend (HTML/JS/CSS)
└── README.md
📊 Performance Vision Model: MAE ~10.6 (on 0-100 scale).
Text Model: Accuracy ~56% (vs 33% random baseline) on 3-class engagement binning.
NOTE: InsightLens currently evaluates visual aesthetic quality and textual engagement potential as separate modules. In future work, these could be fused for deeper content quality analysis.
Mugilan Y