Skip to content

Divyansh-git10/Reddit-Persona-Generator

Repository files navigation

Reddit Persona Generator

Built for the AI/LLM Engineer Internship Assignment @ BeyondChats

A fully local, end-to-end pipeline to transform real Reddit activity into structured user personas — using Reddit API, Python, and the Mistral LLM running via Ollama.


📖 Project Description

Imagine being able to understand any user deeply — just by reading how they talk online.

Reddit Persona Generator does exactly that.
It takes a Reddit profile, scrapes public posts/comments, and uses a local LLM to generate a structured persona — complete with:

  • Behavior patterns
  • Frustrations
  • Goals
  • Motivations
  • Tone
  • Interests

…and even citations for every insight from real Reddit activity.


💡 Why This Project Matters

LLMs are powerful — but without real-world grounding, they're just words.
This project shows how LLMs can extract genuine human insight from everyday conversations online — for use cases like:

  • Personalized marketing
  • User segmentation
  • Behavioral analysis
  • Agent customization

And the best part? It runs entirely offline, using Mistral with Ollama.


Key Features

  • Full Reddit post + comment scraping via praw
  • Structured personas with 6 rich categories
  • Citations for every insight (Reddit URLs)
  • LLM-based generation using Mistral via Ollama (local)
  • Invalid profile handling (e.g. 404s)
  • Support for multiple personas per user (prompt depth demo)
  • PEP8-compliant, modular Python code
  • No OpenAI or cloud dependency

🔧 Setup Instructions

1. Clone the Repo

git clone https://github.com/Divyansh-git10/Reddit-Persona-Generator.git
cd Reddit-Persona-Generator

2. Install Dependencies

pip install -r requirements.txt

3. Add .env file with Reddit credentials

CLIENT_ID=your_client_id
CLIENT_SECRET=your_client_secret
USER_AGENT=your_user_agent

How to create Reddit API keys

4. Start Ollama and pull the Mistral model

ollama run mistral

5. Run the Script

Update username = "..." in main.py and run:

python main.py

Output will be saved in /output/username_persona.txt


Processed Reddit Users

Reddit Username Persona Type
kojied Urban, reflective tech worker
Hungry-Move-6603 Civic voice, realism, political concern
St0rmCh4ser Tech & crypto guide
drywallwizard DIY expert, knowledge sharer
TalesOfTheLost (x2) Spiritual / WWII historian
GallowBoob Meme king, pet lover, humorous content creator

Multiple personas for the same user (TalesOfTheLost) demonstrate prompt adaptability and persona diversity.


⚠️ Invalid Profile Handling

The script gracefully exits if a profile doesn't exist or has no data:

Error fetching data for MasterOfNone_92: received 404 HTTP response
No data found. Skipping persona generation.

File Structure

Reddit-Persona-Generator/
├── main.py
├── reddit_scraper.py
├── persona_generator.py
├── requirements.txt
├── README.md
├── .env                   # (excluded)
├── raw/
│   ├── kojied_raw.txt
│   ├── ...
├── output/
│   ├── kojied_persona.txt
│   ├── TalesOfTheLost_spiritual.txt
│   ├── GallowBoob_persona.txt
│   └── ...

🤖 Tech Stack

  • Python 3.11+
  • praw for Reddit scraping
  • ollama for local model runtime
  • Mistral (7B open-weight LLM)
  • .env config via python-dotenv

🙌 Author

Divyansh Gautam Internship Applicant @ BeyondChats GitHub


📌 Submission Context

This project was built in under 48 hours as part of the BeyondChats Generative AI Internship Assignment.

The goal was to demonstrate:

  • Real-world LLM application
  • Prompt design intuition
  • Local-first inference
  • Strong persona modeling

Thanks for the opportunity 🙏

About

Extract Reddit user personas using LLMs, with citations and behavior insights — powered by Python, PRAW, and Ollama (Mistral).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages