Sentio+ is an AI-powered decision-support platform that transforms large-scale, unstructured customer review data into actionable business insights using a Retrieval-Augmented Generation (RAG) architecture. It is designed as an internal intelligence tool for Product, CX, Strategy, and Leadership teams to understand why customers feel the way they do and what actions should be taken as a result.
Unlike traditional sentiment dashboards that stop at positive vs. negative classification, Sentio+ enables aspect-level reasoning over customer feedback, grounding every insight directly in real review evidence.
Most sentiment analysis systems answer what customers feel, but fail to explain:
-
Why customers are dissatisfied
-
Which product aspects (e.g., usability, payments, performance, pricing) are driving ratings
-
What teams should fix first to improve outcomes
Sentio+ addresses this gap by combining:
-
Structured sentiment signals (ratings, categories, segments)
-
Semantic retrieval over raw review text
-
LLM-based synthesis grounded in retrieved evidence
This allows teams to ask business-critical questions such as:
-
"What usability issues are driving 1-star reviews in Finance apps?"
-
"How have payment-related complaints evolved over the last 6 months?"
-
"What features do users in the 'Everyone' content segment value most?"
The system translates unstructured feedback into decision-ready insights that inform product prioritization, roadmap planning, and customer experience improvements.
Sentio+ follows a modular Retrieval-Augmented Generation (RAG) architecture designed for scalability, traceability, and business interpretability. The system is structured to ensure that every generated insight is grounded in real customer evidence and aligned with decision-making needs.
Data → Embeddings → Retrieval → LLM Reasoning → Business Insight
-
Evidence-first answers (no hallucinated insights)
-
Clear separation of preprocessing, retrieval, and generation
-
Metadata-aware retrieval for precise filtering and analysis
-
Python for ETL and preprocessing logic
-
Pandas / NumPy for data cleaning and transformation
-
Jupyter Notebooks (
/notebooks) for exploration, validation, and iterative development -
KaggleHub for dataset retrieval
-
ChromaDB for vector persistence and semantic search
-
Metadata-aware indexing (category, rating, date, segment)
-
Cosine similarity–based retrieval
-
AWS Bedrock LLMs for grounded text generation
-
Retrieval-Augmented Generation (RAG) with top-K semantic search
-
Two-step reasoning: retrieval → synthesis
-
Python RAG services for embedding, retrieval, and response assembly
-
API-based LLM invocation
-
Next.js web application (
/web) -
Chat-style interface for natural language queries
- Amazon S3 for raw and processed dataset storage
- Local execution for development with cloud-based model services
Sentio+ is powered by the Google Play Store Reviews dataset.
-
Source: Kaggle – Google Play Market Reviews
-
Scale: ~1M reviews across ~500 app titles (subset used during prototyping)
-
Sampling: Sampled 50,000 reviews
-
Key Fields:
-
Review text
-
Star rating (1–5)
-
App category
-
Review date
-
Content rating (Everyone, Teen, etc.)
-
Purpose: Enable fine-grained analysis of customer sentiment, feature requests, and recurring pain points across app categories.
Rather than relying on naive random sampling, Sentio+ implements a hybrid stratified sampling strategy to maximize signal quality.
-
Breadth (Coverage): Reviews are balanced across all categories (Finance, Social, Productivity, etc.) and ratings (1-5 stars)
-
Depth (Signal Quality) Within each category/rating bucket, you prioritize:
-
Long reviews (>150 characters) for detailed evidence
-
Helpful reviews (high helpful_count) for peer-vetted insights
-
Recent reviews (~60% from last 12 months) for current relevance
-
The resulting dataset treats each review as high-information testimony, avoiding low-signal noise such as one-word feedback ("Good app"). This dramatically improves downstream retrieval and LLM reasoning quality.
-
Merge review data (
apps_reviews.csv) with app metadata (apps_info.csv) -
Clean categories, filter for quality (length, helpfulness)
-
Create enriched text with context headers:
[APP: Google Wallet | CAT: Finance | RATING: 1/5 | DATE: 2024-09 | SEGMENT: Everyone]
USER REVIEW: The payment gateway keeps timing out.
- Load into ChromaDB with dual metadata (structured fields + enriched text)
-
User asks a natural language question
-
System performs semantic search in ChromaDB (with optional hard filters by category, date, rating)
-
Rerank results by helpfulness/recency
-
Retrieve top-K most relevant review chunks
-
LLM synthesizes insights from retrieved reviews
-
Response includes citations linking back to specific review_ids
-
UI displays original review excerpts as evidence
-
Users can drill down to full review context
-
Large-scale review ingestion via S3
-
Aspect-level sentiment reasoning
-
Metadata-aware semantic search
-
Evidence-grounded natural language insights
-
Trend detection across time and categories
-
Business-ready summaries for non-technical stakeholders
"Why are 1-star reviews increasing for Finance apps in the last 6 months?"
- Filters applied: category = Finance, rating = 1, date >= last 6 months
- Top-K reviews retrieved mentioning payment failures, login issues, and crashes
"Recent 1-star reviews in Finance apps are primarily driven by payment gateway timeouts and authentication failures following recent updates. Multiple users report being unable to complete transactions, leading to trust and reliability concerns."
- Review A (2024-09): "The payment gateway keeps timing out during checkout"
- Review B (2024-10): "After the update, I can't log in anymore"
This ensures every insight is explainable, auditable, and trusted.
- "What are the most common reasons for 1-star reviews in Finance apps?"
- "Which features are users requesting most in the last quarter?"
- "How do Teen-rated app complaints differ from Everyone-rated apps?"
- "What issues are driving churn-related feedback this year?"
- Identify top recurring bugs and UX pain points
- Prioritize features based on real customer impact
- Detect systemic issues across product lines
- Inform roadmap and investment decisions
- Understand root causes of negative sentiment
- Track shifts in customer perception over time
git clone https://github.com/Carlomos7/sentio-plus.git
This project requires Python 3.13 or higher and Docker.
While Sentio+ prioritizes decision support over raw accuracy metrics, quality is evaluated through:
- Retrieval relevance: Are returned reviews actually answering the question?
- Groundedness: Are all claims supported by cited evidence?
- Business usefulness: Do insights translate into clear action items?
- Consistency: Do similar queries yield stable themes over time?
Future work may introduce quantitative evaluation (e.g., retrieval precision@K, human-in-the-loop validation).
Sentio+ is intentionally designed as a consulting-grade internal analytics tool, not a consumer chatbot. Its primary value lies in converting raw customer feedback into strategic, explainable insights that organizations can act on with confidence.
- Product Prioritization: "What are the top 3 recurring bugs users want fixed in Finance apps?"
- Competitive Analysis: "How do user complaints about subscription pricing compare across categories?"
- Roadmap Planning: "What features are users requesting most in the last quarter?
- CX Improvements: "Why are 1-star reviews spiking for our top apps?"
- Audience Insights: "What do Teen-rated app users complain about vs Everyone-rated apps?"
This project was originally collaborated on in Google Colab by:

