Industrial RAG System for Machine Maintenance Analysis

Overview

This project implements a Retrieval-Augmented Generation (RAG) system designed to analyze structured industrial data such as machine maintenance logs, incident reports, production reports, and SOPs (Standard Operating Procedures).

Unlike generic chatbot-style RAG demos, this system focuses on data correctness, auditability, and retrieval reliability. The design prioritizes retrieval validation before generation and avoids unnecessary complexity.

Key Features

Structured Data Ingestion from JSONL files
Metadata-Preserving Vector Storage using ChromaDB
Retrieval-First Design (retrieval verified independently)
Strict Hallucination Control via prompt constraints
Deterministic & Auditable Outputs
Terminal-Based Workflow (no notebooks required)

Dataset Structure

The dataset is split into four JSONL files, each representing a distinct document type:

logs.jsonl — Machine maintenance logs
incident.jsonl — Incident and anomaly descriptions
report.jsonl — Daily production reports
sop.jsonl — Standard Operating Procedures

Each record follows this structure:

{
  "id": "unique-id",
  "document_type": "maintenance_log | incident | production_report | sop",
  "text": "RAG-optimized natural language content",
  "metadata": {
    "machine_id": "optional",
    "date": "optional",
    "severity": "optional",
    "system": "optional"
  }
}

System Architecture

JSONL Files
   ↓
Custom Loader (preserves metadata)
   ↓
SentenceTransformer Embeddings
   ↓
Chroma Vector Database
   ↓
Similarity Search (Retrieval)
   ↓
Prompt Construction
   ↓
Local LLM (Generation)

Retrieval can be executed independently without invoking the language model.

Technologies Used

Python 3.10+
LangChain (document abstraction & prompt templates)
ChromaDB (vector storage)
SentenceTransformers (embeddings)
Hugging Face Transformers (local LLM)
PyTorch

Usage

1. Ingest Data

python db.py

This loads all JSONL files, embeds documents, and persists vectors in the chroma/ directory.

2. Retrieval-Only Testing

python test.py

This script performs similarity search and saves results to a text file for inspection.

3. Retrieval + Generation

python main.py

The system retrieves relevant documents and generates an answer strictly grounded in retrieved context.

If no answer is present in the data, the system responds with:

No root cause found in available documents.

Design Decisions

Why RAG instead of Fine-Tuning?

Data is frequently updated
Requires traceability and explainability
Lower cost and easier iteration

Why No Reranking?

Small, clean dataset
Strong embeddings
Metadata-driven retrieval sufficient

Why No Chain-of-Thought (yet)?

Avoids masking retrieval errors
Keeps outputs concise and auditable

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
chroma		chroma
data		data
LICENSE		LICENSE
README.md		README.md
db.py		db.py
download_models.py		download_models.py
main.py		main.py
requirements.txt		requirements.txt
retrieval_results.txt		retrieval_results.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Industrial RAG System for Machine Maintenance Analysis

Overview

Key Features

Dataset Structure

System Architecture

Technologies Used

Usage

1. Ingest Data

2. Retrieval-Only Testing

3. Retrieval + Generation

Design Decisions

Why RAG instead of Fine-Tuning?

Why No Reranking?

Why No Chain-of-Thought (yet)?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Industrial RAG System for Machine Maintenance Analysis

Overview

Key Features

Dataset Structure

System Architecture

Technologies Used

Usage

1. Ingest Data

2. Retrieval-Only Testing

3. Retrieval + Generation

Design Decisions

Why RAG instead of Fine-Tuning?

Why No Reranking?

Why No Chain-of-Thought (yet)?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages