AI-Augmented SOC Detection Engine

A modular Security Operations Center (SOC) detection engine combining supervised machine learning, anomaly detection, and rule-based logic to detect cyber threats in real-time. The system is designed with evaluation rigor, temporal validation, and deployment constraints in mind.

🚀 Key Capabilities

Hybrid Detection Pipeline Combines ML-based classification with anomaly scoring and deterministic rules.
Multi-Layer Threat Detection
- Structured ML model (LightGBM) for flow-based intrusion detection
- Sentence-BERT for semantic log understanding
- Isolation Forest for anomaly detection
- Rule engine for deterministic threat signatures
MITRE ATT&CK Mapping Automatically maps detections to TTPs (e.g., T1110 – Brute Force).
Temporal Evaluation Framework Supports both random and chronological data splits to measure generalization under distribution shift.
Drift Monitoring Monitors embedding and feature drift to detect changes in traffic behavior.
Production-Oriented Design Async FastAPI microservice, structured logging, Dockerized deployment.

🏗 System Architecture

graph TD
    A[Log Ingestion] -->|Async Queue| B(API Gateway / FastAPI);
    B --> C{Detection Core};
    C -->|ML Classifier| D[LightGBM IDS];
    C -->|Semantic Analysis| E[Sentence-BERT];
    C -->|Statistical Check| F[Isolation Forest];
    C -->|Rule Check| G[Rule Engine];
    D --> H[Threshold Calibration];
    C --> I[Result Aggregator];
    I --> J[MITRE Mapper];
    J --> K[JSON Response];

📂 Project Structure

src/
├── api/              # FastAPI application & endpoints
├── models/           # LightGBM IDS, BERT, Isolation Forest
├── detection/        # Core detection orchestration
├── rules/            # Rule-based detection logic
├── mitre/            # MITRE ATT&CK mapping
├── monitoring/       # Drift detection & calibration
├── evaluation/       # Benchmarking & temporal validation
└── utils/            # Preprocessing & helpers
scripts/              # Load testing & utilities

🛠 Installation & Setup

Prerequisites

Python 3.8+
Docker (optional)

Local Setup

Clone the repository
Install dependencies:

pip install -r requirements.txt

Train the model:

python train_siem.py

Start the API:

uvicorn src.api.main:app --reload

Docker Deployment

docker-compose up --build -d

⚡ Performance

Intrusion Detection (CIC-IDS2017)

Evaluation performed under two strategies:

Split Strategy	ROC-AUC	Detection Rate	False Positive Rate
Random Flow-Level	~0.999	~99.98%	~0.1%
Chronological (Time-Based)	Evaluated to measure real-world generalization

Note: Chronological split simulates deployment by training on earlier capture days and testing on future traffic to reduce leakage effects.

Inference Benchmark

Single Sample Latency: ~3–5 ms (CPU)
Throughput (Batch 32): ~4000 samples/sec
Async API Throughput: ~200+ logs/sec per worker

🛡 Detection Capabilities

Detection Layer	Technique	Example
Flow-Based IDS	LightGBM	DDoS, DoS, PortScan, Brute Force
Semantic	Sentence-BERT	Suspicious command patterns in logs
Statistical	Isolation Forest	Traffic volume anomalies
Rule-Based	Threshold/Pattern	5 failed logins in 10s

📊 Evaluation Philosophy

This project emphasizes:

Impact of data splitting strategy on IDS performance
Performance inflation under naive random splits
Temporal validation to approximate deployment behavior
Threshold calibration (Youden-J vs Max-F1)
Per-class detection analysis for rare attacks

The goal is not just high metrics, but defensible and reproducible evaluation.

📈 Running Evaluation

Benchmark:

python src/evaluation/benchmark.py

Load test:

python scripts/load_test.py

🗺 Roadmap

Cross-dataset validation (UNSW-NB15 / CIC-IDS2018)
Adaptive thresholding under drift
Online learning module
Entity graph anomaly detection

👨‍💻 Authors

Rishit Sharma, Kokkula Srinivas Detection Engineering | ML for Cyber Defense

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
dashboard		dashboard
data		data
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
demo.py		demo.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run_ids_pipeline.py		run_ids_pipeline.py
test_custom_data.py		test_custom_data.py
train_siem.py		train_siem.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Augmented SOC Detection Engine

🚀 Key Capabilities

🏗 System Architecture

📂 Project Structure

🛠 Installation & Setup

Prerequisites

Local Setup

Docker Deployment

⚡ Performance

Intrusion Detection (CIC-IDS2017)

Inference Benchmark

🛡 Detection Capabilities

📊 Evaluation Philosophy

📈 Running Evaluation

🗺 Roadmap

👨‍💻 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Augmented SOC Detection Engine

🚀 Key Capabilities

🏗 System Architecture

📂 Project Structure

🛠 Installation & Setup

Prerequisites

Local Setup

Docker Deployment

⚡ Performance

Intrusion Detection (CIC-IDS2017)

Inference Benchmark

🛡 Detection Capabilities

📊 Evaluation Philosophy

📈 Running Evaluation

🗺 Roadmap

👨‍💻 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages