Skip to content

Dakshin10/GridMind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

title GridMind
emoji โšก
colorFrom blue
colorTo green
sdk gradio
sdk_version 6.13.0
python_version 3.10
app_file app.py
pinned false

โšก GridMind: Teaching an AI to Prevent Power Grid Blackouts

GridMind Gradio Dashboard Banner

Hugging Face Space Training Notebook GitHub Repo Python 3.10 OpenEnv Compliant

OpenEnv Hackathon 2026 โ€” India Submission
An advanced reinforcement learning system and custom environment simulating decentralised power distribution under strategic misreporting, delayed cascading failures, and grid overloads.


๐Ÿ“– Project Overview

Modern electrical grids are highly vulnerable to cascading failuresโ€”where a localized overload triggers trip-outs, placing greater demand on remaining lines, resulting in a city-wide blackout.

GridMind addresses this critical stability challenge. Human operators prevent blackouts using "defensive curtailment" (strategically cutting power to low-priority zones to shield critical infrastructure like hospitals). We use Reinforcement Learning (Recurrent PPO with LSTM) and LLM-Native Alignment (GRPO) to train autonomous agents to master this curtailment capability.

Furthermore, in decentralised smart grids, local distribution zones act as self-interested agents who often misreport (exaggerate) demand to secure more power. GridMind designs a Reputation System that clamps down on dishonesty and incentivizes zone cooperation.


๐ŸŽฎ Interactive Live Demo

GridMind includes a fully interactive web interface built with Gradio. The demo allows you to:

  1. Manual Mode: Test your skills as a human operator adjusting load sliders for residential, commercial, and hospital zones to maintain balance without tripping faults.
  2. AI Mode: Deploy the trained RecurrentPPO agent to handle dynamic demands, stochastic failures, and load fluctuations autonomously.
Live Grid State Visualization

Mobile Zone Controls View


๐ŸŒ Environment Architecture: GridOpsEnv

The environment simulates a 3-zone grid:

  • Zone 1 (Residential): Low priority (criticality weight: 0.5)
  • Zone 2 (Commercial): Medium priority (criticality weight: 0.75)
  • Zone 3 (Hospital / Critical): High priority (criticality weight: 1.0)
graph TD
    A[Environment Reset] --> B[Generate Demands & Total Power]
    B --> C[Format State Description state_to_text]
    C --> D[PPO LSTM / GRPO LLM Decision]
    D --> E[Normalize Allocation Action]
    E --> F[Stochastic Dynamics & Overload Detection]
    F --> G[Queue Delayed Failures]
    G --> H[Update Reputation System]
    H --> I[Evaluate Composable Rubric]
    I --> J[Check Episode Termination]
    J -- No --> B
    J -- Yes --> K[Generate Episode Summary]
Loading

๐Ÿง  Key Environmental Dynamics

1. Delayed Cascade Failure Queue

If supply allocated to a zone exceeds $1.3 \times$ its demand, that line overloads. Unlike simple simulations, overloads do not instantly cause blackouts. Instead, they enter a Failure Queue with a stochastic delay ($t+k$ steps). When the delay expires, the grid suffers a sudden capacity reduction.

Important

The delayed effect of overloads creates a complex temporal credit assignment problem that standard feedforward (MLP) RL agents fail to solve because they lack memory of previous steps.

2. Reputation System

If a zone overbids / misreports demand ($>1.3\times$ actual demand), the environment detects the lie:

  • Reputation Decay: Reputation is penalized by -0.1 per step.
  • Reputation Recovery: Honest steps recover reputation by +0.02 (clamped between 0.2 and 2.0).
  • Coordinated Weighting: Grid allocation dynamically prioritizes zones based on priority * demand * reputation. Lying results in lower future allocations.

3. Composable Reward Rubric

Aligned with the OpenEnv design principles, the reward uses a composable rubric rather than a monolithic score: $$\text{Reward} = 2.0 \cdot \text{ServedRatio} + 0.5 \cdot \text{Stability} - 0.1 \cdot \text{UnmetDemand} - 0.05 \cdot \text{WastedPower} - 1.0 \cdot \text{BlackoutPenalty}$$


๐Ÿ“Š Empirical Results

1. Game-Theoretic Simulation Modes

We evaluated four different execution dynamics under extreme demand peaks and line stress conditions:

  • Baseline: Random load distribution.
  • Selfish: Zones constantly overreport demand, maximizing local short-term allocations.
  • Coordinated: Allocations utilize zone priority and demand weighting.
  • Advanced: The full stackโ€”integrating coalition bonuses, reputation tracking, and global stability constraints.

Evaluation Metrics across Scenario Modes (50-step episodes)

Simulation Mode Avg Reward/Step Avg Blackouts Avg Stability Avg Misreporting Rate Coalition Activation
Baseline -1.495 2.111 0.333 11.1% 47.1%
Selfish 1.544 0.111 0.944 0.0% 44.8%
Coordinated 1.288 0.333 0.778 0.0% 48.3%
Advanced (Ours) 1.666 0.000 0.944 0.0% 54.9%

Tip

The Advanced Mode achieves 0.000 blackouts even in high-stress and unstable grid states by forming active coalitions and suppressing strategic lying.


2. Reinforcement Learning: Recurrent PPO (LSTM) vs. Random Policy

We trained a RecurrentPPO (PPO + LSTM) agent using Stable-Baselines3 and sb3-contrib to handle the grid's temporal dependencies.

We evaluated the trained agent over 50 full episodes against a Random baseline in the high-stress environment:

Metric Random Policy PPO LSTM Agent Improvement / Reduction
Avg Reward / Episode -2399.724 -914.028 +61.9%
Avg Blackout Penalty 88.253 46.482 -47.3% (reduction)
Avg Grid Stability 0.331 0.641 +93.5%

Why LSTM is Critical

Standard MLP networks have no memory. Because grid overloading has a delayed cascading impact, the MLP policy cannot associate an overload action on step $t$ with a blackout on step $t+2$. The LSTM policy stores a hidden state tracking prior allocations and current overload flags, allowing it to curtail load before the queue triggers.


๐Ÿง  LLM-Native Integration & GRPO Training

GridMind supports LLM-native decision-making via a structured text serializer:

  • Prompt Generation: The state_to_text() method translates floats (demands, supplies, reputations, faults) into structured natural language:
    === POWER GRID STATE (Step 12/50 โ€” 24% complete) ===
    
    Grid Zones:
      Zone 1 [Residential (low)]: demand=0.345, supply=0.333, reputation=1.00, status=โœ… Healthy
      Zone 2 [Commercial (medium)]: demand=0.290, supply=0.333, reputation=0.90, status=โœ… Healthy
      Zone 3 [Hospital/Critical (HIGH)]: demand=0.365, supply=0.334, reputation=1.00, status=โš ๏ธ FAULT DETECTED
    
    Episode so far:
      Blackouts: 0
      Total unmet demand: 0.120
      Total reward: 14.50
    
    Task: Allocate power to 3 zones as fractions summing to 1.0.
    Priority: Serve Zone 3 (Hospital) first. Avoid overloads โ€” they cascade into blackouts.
    Reply with exactly 3 space-separated floats. Example: 0.20 0.30 0.50
    
  • GRPO Optimization: Using Group Relative Policy Optimization (GRPO) on platforms like Qwen2-0.5B, the LLM is trained directly on reward feedback from the environment. This pipeline runs efficiently in Google Colab (see deliverables above).

๐Ÿ“ Repository Structure

  • env/: Core custom Gymnasium environment.
    • gridops_env.py: Grid simulator logic, delayed failure queues, reputation dynamics, and reward composability.
  • train/: Simulation analysis, plotting and helper code.
    • train.py: Multi-seed environment baseline evaluator and standard PPO setup.
    • plots.py: Plot generator for reward curves, blackout accumulation, and reputation trends.
    • analyze.py: Research analysis layer including ablation studies and cascade delay tracking.
  • models/: Pre-trained RL checkpoints.
    • recurrent_ppo_grid.zip: Trained RecurrentPPO model.
    • vecnorm.pkl: Observation normalization statistics.
  • plots/: Training graphs, ablation charts, and visual comparison plots.
  • app.py: Gradio app code for local execution and Hugging Face hosting.
  • requirements.txt: Python dependencies.
  • openenv.yaml: Environment metadata manifest.

๐Ÿš€ How to Run Locally

1. Clone the Repository

git clone https://github.com/TechLearnr4S/GridMind
cd GridMind

2. Install Dependencies

Make sure you have a python virtual environment set up (recommended: Python 3.10):

pip install -r requirements.txt

3. Run the Gradio Web Application

Launch the local web server:

python app.py

Open http://127.0.0.1:7860 in your web browser to play manual mode or deploy the AI coordinator.

4. Run Analysis & Generate Charts

To re-run the simulation baselines, ablation studies, and save all analysis plots:

python train/train.py

5. Evaluate the Pre-trained LSTM Model

Run the GRPO-aligned recurrent model evaluation script:

python eval_grpo.py

๐Ÿ‘ฅ Contributors

  • Dakshin (Dakshin10) โ€” Lead Reinforcement Learning Engineer & Environment Designer.

About

RL-based AI system that learns to allocate power across zones to prevent cascading blackouts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages