This repository implements RLiG, a reinforcement learning agent for Bayesian Network (BN) structure learning.
Unlike traditional greedy methods (Hill-Climbing, GES, PC), RLiG optimizes a hybrid objective that balances:
- Structural fitness (BIC)
- Generative fidelity (simulating data from the learned BN and comparing against held-out samples)
To keep the generative cost tractable, the agent evaluates generative metrics only at selected steps within fixed-length tiles, while relying on BIC at other steps.
- Tabular Q-learning agent for BN structure learning.
- Hybrid reward function combining BIC and generative metrics (held-out log-likelihood, MMD, JS divergence).
- Budgeted meta-controller (GateLearn) to decide when to trigger generative evaluations.
- Flexible config system via YAML for datasets, environment parameters, and training settings.
- Evaluation on the ASIA benchmark dataset with classical baselines (Hill-Climb, GES, PC).
- Visualization of learned DAGs and training curves.
.
├── config.py # Config loader + typed helpers
├── core.py # Core RL environment (RLiGEnv, scoring, actions, rewards)
├── dag.py # DAG class with edge ops + visualization
├── gatelearn_run.py # Meta-controller (GateLearn) training & evaluation
├── qlearn_run.py # Q-learning training & evaluation
├── data/
│ └── asia_10000.csv # Example dataset (ASIA benchmark)
├── reports/
│ ├── figures/ # Plots of returns, BIC, DAGs
│ └── tables/ # Training logs, ablation tables
└── README.md # Project documentation
Clone the repo and install requirements:
git clone https://github.com/<YOUR_USERNAME>/<YOUR_REPO>.git
cd <YOUR_REPO>
pip install -r requirements.txtpython qlearn_run.py --config configs/asia.yamlpython gatelearn_run.py --config configs/asia.yamldata:
path: data/asia_10000.csv
train_frac: 0.8
seed: 42
env:
k: 2
L: 4
T: 20
I_g: [1, 3]
score:
alpha: 1.0
beta: 0.05
Ns: 1000
gen_metric: "mmd"
alpha_dirichlet: 1.0
bic_per_sample: true
qlearn:
episodes: 3000
eps_start: 1.0
eps_end: 0.05
eps_decay: 0.995
gamma: 0.99
alpha_lr: 0.25
optimistic_q: 0.0
snapshot_every: 100
start_mode: "hc"
warm_start_best: trueOn the ASIA dataset:
- RL agent learns competitive structures compared to HC, GES, PC.
- Gate meta-controller reduces generative cost while preserving accuracy.
- Metrics reported: SHD, Precision, Recall, F1, BIC, Generative metrics.
- DAGs can be visualized using
DAG.visualize(). - Training curves (returns, BIC) and gate usage are automatically saved under
reports/figures/.
Developed by Sunain Mushtaq (Computer Science, Deakin University) as part of Advanced Algorithms coursework.