feat(gnn): GraphSAGE fraud + GAE anomaly showcase by mivertowski · Pull Request #190 · mivertowski/SyntheticData

mivertowski · 2026-05-09T08:49:51Z

Summary

End-to-end GNN showcase on the v5.9.0 Method-A je_network — a customer-driven trained-model deliverable that demonstrates how the synthetic dataset enables real graph-ML work.

Edge fraud classifier: GraphSAGE 2-layer encoder + edge-head MLP. Test AUC 0.914, F1 0.78, AUC-PR 0.79; per-business-process AUC 0.886–0.951.
Edge / node anomaly scorer: attribute-reconstruction GAE (encoder + MLP decoder predicting edge_attr). Per-edge anomaly AUC 0.654 unsupervised.

Live artefacts

Model: https://huggingface.co/VynFi/je-fraud-gnn
Gradio Space: https://huggingface.co/spaces/VynFi/fraud-gnn-demo

What's in this PR

scripts/ml/build_je_pyg_dataset.py — reproducible PyG Data builder (499 nodes × 17 feat, 61,656 edges × 22 feat). Adds weekend + round-dollar features that capture the v5.x fraud-bias signatures.
scripts/ml/train_je_fraud_gnn.py — sklearn LR baseline + GraphSAGE training script.
scripts/ml/train_je_anomaly_gae.py — attribute-reconstruction GAE training script.
scripts/ml/inference.py — shared InferenceBundle used by Space + notebook.
scripts/ml/package_for_hf.py — packages weights/preprocessor/metadata for HF model repo.
spaces/fraud-gnn-demo/ — Gradio Space with 3 tabs (edge predictor / node anomaly / live check).
notebooks/gnn_fraud_demo.ipynb — reproducible end-to-end notebook.
requirements-ml.txt — torch + torch-geometric stack (CPU-friendly).

Honest framing

DataSynth's fraud_bias injects strong local signals into edge attributes (round-dollar 378× lift, weekend 77× lift), so a vanilla LogReg on edge features already gets to AUC 0.91. GraphSAGE adds +0.13 AUC pts on the supervised task; the unsupervised attribute-GAE is where graph methods earn their keep here. The model card discusses this trade-off explicitly.

Test plan

Reproducibility: dataset build is deterministic (ChaCha8 seed 20260509, sklearn random_state=0)
Edge fraud classifier metrics match training log
GAE per-edge anomaly AUC > 0.5 (real signal, not random)
Inference bundle round-trips (load_bundle → predict_fraud matches training)
HF model repo VynFi/je-fraud-gnn pushed and pullable
Gradio Space builds and renders all three tabs locally

🤖 Generated with Claude Code

End-to-end GNN showcase on the v5.9.0 Method-A je_network from VynFi/vynfi-journal-entries-1m: * scripts/ml/build_je_pyg_dataset.py — reproducible PyG Data builder (499 nodes × 17 feat, 61,656 edges × 22 feat) with weekend + round-dollar features that capture the v5.x fraud bias signatures. * scripts/ml/train_je_fraud_gnn.py — sklearn LR baseline + GraphSAGE 2-layer encoder + edge-head MLP. Test AUC 0.914, F1 0.78, AUC-PR 0.79; per-process AUC ranges 0.886–0.951. * scripts/ml/train_je_anomaly_gae.py — attribute-reconstruction GAE (GraphSAGE encoder + MLP decoder predicting edge_attr). Per-edge anomaly AUC 0.654 unsupervised. * scripts/ml/inference.py — shared InferenceBundle for the demo Space and downstream consumers. * scripts/ml/package_for_hf.py — packages weights + preprocessor + metadata for HF Hub upload. * spaces/fraud-gnn-demo/ — Gradio Space with three tabs (edge fraud predictor / node anomaly explorer / live check). * notebooks/gnn_fraud_demo.ipynb — reproducible end-to-end run. * requirements-ml.txt — torch + torch-geometric stack. Live: * Model: https://huggingface.co/VynFi/je-fraud-gnn * Space: https://huggingface.co/spaces/VynFi/fraud-gnn-demo Honest framing: LR baseline already hits 0.91 because v5.x fraud bias writes strong local signals (round-dollar 378x lift, weekend 77x lift) that land directly in edge attributes. GraphSAGE adds +0.13 AUC pts on the supervised task; the unsupervised GAE is where graph methods earn their keep here.

mivertowski added 2 commits May 9, 2026 10:49

docs(readme): add Showcases section linking VynFi Spaces + GNN model

cf36111

mivertowski merged commit e14e2bc into main May 9, 2026
16 checks passed

mivertowski deleted the feat/gnn-fraud-showcase branch May 9, 2026 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gnn): GraphSAGE fraud + GAE anomaly showcase#190

feat(gnn): GraphSAGE fraud + GAE anomaly showcase#190
mivertowski merged 2 commits intomainfrom
feat/gnn-fraud-showcase

mivertowski commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mivertowski commented May 9, 2026

Summary

Live artefacts

What's in this PR

Honest framing

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant