Skip to content

feat(gnn): GraphSAGE fraud + GAE anomaly showcase#190

Merged
mivertowski merged 2 commits intomainfrom
feat/gnn-fraud-showcase
May 9, 2026
Merged

feat(gnn): GraphSAGE fraud + GAE anomaly showcase#190
mivertowski merged 2 commits intomainfrom
feat/gnn-fraud-showcase

Conversation

@mivertowski
Copy link
Copy Markdown
Owner

Summary

End-to-end GNN showcase on the v5.9.0 Method-A je_network — a customer-driven trained-model deliverable that demonstrates how the synthetic dataset enables real graph-ML work.

  • Edge fraud classifier: GraphSAGE 2-layer encoder + edge-head MLP. Test AUC 0.914, F1 0.78, AUC-PR 0.79; per-business-process AUC 0.886–0.951.
  • Edge / node anomaly scorer: attribute-reconstruction GAE (encoder + MLP decoder predicting edge_attr). Per-edge anomaly AUC 0.654 unsupervised.

Live artefacts

What's in this PR

  • scripts/ml/build_je_pyg_dataset.py — reproducible PyG Data builder (499 nodes × 17 feat, 61,656 edges × 22 feat). Adds weekend + round-dollar features that capture the v5.x fraud-bias signatures.
  • scripts/ml/train_je_fraud_gnn.py — sklearn LR baseline + GraphSAGE training script.
  • scripts/ml/train_je_anomaly_gae.py — attribute-reconstruction GAE training script.
  • scripts/ml/inference.py — shared InferenceBundle used by Space + notebook.
  • scripts/ml/package_for_hf.py — packages weights/preprocessor/metadata for HF model repo.
  • spaces/fraud-gnn-demo/ — Gradio Space with 3 tabs (edge predictor / node anomaly / live check).
  • notebooks/gnn_fraud_demo.ipynb — reproducible end-to-end notebook.
  • requirements-ml.txt — torch + torch-geometric stack (CPU-friendly).

Honest framing

DataSynth's fraud_bias injects strong local signals into edge attributes (round-dollar 378× lift, weekend 77× lift), so a vanilla LogReg on edge features already gets to AUC 0.91. GraphSAGE adds +0.13 AUC pts on the supervised task; the unsupervised attribute-GAE is where graph methods earn their keep here. The model card discusses this trade-off explicitly.

Test plan

  • Reproducibility: dataset build is deterministic (ChaCha8 seed 20260509, sklearn random_state=0)
  • Edge fraud classifier metrics match training log
  • GAE per-edge anomaly AUC > 0.5 (real signal, not random)
  • Inference bundle round-trips (load_bundlepredict_fraud matches training)
  • HF model repo VynFi/je-fraud-gnn pushed and pullable
  • Gradio Space builds and renders all three tabs locally

🤖 Generated with Claude Code

End-to-end GNN showcase on the v5.9.0 Method-A je_network from
VynFi/vynfi-journal-entries-1m:

  * scripts/ml/build_je_pyg_dataset.py — reproducible PyG Data
    builder (499 nodes × 17 feat, 61,656 edges × 22 feat) with
    weekend + round-dollar features that capture the v5.x fraud
    bias signatures.
  * scripts/ml/train_je_fraud_gnn.py — sklearn LR baseline +
    GraphSAGE 2-layer encoder + edge-head MLP. Test AUC 0.914,
    F1 0.78, AUC-PR 0.79; per-process AUC ranges 0.886–0.951.
  * scripts/ml/train_je_anomaly_gae.py — attribute-reconstruction
    GAE (GraphSAGE encoder + MLP decoder predicting edge_attr).
    Per-edge anomaly AUC 0.654 unsupervised.
  * scripts/ml/inference.py — shared InferenceBundle for the demo
    Space and downstream consumers.
  * scripts/ml/package_for_hf.py — packages weights + preprocessor
    + metadata for HF Hub upload.
  * spaces/fraud-gnn-demo/ — Gradio Space with three tabs
    (edge fraud predictor / node anomaly explorer / live check).
  * notebooks/gnn_fraud_demo.ipynb — reproducible end-to-end run.
  * requirements-ml.txt — torch + torch-geometric stack.

Live:
  * Model: https://huggingface.co/VynFi/je-fraud-gnn
  * Space: https://huggingface.co/spaces/VynFi/fraud-gnn-demo

Honest framing: LR baseline already hits 0.91 because v5.x fraud
bias writes strong local signals (round-dollar 378x lift, weekend
77x lift) that land directly in edge attributes. GraphSAGE adds
+0.13 AUC pts on the supervised task; the unsupervised GAE is
where graph methods earn their keep here.
@mivertowski mivertowski merged commit e14e2bc into main May 9, 2026
16 checks passed
@mivertowski mivertowski deleted the feat/gnn-fraud-showcase branch May 9, 2026 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant