feat(gnn): GraphSAGE fraud + GAE anomaly showcase#190
Merged
mivertowski merged 2 commits intomainfrom May 9, 2026
Merged
Conversation
End-to-end GNN showcase on the v5.9.0 Method-A je_network from
VynFi/vynfi-journal-entries-1m:
* scripts/ml/build_je_pyg_dataset.py — reproducible PyG Data
builder (499 nodes × 17 feat, 61,656 edges × 22 feat) with
weekend + round-dollar features that capture the v5.x fraud
bias signatures.
* scripts/ml/train_je_fraud_gnn.py — sklearn LR baseline +
GraphSAGE 2-layer encoder + edge-head MLP. Test AUC 0.914,
F1 0.78, AUC-PR 0.79; per-process AUC ranges 0.886–0.951.
* scripts/ml/train_je_anomaly_gae.py — attribute-reconstruction
GAE (GraphSAGE encoder + MLP decoder predicting edge_attr).
Per-edge anomaly AUC 0.654 unsupervised.
* scripts/ml/inference.py — shared InferenceBundle for the demo
Space and downstream consumers.
* scripts/ml/package_for_hf.py — packages weights + preprocessor
+ metadata for HF Hub upload.
* spaces/fraud-gnn-demo/ — Gradio Space with three tabs
(edge fraud predictor / node anomaly explorer / live check).
* notebooks/gnn_fraud_demo.ipynb — reproducible end-to-end run.
* requirements-ml.txt — torch + torch-geometric stack.
Live:
* Model: https://huggingface.co/VynFi/je-fraud-gnn
* Space: https://huggingface.co/spaces/VynFi/fraud-gnn-demo
Honest framing: LR baseline already hits 0.91 because v5.x fraud
bias writes strong local signals (round-dollar 378x lift, weekend
77x lift) that land directly in edge attributes. GraphSAGE adds
+0.13 AUC pts on the supervised task; the unsupervised GAE is
where graph methods earn their keep here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end GNN showcase on the v5.9.0 Method-A je_network — a customer-driven trained-model deliverable that demonstrates how the synthetic dataset enables real graph-ML work.
edge_attr). Per-edge anomaly AUC 0.654 unsupervised.Live artefacts
What's in this PR
scripts/ml/build_je_pyg_dataset.py— reproducible PyGDatabuilder (499 nodes × 17 feat, 61,656 edges × 22 feat). Adds weekend + round-dollar features that capture the v5.x fraud-bias signatures.scripts/ml/train_je_fraud_gnn.py— sklearn LR baseline + GraphSAGE training script.scripts/ml/train_je_anomaly_gae.py— attribute-reconstruction GAE training script.scripts/ml/inference.py— sharedInferenceBundleused by Space + notebook.scripts/ml/package_for_hf.py— packages weights/preprocessor/metadata for HF model repo.spaces/fraud-gnn-demo/— Gradio Space with 3 tabs (edge predictor / node anomaly / live check).notebooks/gnn_fraud_demo.ipynb— reproducible end-to-end notebook.requirements-ml.txt— torch + torch-geometric stack (CPU-friendly).Honest framing
DataSynth's
fraud_biasinjects strong local signals into edge attributes (round-dollar 378× lift, weekend 77× lift), so a vanilla LogReg on edge features already gets to AUC 0.91. GraphSAGE adds +0.13 AUC pts on the supervised task; the unsupervised attribute-GAE is where graph methods earn their keep here. The model card discusses this trade-off explicitly.Test plan
20260509, sklearnrandom_state=0)load_bundle→predict_fraudmatches training)VynFi/je-fraud-gnnpushed and pullable🤖 Generated with Claude Code