Skip to content

Add sedi() — Symmetric Extremal Dependence Index for binary classification#630

Open
SimonDedman wants to merge 2 commits intotidymodels:mainfrom
SimonDedman:feature/sedi-metric
Open

Add sedi() — Symmetric Extremal Dependence Index for binary classification#630
SimonDedman wants to merge 2 commits intotidymodels:mainfrom
SimonDedman:feature/sedi-metric

Conversation

@SimonDedman
Copy link

Summary

  • Adds sedi() and sedi_vec(): a prevalence-independent skill metric for binary classification
  • SEDI uses a logarithmic transformation of sensitivity and false alarm rate, keeping it reliable at extreme class imbalance where j_index() (TSS) and mcc() degrade
  • Binary only (errors informatively on multiclass input)
  • Full interface parity: data.frame, table, matrix, and vec methods
  • 18 tests including prevalence-independence verification

Motivation

Species distribution modelling, rare disease detection, fraud detection, and extreme weather forecasting routinely deal with prevalence below 2.5%. At these levels:

  • TSS (j_index) converges to the hit rate alone and loses discriminatory power (Wunderlich et al. 2019)
  • MCC exhibits denominator suppression, producing misleadingly low values for genuinely good models (Racz et al. 2024)
  • SEDI remains discriminating because its log transform prevents collapse at the extremes (Ferro & Stephenson 2011)

SEDI is increasingly recommended in the ecological modelling literature as a replacement for TSS at low prevalence, but has no implementation in tidymodels, requiring users to compute it manually from sensitivity and specificity.

References

  • Ferro, C.A.T. and Stephenson, D.B. (2011). "Extremal Dependence Indices: Improved Verification Measures for Deterministic Forecasts of Rare Binary Events". Weather and Forecasting. 26(5): 699–713.
  • Wunderlich, R.F. et al. (2019). "Two alternative evaluation metrics to replace the true skill statistic in the assessment of species distribution models". Nature Conservation. 35: 97–116.
  • Racz, A. et al. (2024). "Mind your prevalence!". Journal of Cheminformatics. 16: 49.

Test plan

  • Known-value test against Altman pathology data
  • Perfect predictions yield SEDI = 1
  • Random predictions yield SEDI near 0
  • Multiclass input rejected with informative error
  • All interfaces (data.frame, table, matrix, vec) agree
  • NA handling (na_rm = TRUE/FALSE)
  • Case weights (importance and frequency)
  • event_level = "second" equivalence
  • Prevalence-independence: same sens/spec at different N gives same SEDI
  • Range values correct (direction = maximize, range [-1, 1])

🤖 Generated with Claude Code

…ation

Prevalence-independent skill metric that remains reliable at extreme
class imbalance (< 2.5%) where j_index (TSS) and mcc degrade.
Based on Ferro & Stephenson (2011), recommended by Wunderlich et al. (2019).

Includes sedi(), sedi_vec(), data.frame/table/matrix methods, full roxygen
docs, 18 tests, NEWS entry, and pkgdown listing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@EmilHvitfeldt
Copy link
Member

Hello @SimonDedman 👋

Before I do a full review, can you explain to me whether this metric only works on binary classification or if it works with any number of outcome classes?

@SimonDedman
Copy link
Author

Hi @EmilHvitfeldt !

Binary only - sensitivity and false alarm rate (F = 1 − specificity) are inherently binary from the 2×2 confusion matrix. The original papers are Ferro & Stephenson 2011 & Wunderlich et al. 2019.

This implementation enforces binary only: sedi_vec() calls finalize_estimator(truth, estimator = "binary") and the test suite includes a test (test-class-sedi.R:34-41) that confirms an attempt at a multiclass input error-messages with "binary". This is the same pattern as other binary yardsticks e.g. sens, spec, mcc (which does multiclass also). There's no established multiclass extension of SEDI in the literature.

Cheers!

SEDI now supports macro, macro-weighted, and micro averaging for
multiclass classification problems. Each class is treated as a
binary one-vs-all problem; per-class SEDI values are averaged.
Macro (unweighted) is the default, preserving SEDI's prevalence
independence across classes with varying frequencies.

- Remove binary-only restriction
- Add sedi_multiclass() with one-vs-all decomposition
- Support macro, macro_weighted, and micro estimators
- Add 6 new multiclass tests (3-class macro/weighted/micro,
  auto-selection, bounds, prevalence independence)
- Update roxygen docs with Multiclass section
- Update NEWS.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@SimonDedman
Copy link
Author

Also - just pushed a commit extending sedi() to support multiclass classification via one-vs-all decomposition, following pattern used by sens(), spec(), and j_index() in yardstick:

  • Macro (default): collapse the K×K matrix to K binary problems (class k vs. all others), compute per-class SEDI, take the unweighted mean. The unweighted average is deliberate — since SEDI's log transform already handles prevalence internally, weighting by class frequency would reintroduce the bias SEDI is designed to eliminate.
  • Macro-weighted: same decomposition, weighted by class prevalence.
  • Micro: pool TP/FP/FN/TN counts across all classes, compute a single SEDI from the pooled rates.

The estimator auto-selects "binary" for 2-level factors and "macro" for 3+ levels, consistent with other yardstick class metrics.

All 22 tests pass (6 new multiclass tests covering macro, macro-weighted, micro, auto-selection, bounds, and multiclass prevalence independence).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants