The Shapley value constitutes a widely recognized tool to assess individual contributions of cooperating entities from a game-theoretic perspective. Within the field of machine learning it is frequently applied to conduct feature attribution or data valuation. Aiming to overcome its deficiencies in capturing intricate interplay between entities, the Shapley interaction index represents a natural extension of the Shapley value, maintaining axiomatic uniqueness. As their complexity renders the exact computation of both quantities impractical, recent work transfers approximation approaches from the Shapley value to Shapley interactions. We propose Permutation-IQ, a domain-independent approximation method based on a novel representation that traces Shapley interactions back to the Shapley value's fine-grained building blocks of marginal contributions. Sampling these instead of the coarser discrete derivatives of which Shapley interactions are composed, allows to utilize the collected information more efficiently.
This repository contains the source code to reproduce the experiments of the paper "Approximating Shapley Interactions with Marginal Contributions" (accepted at ECML PKDD 2026) by Patrick Kolpaczki, Felix Edelmann, Maximilian Muschalik and Eyke Hüllermeier.
It consists of:
- Implementation of the Permutation-IQ method using the
shapiqframework.permutation_iq.py:- Class
PermutationIQ, inheriting fromshapiq.approximator.Approximator, implementing the Permutation-IQ method. - Class
PermutationIQStratified, implementing a stratified sampling variant of Permutation-IQ.
- Class
- Benchmark setup to evaluate the approximation quality of Permutation-IQ against the state-of-the-art methods on synthetic and real-world datasets, using the benchmark framework provided by
shapiq.main.py: Script to run the experiments.slurmjob.bash: SLURM job configuration to be run on a compute cluster.results/: Benchmark results stored as CSV files, along with the corresponding log files.pyproject.toml,.python-version,uv.lock: Python project configuration and dependencies.
- Setup for processing and visualizing the benchmark results using R and
tidyversepackages (most notablyggplot2).analyze.R: Script to process the raw benchmark results, compute summary statistics and plots.plots/: Plots as PNG files generated byanalyze.R, along with the corresponding plot data as CSV files..Rprofile,renv.lock,renv/: R project configuration and dependencies.
Install the Python package manager uv, e.g. via:
pipx install uv
Now, you can either call the benchmark script via uv run main.py [command] [options] or by first syncing and activating the virtual environment (venv):
uv sync
source .venv/bin/activate
python main.py [command] [options]
Three commands are available to run the benchmarks:
help: Print help message with available commands and options.debug: Run all approximators on a single dataset with a single budget and print results. Useful when developing approximators and wanting to quickly check their performance.benchmark [name]: Run a specific benchmark configuration identified bynameand save results toresults/. Benchmark configurations are defined inmain.py.
Install the R package manager renv and use it to install the required R packages:
install.packages("renv")
renv::restore()
Now, you can run the analysis script to process the raw benchmark results and generate summary statistics and plots:
Rscript analyze.R
It is advisable to comment out the code blocks in analyze.R that are not relevant for the specific benchmark results you want to analyze, as the script is currently set up to process all benchmark results at once, which can be time-consuming.
Implementation of the Permutation-IQ method can be found in permutation_iq.py, which defines the PermutationIQ class, inheriting from the abstact class shapiq.approximator.Approximator. The class needs to implements the approximate method, which takes as input a budget, i.e. the maximum number of function evaluations, and a cooperative game.
It is in the nature of Permutation-IQ that for a single interaction set K consisting of k players, we receive k sibling estimates that need to be aggregated in some way to obtain the final estimate of the Shapley interaction index for K. The following aggregation strategies are implemented in the PermutationIQ class:
mean: Arithmetic mean of the sibling estimatesinverse_variance_weighting: Weighted mean of the sibling estimates, where the weights are inversely proportional to the estimated variance of the estimates. Variances are estimated via Welford's online algorithm.inverse_variance_weighting_optimal: Optional. Same asinverse_variance_weighting, but using pre-computed exact variances instead. Only feasable when the datasets are small enough for exact variance calculation, and only useful for benchmarking purposes.
As the aggregation step is only the last step of the approximation process and computationally cheap, the aggregation strategy is not a configuration option of the approximator but all strategies are applied to the same set of sibling estimates, which allows for a direct comparison of the strategies. These estimates by aggregation strategy (called "variant" in the code) are returned by the approximate_variants function.
PermutationIQStratified implements a stratified sampling variant of Permutation-IQ, where samples are stratified by coalition size. Note that this method is not included in the above-mentioned paper, but kept for future work.
The benchmark framework is given by the shapiq package, which provides state-of-the-art approximation methods for the Shapley interaction index, as well as cooperative game implementations with pre-computed exact Shapley interaction indices for some configurations. The benchmark setup is defined in main.py, which contains a function for each benchmark configuration. Each function defines the cooperative games to be used, the approximators to be evaluated, and the budgets to be tested. The results are saved as CSV files in the results/ directory, along with log files containing additional information about the benchmark run.
As some state-of-the-art methods exhibit infeasible runtimes for larger datasets (usually larger sum of unanimity games, SOUGs), some benchmarks define a hard-coded maximum runtime for each approximator run. If this timeout is hit, that approximator will be cancelled and not be included in further runs with the same or larger budgets, as it is assumed that it will also not finish within the budget for those runs. This is done to save computational resources and to be able to run the benchmarks in a reasonable time frame.