Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
83df0f5
[ ADD ] Sparse modifiers
setday Feb 26, 2026
d9f15a1
[ ADD ] GELUSquared and squared clipped functions
setday Apr 3, 2026
8ad06e5
[ ADD ] BatchNorm pre-stop
setday Apr 3, 2026
6f31592
[ FIX ] Gradient for inplace clamping and relative path for modifiers
setday Apr 3, 2026
8c419a5
[ UPD ] Turn of analytics by default
setday Apr 3, 2026
6082aff
[ ADD ] Tests
setday Apr 3, 2026
195ec07
[ ADD ] Proper parameter clonning on modification
setday Apr 3, 2026
194be79
[ UPD ] __init__.py with all the modifiers
setday Apr 3, 2026
91b02e8
[ UPD ] Replace quantile with kth value + make evenly spaced selectio…
setday Apr 3, 2026
7a97c01
[ UPD ] make top k with sparsity in [0, 1] + replace quantile with kt…
setday Apr 3, 2026
2a86c58
[ UPD ] Make input and output cloning to avoid changing by inplace fu…
setday Apr 3, 2026
fe723c3
[ UPD ] make less memory load using clamp_ instead of masking
setday Apr 3, 2026
e721819
[ UPD ] Add inplace arg for SUGARBSiLU
setday Apr 3, 2026
7aa163a
[ FIX ] Fixing NoisyReLU broken logic (it was working like ReLU all t…
setday Apr 3, 2026
6f53af9
[ ADD ] Linear sparse activation layers
setday May 6, 2026
2f584c6
[ ADD ] LayerNormPreStop
setday May 6, 2026
36a869f
[ ADD ] Normalizations with Accumulation Stop
setday May 6, 2026
bc16af8
[ UPD ] modifiers with better param copy and layer detection
setday May 6, 2026
2f85a94
[ ADD ] Linear modifier and analitical layer creators
setday May 6, 2026
24dfff9
[ UPD ] Better debug info handelling for activations
setday May 6, 2026
b518377
[ ADD ] RS for top k sparsity
setday May 6, 2026
ebc177d
[ ADD ] More activations
setday May 6, 2026
75df3c0
[ FIX ] Convert RS from buffers to parameters
setday May 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,49 @@
# Sparse-Activations

## Modifiers

The `modifiers` package is a collection of drop-in replacements for standard PyTorch activation functions and normalization layers, with a focus on sparsity-inducing variants.

### Activations

Most of these are decorated with `@analytical_module`, which optionally stores input/output tensors on the forward pass.

- **ReLUSquared** - Just ReLU followed by squaring: $f(x) = (\max(0, x))^2$.
- **BSiLU** - A shifted SiLU variant: $f(x) = (x + \alpha) \cdot \sigma(x) - \alpha/2$. Comes from [this paper](https://arxiv.org/html/2505.22074v1). Smoother gradients than ReLU, and `alpha` is configurable.
- **SUGARBSiLU** - Uses ReLU in the forward pass but BSiLU's gradient in the backward pass (surrogate gradient trick). Same paper as above.
- **NoisyReLU** - Adds learnable noise during training based on the negative part of the input. The noise scale is controlled by a parameter `p` and a constant `c`. Based on [this paper](https://arxiv.org/pdf/1603.00391).
- **QuantileReLU** - Zeros out activations below a given quantile threshold instead of just below zero. Supports several modes: shifted sparsity, unsigned, continuous, etc.
- **TopKSparseGELU** - GELU but only the top-k% of activations survive (the rest get zeroed). Uses the `@topk_sparse_module` decorator under the hood.

All activations are accessible by string name (e.g. `'ReLUSquared'`, `'TopKSparseGELU-50'`) through a built-in name map, so you don't have to import classes manually if you don't want to. Presets with common sparsity levels (10%, 25%, 50%, 75%, 90%) are included.

### Normalizations

Custom normalization layers that shift the centering point from the mean to a quantile - this effectively biases the normalization toward sparser outputs.

- **QuantileBatchNorm2d** - Like BatchNorm2d, but uses a quantile of the activations as the mean estimate. Supports global, batchwise, or channelwise quantile computation.
- **QuantileMeanBatchNorm2d** - Same quantile-based mean, but keeps the standard variance calculation. A middle ground.
- **QuantileLayerNorm** - LayerNorm variant with quantile-based centering. Also supports running stats tracking and configurable quantile search modes.

Same deal as activations - string names with sparsity presets are available (e.g. `'QuantileBatchNorm2d-50'`).

### Decorators

Two module decorators that can wrap any `nn.Module`:

- `@analytical_module` - Adds `in_activation` / `out_activation` attributes that capture tensors during forward. Toggle with `debug_info=True/False`.
- `@topk_sparse_module` - Adds top-k sparsity to any activation. Set `sparsity_level` (0 to 1) and choose whether sparsity is applied before or after the base activation with `post_sparsity`.

### Replacing layers in an existing model

```python
from modifiers import replace_activation, replace_normalization

# Swap all GELU activations for ReLUSquared
replace_activation(model, original_activation='GELU', replaced_activation='ReLUSquared')

# Swap BatchNorm2d for quantile-based variant at 50% sparsity
replace_normalization(model, original_normalization='BatchNorm2d', replaced_normalization='QuantileBatchNorm2d-50')
```

Both functions walk the module tree recursively and return a list of the newly created layers, in case you need to track them.
75 changes: 0 additions & 75 deletions activations.py

This file was deleted.

90 changes: 90 additions & 0 deletions modifiers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from .decorators import analytical_activation_module, analytical_linear_module, topk_sparse_module
from .activations import (
ReLUSquared,
ReLUSquaredClipped,
GELUSquared,
GELUSquaredClipped,

QuantileReLU,
NoisyReLU,

BSiLU,
SUGARBSiLU,

TopKSparseGELU,

ActivationClass,
)
from .normalizations import (
BatchNorm2dPreStop,
LayerNormPreStop,

QuantileBatchNorm2d,
QuantileLayerNorm,
QuantileMeanBatchNorm2d,

NormalizationClass,
)
from .linears import (
TopKSparseLinear,
TopKSparseConv2d,
TopKSparseConv1d,

LinearClass,
)
from .modify import (
replace_activation,
replace_normalization,
replace_linear,

make_analytical_activation,
make_analytical_linear
)

__all__ = [
# Decorators
'topk_sparse_module',
'analytical_activation_module',
'analytical_linear_module',

# Activations
'ReLUSquared',
"ReLUSquaredClipped",
'GELUSquared',
'GELUSquaredClipped',

'QuantileReLU',
'NoisyReLU',

'BSiLU',
'SUGARBSiLU',

'TopKSparseGELU',

'ActivationClass',

# Normalizations
'BatchNorm2dPreStop',
'LayerNormPreStop',

'QuantileBatchNorm2d',
'QuantileLayerNorm',
'QuantileMeanBatchNorm2d',

'NormalizationClass',

# Linears
'TopKSparseLinear',
'TopKSparseConv2d',
'TopKSparseConv1d',

'LinearClass',

# Modifiers
'replace_activation',
'replace_normalization',
'replace_linear',

'make_analytical_activation',
'make_analytical_linear',
]
Loading