BlockRank makes LLMs efficient for document ranking by using structured sparse attention and attention-based inference, achieving 2-4× faster inference with competitive accuracy on BEIR benchmarks.
- Linear Complexity: O(n) attention instead of O(n²) through block-sparse patterns
- Fast Inference: Allows skipping autoregressive decoding using attention scores directly
- Strong Performance: Matches or outperforms state-of-the-art listwise rankers
- Easy Integration: Existing LLMs (Qwen, Mistral, Llama, etc) can be easily made a BlockRank model
| Model | Train Data | Climate FEVER | DBPedia | FEVER | FiQA | Hotpot QA | MS MARCO | NF Corpus | NQ | Sci-Docs | Sci-Fact | TREC-COVID | Avg BEIR |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BlockRank-Mistral-7B | 10% MS MARCO (50K) | 29.0 | 51.0 | 87.8 | 44.4 | 75.2 | 47.6 | 36.4 | 62.6 | 18.5 | 75.2 | 75.9 | 54.9 |
pip install git+https://github.com/nilesh2797/BlockRank.gitOr clone for development:
git clone https://github.com/nilesh2797/BlockRank.git
cd BlockRank
pip install -r requirements.txt
pip install -e .import blockrank
# Import dataset utilities
from blockrank.dataset import load_icr_dataset_hf, calculate_accuracy
# Import attention modules
from blockrank import blockrank_std_attention
from blockrank import blockrank_triton_kernel_attention
# standard SDPA-based and torch compiled BlockRank
blockrank_std_attention.register_blockrank_attention();
# Triton-kernel based BlockRank - only supports inference at the moment!
blockrank_triton_kernel_attention.register_triton_blockrank_attention();
# Import training components
from blockrank.losses import compute_auxiliary_attention_loss
from blockrank.trainer import BlockRankAuxLossTrainer# Prepare your data (JSONL format - see docs/DATA_FORMAT.md)
# Configure training (see src/configs/ for examples)
# Single GPU
CUDA_VISIBLE_DEVICES=0 python scripts/train.py --config your_config.yaml
# Multi-GPU
accelerate launch --config_file src/configs/accelerate_config.yaml \
scripts/train.py --config your_config.yamlDetails in docs/TRAINING.md.
# Fast attention-based inference (recommended)
python scripts/eval_attn.py \
--config src/configs/eval_beir.yaml \
--checkpoint your-model \
--attn_layer 20
# Standard decode-based inference
python scripts/eval_decode.py \
--config src/configs/eval_beir.yaml \
--checkpoint your-modelBlockRank introduces three changes to standard transformer LLMs:
1. Structured Sparse Attention Documents attend only to instructions and themselves (causal), while the query attends to all. Reduces complexity from O(n²) → O(n).
2. Auxiliary Contrastive Loss Mid-layer InfoNCE loss on attention patterns strengthens query-document relevance signals:
L = L_lm + λ * L_aux
3. Attention-Based Inference Extract relevance scores directly from attention maps during prefill stage:
score_i = Σ attention[layer, head, query_token, doc_i_tokens]| Model | Base | Training Data | Download |
|---|---|---|---|
| blockrank-msmarco-mistral-7B | Mistral-7B-Instruct-v0.3 | 10% MS MARCO (50K) | 🤗 HuggingFace |
| More models coming soon... |
| Model | Avg. BEIR | Climate FEVER | DBPedia | FEVER | FiQA | Hotpot QA | MS MARCO | NF Corpus | NQ | Sci-Docs | Sci-Fact | TREC-COVID |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| blockrank-msmarco-mistral-7B | 54.9 | 29.0 | 51.0 | 87.8 | 44.4 | 75.2 | 47.6 | 36.4 | 62.6 | 18.5 | 75.2 | 75.9 |
- Training Guide - Detailed training instructions, hyperparameters, and best practices
- Data Format - Data preparation and format specifications
- Paper - Full technical details and benchmarks
BlockRank/
├── src/blockrank/ # Python package
│ ├── blockrank_std_attention.py # PyTorch attention implementations
│ ├── blockrank_triton_kernel_attention.py # Triton kernels (fastest, but inference-only)
│ ├── dataset.py # Data loading and collation
│ ├── losses.py # Auxiliary contrastive loss
│ ├── trainer.py # Custom trainer with aux loss
│ └── utils.py # Utilities (metrics, formatting)
├── scripts/ # CLI scripts
│ ├── train.py # Training script
│ ├── eval_attn.py # Attention-based evaluation
│ └── eval_decode.py # Decode-based evaluation
├── configs/ # Training & eval configs
├── docs/ # Documentation
├── data/ # Downloaded datasets
└── quickstart.ipynb # Quickstart notebook
@article{gupta2025blockrank,
title={Scalable In-context Ranking with Generative Models},
author={Gupta, Nilesh and You, Chong and Bhojanapalli, Srinadh and Kumar, Sanjiv and Dhillon, Inderjit and Yu, Felix},
journal={arXiv preprint arXiv:2510.05396},
year={2025}
}MIT License - see LICENSE for details.
- Paper: arXiv:2510.05396
- Issues: GitHub Issues
- Author: Nilesh Gupta
⭐ Star us on GitHub if BlockRank is useful for your research!
