Skip to content

nilesh2797/BlockRank

Repository files navigation

BlockRank: Scalable In-context Ranking with Generative Models

Paper Model Colab License Python 3.9+

BlockRank Architecture

BlockRank makes LLMs efficient for document ranking by using structured sparse attention and attention-based inference, achieving 2-4× faster inference with competitive accuracy on BEIR benchmarks.

Key Features

  • Linear Complexity: O(n) attention instead of O(n²) through block-sparse patterns
  • Fast Inference: Allows skipping autoregressive decoding using attention scores directly
  • Strong Performance: Matches or outperforms state-of-the-art listwise rankers
  • Easy Integration: Existing LLMs (Qwen, Mistral, Llama, etc) can be easily made a BlockRank model

Results

Model Train Data Climate FEVER DBPedia FEVER FiQA Hotpot QA MS MARCO NF Corpus NQ Sci-Docs Sci-Fact TREC-COVID Avg BEIR
BlockRank-Mistral-7B 10% MS MARCO (50K) 29.0 51.0 87.8 44.4 75.2 47.6 36.4 62.6 18.5 75.2 75.9 54.9

Installation

pip install git+https://github.com/nilesh2797/BlockRank.git

Or clone for development:

git clone https://github.com/nilesh2797/BlockRank.git
cd BlockRank
pip install -r requirements.txt
pip install -e .

Quick Start

Interactive Demo

Try the Colab notebook: Open In Colab

As a Library

import blockrank

# Import dataset utilities
from blockrank.dataset import load_icr_dataset_hf, calculate_accuracy

# Import attention modules
from blockrank import blockrank_std_attention
from blockrank import blockrank_triton_kernel_attention
# standard SDPA-based and torch compiled BlockRank
blockrank_std_attention.register_blockrank_attention(); 
# Triton-kernel based BlockRank - only supports inference at the moment!
blockrank_triton_kernel_attention.register_triton_blockrank_attention();

# Import training components
from blockrank.losses import compute_auxiliary_attention_loss
from blockrank.trainer import BlockRankAuxLossTrainer

Training

# Prepare your data (JSONL format - see docs/DATA_FORMAT.md)
# Configure training (see src/configs/ for examples)

# Single GPU
CUDA_VISIBLE_DEVICES=0 python scripts/train.py --config your_config.yaml

# Multi-GPU
accelerate launch --config_file src/configs/accelerate_config.yaml \
    scripts/train.py --config your_config.yaml

Details in docs/TRAINING.md.

Evaluation

# Fast attention-based inference (recommended)
python scripts/eval_attn.py \
    --config src/configs/eval_beir.yaml \
    --checkpoint your-model \
    --attn_layer 20

# Standard decode-based inference
python scripts/eval_decode.py \
    --config src/configs/eval_beir.yaml \
    --checkpoint your-model

How It Works

BlockRank introduces three changes to standard transformer LLMs:

1. Structured Sparse Attention Documents attend only to instructions and themselves (causal), while the query attends to all. Reduces complexity from O(n²) → O(n).

2. Auxiliary Contrastive Loss Mid-layer InfoNCE loss on attention patterns strengthens query-document relevance signals:

L = L_lm + λ * L_aux

3. Attention-Based Inference Extract relevance scores directly from attention maps during prefill stage:

score_i = Σ attention[layer, head, query_token, doc_i_tokens]

Model Zoo

Model Base Training Data Download
blockrank-msmarco-mistral-7B Mistral-7B-Instruct-v0.3 10% MS MARCO (50K) 🤗 HuggingFace
More models coming soon...

Evals

Evaluation Data

Model Avg. BEIR Climate FEVER DBPedia FEVER FiQA Hotpot QA MS MARCO NF Corpus NQ Sci-Docs Sci-Fact TREC-COVID
blockrank-msmarco-mistral-7B 54.9 29.0 51.0 87.8 44.4 75.2 47.6 36.4 62.6 18.5 75.2 75.9

Documentation

  • Training Guide - Detailed training instructions, hyperparameters, and best practices
  • Data Format - Data preparation and format specifications
  • Paper - Full technical details and benchmarks

Project Structure

BlockRank/
├── src/blockrank/          # Python package
│   ├── blockrank_std_attention.py      # PyTorch attention implementations
│   ├── blockrank_triton_kernel_attention.py  # Triton kernels (fastest, but inference-only)
│   ├── dataset.py          # Data loading and collation
│   ├── losses.py           # Auxiliary contrastive loss
│   ├── trainer.py          # Custom trainer with aux loss
│   └── utils.py            # Utilities (metrics, formatting)
├── scripts/                # CLI scripts
│   ├── train.py            # Training script
│   ├── eval_attn.py        # Attention-based evaluation
│   └── eval_decode.py      # Decode-based evaluation
├── configs/                # Training & eval configs
├── docs/                   # Documentation
├── data/                   # Downloaded datasets
└── quickstart.ipynb        # Quickstart notebook

Citation

@article{gupta2025blockrank,
  title={Scalable In-context Ranking with Generative Models},
  author={Gupta, Nilesh and You, Chong and Bhojanapalli, Srinadh and Kumar, Sanjiv and Dhillon, Inderjit and Yu, Felix},
  journal={arXiv preprint arXiv:2510.05396},
  year={2025}
}

License

MIT License - see LICENSE for details.

Contact


⭐ Star us on GitHub if BlockRank is useful for your research!

About

BlockRank makes LLMs efficient and scalable for RAG and in-context ranking

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published