`its-hub`: A Python library for inference-time scaling

its_hub is a Python library for inference-time scaling of LLMs, focusing on mathematical reasoning tasks.

📚 Documentation

For comprehensive documentation, including installation guides, tutorials, and API reference, visit:

https://ai-innovation.team/its_hub

Installation

its_hub provides a minimal core focused on algorithms, with optional language model implementations.

Core Installation (Algorithms Only)

For gateway integration - just algorithms and interfaces, minimal dependencies:

pip install its_hub

This includes:

✓ Self-Consistency and Best-of-N algorithms
✓ Abstract base classes (AbstractLanguageModel, AbstractOutcomeRewardModel)
✓ Only 2 dependencies: numpy, typing-extensions

With Language Model Support

For standalone use - includes OpenAI-compatible language model implementation:

pip install its_hub[lm]

Adds: OpenAICompatibleLanguageModel, LLMJudge, StepGeneration (requires openai, aiohttp, backoff)

With Experimental Algorithms

For experimental features - includes beam search and particle filtering:

pip install its_hub[experimental]

Adds: Process reward models, beam search, particle filtering algorithms

Development Installation

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
cd its_hub
pip install -e ".[dev]"
# or using uv:
uv sync --extra dev

Quick Start

Example 1: Gateway Integration (Core Installation)

Installation required: pip install its_hub (core only, minimal dependencies)

Gateway integration requires implementing two interfaces: AbstractLanguageModel for LM calls and AbstractOrchestrator for managing parallel execution with concurrency control and rate limiting.

import asyncio

from its_hub import AbstractLanguageModel, AbstractOrchestrator, SelfConsistency

# Step 1: Implement AbstractLanguageModel with your gateway's LM client
class MyGatewayLM(AbstractLanguageModel):
    def __init__(self, gateway_client):
        self.client = gateway_client

    async def agenerate_single(self, messages, stop=None, **kwargs):
        response = await self.client.generate(messages, stop=stop, **kwargs)
        return {"role": "assistant", "content": response}

# Step 2: Implement AbstractOrchestrator for concurrency control
# (or use the built-in LMOrchestrator from its_hub[lm])
class MyGatewayOrchestrator(AbstractOrchestrator):
    async def agenerate(self, lm, messages_lst, **kwargs):
        # Manage parallel calls with your gateway's rate limits
        ...

async def main():
    lm = MyGatewayLM(your_gateway_client)
    orchestrator = MyGatewayOrchestrator()
    algorithm = SelfConsistency(orchestrator=orchestrator)
    result = await algorithm.ainfer(lm, "What is 2+2?", budget=5)
    print(result)  # {"role": "assistant", "content": "4", ...}

asyncio.run(main())

The AbstractOrchestrator is the central coordination point — it controls how algorithms fan out parallel LM calls, enforces rate limits, and provides structured error handling. See Orchestration for details.

Example 2: Standalone Use with OpenAI-Compatible LM

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import OpenAICompatibleLanguageModel, SelfConsistency

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

algorithm = SelfConsistency()
result = algorithm.infer(lm, "What is the capital of France?", budget=3)
print(result)  # Most common answer from 3 generations

# Close lm for resource cleanup
asyncio.run(lm.close())

Example 3: Best-of-N with LLM Judge

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import BestOfN, LLMJudge, OpenAICompatibleLanguageModel

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

judge = LLMJudge(lm=lm, fallback_score=5.0)
algorithm = BestOfN(orm=judge)
result = algorithm.infer(lm, "Write a sorting function", budget=5)
print(result)  # Best response as judged by LLM

# Close lm for resource cleanup
asyncio.run(lm.close())

Key Features

🔬 Multiple Algorithms: Self-Consistency, Best-of-N, Beam Search (experimental), Particle Filtering (experimental)
🚀 Gateway Integration: Clean abstractions (AbstractLanguageModel, AbstractOrchestrator) for easy integration with AI gateways
🔄 Orchestration: AbstractOrchestrator provides structured concurrency, rate limiting, and error propagation for parallel LM calls — essential for production gateway deployments
🧮 Math-Optimized: Built for mathematical reasoning tasks
⚡ Async-First: ainfer() is the primary method; infer() is a sync wrapper. Concurrent generation with limits and error handling
🎯 Minimal Core: Only 2 dependencies (numpy, typing-extensions) for core install

For detailed documentation, visit: https://ai-innovation.team/its_hub

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
.claude		.claude
.devcontainer		.devcontainer
.github/workflows		.github/workflows
benchmarking		benchmarking
docs		docs
examples		examples
its_hub		its_hub
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.jupytext.yml		.jupytext.yml
BREAKING_CHANGES.md		BREAKING_CHANGES.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
REFACTOR.md		REFACTOR.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`its-hub`: A Python library for inference-time scaling

📚 Documentation

Installation

Core Installation (Algorithms Only)

With Language Model Support

With Experimental Algorithms

Development Installation

Quick Start

Example 1: Gateway Integration (Core Installation)

Example 2: Standalone Use with OpenAI-Compatible LM

Example 3: Best-of-N with LLM Judge

Key Features

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

its-hub: A Python library for inference-time scaling

📚 Documentation

Installation

Core Installation (Algorithms Only)

With Language Model Support

With Experimental Algorithms

Development Installation

Quick Start

Example 1: Gateway Integration (Core Installation)

Example 2: Standalone Use with OpenAI-Compatible LM

Example 3: Best-of-N with LLM Judge

Key Features

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`its-hub`: A Python library for inference-time scaling

Packages