This document describes all environment variables used throughout BitNet-rs for configuration, testing, and development.
BITNET_GGUF/CROSSVAL_GGUF: Path to test modelBITNET_CPP_DIR: Path to C++ implementationHF_TOKEN: Hugging Face token for private reposBITNET_DETERMINISTIC: Enable deterministic mode for testingBITNET_SEED: Set seed for reproducible runsBITNET_STRICT_MODE: Prevent mock inference fallbacks and validate LayerNorm gamma statistics ("1" enables strict mode for production)- Prevents all mock inference paths
- Validates LayerNorm gamma weights have mean ≈ 1.0
- Fails immediately on suspicious LayerNorm statistics (mean outside [0.5, 2.0])
- In non-strict mode (default), issues warnings but continues
-
BITNET_CORRECTION_POLICY: Path to YAML policy file defining model-specific corrections- Value: Absolute or relative path to policy YAML file (e.g.,
/path/to/policy.yml) - Purpose: Enable runtime corrections for known-bad models with fingerprinted, auditable fixes
- Format: YAML file specifying model fingerprints and correction parameters
- Usage:
# Enable policy-driven corrections export BITNET_CORRECTION_POLICY=/path/to/correction-policy.yml export BITNET_ALLOW_RUNTIME_CORRECTIONS=1 cargo run -p bitnet-cli -- run --model model.gguf
- Important: Both
BITNET_CORRECTION_POLICYandBITNET_ALLOW_RUNTIME_CORRECTIONSmust be set
- Value: Absolute or relative path to policy YAML file (e.g.,
-
BITNET_ALLOW_RUNTIME_CORRECTIONS: Enable runtime corrections (must be used with BITNET_CORRECTION_POLICY)- Value: "1" to enable (disabled by default)
- Purpose: Safety gate preventing accidental application of corrections
- Warning: CI blocks correction flags - runtime corrections are for known-bad models only
- Proper fix: Always prefer regenerating GGUF with LayerNorm weights in FP16/FP32 (not quantized)
- Usage:
# Inspect model statistics first cargo run -p bitnet-cli -- inspect --ln-stats model.gguf # Apply corrections if needed (temporary workaround) export BITNET_CORRECTION_POLICY=./model-corrections.yml export BITNET_ALLOW_RUNTIME_CORRECTIONS=1 cargo run -p bitnet-cli -- run --model model.gguf
-
BITNET_RESCALE_GAMMA_ON_LOAD: Experimental - Rescale LayerNorm gamma by √hidden_size during loading- Value: "1" to enable (disabled by default)
- Purpose: Test hypothesis that bitnet.cpp rescales pre-scaled gamma weights on load
- Algorithm: For LayerNorm tensors, applies
gamma' = gamma * sqrt(hidden_size) - Use case: If gamma RMS ≈ 0.018 = 1/√2560, this rescales to RMS ≈ 1.0
- Safety: Disabled in strict mode (
BITNET_STRICT_MODE=1) - Status: Experimental feature for investigating activation magnitude discrepancies
- Usage:
# Enable experimental gamma rescaling export BITNET_RESCALE_GAMMA_ON_LOAD=1 cargo run -p bitnet-cli --features cpu,full-cli -- run \ --model model.gguf \ --tokenizer tokenizer.json \ --prompt "Test" \ --max-tokens 16 # Check rescaling logs (look for "EXPERIMENTAL: Rescaled" messages) RUST_LOG=info BITNET_RESCALE_GAMMA_ON_LOAD=1 \ cargo run -p bitnet-cli --features cpu,full-cli -- run \ --model model.gguf --tokenizer tokenizer.json --prompt "Test"
- Important: This is an experimental diagnostic tool, not a production fix. Always prefer regenerating GGUF with correct LayerNorm weights.
RAYON_NUM_THREADS: Control CPU parallelism (Rayon thread pool)BITNET_CPU_THREADS: CPU thread count for inference (overrides CLI config)BITNET_NUM_THREADS: Alternative thread count setting (used in some crates)
BITNET_DEVICE: Device for inference —cpu,cuda,metal,vulkan(default:cpu)BITNET_LOG_LEVEL: Log level —trace,debug,info,warn,error
These override CLI arguments and config file values:
BITNET_MODEL_PATH: Path to model file (GGUF or SafeTensors)BITNET_MODEL_FORMAT: Model format —gguf,safetensorsBITNET_ARCHITECTURE: Model architecture hint —bitnet,llama,phi, etc.BITNET_HIDDEN_SIZE: Hidden dimension size (e.g.,2560)BITNET_NUM_LAYERS: Number of transformer layersBITNET_NUM_HEADS: Number of attention headsBITNET_VOCAB_SIZE: Vocabulary sizeBITNET_BLOCK_SIZE: Transformer block sizeBITNET_REQUIRE_LAYER_NORM_BIAS: Require LayerNorm bias tensors (1to enable)
BITNET_MAX_TOKENS: Maximum number of tokens to generateBITNET_MAX_NEW_TOKENS: Maximum new tokens (alias forMAX_TOKENSin some contexts)BITNET_MAX_LENGTH: Maximum total sequence lengthBITNET_TEMPERATURE: Sampling temperature (e.g.,0.7)BITNET_TOP_K: Top-k sampling parameter (e.g.,40)BITNET_TOP_P: Top-p (nucleus) sampling parameter (e.g.,0.9)BITNET_BATCH_SIZE: Batch size for inference
BITNET_TOKENIZER: Path to tokenizer file (tokenizer.json)BITNET_OFFLINE: Disable network access for tokenizer downloads (1to enable)BITNET_CACHE_DIR: Cache directory for downloaded tokenizers
BITNET_QUANTIZATION_TYPE: Force quantization type —i2s,tl1,tl2,qk256BITNET_IQ2S_IMPL: I2S implementation selectorBITNET_QUANT_SANITY: Enable quantization sanity checks (1to enable)BITNET_DISABLE_MINIMAL_LOADER: Disable minimal GGUF loader (1to disable)BITNET_PARITY: Enable parity checking mode
BITNET_TRACE_DIR: Directory for tensor activation trace outputBITNET_TRACE_QUANT: Enable quantization tracing (1to enable)BITNET_TRACE_RMS: Enable RMS norm tracingBITNET_TRACE_TIMING: Enable timing trace outputBITNET_DEBUG_LOGITS: Enable logits debugging outputBITNET_DEBUG_ATTN_SCALE: Debug attention scalingBITNET_DEBUG_GQA: Debug grouped-query attentionBITNET_DEBUG_MLP: Debug MLP/FFN layerBITNET_DEBUG_RMSNORM: Debug RMS normalizationBITNET_DEBUG_ROPE: Debug rotary position embeddingsBITNET_DEBUG_TIMEOUT_SECS: Debug timeout in seconds
BITNET_USE_GPU: Enable GPU acceleration (1to enable)BITNET_GPU_MEMORY_LIMIT: GPU memory limit in bytesBITNET_GPU_CACHE: GPU kernel cache directoryBITNET_GPU_DEBUG: Enable GPU debug output (1to enable)BITNET_ENABLE_NPU: Enable NPU backend (1to enable)BITNET_NPU_BACKEND: NPU backend selectionBITNET_ENABLE_ROCM: Enable ROCm backend (1to enable)
BITNET_GPU_FAKE: Override GPU detection for deterministic testing and device-aware fallback validation- Values:
none: Disable GPU detection (test CPU fallback paths)cudaorgpu: Enable fake GPU detection (test GPU code paths without hardware)metal,rocm: Simulate specific GPU backends- Multiple backends:
cuda,rocm(comma-separated)
- Usage with Preflight:
# Test CPU fallback behavior BITNET_GPU_FAKE=none cargo run -p xtask -- preflight # Expected: "✗ GPU: Not available at runtime" # Test GPU path without hardware BITNET_GPU_FAKE=cuda cargo run -p xtask -- preflight # Expected: "✓ GPU: Available"
- Device-Aware Testing:
# Test quantization device selection with fake GPU BITNET_GPU_FAKE=cuda cargo test --no-default-features --features gpu -p bitnet-quantization # Test CPU fallback in GPU-compiled binary BITNET_GPU_FAKE=none cargo test --no-default-features --features gpu -p bitnet-inference
- Values:
These variables prevent "Potemkin passes" (false positives) in performance and integration tests by eliminating mock inference paths and ensuring honest quantized computation:
BITNET_STRICT_MODE=1: Primary strict mode - Prevents ALL mock inference fallbacks and FP32 quantization fallbacks, essential for production deployment and accurate performance measurement- Tier 1 (Development): Debug assertions catch fallbacks in debug builds (panics immediately)
- Tier 2 (Production): Strict mode returns
Err(BitNetError::StrictMode(...))in release builds - Tier 3 (Verification): Receipt validation ensures honest computation claims
- Enables
fail_on_mock,require_quantization,enforce_quantized_inference, andvalidate_performancechecks - Fails fast when mock computation or FP32 fallback is detected
- Validates performance metrics to reject suspicious values (>150 tok/s flagged as potentially mock)
- Required for production deployments to ensure real quantized inference
- Usage:
# Production inference with strict mode BITNET_STRICT_MODE=1 \ cargo run --release -p bitnet-cli --no-default-features --features cpu -- \ infer --model model.gguf --prompt "Test" --max-tokens 16 # If FP32 fallback would occur: # Error: Strict mode: FP32 fallback rejected - qtype=I2S, device=Cpu, layer_dims=[2048, 2048], reason=kernel_unavailable
-
BITNET_STRICT_FAIL_ON_MOCK=1: Fail immediately when mock computation is detected in inference pipeline- Activated automatically when
BITNET_STRICT_MODE=1 - Can be enabled independently for targeted testing
- Validates all tensor operations and kernel calls for mock usage
- Usage:
# Fail on mock detection only BITNET_STRICT_FAIL_ON_MOCK=1 \ cargo test -p bitnet-inference test_inference_real_computation
- Activated automatically when
-
BITNET_STRICT_REQUIRE_QUANTIZATION=1: Require real quantization kernels (I2S/TL1/TL2) to be available and used (Issue #453 - AC3)- Activated automatically when
BITNET_STRICT_MODE=1 - Prevents fallback to FP32 dequantization staging when quantization expected
- Validates device-aware quantization kernel selection
- Rejects FP32 fallback in
QuantizedLinear::forwardwith detailed error - Error includes: quantization type, device, layer dimensions, fallback reason
- Usage:
# Require quantization kernels only BITNET_STRICT_REQUIRE_QUANTIZATION=1 \ cargo test -p bitnet-quantization test_quantization_kernel_integration # If kernel unavailable: # Error: Strict mode: FP32 fallback rejected - qtype=I2S, device=Cpu, # layer_dims=[2048, 2048], reason=kernel_unavailable
- Activated automatically when
-
BITNET_STRICT_VALIDATE_PERFORMANCE=1: Validate performance metrics for realistic values (Issue #453 - AC6)- Activated automatically when
BITNET_STRICT_MODE=1 - Rejects performance metrics from mock computation paths
- Flags unrealistic throughput (>150 tok/s) as suspicious
- Validates
tokens_per_secondagainst baseline thresholds - Usage:
# Validate performance metrics only BITNET_STRICT_VALIDATE_PERFORMANCE=1 \ cargo run -p xtask -- benchmark --model model.gguf --tokens 128 # Then verify receipt cargo run -p xtask -- verify-receipt --validate-performance ci/inference.json
- Activated automatically when
-
BITNET_CI_ENHANCED_STRICT=1: Enhanced strict mode for CI environments (Issue #453 - AC6)- Activates when both
CIenvironment variable and this flag are set - Enables
ci_enhanced_mode,log_all_validations, andfail_fast_on_any_mock - Provides comprehensive logging for CI pipeline debugging
- Ensures production-grade validation in automated testing
- Usage:
# .github/workflows/strict-mode-ci.yml - name: Run strict mode tests env: CI: "1" BITNET_CI_ENHANCED_STRICT: "1" BITNET_STRICT_MODE: "1" run: cargo test --workspace --no-default-features --features cpu
- Activates when both
BITNET_STRICT_TOKENIZERS=1: Forbid mock tokenizer fallbacks in perf/integration tests (includes SPM tokenizer fallbacks)BITNET_STRICT_NO_FAKE_GPU=1: Forbid fake GPU backends in perf/integration tests
For Git metadata capture (used by bitnet-server crate with vergen-gix):
VERGEN_GIT_SHA: Override Git SHA (useful in CI/Docker without .git)VERGEN_GIT_BRANCH: Override Git branchVERGEN_GIT_DESCRIBE: Override Git describe outputVERGEN_IDEMPOTENT: Set to "1" for reproducible builds
# GCC (default)
export CC=gcc CXX=g++
# Clang
export CC=clang CXX=clang++# Linux FFI
export LD_LIBRARY_PATH=target/release
# macOS FFI
export DYLD_LIBRARY_PATH=target/releaseEnvironment variables for bitnet-server (axum HTTP server). All are optional with sensible defaults.
BITNET_SERVER_HOST: Bind address (default:0.0.0.0)BITNET_SERVER_PORT: Port number (default:3000)BITNET_SERVER_WORKERS: Worker thread count (default: auto-detected)BITNET_REQUEST_TIMEOUT: Request timeout in seconds (default:30)BITNET_DEFAULT_MODEL_PATH: Default model file pathBITNET_DEFAULT_TOKENIZER_PATH: Default tokenizer file pathBITNET_DEFAULT_DEVICE: Default inference device —cpu,cuda
BITNET_MAX_CONCURRENT_LOADS: Maximum concurrent model loads (default:2)BITNET_MODEL_CACHE_SIZE: Model cache capacity (default:4)BITNET_MEMORY_LIMIT_GB: Memory limit in GB for model loadingBITNET_MODEL_VALIDATION: Enable model validation on load (true/false)
BITNET_DEVICE_STRATEGY: Device selection strategy —auto,cpu,gpu,hybridBITNET_FALLBACK_ENABLED: Enable CPU fallback when GPU fails (true/false)BITNET_BENCHMARK_ON_STARTUP: Run benchmark on startup for routing decisions (true/false)
BITNET_MAX_BATCH_SIZE: Maximum batch size (default:32)BITNET_BATCH_TIMEOUT_MS: Batch collection timeout in millisecondsBITNET_MAX_CONCURRENT_BATCHES: Maximum concurrent batch executionsBITNET_ADAPTIVE_BATCHING: Enable adaptive batch sizing (true/false)BITNET_QUANTIZATION_AWARE: Enable quantization-aware batching (true/false)
BITNET_MAX_CONCURRENT_REQUESTS: Maximum concurrent requestsBITNET_MAX_REQUESTS_PER_SECOND: Rate limit (requests/second)BITNET_MAX_REQUESTS_PER_MINUTE: Rate limit (requests/minute)BITNET_BACKPRESSURE_THRESHOLD: Backpressure activation thresholdBITNET_CIRCUIT_BREAKER_ENABLED: Enable circuit breaker (true/false)BITNET_PER_IP_RATE_LIMIT: Per-IP rate limit
BITNET_JWT_SECRET: JWT signing secret for authenticationBITNET_REQUIRE_AUTHENTICATION: Require authentication (true/false)BITNET_MAX_PROMPT_LENGTH: Maximum prompt length (characters)BITNET_MAX_TOKENS_PER_REQUEST: Maximum tokens per requestBITNET_ALLOWED_ORIGINS: Allowed CORS origins (comma-separated)BITNET_BLOCKED_IPS: Blocked IP addresses (comma-separated)BITNET_INPUT_SANITIZATION: Enable input sanitization (true/false)BITNET_CONTENT_FILTERING: Enable content filtering (true/false)BITNET_ALLOWED_MODEL_DIRECTORIES: Allowed model directories (comma-separated)
BITNET_PROMETHEUS_ENABLED: Enable Prometheus metrics (true/false)BITNET_OPENTELEMETRY_ENABLED: Enable OpenTelemetry tracing (true/false)BITNET_OTLP_ENDPOINT: OTLP collector endpoint URL
BITNET_SKIP_SLOW_TESTS: Skip slow tests — set to1(used in CI Core)BITNET_RUN_SLOW_TESTS: Explicitly opt-in to slow testsBITNET_FAST_TESTS: Run only fast testsBITNET_RUN_E2E: Enable end-to-end testsBITNET_FORCE_GPU_TESTS: Force GPU tests even without hardwareBITNET_GENERATE_FIXTURES: Generate test fixtures (1to enable)BITNET_QUIET_BACKEND: Suppress backend output in testsBITNET_TEST_ENV: Test environment identifierBITNET_TEST_SCENARIO: Test scenario selectorBITNET_MOCK_DETECTION_THRESHOLD: Threshold for mock computation detectionBITNET_VALIDATION_LEVEL: Validation strictness levelBITNET_VALIDATION_TOLERANCE: Numerical validation tolerance
BITNET_CROSSVAL_ENABLED: Enable cross-validation against C++ referenceBITNET_CROSSVAL_WEIGHTS: Path to cross-validation weightsBITNET_CPP_PATH: Path to bitnet.cpp binary
For GPU development, testing, and mock scenarios:
# Test GPU backend detection
cargo test --no-default-features --features cpu -p bitnet-kernels --no-default-features test_gpu_info_summary
# Mock GPU scenarios for testing
BITNET_GPU_FAKE="cuda" cargo test --no-default-features --features cpu -p bitnet-kernels test_gpu_info_mocked_scenarios
BITNET_GPU_FAKE="metal" cargo run -p xtask -- download-model --dry-run
BITNET_GPU_FAKE="cuda,rocm" cargo test --no-default-features -p bitnet-kernels --features gpuFor reproducible builds and testing:
# Force stable runs with strict mode (no mock fallbacks)
export BITNET_STRICT_MODE=1
export BITNET_DETERMINISTIC=1
export BITNET_SEED=42
# Single-threaded CPU determinism for testing
export RAYON_NUM_THREADS=1
# Production deterministic inference with real quantization
BITNET_STRICT_MODE=1 BITNET_DETERMINISTIC=1 BITNET_SEED=42 \
cargo run -p xtask -- infer --model model.gguf --prompt "Test"
# Local performance builds (not CI)
export RUSTFLAGS="-C target-cpu=native"# Primary strict mode - prevents ALL mock inference fallbacks
BITNET_STRICT_MODE=1 cargo test --no-default-features -p bitnet-inference --features cpu
BITNET_STRICT_MODE=1 cargo run -p xtask -- infer --model model.gguf --prompt "Test"
# Production inference with strict mode (SIMD-optimised CPU, GPU-accelerated alpha)
BITNET_STRICT_MODE=1 cargo run -p xtask -- infer \
--model models/bitnet-model.gguf \
--prompt "Explain quantum computing" \
--deterministic# Fail immediately on mock detection
BITNET_STRICT_FAIL_ON_MOCK=1 \
cargo test -p bitnet-inference --no-default-features --features cpu test_inference_real_computation
# Require real quantization kernels (I2S/TL1/TL2)
BITNET_STRICT_REQUIRE_QUANTIZATION=1 \
cargo test -p bitnet-quantization --no-default-features --features cpu test_quantization_kernel_integration
# Validate performance metrics for realistic values
BITNET_STRICT_VALIDATE_PERFORMANCE=1 \
cargo run -p xtask -- benchmark --model model.gguf --tokens 128
# CI enhanced strict mode (comprehensive validation)
CI=1 BITNET_CI_ENHANCED_STRICT=1 BITNET_STRICT_MODE=1 \
cargo test --workspace --no-default-features --features cpu# CPU baseline with real quantization (no mocks)
BITNET_STRICT_MODE=1 \
cargo bench --no-default-features --features cpu -p bitnet-quantization --bench simd_comparison
# GPU performance with strict hardware validation
BITNET_STRICT_NO_FAKE_GPU=1 \
BITNET_STRICT_MODE=1 \
cargo bench -p bitnet-kernels --bench mixed_precision_bench --features gpu
# Realistic CPU performance baselines (Issue #261 - AC7)
# Expected: SIMD-optimised throughput (hardware-dependent)
BITNET_STRICT_MODE=1 \
BITNET_DETERMINISTIC=1 \
BITNET_SEED=42 \
cargo run -p xtask -- benchmark --features cpu --quantization i2s
# Realistic GPU performance baselines (Issue #261 - AC8)
# Expected: GPU-accelerated (alpha), GPU utilization >80%
BITNET_STRICT_MODE=1 \
BITNET_DETERMINISTIC=1 \
cargo run -p xtask -- benchmark --features gpu --quantization i2s# Strict tokenizer tests (no mock fallbacks)
BITNET_STRICT_TOKENIZERS=1 \
BITNET_STRICT_MODE=1 \
cargo test --features cpu -p bitnet-tokenizers -- --quiet
# Strict GPU kernel tests (real hardware only)
BITNET_STRICT_NO_FAKE_GPU=1 \
BITNET_STRICT_MODE=1 \
cargo test --no-default-features -p bitnet-kernels --features gpu -- --quiet
# Combined strict testing for production validation
BITNET_STRICT_MODE=1 \
BITNET_STRICT_TOKENIZERS=1 \
BITNET_STRICT_NO_FAKE_GPU=1 \
scripts/verify-tests.sh
# Cross-validation with strict mode (Issue #261 - AC9)
# Validates quantization accuracy: I2S ≥99.8%, TL1/TL2 ≥99.6% vs FP32
BITNET_STRICT_MODE=1 \
BITNET_DETERMINISTIC=1 \
BITNET_SEED=42 \
cargo run -p xtask -- crossvalFor server monitoring and system metrics collection:
# Test system metrics collection in server
cargo test --no-default-features -p bitnet-server --features prometheus test_system_metrics_collection
# Run server with system metrics enabled
cargo run -p bitnet-server --features prometheus --bin server &
curl http://localhost:8080/metrics | grep "system_"
# Test memory tracking integration with system metrics
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_memory_tracking_comprehensive
# Validate system metrics in monitoring stack
cd monitoring && docker-compose up -d
curl http://localhost:9090/api/v1/query?query=system_cpu_usage_percentFor more information on specific topics, see:
- GPU Development Guide - GPU-specific environment variables and testing
- Test Suite Guide - Testing configuration and variables
- Performance Benchmarking Guide - Performance testing variables