GPU Solver Architecture

!!! note "Historical document" This document describes an early GPU solver architecture. The codebase has since been restructured: DC analysis is in analysis/dc.py, transient analysis is in analysis/transient/, and Jacobians are now computed analytically via OpenVAF. The design principles and trade-offs described here remain relevant.

This document describes the GPU-native solver architecture for VAJAX, including DC operating point and transient analysis implementations.

Overview

VAJAX provides two complementary GPU-native solvers:

DC Solver (analysis/dc.py) - Computes steady-state operating point
Transient Solver (analysis/transient/) - Time-domain simulation with adaptive BDF2

Jacobians are computed analytically via OpenVAF-compiled Verilog-A models.

Key Design Decisions

Why Transient Instead of DC for Digital Circuits?

For digital MOSFET circuits (like the C6288 multiplier), transient analysis with icmode='uic' is preferred over DC operating point analysis. This matches the approach used by VACASK and other production simulators.

The problem with DC analysis:

Digital circuits have many possible stable states (high/low combinations)
Newton-Raphson can converge to wrong states or oscillate between them
Floating gate nodes and feedback paths make convergence difficult
Source stepping and GMIN stepping help but don't always work

Why transient works better:

Natural settling behavior from initial conditions
No ambiguity about which state to converge to
The circuit "finds" its own solution through time evolution
Capacitances (intrinsic or explicit) provide natural damping

VACASK Example:

// From c6288.sim - VACASK uses transient, not DC
analysis tranmul tran stop=2n step=2p icmode="uic"
// analysis op1 op  <- DC is commented out

Initial Condition Modes (icmode)

The transient solver supports two initial condition modes:

icmode='op' (default): Compute DC operating point first, then run transient
- Good for analog circuits where DC solution is well-defined
- Can fail for digital circuits with multiple stable states
icmode='uic': Use Initial Conditions directly (zeros + supply voltages)
- Skip DC computation entirely
- Let circuit settle through transient simulation
- Preferred for digital logic circuits
- Matches VACASK's behavior for benchmarks

Solver Architecture

Data Flow

MNASystem (devices)
      │
      ▼
build_device_groups()  ◄── Groups devices by type (MOSFET, R, V, C)
      │
      ▼
VectorizedDeviceGroup  ◄── Pre-computed JAX arrays per device type
      │
      ├──────────────────┐
      ▼                  ▼
build_gpu_residual_fn()  build_transient_circuit_data_fast()
      │                  │
      ▼                  ▼
  residual_fn()      TransientCircuitData
      │                  │
      ▼                  ▼
sparsejac.jacrev()   build_transient_residual_fn()
      │                  │
      ▼                  ▼
Sparse Jacobian      residual_fn(V_curr, V_prev, dt)
(BCOO format)             │
      │                   ▼
      ▼              sparsejac.jacrev()
Newton-Raphson            │
Iteration                 ▼
      │              Newton-Raphson per timestep
      ▼                   │
Solution                  ▼
                    Time-domain solution

Key Components

1. VectorizedDeviceGroup

Groups devices of the same type with pre-computed JAX arrays:

class VectorizedDeviceGroup:
    device_type: DeviceType       # MOSFET, RESISTOR, VSOURCE, etc.
    device_names: List[str]
    node_indices: Array           # Shape: (n_devices, n_terminals)
    params: Dict[str, Array]      # Device parameters as batched arrays

This enables vectorized device evaluation instead of per-device Python loops.

2. GPU Residual Function

The residual function computes KCL violations at each node:

def gpu_residual_fn(V: Array) -> Array:
    """Compute f(V) where f=0 at solution.

    Returns:
        residual: Array of shape (n_nodes - 1,) - current imbalance at each node
    """
    # For each device type, compute currents vectorized:
    # - Voltage sources: I = G_big * (V_actual - V_target)
    # - MOSFETs: Ids = f(Vgs, Vds, W, L, model_params)
    # - Resistors: I = G * (Vp - Vn)
    # - Capacitors: I = C/dt * (V - V_prev)  [transient only]

    # Sum contributions at each node
    residual = sum_at_nodes(device_currents)

    # Add GMIN to ground (numerical stability)
    residual += gmin * V

    return residual

3. Sparse Jacobian via sparsejac

Instead of manually computing Jacobian entries, we use automatic differentiation:

# Create sparsity pattern (which entries are non-zero)
sparsity = jsparse.BCOO((data, indices), shape=(n, n))

# Create sparse Jacobian function via graph coloring
jacobian_fn = sparsejac.jacrev(residual_fn, sparsity=sparsity)

# Evaluate at a point
J_sparse = jacobian_fn(V)  # Returns BCOO sparse matrix

Benefits:

No need for analytical derivatives
Correct derivatives even for complex models
Efficient evaluation via graph coloring

Key insight: The sparsity pattern comes from circuit topology - a device only contributes to Jacobian entries connecting its terminal nodes.

4. Newton-Raphson Iteration

Both solvers use Newton-Raphson with voltage limiting:

for iteration in range(max_iterations):
    f = residual_fn(V)
    if max(abs(f)) < abstol:
        break  # Converged

    J = jacobian_fn(V)
    delta_V = solve(J, -f)  # Linear solve

    # Voltage limiting (prevent overshooting)
    max_step = 0.5 * vdd
    if max(abs(delta_V)) > max_step:
        delta_V *= max_step / max(abs(delta_V))

    V = V + delta_V
    V = clip(V, -2*vdd, 2*vdd)  # Safety clamp

MOSFET Models

Transient Solver: Level-1 Model

The transient solver uses a simplified level-1 MOSFET model:

beta = kp * W / L
Vgst = Vgs - Vth0

# Smooth subthreshold (avoids discontinuity at Vth)
Vgst_eff = log1p(exp(alpha * Vgst)) / alpha

# Saturation with soft clipping
Vdsat = max(Vgst_eff, epsilon)
Vds_eff = Vdsat * tanh(abs(Vds) / Vdsat)

# Drain current
Ids = beta * Vgst_eff * Vds_eff * (1 + lambda * abs(Vds))

DC Solver: BSIM-like Model

The DC solver uses a more sophisticated BSIM-like model from mosfet_simple.py:

Body effect (gamma, phiB)
Velocity saturation (vsat)
Subthreshold conduction (n_sub)
Short channel effects (theta, a0)
Temperature dependence

Performance Characteristics

JIT Compilation

Both solvers benefit from JAX's JIT compilation:

First timestep (includes JIT): ~1.5s
Subsequent timesteps: ~0.002s (750x faster)

Tip: For benchmarking, always exclude the first iteration or use pre-warming.

CPU vs GPU

Circuit Size	CPU Time	GPU Time	Speedup
Small (< 100 nodes)	Faster	Slower	0.5x
Medium (1K nodes)	Similar	Similar	1x
Large (5K+ nodes)	Slower	Faster	3-10x

GPU acceleration becomes beneficial at ~1000+ nodes due to data transfer overhead.

Memory Usage

Sparse Jacobian memory scales with O(devices * terminals²) instead of O(nodes²):

C6288 (5123 nodes, 10112 MOSFETs):
- Dense Jacobian: ~200 MB
- Sparse Jacobian: ~1.3 MB
- Savings: 150x

Usage Examples

DC Operating Point

from vajax import CircuitEngine

engine = CircuitEngine("circuit.sim")
engine.parse()

# DC operating point is computed automatically before transient
# Or run DC explicitly:
# engine.run_dcinc()

Transient Analysis

from vajax import CircuitEngine

engine = CircuitEngine("circuit.sim")
engine.parse()

# For digital circuits - use icmode='uic' to skip DC
engine.prepare(t_stop=2e-9, dt=2e-12)
result = engine.run_transient()

# For analog circuits with sparse solver
engine.prepare(t_stop=1e-6, dt=1e-9, use_sparse=True)
result = engine.run_transient()

Current State

Since this document was written, major improvements have been made:

Adaptive timestepping: BDF2 with LTE control is implemented
Unified device models: All devices use OpenVAF-compiled Verilog-A (PSP103, etc.)
GPU-resident time loops: Transient uses lax.scan for full GPU execution
Sparse solver: BCOO/BCSR with spsolve for large circuits
AC and noise analysis: Frequency-domain analyses are available

References

VACASK simulator behavior for digital circuit analysis
sparsejac: https://github.com/mfschubert/sparsejac
JAX documentation: https://jax.readthedocs.io
Newton-Raphson for circuit simulation (Nagel, 1975)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Solver Architecture

Overview

Key Design Decisions

Why Transient Instead of DC for Digital Circuits?

Initial Condition Modes (icmode)

Solver Architecture

Data Flow

Key Components

1. VectorizedDeviceGroup

2. GPU Residual Function

3. Sparse Jacobian via sparsejac

4. Newton-Raphson Iteration

MOSFET Models

Transient Solver: Level-1 Model

DC Solver: BSIM-like Model

Performance Characteristics

JIT Compilation

CPU vs GPU

Memory Usage

Usage Examples

DC Operating Point

Transient Analysis

Current State

References

FilesExpand file tree

gpu_solver_architecture.md

Latest commit

History

gpu_solver_architecture.md

File metadata and controls

GPU Solver Architecture

Overview

Key Design Decisions

Why Transient Instead of DC for Digital Circuits?

Initial Condition Modes (icmode)

Solver Architecture

Data Flow

Key Components

1. VectorizedDeviceGroup

2. GPU Residual Function

3. Sparse Jacobian via sparsejac

4. Newton-Raphson Iteration

MOSFET Models

Transient Solver: Level-1 Model

DC Solver: BSIM-like Model

Performance Characteristics

JIT Compilation

CPU vs GPU

Memory Usage

Usage Examples

DC Operating Point

Transient Analysis

Current State

References