Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -437,34 +437,34 @@ jobs:
build_artifacts/6_final.v build_artifacts/result.gemparts \
--top-module top

- name: Run GPU simulation with SDF timing (100K ticks)
- name: Run GPU co-simulation with SDF timing (100K ticks)
timeout-minutes: 10
run: |
cargo run --release --features metal --bin gpu_sim -- \
cargo run --release --features metal --bin loom -- cosim \
build_artifacts/6_final.v build_artifacts/result.gemparts \
--config tests/mcu_soc/sim_config.json --top-module top \
--sdf build_artifacts/6_final.sdf --sdf-corner typ \
--max-cycles 100000 \
build_artifacts/6_final.v build_artifacts/result.gemparts \
2>&1 | tee gpu_sim_output.txt
2>&1 | tee cosim_output.txt

- name: Verify UART boot output
run: |
if grep -q "nyaa" gpu_sim_output.txt; then
if grep -q "nyaa" cosim_output.txt; then
echo "MCU SoC booted successfully - UART output detected"
else
echo "ERROR: Expected UART output 'nyaa' not found"
echo "--- Last 50 lines of simulation output ---"
tail -50 gpu_sim_output.txt
tail -50 cosim_output.txt
exit 1
fi

- name: Report simulation results
if: always()
run: |
{
echo "## MCU SoC Metal Simulation (with SDF timing)"
echo "## MCU SoC Metal Co-simulation (with SDF timing)"
echo "\`\`\`"
tail -20 gpu_sim_output.txt
tail -20 cosim_output.txt
echo "\`\`\`"
} >> "$GITHUB_STEP_SUMMARY"

Expand All @@ -474,7 +474,7 @@ jobs:
with:
name: mcu-soc-results
path: |
gpu_sim_output.txt
cosim_output.txt
tests/mcu_soc/sim_config.json

# Build documentation (cargo doc + mdbook) and deploy to GitHub Pages
Expand Down
4 changes: 0 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,6 @@ required-features = ["cuda"]
name = "metal_test"
required-features = ["metal"]

[[bin]]
name = "gpu_sim"
required-features = ["metal"]

[dev-dependencies]
criterion = "0.5"

Expand Down
77 changes: 73 additions & 4 deletions docs/timing-violations.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ Setup and hold violations occur when data arrives too late (setup) or too early
design.gv design.gemparts \
--config testbench.json \
--sdf design.sdf \
--sdf-corner typ
--sdf-corner typ \
--clock-uncertainty-ps 500
```

### CLI Flags Reference
Expand All @@ -61,6 +62,7 @@ Setup and hold violations occur when data arrives too late (setup) or too early
| `--sdf <path>` | all | Path to SDF file with back-annotated delays |
| `--sdf-corner <min\|typ\|max>` | all | Which SDF corner to use (default: `typ`) |
| `--sdf-debug` | all | Print unmatched SDF instances for debugging |
| `--clock-uncertainty-ps <ps>` | `gpu_sim` | Clock uncertainty/jitter margin in picoseconds (default: 0). Added to both setup and hold constraints, tightening the timing window. |
| `--enable-timing` | `cuda_test` | Enable timing analysis (arrival + violation checks) |
| `--timing-clock-period <ps>` | `cuda_test` | Clock period in picoseconds (default: 1000) |
| `--timing-report-violations` | `cuda_test` | Report all violations, not just summary |
Expand Down Expand Up @@ -113,14 +115,52 @@ cargo run -r --features metal --bin metal_test -- \
| **hold** | DFF hold time constraint from SDF/Liberty (picoseconds) |
| **slack** | `arrival - hold`. Negative = violation amount |

### Summary Statistics
### Timing Analysis Summary

At the end of simulation, `gpu_sim` prints a timing analysis summary with standard signoff metrics:

```
=== Timing Analysis ===
Clock period: 1200ps
Clock uncertainty: 500ps
Setup violations: 12 (8 unique endpoints)
Hold violations: 3 (2 unique endpoints)
WNS (setup): -450ps
TNS (setup): -2340ps
WHS (hold): -120ps
THS (hold): -280ps
TIMING: VIOLATIONS DETECTED
```

| Metric | Meaning |
|--------|---------|
| **WNS** (Worst Negative Slack) | The most negative setup slack across the entire simulation. Indicates the single worst timing path. |
| **TNS** (Total Negative Slack) | Sum of all negative setup slacks. Indicates overall timing health — a large TNS means many paths are failing. |
| **WHS** (Worst Hold Slack) | The most negative hold slack. |
| **THS** (Total Hold Slack) | Sum of all negative hold slacks. |
| **Unique endpoints** | Number of distinct state words (each covering 32 DFF data inputs) that had at least one violation. |

At the end of simulation, GEM prints totals:
If no violations are detected, the summary shows:

```
Simulation complete: 1000 cycles, 5 setup violations, 0 hold violations
=== Timing Analysis ===
Clock period: 1200ps
Setup violations: 0 (0 unique endpoints)
Hold violations: 0 (0 unique endpoints)
TIMING: PASSED
```

**Event buffer overflow**: The GPU-side event buffer holds up to 1024 events per batch. If a design has very many violations per batch, some events may be dropped. When this happens, a warning is printed and WNS/TNS metrics may be approximate (but never optimistic — missed events only mean the true TNS is worse).

### Clock Uncertainty

The `--clock-uncertainty-ps` flag models clock jitter and skew by adding a margin to both setup and hold constraints. This effectively tightens the timing window:

- **Setup check**: `arrival + (setup + uncertainty) > clock_period` triggers a violation
- **Hold check**: `arrival < (hold + uncertainty)` triggers a violation

Use this when your design has known clock tree uncertainty from P&R reports. A typical value for SKY130 is 100-500ps depending on clock tree quality.

## Tracing Violations to Source Signals

When you see a violation on a specific word, follow this workflow to identify the offending signals and their logic cone.
Expand Down Expand Up @@ -186,6 +226,35 @@ If a violation is reported but you suspect it's a false positive from the approx
1. **Use `timing_sim_cpu`** for per-signal accuracy (see [Detailed CPU Timing Analysis](#4-detailed-cpu-timing-analysis) above).
2. **Timing-aware bit packing** groups signals with similar arrival times into the same thread, reducing the approximation error. See `docs/timing-simulation.md` § "Timing-Aware Bit Packing" for details.

## Multi-Corner Analysis

A single SDF corner only catches one class of violations: the **max** (slow) corner reveals setup violations, while the **min** (fast) corner reveals hold violations. For complete timing signoff, run both corners:

```bash
# Max corner: catches setup violations (slow paths)
cargo run -r --features metal --bin gpu_sim -- \
design.gv design.gemparts \
--config testbench.json \
--sdf design.sdf --sdf-corner max \
--clock-uncertainty-ps 500

# Min corner: catches hold violations (fast paths)
cargo run -r --features metal --bin gpu_sim -- \
design.gv design.gemparts \
--config testbench.json \
--sdf design.sdf --sdf-corner min \
--clock-uncertainty-ps 500
```

Both runs use the same `.gemparts` file (compilation is cached), so the overhead is just the simulation time. In CI, add both as separate steps:

```yaml
- name: Timing check (max corner - setup)
run: cargo run -r --features metal --bin gpu_sim -- ... --sdf-corner max
- name: Timing check (min corner - hold)
run: cargo run -r --features metal --bin gpu_sim -- ... --sdf-corner min
```

## Common Scenarios

**Setup violations on many words, same cycle**: The clock period is likely too tight for the design. The combinational logic depth exceeds what can settle in one clock period. Try increasing the clock period.
Expand Down
89 changes: 68 additions & 21 deletions src/bin/loom.rs
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,10 @@ struct CosimArgs {
/// Enable SDF debug output.
#[clap(long)]
sdf_debug: bool,

/// Clock uncertainty in picoseconds (added to setup/hold constraints to model jitter).
#[clap(long, default_value = "0")]
clock_uncertainty_ps: u64,
}

/// Invoke the mt-kahypar partitioner.
Expand Down Expand Up @@ -360,11 +364,12 @@ fn cmd_sim(args: SimArgs) {
sdf: args.sdf.clone(),
sdf_corner: args.sdf_corner.clone(),
sdf_debug: args.sdf_debug,
clock_period_ps: None,
};

#[allow(unused_mut)]
let mut design = setup::load_design(&design_args);
let timing_constraints = setup::build_timing_constraints(&design.script);
let timing_constraints = setup::build_timing_constraints(&design.script, 0);

// Parse input VCD
let input_vcd = std::fs::File::open(&args.input_vcd).unwrap();
Expand Down Expand Up @@ -1038,26 +1043,68 @@ fn cmd_cosim(args: CosimArgs) {

#[cfg(feature = "metal")]
{
// The co-simulation logic is complex (SPI flash, UART, batch GPU encoding).
// For now, delegate to gpu_sim binary. Full integration planned for a future release.
eprintln!(
"loom cosim is not yet fully integrated.\n\
\n\
The co-simulation logic (SPI flash model, UART, GPU batch encoding)\n\
is currently available in the gpu_sim binary:\n\
\n cargo run -r --features metal --bin gpu_sim -- \\\n\
{:?} {:?} --config {:?} \\\n\
--num-blocks {}{}{}\n",
args.netlist_verilog,
args.gemparts,
args.config,
args.num_blocks,
args.max_cycles
.map_or(String::new(), |c| format!(" --max-cycles {}", c)),
args.sdf
use gem::sim::cosim_metal::CosimOpts;
use gem::sim::setup;
use gem::testbench::TestbenchConfig;

// Load testbench config
let file = std::fs::File::open(&args.config).expect("Failed to open config file");
let reader = std::io::BufReader::new(file);
let config: TestbenchConfig =
serde_json::from_reader(reader).expect("Failed to parse config JSON");
clilog::info!("Loaded testbench config: {:?}", config);

// Determine clock period for SDF loading
let clock_period_ps = args
.clock_period
.or(config.clock_period_ps)
.or(config.timing.as_ref().map(|t| t.clock_period_ps));

// Determine SDF path: CLI --sdf takes priority, then config.timing.sdf_file
let sdf = args.sdf.clone().or_else(|| {
config
.timing
.as_ref()
.map_or(String::new(), |s| format!(" --sdf {:?}", s)),
);
std::process::exit(1);
.map(|t| std::path::PathBuf::from(&t.sdf_file))
});
let sdf_corner = if args.sdf.is_some() {
args.sdf_corner.clone()
} else if let Some(ref t) = config.timing {
t.sdf_corner.clone()
} else {
"typ".to_string()
};
let sdf_debug = args.sdf_debug;

let design_args = DesignArgs {
netlist_verilog: args.netlist_verilog.clone(),
top_module: args.top_module.clone(),
level_split: args.level_split.clone(),
gemparts: args.gemparts.clone(),
num_blocks: args.num_blocks,
json_path: None,
sdf,
sdf_corner,
sdf_debug,
clock_period_ps,
};

let mut design = setup::load_design(&design_args);
let timing_constraints =
setup::build_timing_constraints(&design.script, args.clock_uncertainty_ps as u16);

let opts = CosimOpts {
max_cycles: args.max_cycles,
num_blocks: args.num_blocks,
flash_verbose: args.flash_verbose,
check_with_cpu: args.check_with_cpu,
gpu_profile: args.gpu_profile,
clock_period: args.clock_period,
clock_uncertainty_ps: args.clock_uncertainty_ps,
};

let result =
gem::sim::cosim_metal::run_cosim(&mut design, &config, &opts, &timing_constraints);
std::process::exit(if result.passed { 0 } else { 1 });
}
}
Loading
Loading