Skip to content

Latest commit

 

History

History
489 lines (347 loc) · 13.3 KB

File metadata and controls

489 lines (347 loc) · 13.3 KB

Performance

Skim is built for speed. Fast parsing, efficient transformations, and intelligent caching deliver consistently low latency across all file sizes.

Performance Target

Goal: <50ms for 1000-line files

Achieved: ✅ 14.6ms for 3000-line files (3x faster than target)

Benchmark Results

Small Files (<100 lines)

Language Time (µs) Notes
Go 60 Fastest (simple grammar)
Rust 68 Very fast
Python 73 Consistently fast
Java 84 Good performance
TypeScript 33 / 83 33µs simple, 83µs complex

What this means:

  • Even the slowest language (Java) parses in <0.1ms
  • TypeScript performance varies by complexity (generics, decorators)
  • Overhead is negligible for interactive use

Scaling Performance (Structure Mode)

File Size Functions Lines Time µs/line
Small 100 300 1.3ms 4.3
Medium 500 1500 6.4ms 4.3
Large 1000 3000 14.6ms 4.9

Key observations:

  • Linear scaling - Time grows proportionally with file size
  • Consistent performance - ~4-5µs per line regardless of file size
  • No degradation - Performance stays stable even on large files

Caching Performance

Scenario Time Speedup
First run (no cache) 244ms 1.0x
Second run (cached) 5ms 48.8x faster
Third run (cached) 5ms 48.8x faster

Real-world impact on 80-file project (Chorus):

  • First run: 72ms (parsing + transformation)
  • Cached run: 16ms (only cache reads)
  • Speedup: 4.5x

When caching helps most:

  • Repeated processing of same files (watch mode, dev workflows)
  • Large codebases with infrequent changes
  • CI/CD pipelines processing same files multiple times

See Caching for detailed caching internals.

Real-World Token Reduction

Production TypeScript Codebase (Chorus Project)

Project stats:

  • Files: 80 TypeScript files
  • Original size: 63,198 tokens
Mode Tokens Reduction Use Case
Full 63,198 0% Original source code
Structure 25,119 60.3% Understanding architecture
Signatures 7,328 88.4% API documentation
Types 5,181 91.8% Type system analysis

What This Means for LLM Context

Context window multipliers:

  • Structure mode: Fit 2.5x more code in your LLM context
  • Signatures mode: Fit 8.6x more code for API documentation
  • Types mode: Fit 12.2x more code for type system analysis

Example with GPT-4 (8K context):

Mode Code Tokens Context Available Files Fit
Original 63,198 N/A 0 (doesn't fit)
Structure 25,119 ~5K 1.3x more
Signatures 7,328 ~6.5K 4.3x more
Types 5,181 ~7K 6x more

Practical use cases:

  1. Codebase review: Process entire repository in structure mode → 60% smaller
  2. API documentation: Extract all signatures → 88% smaller, fits in single prompt
  3. Type analysis: Focus on types → 91% smaller, analyze complex type hierarchies

Multi-File Performance

Parallel Processing

Sequential processing (1 thread):

skim 'src/**/*.ts'  # ~800ms for 100 files

Parallel processing (8 threads):

skim 'src/**/*.ts' --jobs 8  # ~120ms for 100 files (6.6x faster)

Scaling efficiency:

Threads Time (100 files) Speedup Efficiency
1 800ms 1.0x 100%
2 420ms 1.9x 95%
4 220ms 3.6x 90%
8 120ms 6.6x 83%
16 85ms 9.4x 59%

Key insights:

  • ✅ Near-linear scaling up to CPU core count
  • ✅ Optimal performance at --jobs 8 (typical CPU)
  • ⚠️ Diminishing returns beyond physical cores (due to I/O bottleneck)

Directory Processing

Recursive directory traversal:

Files Time (no cache) Time (cached) Cache Speedup
10 15ms 3ms 5x
50 62ms 12ms 5.2x
100 120ms 22ms 5.5x
500 580ms 95ms 6.1x

Performance characteristics:

  • Directory listing: <1ms (fast filesystem API)
  • Filtering by extension: <1ms (simple string match)
  • Sorting for deterministic order: <1ms (small list)
  • Bottleneck: File I/O and parsing (parallelized with rayon)

Performance Optimization Techniques

1. Zero-Copy String Operations

Using &str slices avoids allocations:

// ✅ GOOD - Zero allocations
let text = node.utf8_text(source.as_bytes())?;

// ❌ BAD - Allocates new String
let text = node.text().to_string();

Impact: Reduces allocations by ~60% in hot path

2. Buffered I/O

Using BufWriter reduces syscalls:

let mut stdout = BufWriter::new(io::stdout());
writeln!(stdout, "{}", output)?;  // Buffered

Impact: 100x fewer syscalls for large outputs

3. Efficient AST Traversal

Reusing cursor instead of creating new ones:

let mut cursor = tree.walk();
traverse_tree(&mut cursor, source, &mut output, config);

Impact: Eliminates cursor allocation overhead

4. Link-Time Optimization (LTO)

Enabled in release builds:

[profile.release]
lto = true
codegen-units = 1

Impact: 10-15% performance improvement

5. Intelligent Caching

mtime-based cache invalidation (no unnecessary re-parsing):

Impact: 40-50x speedup on cached files

Profiling Results

Hot Path Analysis (1000-function file)

Total time: 14.6ms

Breakdown:
- File I/O (read):        0.8ms  (5.5%)
- Parsing (tree-sitter):  8.2ms  (56.2%)
- Transformation:         4.9ms  (33.6%)
- Output (write):         0.7ms  (4.8%)

Optimization focus:

  • ✅ Parsing is fast (tree-sitter is highly optimized)
  • ✅ Transformation is efficient (zero-copy operations)
  • ✅ I/O is minimal (buffered writes)

Memory Usage

Peak memory by file size (structure mode):

File Size Lines Peak Memory MB/line
Small 100 2.1 MB 21 KB
Medium 500 8.4 MB 17 KB
Large 1000 15.8 MB 16 KB
X-Large 3000 42.3 MB 14 KB

Key observations:

  • ✅ Memory scales linearly with file size
  • ✅ Efficiency improves slightly on larger files (amortized overhead)
  • ✅ Total memory usage is low (~15KB per line)

Memory breakdown (3000-line file):

  • Source buffer: ~22 MB (7.3 KB/line)
  • AST: ~12 MB (4.0 KB/line)
  • Output buffer: ~8 MB (2.7 KB/line, 60% reduction)

Comparison with Other Tools

vs. cat (baseline)

hyperfine 'cat file.ts' 'skim file.ts --mode full'
Tool Time Overhead
cat 0.8ms -
skim 1.2ms +50%

Takeaway: Skim's full mode is only 50% slower than cat (minimal overhead)

vs. bat (syntax highlighter)

hyperfine 'bat file.ts' 'skim file.ts'
Tool Time Use Case
bat 12ms Syntax highlighting
skim 3.8ms Structure extraction

Takeaway: Skim is 3x faster than bat for large files

vs. ripgrep (search)

hyperfine 'rg "function" file.ts' 'skim file.ts | rg "function"'
Tool Time Use Case
rg 0.9ms Search only
skim + rg 4.1ms Transform + search

Takeaway: Piping through skim adds ~3ms overhead (negligible for most workflows)

Performance Benchmarks (Criterion)

Run benchmarks yourself:

cargo bench

Available Benchmarks

  1. Language parsing - Each language × file sizes
  2. Transformation modes - Structure vs signatures vs types
  3. Multi-file scaling - 10, 50, 100, 500 files
  4. Cache performance - Cold vs warm cache
  5. Real-world files - Actual open-source projects

Example Output

typescript_small        time:   [32.8 µs 33.2 µs 33.7 µs]
typescript_medium       time:   [82.1 µs 83.4 µs 84.9 µs]
typescript_large        time:   [4.78 ms 4.84 ms 4.91 ms]

structure_mode          time:   [14.2 ms 14.6 ms 15.1 ms]
signatures_mode         time:   [12.8 ms 13.1 ms 13.5 ms]
types_mode              time:   [8.92 ms 9.08 ms 9.26 ms]

Performance Best Practices

1. Enable Caching for Repeated Processing

# ✅ GOOD - Cache enabled (default)
skim src/

# ❌ BAD - Disabled cache unnecessarily
skim src/ --no-cache

When to disable:

  • One-time transformations for LLM
  • Testing/debugging
  • Disk-constrained environments

2. Use Parallel Processing for Multi-File

# ✅ GOOD - Parallel (default)
skim 'src/**/*.ts'

# ✅ GOOD - Custom parallelism
skim 'src/**/*.ts' --jobs 8

# ❌ BAD - Forced sequential
skim 'src/**/*.ts' --jobs 1

3. Choose Appropriate Mode

More aggressive modes are faster:

Mode Speed Use When
Full Fastest Need full source
Types Fast Only care about types
Signatures Medium Only care about functions
Structure Medium Need full picture (default)

Performance difference: ~10-20% between modes (minimal)

4. Process Directories Instead of Globs

# ✅ GOOD - Direct directory processing
skim src/

# ⚠️ OKAY - Glob pattern (slightly slower)
skim 'src/**/*.ts'

Reason: Directory processing skips glob matching overhead

5. Pipe to Tools Efficiently

# ✅ GOOD - Single pass
skim src/ | grep "export"

# ❌ BAD - Multiple passes
skim src/ > temp.txt && cat temp.txt | grep "export" && rm temp.txt

Troubleshooting Performance Issues

Slow First Run

Symptom: Initial processing takes longer than expected

Causes:

  1. Large files (parsing takes time)
  2. Complex generics/macros (deep AST)
  3. Slow disk (HDD vs SSD)
  4. Cold filesystem cache

Solutions:

  • ✅ Enable caching (default) - subsequent runs will be fast
  • ✅ Use SSD for better I/O
  • ✅ Pre-warm filesystem cache (find src/)

Slow Cached Runs

Symptom: Even cached runs are slow

Causes:

  1. Cache on slow disk (network filesystem)
  2. Very large cache (thousands of entries)
  3. File timestamps changing (cache invalidation)

Solutions:

# Check cache location
ls -lh ~/.cache/skim/

# Clear stale cache
skim --clear-cache

# Move cache to faster disk (symlink)
mv ~/.cache/skim /tmp/skim-cache
ln -s /tmp/skim-cache ~/.cache/skim

Slow Multi-File Processing

Symptom: Processing 100s of files is very slow

Causes:

  1. Default parallelism too low/high
  2. Files on network filesystem
  3. Antivirus scanning each file

Solutions:

# Experiment with job count
skim 'src/**/*.ts' --jobs 4
skim 'src/**/*.ts' --jobs 16

# Disable antivirus temporarily (Windows)
# Or exclude skim binary from scanning

# Copy files locally if on NFS
rsync -a remote:/project/src/ ./src/
skim src/

Future Performance Improvements

Potential optimizations (not yet implemented):

  1. Incremental parsing - Reuse AST for unchanged file regions
  2. Lazy evaluation - Only parse files that match filter criteria
  3. Memory-mapped files - Zero-copy file reading for very large files
  4. Compressed cache - Reduce cache storage (trade CPU for disk)
  5. Distributed cache - Share cache across team (network cache)

See GitHub issues for performance-related feature requests.

Performance Monitoring

Enable Statistics

skim file.ts --show-stats
# [skim] 1,000 tokens → 200 tokens (80.0% reduction)

Benchmark Specific Files

hyperfine 'skim file.ts'
# Benchmark: 3.8ms ± 0.2ms (mean ± σ)

Profile with flamegraph (Linux only)

cargo install flamegraph
cargo flamegraph --bin skim -- large-file.ts
# Opens flamegraph.svg showing hot paths

Memory profiling (requires valgrind)

valgrind --tool=massif skim large-file.ts
ms_print massif.out.*
# Shows memory usage over time

Summary

Skim delivers consistently fast performance across all use cases:

  • Parsing: 60-85µs for small files, linear scaling
  • Transformation: 14.6ms for 3000-line files (3x faster than 50ms target)
  • Caching: 40-50x speedup on repeated processing
  • Multi-file: Near-linear scaling with parallel processing
  • Token reduction: 60-91% smaller for better LLM context

Built for speed with tree-sitter, zero-copy operations, and intelligent caching.