[0.1.3] - 2025-06-06
- Configurable quantization and device offloading when loading local models.
- Past-key-value caching in provider to reuse model states across tokens.
[0.1.2] - 2025-06-05
- Multitoken Generation
- Structure to recursively add probabilities