Skip to content

feat: extend simulated path to multi-token decode#41

Open
hulohot wants to merge 3 commits intomainfrom
feat/llm-sim-multitoken
Open

feat: extend simulated path to multi-token decode#41
hulohot wants to merge 3 commits intomainfrom
feat/llm-sim-multitoken

Conversation

@hulohot
Copy link
Owner

@hulohot hulohot commented Feb 26, 2026

Summary

  • extend python/run_tiny_llm_sim.py simulated path from first-token only to multi-token decode
  • add separate simulated decoding CLI controls (--sim-max-new-tokens, --sim-temperature, --sim-top-k, --sim-top-p, --sim-repetition-penalty, --sim-seed)
  • preserve reference HF generation path and backward-compatible first-token fields
  • keep output explicit that simulated decode is software approximation using HF hidden states, not full RTL/hardware decode
  • update README usage examples and limitations language

Closes #38

Validation

  • python3 -m py_compile python/run_tiny_llm_sim.py
  • python3 -m python.run_tiny_llm_sim --prompt "Hello tiny NPU" --max-new-tokens 4 --temperature 0.9 --top-k 40 --top-p 0.95 --seed 42 --sim-max-new-tokens 4 --sim-temperature 0.0 --sim-seed 123 (fails in this environment due to missing dependency: torch)

Hulobot added 3 commits February 26, 2026 12:54
…den states

- First token: INT8 projection from initial hidden state (original behavior)
- Tokens 2+: Use cached reference hidden states from prior HF forward steps
- _run_reference_generation now returns hidden_states_list
- _run_simulated_generation accepts hidden_states parameter
- Updated note to reflect hybrid approach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM: extend simulated path to multi-token decode (beyond projection-only first token)

1 participant