Skip to content

CLI flags --max-tokens and --temp ignored in interactive mode #7

@GavinPalmer1984

Description

@GavinPalmer1984

Bug

When using dlgo run in interactive mode (piping input), the --max-tokens and --temp flags appear to be ignored.

Reproduction

echo "Explain what a compiler does in one paragraph." | \
  dlgo run tinyllama-1.1b-chat-v1.0.Q4_0.gguf --temp 0 --max-tokens 64 --no-stream

Expected: Temperature 0.0 and generation stops at 64 tokens.

Actual:

  • Banner shows temp=0.70 despite --temp 0
  • Generated 246 tokens despite --max-tokens 64
  Model:     llama
  Params:    22 layers, 2048 dim, 32 heads, vocab 32000
  Context:   2048 tokens
  Backend:   CPU (4 threads)
  Sampling:  temp=0.70 top-k=40 top-p=0.90   <-- should be temp=0.00

>>> A compiler is a software tool that translates ...
  9.1 tok/s | 246 tokens | 37.4s               <-- should stop at 64

Across multiple runs with --max-tokens 128, dlgo generated 51, 82, and 102 tokens — variable counts, none matching the requested 128.

Environment

  • dlgo built from main (commit around 2026-03-19)
  • Linux 6.6.87 (WSL2), Go 1.26.0
  • Model: TinyLlama 1.1B Chat v1.0 Q4_0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions