Skip to content

Commit 706c594

Browse files
committed
server: add descriptions for rope/yarn settings
The `rope_scaling_type` and `yarn_*` server settings had no `description=`, so they rendered blank in `python -m llama_cpp.server --help` and in the auto-generated server settings reference (cli.py forwards `field.description` to argparse `help=`). Fill in accurate descriptions. Addresses #635
1 parent 3850aff commit 706c594

1 file changed

Lines changed: 19 additions & 6 deletions

File tree

llama_cpp/server/settings.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -84,17 +84,30 @@ class ModelSettings(BaseSettings):
8484
description="The number of threads to use when batch processing. Use -1 for max cpu threads",
8585
)
8686
rope_scaling_type: int = Field(
87-
default=llama_cpp.LLAMA_ROPE_SCALING_TYPE_UNSPECIFIED
87+
default=llama_cpp.LLAMA_ROPE_SCALING_TYPE_UNSPECIFIED,
88+
description="RoPE frequency scaling method. Defaults to the type defined by the model (unspecified).",
8889
)
8990
rope_freq_base: float = Field(default=0.0, description="RoPE base frequency")
9091
rope_freq_scale: float = Field(
9192
default=0.0, description="RoPE frequency scaling factor"
9293
)
93-
yarn_ext_factor: float = Field(default=-1.0)
94-
yarn_attn_factor: float = Field(default=1.0)
95-
yarn_beta_fast: float = Field(default=32.0)
96-
yarn_beta_slow: float = Field(default=1.0)
97-
yarn_orig_ctx: int = Field(default=0)
94+
yarn_ext_factor: float = Field(
95+
default=-1.0,
96+
description="YaRN extrapolation mix factor. -1.0 uses the value from the model.",
97+
)
98+
yarn_attn_factor: float = Field(
99+
default=1.0, description="YaRN magnitude scaling factor for attention."
100+
)
101+
yarn_beta_fast: float = Field(
102+
default=32.0, description="YaRN low correction dim (beta fast)."
103+
)
104+
yarn_beta_slow: float = Field(
105+
default=1.0, description="YaRN high correction dim (beta slow)."
106+
)
107+
yarn_orig_ctx: int = Field(
108+
default=0,
109+
description="YaRN original context size of the model. 0 uses the model's training context size.",
110+
)
98111
mul_mat_q: bool = Field(
99112
default=True, description="if true, use experimental mul_mat_q kernels"
100113
)

0 commit comments

Comments
 (0)