1818def apply_llm_prompt_strategy (
1919 text : str ,
2020 mapping : Mapping [str , List [str ]],
21- model_name : str = "google/gemma-2 -1b-it" ,
21+ model_name : str = "google/gemma-3 -1b-it" ,
2222 device : Optional [str ] = None ,
2323 max_length : int = 512 ,
2424 temperature : float = 0.0 ,
@@ -37,7 +37,7 @@ def apply_llm_prompt_strategy(
3737 mapping (Mapping[str, List[str]]): A mapping from original characters to
3838 their possible homoglyph replacements.
3939 model_name (str): The HuggingFace model name to load.
40- Defaults to "google/gemma-2 -1b-it".
40+ Defaults to "google/gemma-3 -1b-it".
4141 device (Optional[str]): Device to run the model on ('cuda', 'cpu', etc.).
4242 Defaults to cuda if available, otherwise cpu.
4343 max_length (int): Maximum length of text segments to process. Longer text will be split.
@@ -118,11 +118,11 @@ def apply_llm_prompt_strategy(
118118For example: { homoglyph_info }
119119
120120Your task is to read the provided text which may contain homoglyphs (visually similar characters from different scripts)
121- and produce a normalized version with standard Latin characters.
121+ and produce a normalized version with the correct characters.
122122
123123Important instructions:
1241241. Identify any homoglyphs or suspicious characters that might be replacements
125- 2. Replace them with their standard Latin equivalents
125+ 2. Replace them with their correct characters (which are often in the same alphabet/script as the surrounding text)
1261263. Preserve the exact wording, spacing, and punctuation of the original text
1271274. If you're uncertain about a character, keep it as is
1281285. Return ONLY the normalized text without any explanations or additional comments
0 commit comments