Feat te itn stretch goals#4
Conversation
Signed-off-by: jaya-TN <seruganti@nvidia.com>
…-999), and native Telugu numerals Signed-off-by: jaya-TN <seruganti@nvidia.com>
Signed-off-by: jaya-TN <seruganti@nvidia.com>
mayuris-00
left a comment
There was a problem hiding this comment.
Core task is signed off - all 28 core cases pass and are correct.
Stretch goals (optional): all three attempted, which is commendable. Main follow-ups: complete and linguistically verify the hundreds (ideally via composition), and add matching tests.
Cleanup: remove the unnecessary inline # comments in the tagger and verbalizer.
There was a problem hiding this comment.
Correctly left empty as a package marker, per Section 5. No action required.
There was a problem hiding this comment.
Correctly left empty as a package marker, per Section 5. No action required.
There was a problem hiding this comment.
Status: Approved.
0 → సున్న matches the spec; native 0 → ౦ correctly added for the stretch goal.
There was a problem hiding this comment.
Status: Approved. All eighteen rows (10–20 and round tens) match the spec spellings exactly. No action required.
There was a problem hiding this comment.
Rows 1–9 match the spec spellings exactly.
Native Telugu numerals (౧–౯) are correctly added as additional rows mapping to the same digits — this satisfies stretch goal 3. No action required.
There was a problem hiding this comment.
Status: Approved; minor cleanup.
-TODO 3 is implemented correctly: pynini.closure(NEMO_NOT_QUOTE, 1) correctly captures the digit value between the quotes, and the token deletion logic is correct.
-Remove the inline comment. # Keep digits between quotes (1+ non-quote chars) just restates the line below it. Please delete it.
There was a problem hiding this comment.
Coverage is complete - all 72 compound values (21–29, 31–39, …, 91–99) are present, space-joined as the brief describes. Correct output.
There was a problem hiding this comment.
Status: Incomplete - primary item to address.
-Coverage gaps: the file enumerates 100–120, then mostly only X00–X10 per hundred (with a fuller 900 block and 999). Values such as 156, 372, 467, 543, 678 are absent and will be silently rejected.
-Linguistic accuracy: the combining form is questionable. Standard Telugu uses నూట before a remainder (101 = నూట ఒకటి, 110 = నూట పది) and the plural వందల(ు) (200 = రెండు వందలు, 201 = రెండు వందల ఒకటి), not వంద ఒకటి.
Recommendation: replace this file with a composed hundreds rule, which resolves both the coverage gaps and the surface-form consistency in one place.
There was a problem hiding this comment.
Matches the spec's checker script and the required root location. No action required.
There was a problem hiding this comment.
Status: Core approved; stretch coverage insufficient.
The 28 core cases match Section 8 exactly and use the correct input~expected format.
Stretch testing is thin: ~6 stretch cases were added for 250+ added entries. Section 9 requires a matching test line for every stretch number added; the missing hundreds (e.g. 156, 372, 543) were never tested, which is why the coverage gaps went unnoticed. Please add representative tests across the hundreds range.
* Add hi_en Code Switched (NVIDIA#415) * Add hi_en Code Switched Signed-off-by: RajanPutty <rputty@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR NVIDIA#415 review: restore ko, dedupe whitelists, expand hi_en tests, add hi_en CI Signed-off-by: Rajan Putty <rputty@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: RajanPutty <rputty@nvidia.com> Signed-off-by: Rajan Putty <rputty@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * Add hi_en Code Switched (NVIDIA#415) * Add hi_en Code Switched Signed-off-by: RajanPutty <rputty@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR NVIDIA#415 review: restore ko, dedupe whitelists, expand hi_en tests, add hi_en CI Signed-off-by: Rajan Putty <rputty@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: RajanPutty <rputty@nvidia.com> Signed-off-by: Rajan Putty <rputty@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * fix jenkins bug Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * separate cache dirs Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * refresh cache dirs Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * refresh cache dir ko Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: RajanPutty <rputty@nvidia.com> Signed-off-by: Rajan Putty <rputty@nvidia.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Co-authored-by: RajanPutty <rputty@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* ar money bugfix and sh tests Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * address review Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* change normalization of alphanum terms Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.