Conversation
Co-authored-by: myieye <12587509+myieye@users.noreply.github.com>
Co-authored-by: myieye <12587509+myieye@users.noreply.github.com>
…ve bulk operations Co-authored-by: myieye <12587509+myieye@users.noreply.github.com>
…ormalization Co-authored-by: myieye <12587509+myieye@users.noreply.github.com>
Co-authored-by: myieye <12587509+myieye@users.noreply.github.com>
- Replace manual test fixtures with reflection-based NfcTestData helpers - Add NormalizationAssert with AssertAllNfc/AssertAllNfd for object graph validation - Ensure all normalizable properties are populated with NFC test data - Add coverage tests that verify all IMiniLcmWriteApi methods are categorized - Add test that verifies every normalizing method has a corresponding test - Remove old WriteNormalizationCoverageTests (merged into single file) Key improvements: - Test data creation methods populate ALL normalizable properties - Assertion helpers use reflection to walk entire object graph - Coverage tests fail if new methods are added without being categorized - Tests verify pre-condition (NFC) and post-condition (NFD) explicitly https://claude.ai/code/session_01MYAMRqK5NJmFkQ7a7tVkTd
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThe PR introduces Unicode NFD string normalization for write operations in MiniLCM by adding a new normalization wrapper, updating constructor signatures across multiple services and hubs to include write-normalization dependencies, refactoring the base hub class from primary constructor pattern, and providing comprehensive test coverage and documentation for the normalization infrastructure. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
- Remove redundant MethodsThatNormalizeText list - calculate dynamically - Remove Wrapper_ImplementsAllWriteMethods - compiler enforces this - Detect JsonPatch overloads via UpdateObjectInput<T> parameter reflection - Count overloads properly to ensure all non-excluded methods have tests - Add ExcludedMethodNames_AllExistOnInterface to catch typos/stale entries - Add NormalizationAssertTests to verify assertion helpers detect issues: - AssertAllNfc throws when given NFD data - AssertAllNfd throws when given NFC data - Empty MultiStrings are detected - Nested NFD data in object graph is detected https://claude.ai/code/session_01MYAMRqK5NJmFkQ7a7tVkTd
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs`:
- Around line 66-70: The CRDT API wrapper order is inconsistent with
MiniLcmApiHubBase: replace the current nested creation that inserts
readNormalization between writeNormalization and validation and instead wrap
crdtApi so writeNormalization directly wraps validation (i.e., use
writeNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi)))
to match MiniLcmApiHubBase; keep fwdataApi as-is (readNormalization +
validation). Update the line that assigns crdtApi which currently calls
writeNormalizationWrapperFactory.Create(readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi))).
In `@backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs`:
- Line 26: The wrapper factories are applied in reverse by WrapWith, causing
validation to run before write normalization; update the factory list passed to
WrapWith when constructing _wrappedApi so the runtime order becomes Client →
Write Normalization → Read Normalization → Validation → Core API by using the
factory sequence [notificationWrapperFactory, validationWrapperFactory,
readNormalizationWrapperFactory, writeNormalizationWrapperFactory] (so after
WrapWith reversal it yields the documented write→read→validation order); ensure
this change in IMiniLcmApi _wrappedApi construction mirrors the ordering used in
MiniLcmApiHubBase.
In `@backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs`:
- Around line 536-557: The test exclusion for JsonPatch overloads leaves
NormalizePatch untested; remove or adjust the IsJsonPatchOverload filter in
GetMethodsThatNeedTests so UpdateObjectInput<> overloads are included, and add a
test (e.g., UpdateWritingSystem) that sends a JsonPatchDocument containing
non-normalized strings through the same path the
MiniLcmWriteApiNormalizationWrapper uses (or invoke the wrapper directly) and
asserts the underlying API receives normalized values; reference
UpdateObjectInput<>, IsJsonPatchOverload, GetMethodsThatNeedTests,
MiniLcmWriteApiNormalizationWrapper, NormalizePatch, and the UpdateWritingSystem
test name when implementing the change.
In `@backend/FwLite/MiniLcm/Normalization/README.md`:
- Line 189: Replace the placeholder issue link "Issue:
https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number]" in the
README by updating the bracketed token to the real GitHub issue number (or the
correct URL) so the reference resolves; locate the exact line containing that
string in Normalization/README.md and substitute [issue-number] with the actual
numeric ID (or remove the line if no issue exists).
🧹 Nitpick comments (4)
backend/FwLite/MiniLcm/Normalization/README.md (2)
24-26: Replace ASCII diagram with Mermaid.As per coding guidelines, use Mermaid diagrams instead of ASCII art for flowcharts and architecture documentation.
📊 Proposed Mermaid diagram
-``` -Client → Write Normalization → Read Normalization → Validation → Core API → Database -``` +```mermaid +flowchart LR + Client --> WN[Write Normalization] + WN --> RN[Read Normalization] + RN --> V[Validation] + V --> CA[Core API] + CA --> DB[Database] +```
186-189: Convert bare URLs to markdown links.Static analysis flagged bare URLs. Consider using proper markdown link syntax for consistency.
📝 Proposed fix
## References -- Unicode Normalization: https://unicode.org/reports/tr15/ -- Issue: https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number] +- [Unicode Normalization (TR15)](https://unicode.org/reports/tr15/) +- [Related Issue](https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number]) - FwLite Architecture: `backend/FwLite/AGENTS.md`backend/FwLite/MiniLcm/Normalization/StringNormalizer.cs (1)
53-65: Inconsistent handling of null RichString values in RichMultiString.When
Normalize(value)returns null (which happens when the input RichString is null per lines 41), the entry is silently dropped from the result. This differs fromNormalize(MultiString)which preserves all keys including those with empty/null values.Consider whether this asymmetry is intentional. If null RichStrings should be preserved, the logic should be updated.
🔄 Proposed fix to preserve null entries
public static RichMultiString Normalize(RichMultiString richMultiString) { var normalized = new RichMultiString(richMultiString.Count); foreach (var (key, value) in richMultiString) { var normalizedRichString = Normalize(value); - if (normalizedRichString is not null) - { - normalized[key] = normalizedRichString; - } + // Preserve all keys, even if the value is null + normalized[key] = normalizedRichString; } return normalized; }backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs (1)
928-1171: Consider handlingIEnumerable<string>collections.Currently only
string[]is validated;List<string>/IReadOnlyList<string>would be skipped. If such fields exist now or later, normalization checks could miss them.♻️ Possible enhancement
- // Check collections of model objects - else if (value is IEnumerable enumerable && value is not string) + // Check collections of strings + else if (value is IEnumerable<string> stringEnumerable) + { + int index = 0; + foreach (var item in stringEnumerable) + { + CheckString(item, $"{propPath}[{index++}]", expectNfc, issues); + } + } + // Check collections of model objects + else if (value is IEnumerable enumerable && value is not string) { int index = 0; foreach (var item in enumerable)
| // Apply normalization to CRDT API (needs NFD normalization) | ||
| crdtApi = writeNormalizationWrapperFactory.Create(readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi))); | ||
|
|
||
| // FwData already normalizes strings internally via LCModel, so only apply read normalization and validation | ||
| fwdataApi = readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(fwdataApi)); |
There was a problem hiding this comment.
Wrapper order inconsistency for CRDT API.
The CRDT API wrapper order here differs from MiniLcmApiHubBase:
Here (line 67):
writeNormalization(readNormalization(validation(crdtApi)))
→ Call flow: Client → WriteNorm → ReadNorm → Validation → API
In MiniLcmApiHubBase (lines 14-20):
writeNormalization(validation(api))
→ Call flow: Client → WriteNorm → Validation → API
The hub applies write normalization before validation, while this sync service applies validation before write normalization. If validators expect normalized data, this inconsistency could cause validation to behave differently between the two code paths.
🔧 Proposed fix to align wrapper order
// Apply normalization to CRDT API (needs NFD normalization)
- crdtApi = writeNormalizationWrapperFactory.Create(readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi)));
+ crdtApi = writeNormalizationWrapperFactory.Create(validationWrapperFactory.Create(readNormalizationWrapperFactory.Create(crdtApi)));
// FwData already normalizes strings internally via LCModel, so only apply read normalization and validation
- fwdataApi = readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(fwdataApi));
+ fwdataApi = validationWrapperFactory.Create(readNormalizationWrapperFactory.Create(fwdataApi));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Apply normalization to CRDT API (needs NFD normalization) | |
| crdtApi = writeNormalizationWrapperFactory.Create(readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi))); | |
| // FwData already normalizes strings internally via LCModel, so only apply read normalization and validation | |
| fwdataApi = readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(fwdataApi)); | |
| // Apply normalization to CRDT API (needs NFD normalization) | |
| crdtApi = writeNormalizationWrapperFactory.Create(validationWrapperFactory.Create(readNormalizationWrapperFactory.Create(crdtApi))); | |
| // FwData already normalizes strings internally via LCModel, so only apply read normalization and validation | |
| fwdataApi = validationWrapperFactory.Create(readNormalizationWrapperFactory.Create(fwdataApi)); |
🤖 Prompt for AI Agents
In `@backend/FwLite/FwLiteProjectSync/CrdtFwdataProjectSyncService.cs` around
lines 66 - 70, The CRDT API wrapper order is inconsistent with
MiniLcmApiHubBase: replace the current nested creation that inserts
readNormalization between writeNormalization and validation and instead wrap
crdtApi so writeNormalization directly wraps validation (i.e., use
writeNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi)))
to match MiniLcmApiHubBase; keep fwdataApi as-is (readNormalization +
validation). Update the line that assigns crdtApi which currently calls
writeNormalizationWrapperFactory.Create(readNormalizationWrapperFactory.Create(validationWrapperFactory.Create(crdtApi))).
| ) : IDisposable | ||
| { | ||
| private readonly IMiniLcmApi _wrappedApi = api.WrapWith([normalizationWrapperFactory, validationWrapperFactory, notificationWrapperFactory], project); | ||
| private readonly IMiniLcmApi _wrappedApi = api.WrapWith([writeNormalizationWrapperFactory, readNormalizationWrapperFactory, validationWrapperFactory, notificationWrapperFactory], project); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find WrapWith implementation
find backend -name "*.cs" -type f | xargs grep -l "WrapWith" | head -5Repository: sillsdev/languageforge-lexbox
Length of output: 4552
🏁 Script executed:
# Search for WrapWith extension method definition
rg -B2 -A10 "public static.*WrapWith" backend/ --type csRepository: sillsdev/languageforge-lexbox
Length of output: 1073
🏁 Script executed:
# Examine MiniLcmApiHubBase.cs to see the comparison
find backend -name "MiniLcmApiHubBase.cs" -type fRepository: sillsdev/languageforge-lexbox
Length of output: 123
🏁 Script executed:
# Check the context around line 26 in MiniLcmJsInvokable.cs
cat -n backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs | head -40Repository: sillsdev/languageforge-lexbox
Length of output: 2127
🏁 Script executed:
cat -n backend/FwLite/FwLiteWeb/Hubs/MiniLcmApiHubBase.cs | head -40Repository: sillsdev/languageforge-lexbox
Length of output: 1648
🏁 Script executed:
# Check for any tests or documentation about wrapper order
rg -B2 -A5 "WrapWith|wrapper.*order" backend/FwLite/MiniLcm.Tests/ --type csRepository: sillsdev/languageforge-lexbox
Length of output: 712
🏁 Script executed:
# Look for wrapper factory implementations to understand their purpose
rg -B1 -A8 "class MiniLcmApiValidationWrapperFactory|class MiniLcmWriteApiNormalizationWrapperFactory|class MiniLcmApiStringNormalizationWrapperFactory" backend/ --type csRepository: sillsdev/languageforge-lexbox
Length of output: 3182
🏁 Script executed:
# Check for comments in the MiniLcmJsInvokable file explaining the wrapper order
cat -n backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs | head -30Repository: sillsdev/languageforge-lexbox
Length of output: 1523
🏁 Script executed:
# Look for any tests that validate the wrapper order or provide insights
rg -B3 -A10 "writeNormalization|readNormalization" backend/FwLite/MiniLcm.Tests/ --type cs | head -50Repository: sillsdev/languageforge-lexbox
Length of output: 55
🏁 Script executed:
# Search for any git blame or comments explaining the wrapper order choice
rg -B5 -A5 "notificationWrapperFactory" backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs --type csRepository: sillsdev/languageforge-lexbox
Length of output: 1048
🏁 Script executed:
# Check if there's a README or documentation explaining wrapper order
find backend -name "README*" -o -name "*wrapper*readme*" -o -name "*normalization*readme*" | xargs cat 2>/dev/nullRepository: sillsdev/languageforge-lexbox
Length of output: 17006
🏁 Script executed:
# Look for issue discussions or comments about wrapper ordering
rg -i "wrapper.*order|order.*wrapper|validation.*before|normalization.*order" backend/FwLite/ --type csRepository: sillsdev/languageforge-lexbox
Length of output: 55
🏁 Script executed:
# Check the notification wrapper to understand what it does
rg -B2 -A15 "class MiniLcmApiNotifyWrapper" backend/ --type cs | head -40Repository: sillsdev/languageforge-lexbox
Length of output: 2562
Fix wrapper application order to match documented architecture.
The WrapWith extension reverses the factory list, resulting in this execution chain:
notification → validation → readNormalization → writeNormalization → api
However, the documented architecture in backend/FwLite/README.md specifies:
Client → Write Normalization → Read Normalization → Validation → Core API
This means validation is currently running before write normalization, causing validation to see un-normalized input. The correct factory order should be:
[notificationWrapperFactory, validationWrapperFactory, readNormalizationWrapperFactory, writeNormalizationWrapperFactory]
This aligns with MiniLcmApiHubBase (lines 15-21), which correctly applies write normalization before validation.
🤖 Prompt for AI Agents
In `@backend/FwLite/FwLiteShared/Services/MiniLcmJsInvokable.cs` at line 26, The
wrapper factories are applied in reverse by WrapWith, causing validation to run
before write normalization; update the factory list passed to WrapWith when
constructing _wrappedApi so the runtime order becomes Client → Write
Normalization → Read Normalization → Validation → Core API by using the factory
sequence [notificationWrapperFactory, validationWrapperFactory,
readNormalizationWrapperFactory, writeNormalizationWrapperFactory] (so after
WrapWith reversal it yields the documented write→read→validation order); ensure
this change in IMiniLcmApi _wrappedApi construction mirrors the ordering used in
MiniLcmApiHubBase.
| /// <summary> | ||
| /// Checks if a method is a JsonPatch overload (takes UpdateObjectInput parameter). | ||
| /// These are not user-facing and don't need tests. | ||
| /// </summary> | ||
| private static bool IsJsonPatchOverload(MethodInfo method) | ||
| { | ||
| return method.GetParameters().Any(p => | ||
| p.ParameterType.IsGenericType && | ||
| p.ParameterType.GetGenericTypeDefinition() == typeof(UpdateObjectInput<>)); | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Gets all write API methods that need normalization tests. | ||
| /// </summary> | ||
| private static List<MethodInfo> GetMethodsThatNeedTests() | ||
| { | ||
| return typeof(IMiniLcmWriteApi) | ||
| .GetMethods(BindingFlags.Public | BindingFlags.Instance) | ||
| .Where(m => !m.IsSpecialName) | ||
| .Where(m => !ExcludedMethodNames.Contains(m.Name)) | ||
| .Where(m => !IsJsonPatchOverload(m)) | ||
| .ToList(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "UpdateObjectInput<" --glob '*.cs' -C 2Repository: sillsdev/languageforge-lexbox
Length of output: 50385
🏁 Script executed:
rg -n "JsonPatchDocument" --glob '*.cs' -C 2Repository: sillsdev/languageforge-lexbox
Length of output: 23379
🏁 Script executed:
# Search for hub definitions and route handlers to see if they use JsonPatch
fd -e cs -path '*/Hubs/*' -o -path '*/Controllers/*' -o -path '*/Routes/*' | head -20Repository: sillsdev/languageforge-lexbox
Length of output: 241
🏁 Script executed:
# Check if IMiniLcmWriteApi is used in hubs or routes
rg -n "IMiniLcmWriteApi" --glob '*.cs' -C 2Repository: sillsdev/languageforge-lexbox
Length of output: 18513
🏁 Script executed:
# Check if there are normalization tests in hub or route tests
find . -name '*Hub*Tests*.cs' -o -name '*Route*Tests*.cs' | xargs grep -l "Normaliz" 2>/dev/null || echo "No normalization in hub/route tests"Repository: sillsdev/languageforge-lexbox
Length of output: 108
🏁 Script executed:
# Check what WriteNormalizationTests actually tests
rg -A 5 "public void.*Test.*Normali" backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs | head -60Repository: sillsdev/languageforge-lexbox
Length of output: 55
🏁 Script executed:
# Verify that the wrapper methods with UpdateObjectInput are actually callable/used
rg "UpdateWritingSystem.*UpdateObjectInput|UpdateSense.*UpdateObjectInput|UpdateExampleSentence.*UpdateObjectInput" backend/FwLite/MiniLcm/Normalization/MiniLcmWriteApiNormalizationWrapper.cs -A 3 | head -40Repository: sillsdev/languageforge-lexbox
Length of output: 871
Patch-based normalization is not tested but should be.
UpdateObjectInput<> overloads are reachable from hub endpoints (MiniLcmApiHubBase, CrdtMiniLcmApiHub) and routes, making them user-facing. The MiniLcmWriteApiNormalizationWrapper applies StringNormalizer.NormalizePatch to these methods before forwarding to the underlying API. Since no normalization tests exist in hub/route test files, excluding these methods from WriteNormalizationTests leaves NormalizePatch behavior untested. Add at least one test (e.g., UpdateWritingSystem with a JsonPatchDocument containing non-normalized strings) to verify patch normalization works correctly.
🤖 Prompt for AI Agents
In `@backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs` around lines 536 -
557, The test exclusion for JsonPatch overloads leaves NormalizePatch untested;
remove or adjust the IsJsonPatchOverload filter in GetMethodsThatNeedTests so
UpdateObjectInput<> overloads are included, and add a test (e.g.,
UpdateWritingSystem) that sends a JsonPatchDocument containing non-normalized
strings through the same path the MiniLcmWriteApiNormalizationWrapper uses (or
invoke the wrapper directly) and asserts the underlying API receives normalized
values; reference UpdateObjectInput<>, IsJsonPatchOverload,
GetMethodsThatNeedTests, MiniLcmWriteApiNormalizationWrapper, NormalizePatch,
and the UpdateWritingSystem test name when implementing the change.
| ## References | ||
|
|
||
| - Unicode Normalization: https://unicode.org/reports/tr15/ | ||
| - Issue: https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number] |
There was a problem hiding this comment.
Update placeholder issue number.
The issue reference contains a placeholder [issue-number] that should be replaced with the actual GitHub issue number.
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 189-189: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In `@backend/FwLite/MiniLcm/Normalization/README.md` at line 189, Replace the
placeholder issue link "Issue:
https://github.com/sillsdev/languageforge-lexbox/issues/[issue-number]" in the
README by updating the bracketed token to the real GitHub issue number (or the
correct URL) so the reference resolves; locate the exact line containing that
string in Normalization/README.md and substitute [issue-number] with the actual
numeric ID (or remove the line if no issue exists).
Change AutoInterface to only forward IMiniLcmReadApi methods. This ensures that if a new write method is added to IMiniLcmWriteApi, the compiler will fail until it's manually implemented in the wrapper. Previously: AutoInterface forwarded all methods from IMiniLcmApi Now: AutoInterface only forwards IMiniLcmReadApi, write methods must be manual https://claude.ai/code/session_01MYAMRqK5NJmFkQ7a7tVkTd
This pull request introduces a robust and explicit system for Unicode NFD string normalization throughout the MiniLCM API, ensuring all user-entered text is consistently normalized before storage or processing. The approach distinguishes between read and write normalization wrappers, applies normalization only where needed (e.g., skips FwData APIs that already normalize internally), and provides comprehensive documentation and utility code to enforce and explain the normalization strategy. The changes also add compile-time and test-time coverage guarantees to prevent regressions.
The most important changes are:
Normalization Infrastructure and Enforcement
MiniLcmWriteApiNormalizationWrapperFactoryandStringNormalizerutility (StringNormalizer.cs) to explicitly normalize all user-entered text to NFD form, with deep normalization for complex types and arrays. This ensures consistent Unicode handling across the API. [1] [2]API and Hub Refactoring
MiniLcmApiHubBaseand its subclasses to support both standard and skip-write-normalization construction, allowing FwData APIs to bypass redundant normalization. Updated constructors and dependency injection accordingly. [1] [2] [3]Documentation and Developer Guidance
README.mdin theMiniLcm/Normalizationdirectory, explaining the rationale, architecture, coverage guarantees, and developer workflow for string normalization. This includes instructions for adding new fields or methods and troubleshooting common issues.Dependency and Import Updates
These changes collectively ensure that all user text is normalized in a controlled, testable, and maintainable way, reducing Unicode-related bugs and improving interoperability with FieldWorks and other systems.