Add count-deces target, standardize GIT_ROOT and improve README #155
Add count-deces target, standardize GIT_ROOT and improve README #155sebfournier95 wants to merge 3 commits intomatchID-project:devfrom
Conversation
WalkthroughRenames Makefile exported variable Changes
Sequence Diagram(s)Not applicable — changes do not modify application control flow. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Makefile (1)
9-9: Variable naming incomplete: GITROOT should be GIT_ROOT per PR objectives.The PR description states the Git variable should be renamed from
GITROOTtoGIT_ROOTfor harmonization, but line 9 still exportsGITROOT, and line 138 still references${GITROOT}. This creates an inconsistency between the stated objective and implementation, and will break any downstream sub-makefiles or external code expecting theGIT_ROOTvariable name mentioned in the PR summary.To complete the rename as described, update:
- export GITROOT = https://github.com/matchid-project + export GIT_ROOT = https://github.com/matchid-projectAnd update line 138:
- @${GIT} clone -q ${GITROOT}/${GIT_BACKEND} + @${GIT} clone -q ${GIT_ROOT}/${GIT_BACKEND}Also applies to: 138-138
🧹 Nitpick comments (1)
README.md (1)
31-33: Fenced code block missing language identifier.The diagram code block starting at line 31 does not specify a language. While this renders in most Markdown viewers, it prevents syntax highlighting and is flagged as a linting issue (MD040).
Apply a language identifier (or use plaintext) to improve clarity:
- ``` + ```plaintext Data.gouv.fr → Stockage S3 → Traitement matchID → Index Elasticsearch → Sauvegarde - ``` + ```
| ## 📋 Table des matières | ||
|
|
||
| - [Vue d'ensemble](#-vue-densemble) | ||
| - [Prérequis](#-prérequis) | ||
| - [Installation](#-installation) | ||
| - [Démarrage rapide](#-démarrage-rapide) | ||
| - [Workflows](#-workflows) | ||
| - [Commandes disponibles](#-commandes-disponibles) | ||
| - [Configuration](#-configuration) | ||
| - [Traitements des données](#-traitements-des-données) | ||
| - [Déploiement distant](#-déploiement-distant) | ||
| - [Dépannage](#-dépannage) | ||
| - [License](#-license) |
There was a problem hiding this comment.
🧩 Analysis chain
Invalid anchor links in table of contents.
The TOC links on lines 11, 17, and 19 reference anchors that don't match the actual section headings due to character encoding or formatting differences. For example:
- Line 11:
[Vue d'ensemble](#-vue-densemble)– the apostrophe character will not match correctly in the anchor - Line 17:
[Déploiement distant](#-déploiement-distant)– accent normalization may fail - Line 19:
[Dépannage](#-dépannage)– similar accent/character handling issue
This breaks TOC navigation in some Markdown renderers.
Verify and correct the anchor links by:
- Testing TOC navigation in your intended rendering environment (GitHub, GitLab, etc.)
- Using proper URL-encoded or normalized anchor IDs that match the heading markdown
Example fix for line 11:
- - [Vue d'ensemble](#-vue-densemble)
+ - [Vue d'ensemble](#-vue-densemble)(Ensure the anchor character is consistent with how the renderer normalizes the heading text.)
🏁 Script executed:
#!/bin/bash
# Locate README.md and extract TOC + corresponding section headings
find . -name "README.md" -type f | head -5Length of output: 11
🏁 Script executed:
#!/bin/bash
# Extract TOC section and section headings to verify anchor links
if [ -f "README.md" ]; then
echo "=== TOC SECTION (lines 9-21) ==="
sed -n '9,21p' README.md
echo ""
echo "=== SECTION HEADINGS (first 50 lines) ==="
head -50 README.md | grep -n "^#"
echo ""
echo "=== SPECIFIC LINES: 31, 346, 348 ==="
echo "Line 31:"
sed -n '31p' README.md
echo ""
echo "Lines 346-348:"
sed -n '346,348p' README.md
fiLength of output: 1078
🏁 Script executed:
#!/bin/bash
# Extract ALL section headings to verify TOC anchor validity
if [ -f "README.md" ]; then
echo "=== ALL SECTION HEADINGS ==="
grep -n "^#" README.md | head -30
echo ""
echo "=== CONTEXT AROUND LINE 31 (lines 26-35) ==="
sed -n '26,35p' README.md
fiLength of output: 1333
🏁 Script executed:
#!/bin/bash
# Test Markdown anchor generation: how GitHub normalizes headings with French accents and apostrophes
# Modern renderers strip accents and replace apostrophes; verify what the actual anchors should be
cat > /tmp/test_anchors.py << 'EOF'
import re
def normalize_anchor(heading_text):
"""
Simulate how GitHub Markdown normalizes headings to anchors.
- Remove emojis
- Lowercase
- Replace spaces and apostrophes with hyphens
- Remove/collapse special characters
"""
# Remove emojis
text = re.sub(r'[\U0001F300-\U0001F9FF]|\s+', ' ', heading_text).strip()
# Lowercase
text = text.lower()
# Replace apostrophes and spaces with hyphens
text = re.sub(r"['\s]+", '-', text)
# Remove any remaining non-alphanumeric except hyphens
text = re.sub(r'[^a-z0-9-]', '', text)
# Collapse multiple hyphens
text = re.sub(r'-+', '-', text)
# Strip leading/trailing hyphens
text = text.strip('-')
return text
headings = [
"Vue d'ensemble",
"Prérequis",
"Installation",
"Démarrage rapide",
"Workflows",
"Commandes disponibles",
"Configuration",
"Traitements des données",
"Déploiement distant",
"Dépannage",
"License"
]
print("Expected anchor normalizations:")
for h in headings:
anchor = normalize_anchor(h)
print(f" {h:30} -> #{anchor}")
EOF
python /tmp/test_anchors.pyLength of output: 596
🏁 Script executed:
#!/bin/bash
# Extract table header and verify column count on lines 346-348
if [ -f "README.md" ]; then
echo "=== TABLE CONTEXT (lines 340-350) ==="
sed -n '340,350p' README.md
fiLength of output: 586
Fix malformed TOC anchor links by removing leading hyphens.
All 10 TOC links (lines 11–20) have erroneous leading hyphens in their anchors (e.g., #-vue-densemble instead of #vue-densemble), causing GitHub and other Markdown renderers to fail navigation. The root cause is the anchor format, not apostrophes or accents.
Correct the anchors by removing the leading hyphen from each link:
- - [Vue d'ensemble](#-vue-densemble)
+ - [Vue d'ensemble](#vue-densemble)
- - [Prérequis](#-prérequis)
+ - [Prérequis](#prérequis)
- - [Installation](#-installation)
+ - [Installation](#installation)
- - [Démarrage rapide](#-démarrage-rapide)
+ - [Démarrage rapide](#démarrage-rapide)
- - [Workflows](#-workflows)
+ - [Workflows](#workflows)
- - [Commandes disponibles](#-commandes-disponibles)
+ - [Commandes disponibles](#commandes-disponibles)
- - [Configuration](#-configuration)
+ - [Configuration](#configuration)
- - [Traitements des données](#-traitements-des-données)
+ - [Traitements des données](#traitements-des-données)
- - [Déploiement distant](#-déploiement-distant)
+ - [Déploiement distant](#déploiement-distant)
- - [Dépannage](#-dépannage)
+ - [Dépannage](#dépannage)
- - [License](#-license)
+ - [License](#license)Minor: Add language specifier to code block at line 32 (currently bare ```).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## 📋 Table des matières | |
| - [Vue d'ensemble](#-vue-densemble) | |
| - [Prérequis](#-prérequis) | |
| - [Installation](#-installation) | |
| - [Démarrage rapide](#-démarrage-rapide) | |
| - [Workflows](#-workflows) | |
| - [Commandes disponibles](#-commandes-disponibles) | |
| - [Configuration](#-configuration) | |
| - [Traitements des données](#-traitements-des-données) | |
| - [Déploiement distant](#-déploiement-distant) | |
| - [Dépannage](#-dépannage) | |
| - [License](#-license) | |
| ## 📋 Table des matières | |
| - [Vue d'ensemble](#vue-densemble) | |
| - [Prérequis](#prérequis) | |
| - [Installation](#installation) | |
| - [Démarrage rapide](#démarrage-rapide) | |
| - [Workflows](#workflows) | |
| - [Commandes disponibles](#commandes-disponibles) | |
| - [Configuration](#configuration) | |
| - [Traitements des données](#traitements-des-données) | |
| - [Déploiement distant](#déploiement-distant) | |
| - [Dépannage](#dépannage) | |
| - [License](#license) |
🧰 Tools
🪛 LanguageTool
[grammar] ~9-~9: Il y a peut-être une erreur ici
Context: ...ecedees/) --- ## 📋 Table des matières - Vue d'ensemble - [Prér...
(QB_NEW_FR)
[grammar] ~21-~21: Il y a peut-être une erreur ici
Context: ...t) - Dépannage - License --- ## 🏗️ Vue d'ensemble Le projet [`deces-da...
(QB_NEW_FR)
🪛 markdownlint-cli2 (0.18.1)
11-11: Link fragments should be valid
(MD051, link-fragments)
17-17: Link fragments should be valid
(MD051, link-fragments)
19-19: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
In README.md around lines 9 to 21, the table of contents entries use malformed
anchors with a leading hyphen (e.g., `#-vue-densemble`) so links fail; remove
the leading hyphen from each anchor in lines 11–20 so they read e.g.,
`#vue-densemble`, `#prérequis`, etc., and also update the code block at line 32
to include an appropriate language specifier (for example, use ```bash or ```sh)
to enable syntax highlighting.
|
|
||
| | Variable | Valeur | Description | | ||
| |----------|--------|-------------| | ||
| | [`FILES_TO_SYNC`](Makefile:39) | `fichier-opposition-deces-.*.csv(.gz)?|deces-.*.txt(.gz)?` | Fichiers à synchroniser | |
There was a problem hiding this comment.
Table column count mismatch.
Lines 346 and 348 have tables with more cells than expected header columns:
- Line 346:
FILES_TO_SYNCrow appears to have 4 values but table expects 3 columns - Line 348:
FILES_TO_PROCESSrow similarly has extra data
This causes rendering issues and data loss in strict Markdown parsers (MD056).
Verify and fix the table structure by ensuring consistent column alignment. For example, if the table should have 3 columns (Variable | Valeur | Description), split or reorganize multi-part values:
| Variable | Valeur | Description |
|----------|--------|-------------|
- | [`FILES_TO_SYNC`](Makefile:39) | `fichier-opposition-deces-.*.csv(.gz)?\|deces-.*.txt(.gz)?` | Fichiers à synchroniser |
+ | [`FILES_TO_SYNC`](Makefile:39) | `fichier-opposition-deces-.*.csv(.gz)?` / `deces-.*.txt(.gz)?` | Fichiers à synchroniser |Also applies to: 348-348
🧰 Tools
🪛 LanguageTool
[typographical] ~346-~346: Caractère d’apostrophe incorrect.
Context: ...| |----------|--------|-------------| | FILES_TO_SYNC | `fichier-...
(APOS_INCORRECT)
[grammar] ~346-~346: Il y a peut-être une erreur ici
Context: ...---| | FILES_TO_SYNC | fichier-opposition-deces-.*.csv(.gz)?|deces-.*.txt(.gz)? | Fichiers à synchroniser | | [`FILES...
(QB_NEW_FR)
🪛 markdownlint-cli2 (0.18.1)
346-346: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
🤖 Prompt for AI Agents
In README.md around lines 346 and 348, the table rows for FILES_TO_SYNC and
FILES_TO_PROCESS contain an extra cell causing a 4-column row in a 3-column
table; edit each row so it has exactly three pipe-separated cells (Variable |
Valeur | Description) by moving any multi-part patterns into a single cell
(e.g., combine the regex patterns into one Valeur cell wrapped as inline code or
escape internal pipes), or split the description into the Description cell,
ensuring no stray pipes create extra columns; after editing, verify both rows
match the header column count and render correctly.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Makefile(2 hunks)
🔇 Additional comments (2)
Makefile (2)
141-141: Clarify the purpose of appending a blank line to artifacts.Line 141 appends an empty line to
${GIT_BACKEND}/artifacts. The motivation and functional purpose of this side effect is unclear from the code context alone. Please confirm this is intentional and explain why the blank line is necessary for the provisioning flow.
85-98: Discrepancy between PR objectives and provided code.The PR objectives mention a new
count-decestarget at lines 85–98 that iterates overdeces-*.txt.gzfiles and useszcatfor decompression. However, the provided code snippet shows the${DATA_TAG}target and other unrelated targets at those line ranges. Please verify whether thecount-decestarget was actually added or if the line range references in the PR objectives are incorrect.
🔧 Makefile and Documentation Improvements
Changes
1. 🔄 Git Variable Standardization (line 9)
Change:
GITROOT→GIT_ROOTMotivation:
Impact:
${GIT_BACKEND}target for repository cloning2. 📚 README Documentation Enhancement
Changes:
Benefits:
Type of Change
Tests
GIT_ROOT