fix(supply_chain): parse pyproject.toml with tomllib instead of requirements regex#95
Conversation
rng1995
left a comment
There was a problem hiding this comment.
Good fix — parsing pyproject.toml semantically with tomllib instead of the line-based requirements parser is clearly the right approach, and it resolves the metadata-key false positives. Approving.
Reading only [project].dependencies, [project.optional-dependencies], and [build-system].requires, with the PEP 621 metadata-key guard and TOMLDecodeError -> [], is correct and robust. tomllib is 3.11+ stdlib so no new dependency. The _PKG_NAME_RE is anchored with non-nested quantifiers (no ReDoS concern), and the tests cover the standard/optional/build-system cases plus malformed/empty input and an end-to-end FP regression.
Minor / optional (non-blocking): only PEP 621 / PEP 517 tables are read, so tool-specific dependency tables — Poetry's [tool.poetry.dependencies], PDM, Hatch, etc. — are silently skipped. Previously those lines produced false positives; now they produce false negatives (real deps not scanned) for those projects. Since this is a supply-chain analyzer where misses matter, consider a follow-up to also parse the common tool tables (or at least note the limitation in docs).
|
Thanks for the thorough review @rng1995! The Poetry/PDM/Hatch gap is a fair point. The current implementation reads only the three PEP 621/517 standard tables, which eliminates all the false positives from the old requirements-regex path but does introduce false negatives for projects using tool-specific tables. The common ones to cover in a follow-up:
I'll open a follow-up issue to track this so it doesn't get lost. The current fix is strictly better than the status quo (no false positives on metadata keys, correct parsing of PEP 621 deps), and the tool-table coverage can land as an incremental improvement. |
|
@Shrotriya-lalit - Please resolve merge conflicts so that I can merge the PR. And yes, please open follow-up PR for open items. Thanks! |
|
Conflicts resolved and pushed. The upstream had independently landed a pyproject.toml fix (PR #28) that added PEP 735
The duplicate Regarding the follow-up for Poetry/PDM/Hatch/uv tables — issue #171 is open to track that work. Ready to take that on as a separate PR once this one lands. 880 tests pass, ruff clean. |
|
I still see a conflict arising due to PRs being merged. Sorry, but can you resolve it for the last time? I will merge this first post resolvement |
Build-time dependencies declared in [build-system].requires (setuptools, hatchling, flit-core, etc.) were silently skipped, creating false negatives for compromised build backends. Extend _extract_packages_from_pyproject to collect specs from [build-system].requires alongside the PEP 621 [project].dependencies, [project.optional-dependencies], and PEP 735 [dependency-groups] tables already covered. Signed-off-by: Lalit Shrotriya <shrotriya.lalit@outlook.com>
f48d41d to
a90826e
Compare
|
Conflicts resolved. Rebased onto current main and re-applied the change cleanly. The previous branch had stale commits from an older main snapshot that would have conflicted with the baseline suppression and batch isolation work that landed in the interim. Rebuilt from current main with only the unique contribution: What changed: What stayed the same: all existing pyproject tests pass, ruff clean, 186 tests in |
rng1995
left a comment
There was a problem hiding this comment.
[Automated SkillSpector Review]
Re-review after rebase — re-confirming approval.
The tomllib-based pyproject parsing I originally approved has since landed on main (via #28), and this branch was rebased onto it. The remaining net diff is a small, clean incremental addition: _extract_packages_from_pyproject now also reads [build-system].requires (defensive isinstance guards), with a matching test_pyproject_build_system_requires_extracted. mergeable_state: clean. No new issues. (The Poetry/PDM/Hatch/uv follow-up from my original note is tracked in #171 and implemented in #190.)
Problem
_analyze_dependenciesroutedpyproject.tomlthrough_extract_packages_from_requirements, a line-by-line regex designed forrequirements.txtformat. TOML keys such asrequires-python,name,description, andauthorsall look like dependency strings to the regex,producing false-positive SC5/SC6 findings on standard PEP 621 metadata
fields.
Example: a
pyproject.tomlwithname = "myproject"triggered an SC6(typosquatting) alert for a non-existent package called
myproject.Root Cause
The requirements parser treats every non-comment line as a potential package
specifier. TOML files have a completely different structure and must be parsed
semantically, not line-by-line.
Fix
Add
_extract_packages_from_pyproject(content)usingtomllib(Python 3.11+stdlib, no new dependencies) that reads only the three PEP 621 / PEP 517
dependency arrays:
[project].dependencies[project.optional-dependencies].<group>[build-system].requiresA
frozensetof PEP 621 metadata keys (name,version,description,authors, …) acts as a secondary guard. Malformed TOML is caught byTOMLDecodeErrorand returns[]so the analyzer never crashes.Tests
[project].dependenciesextracted correctly[project.optional-dependencies]groups extracted[build-system].requiresextracted[]without exceptionname,version,description) never treated as packagespyproject.tomlwith metadata fields produces no SC5/SC6 findingsCloses #2
Checklist
make lintpassesgit commit -s)