Skip to content

fix(supply_chain): parse pyproject.toml with tomllib instead of requirements regex#95

Merged
rng1995 merged 1 commit into
NVIDIA:mainfrom
Shrotriya-lalit:fix/issue-2-pyproject-toml-parsing
Jun 24, 2026
Merged

fix(supply_chain): parse pyproject.toml with tomllib instead of requirements regex#95
rng1995 merged 1 commit into
NVIDIA:mainfrom
Shrotriya-lalit:fix/issue-2-pyproject-toml-parsing

Conversation

@Shrotriya-lalit

Copy link
Copy Markdown
Contributor

Problem

_analyze_dependencies routed pyproject.toml through
_extract_packages_from_requirements, a line-by-line regex designed for
requirements.txt format. TOML keys such as requires-python, name,
description, and authors all look like dependency strings to the regex,
producing false-positive SC5/SC6 findings on standard PEP 621 metadata
fields.

Example: a pyproject.toml with name = "myproject" triggered an SC6
(typosquatting) alert for a non-existent package called myproject.

Root Cause

The requirements parser treats every non-comment line as a potential package
specifier. TOML files have a completely different structure and must be parsed
semantically, not line-by-line.

Fix

Add _extract_packages_from_pyproject(content) using tomllib (Python 3.11+
stdlib, no new dependencies) that reads only the three PEP 621 / PEP 517
dependency arrays:

  • [project].dependencies
  • [project.optional-dependencies].<group>
  • [build-system].requires

A frozenset of PEP 621 metadata keys (name, version, description,
authors, …) acts as a secondary guard. Malformed TOML is caught by
TOMLDecodeError and returns [] so the analyzer never crashes.

def _extract_packages_from_pyproject(content: str) -> list[tuple[str, str | None, int]]:
    try:
        data = tomllib.loads(content)
    except tomllib.TOMLDecodeError:
        return []
    # reads only project.dependencies, project.optional-dependencies, build-system.requires
    ...

Tests

  • Standard [project].dependencies extracted correctly
  • [project.optional-dependencies] groups extracted
  • [build-system].requires extracted
  • Malformed TOML returns [] without exception
  • PEP 621 metadata keys (name, version, description) never treated as packages
  • End-to-end false-positive regression: pyproject.toml with metadata fields produces no SC5/SC6 findings

Closes #2

Checklist

  • make lint passes
  • Tests added covering false-positive regression
  • DCO sign-off on commit (git commit -s)

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — parsing pyproject.toml semantically with tomllib instead of the line-based requirements parser is clearly the right approach, and it resolves the metadata-key false positives. Approving.

Reading only [project].dependencies, [project.optional-dependencies], and [build-system].requires, with the PEP 621 metadata-key guard and TOMLDecodeError -> [], is correct and robust. tomllib is 3.11+ stdlib so no new dependency. The _PKG_NAME_RE is anchored with non-nested quantifiers (no ReDoS concern), and the tests cover the standard/optional/build-system cases plus malformed/empty input and an end-to-end FP regression.

Minor / optional (non-blocking): only PEP 621 / PEP 517 tables are read, so tool-specific dependency tables — Poetry's [tool.poetry.dependencies], PDM, Hatch, etc. — are silently skipped. Previously those lines produced false positives; now they produce false negatives (real deps not scanned) for those projects. Since this is a supply-chain analyzer where misses matter, consider a follow-up to also parse the common tool tables (or at least note the limitation in docs).

@Shrotriya-lalit

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @rng1995!

The Poetry/PDM/Hatch gap is a fair point. The current implementation reads only the three PEP 621/517 standard tables, which eliminates all the false positives from the old requirements-regex path but does introduce false negatives for projects using tool-specific tables.

The common ones to cover in a follow-up:

  • [tool.poetry.dependencies] / [tool.poetry.dev-dependencies] / [tool.poetry.group.*.dependencies]
  • [tool.pdm.dev-dependencies]
  • [tool.hatch.envs.*.dependencies]
  • [tool.uv.dev-dependencies]

I'll open a follow-up issue to track this so it doesn't get lost. The current fix is strictly better than the status quo (no false positives on metadata keys, correct parsing of PEP 621 deps), and the tool-table coverage can land as an incremental improvement.

@rng1995

rng1995 commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

@Shrotriya-lalit - Please resolve merge conflicts so that I can merge the PR.

And yes, please open follow-up PR for open items. Thanks!

@Shrotriya-lalit

Copy link
Copy Markdown
Contributor Author

Conflicts resolved and pushed.

The upstream had independently landed a pyproject.toml fix (PR #28) that added PEP 735 [dependency-groups] support but dropped [build-system].requires. Merge resolution keeps the best of both:

  • [project].dependencies
  • [project.optional-dependencies]
  • [dependency-groups] (PEP 735, from upstream) ✓
  • [build-system].requires (from this PR) ✓
  • Empty-content early-return guard ✓

The duplicate _extract_packages_from_pyproject definition (one from each branch) was collapsed into a single function.

Regarding the follow-up for Poetry/PDM/Hatch/uv tables — issue #171 is open to track that work. Ready to take that on as a separate PR once this one lands.

880 tests pass, ruff clean.

@rng1995

rng1995 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

I still see a conflict arising due to PRs being merged. Sorry, but can you resolve it for the last time? I will merge this first post resolvement

Build-time dependencies declared in [build-system].requires (setuptools,
hatchling, flit-core, etc.) were silently skipped, creating false
negatives for compromised build backends.

Extend _extract_packages_from_pyproject to collect specs from
[build-system].requires alongside the PEP 621 [project].dependencies,
[project.optional-dependencies], and PEP 735 [dependency-groups] tables
already covered.

Signed-off-by: Lalit Shrotriya <shrotriya.lalit@outlook.com>
@Shrotriya-lalit Shrotriya-lalit force-pushed the fix/issue-2-pyproject-toml-parsing branch from f48d41d to a90826e Compare June 24, 2026 06:56
@Shrotriya-lalit

Copy link
Copy Markdown
Contributor Author

Conflicts resolved. Rebased onto current main and re-applied the change cleanly.

The previous branch had stale commits from an older main snapshot that would have conflicted with the baseline suppression and batch isolation work that landed in the interim. Rebuilt from current main with only the unique contribution:

What changed: _extract_packages_from_pyproject now also reads [build-system].requires alongside the PEP 621 and PEP 735 tables already covered. New test: test_pyproject_build_system_requires_extracted.

What stayed the same: all existing pyproject tests pass, ruff clean, 186 tests in test_patterns_new.py pass.

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated SkillSpector Review]

Re-review after rebase — re-confirming approval.

The tomllib-based pyproject parsing I originally approved has since landed on main (via #28), and this branch was rebased onto it. The remaining net diff is a small, clean incremental addition: _extract_packages_from_pyproject now also reads [build-system].requires (defensive isinstance guards), with a matching test_pyproject_build_system_requires_extracted. mergeable_state: clean. No new issues. (The Poetry/PDM/Hatch/uv follow-up from my original note is tracked in #171 and implemented in #190.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False positive: 'requires-python' pyproject field flagged as malicious PyPI package (MAL-2025-41747)

2 participants