Conversation
There was a problem hiding this comment.
Code Review
This pull request implements full recipe loading, serializing the complete plan state into a recipe file to enable reproducible runs. A key change involves dynamically deriving the discover-phase attribute from test paths. However, a critical security vulnerability has been identified: the implementation of test path recreation in the discover step is susceptible to Path Traversal. The path attribute from the recipe is used to construct filesystem paths for directory creation without adequate sanitization, potentially allowing an attacker to create directories outside the intended workdir using .. sequences. This requires remediation by validating that the resulting paths do not escape the intended base directory. Additionally, there are a couple of suggestions to improve code robustness and clarity in tmt/recipe.py.
| for test in self._tests: | ||
| if test.path is None: | ||
| continue | ||
| path = self.test_dir / test.path.unrooted().relative_to(Path(self.safe_name) / 'tests') | ||
| if not path.exists(): | ||
| path.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
The discover_from_recipe method recreates test paths using the path attribute from the recipe without proper sanitization. It uses pathlib.Path.relative_to() which, when used lexically, allows directory traversal sequences (e.g., ..) to remain in the resulting path if they follow the specified prefix. An attacker can provide a malicious recipe file with path attributes like /{phase_name}/tests/../../../../etc/evil to create directories in arbitrary locations on the filesystem where the user running tmt has write permissions. If tmt is run with elevated privileges (e.g., as root to provision containers), this could allow creating directories in sensitive system locations.
tmt/recipe.py
Outdated
| def from_serialized(cls, serialized: dict[str, Any], logger: Logger) -> '_RecipeTest': # type: ignore[override] | ||
| raw_checks = serialized.pop('check', []) | ||
| serialized['check'] = [] | ||
| recipe_test = super().from_serialized(serialized) | ||
| recipe_test.check = [Check.from_spec(check, logger) for check in raw_checks] | ||
| return recipe_test |
There was a problem hiding this comment.
This method modifies the serialized dictionary in-place, which can be an unexpected side effect. To improve encapsulation and prevent potential issues, create a copy of the dictionary to work with.
| def from_serialized(cls, serialized: dict[str, Any], logger: Logger) -> '_RecipeTest': # type: ignore[override] | |
| raw_checks = serialized.pop('check', []) | |
| serialized['check'] = [] | |
| recipe_test = super().from_serialized(serialized) | |
| recipe_test.check = [Check.from_spec(check, logger) for check in raw_checks] | |
| return recipe_test | |
| def from_serialized(cls, serialized: dict[str, Any], logger: Logger) -> '_RecipeTest': # type: ignore[override] | |
| serialized = serialized.copy() | |
| raw_checks = serialized.pop('check', []) | |
| recipe_test = super().from_serialized(serialized) | |
| recipe_test.check = [Check.from_spec(check, logger) for check in raw_checks] | |
| return recipe_test |
tmt/recipe.py
Outdated
| for e in [ | ||
| 'environment_from_fmf', | ||
| 'environment_from_importing', | ||
| 'environment_from_cli', | ||
| 'environment_from_intrinsics', | ||
| ]: | ||
| spec.pop(e) |
There was a problem hiding this comment.
Using a hardcoded list of keys to pop is brittle. If a new environment_from_* key is added in the future, it could be missed here. A more robust approach is to dynamically identify and remove these keys based on their prefix.
for key in list(spec.keys()):
if key.startswith('environment_from_'):
spec.pop(key)| return [ | ||
| TestOrigin( | ||
| phase=test.discover_phase, | ||
| phase=test.path.unrooted().parts[0], |
There was a problem hiding this comment.
Uph, this is fragile. Is it really needed?
3c63f94 to
18b7353
Compare
thrix
left a comment
There was a problem hiding this comment.
The full recipe loading direction is good. Three findings, one requiring a change:
-
Run.environmentsilently ignores CLI--environmentwhen recipe is loaded - The recipe env unconditionally overrides CLI options. New--environmentoverrides on replay are silently lost. This should merge recipe env with CLI env, letting CLI take precedence. -
linkfield type mismatch after unserialization removal - Raw data stored whereLinksobject is expected. Not an active runtime bug but incorrect typing that could break on re-serialization paths. -
Unrelated schema change -
display-guestinreport/display.yamlshould be split out.
| options. | ||
| """ | ||
|
|
||
| if self.recipe is not None: |
There was a problem hiding this comment.
This early return completely bypasses _environment_from_cli, meaning CLI options like --environment FOO=bar are silently ignored when running from a recipe.
While the recipe faithfully captures the original run's environment (via run.environment in save()), any new CLI overrides on replay are lost. A user running:
tmt run --recipe recipe.yaml --environment FOO=bar
would expect FOO=bar to be set, but it won't be.
Consider merging instead, letting CLI overrides take precedence:
if self.recipe is not None:
return Environment({
**self.recipe.run.environment,
**self._environment_from_cli,
})| link: Optional['tmt.base.core.Links'] = field( | ||
| serialize=lambda value: value.to_spec() if value else [], | ||
| unserialize=lambda value: _unserialize_links(value), | ||
| ) |
There was a problem hiding this comment.
The unserialize callback was removed but the type annotation still says Optional['tmt.base.core.Links']. When loading from serialized YAML via from_serialized(), the raw value (list of dicts) is stored instead of a Links object.
This doesn't cause runtime errors in the current save path (which creates from Plan objects where link IS a Links object), but the type mismatch could break if the loaded recipe is ever re-serialized, since the serialize callback calls value.to_spec() which would fail on raw data.
Same issue exists for _RecipePlan.link where from_serialized does link=serialized.get('link') (raw data).
| name: | ||
| type: string | ||
|
|
||
| display-guest: |
There was a problem hiding this comment.
This display-guest addition appears unrelated to recipe loading. Should be in a separate commit or PR.
|
Besides other comments, the code appears to assume the recipe file is well-formed, do you consider add some validation to the recipe file provided, say, a schema file? |
|
@therazix please, set the "Size" of this PR. |
thrix
left a comment
There was a problem hiding this comment.
Review: Add support for full recipe loading
Good progress on extending recipe loading from discover-only to all plan steps. The environment simplification and removal of discover_phase are clean.
Issues
Blocking:
- CLI environment silently ignored (
tmt/base/core.py): The early return bypasses_environment_from_cli, so--environment FOO=baris silently dropped when using--recipe. See inline comment for suggested fix.
Latent bug:
_RecipeTest.linkunserialization removed (tmt/recipe.py): Works today because all test links are[](falsy short-circuit), but wouldAttributeErroron any recipe with non-empty test links. See inline comment.
Hygiene:
- Unrelated schema change (
tmt/schemas/report/display.yaml):display-guestaddition is not related to recipe loading — should be a separate commit/PR. - PR checklist: All items are unchecked — docs, spec, schema, version, release note still needed.
Verified non-issues
- Path traversal in
discover_from_recipe: Therelative_to()+resolve()+ parent check is sufficient. The gemini-code-assist security warning is overblown. - Removed "non-existent plan" test: Correct — with
tree.children.clear(), the tree IS the recipe, so the old error case no longer applies. - Only saving
_environment_from_fmfin_RecipePlan: Reasonable —_RecipeRun.environmentcaptures the full merged env, and intrinsics should be regenerated per run.
Generated-by: Claude Code
| if self.recipe is not None: | ||
| return self.recipe.run.environment.copy() | ||
|
|
There was a problem hiding this comment.
This early return completely bypasses _environment_from_cli, meaning CLI options like --environment FOO=bar are silently ignored when running from a recipe.
While the recipe faithfully captures the original run's environment (via run.environment in save()), any new CLI overrides on replay are lost. A user running:
tmt run --recipe recipe.yaml --environment FOO=bar
would expect FOO=bar to be set, but it won't be.
Consider merging instead, letting CLI overrides take precedence:
if self.recipe is not None:
return Environment({
**self.recipe.run.environment,
**self._environment_from_cli,
})| link: Optional['tmt.base.core.Links'] = field( | ||
| serialize=lambda value: value.to_spec() if value else [], | ||
| unserialize=lambda value: _unserialize_links(value), | ||
| ) |
There was a problem hiding this comment.
Latent bug: the unserialize callback was removed, so when loading from YAML via from_serialized(), the raw value (list of dicts) is stored instead of a Links object.
This is fine when link is [] (falsy — the serialize callback short-circuits to []), which is the case for all current test data. However, if a recipe test ever has a non-empty link (e.g., [{"relates": "..."}]), the serialize callback lambda value: value.to_spec() if value else [] would call .to_spec() on a raw list, causing an AttributeError.
Either restore the unserialize callback or handle raw data in the serialize callback.
| display-guest: | ||
| type: boolean | ||
|
|
There was a problem hiding this comment.
This display-guest addition appears unrelated to recipe loading. Should be in a separate commit or PR to keep changes atomic.
Pull Request Checklist