Skip to content

[container_bench] Podman machine benchmarks#904

Open
Honny1 wants to merge 3 commits intoopenshift-psap:mainfrom
Honny1:podman-machine-benchmarks
Open

[container_bench] Podman machine benchmarks#904
Honny1 wants to merge 3 commits intoopenshift-psap:mainfrom
Honny1:podman-machine-benchmarks

Conversation

@Honny1
Copy link
Copy Markdown
Collaborator

@Honny1 Honny1 commented Mar 12, 2026

Summary by CodeRabbit

  • New Features

    • Added two new Podman machine benchmarks: machine_first_start_benchmark and machine_start_benchmark for measuring machine initialization and startup performance.
    • Enhanced comparison visualizations with synthetic benchmark and performance metrics comparison plots.
  • Documentation

    • Added documentation for the new machine benchmarks with parameter descriptions.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ccamacho for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@Honny1 Honny1 force-pushed the podman-machine-benchmarks branch from d8293f1 to 1812832 Compare March 12, 2026 18:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

This pull request introduces Podman machine lifecycle benchmarking support to the container_bench testing framework. It adds two new benchmark types (machine first-start and subsequent start timing), platform-agnostic Ansible task definitions for Unix/Windows environments, configuration management for machine hardware parameters (CPU, memory, rootful mode), test orchestration logic to route machine benchmarks to a dedicated platform, and visualization components for benchmark comparison reporting.

Changes

Cohort / File(s) Summary
Documentation
docs/toolbox.generated/Container_Bench.machine_first_start_benchmark.rst, docs/toolbox.generated/Container_Bench.machine_start_benchmark.rst, docs/toolbox.generated/index.rst
Added two new auto-generated RST documentation pages describing the machine first-start and subsequent-start benchmarks, including parameter documentation for exec_props and machine configuration fields.
Testing Configuration
projects/container_bench/testing/config.yaml
Disabled Docker remote host support; added two new machine-focused benchmarks with Podman-only execution and applehv provider configuration; updated platform skipping to exclude Docker; extended Podman test configuration with machine_config variations (CPU, memory, rootful); introduced benchmark_machine configuration with multiple provider/hardware combinations; updated matbenchmarking iterable fields to map machine configuration parameters.
Configuration Management
projects/container_bench/testing/config_manager.py
Extended get_benchmark_config() to include is_machine_benchmark boolean field; added new get_benchmark_machine_config() method to retrieve machine hardware settings (provider, cpus, memory, rootful).
Test Execution Logic
projects/container_bench/testing/test_container_bench.py, projects/container_bench/testing/utils.py
Added platform separation for test.benchmark_machine* keys into a machine_podman bucket; implemented OS-specific hypervisor filtering; added gating logic to route machine benchmarks only to machine_podman platform; updated prepare_benchmark_args() to inject machine configuration into exec_props; added Podman machine lifecycle handling (stop before benchmark, restart after); modified parse_platform() and parse_benchmark() utility functions to support machine benchmark semantics.
Toolbox Methods
projects/container_bench/toolbox/container_bench.py
Added two new public methods: machine_first_start_benchmark() and machine_start_benchmark() decorated with corresponding Ansible roles; both accept exec_props parameter with machine configuration fields.
Machine Benchmark Ansible Tasks
projects/container_bench/toolbox/container_bench_machine_first_start_benchmark/defaults/main/config.yml, projects/container_bench/toolbox/container_bench_machine_first_start_benchmark/tasks/main.yml, projects/container_bench/toolbox/container_bench_machine_start_benchmark/defaults/main/config.yml, projects/container_bench/toolbox/container_bench_machine_start_benchmark/tasks/main.yml
Added Ansible defaults and tasks for two machine benchmarks; both define execution property handling, optional provider_env prefix construction, and machine initialization/configuration commands; conditionally include Unix or Windows-specific benchmark body tasks.
Shared Benchmark Tasks
projects/container_bench/toolbox/shared_tasks/unix_machine_benchmark_body.yml, projects/container_bench/toolbox/shared_tasks/windows_machine_benchmark_body.yml
Added reusable Ansible task files for machine benchmark body execution on Unix/Linux/Darwin and Windows; both validate required variables (prepare_commands, benchmark_command, cleanup_commands), create artifacts directory, conditionally execute preparation and cleanup commands with privilege escalation awareness, and delegate benchmark measurement to run_benchmark.yml.
Visualization & Reporting
projects/container_bench/visualizations/benchmark/plotting/__init__.py, projects/container_bench/visualizations/benchmark/plotting/comparison_plots.py, projects/container_bench/visualizations/benchmark/plotting/comparison_report.py, projects/container_bench/visualizations/benchmark/plotting/utils/config.py
Added new comparison_plots.py module with SyntheticBenchmarkComparisonPlot and PerformanceComparisonPlot classes for Plotly-based visualizations; integrated module into plotting package; extended comparison_report.py to include plot cards alongside tables; enhanced utils/config.py with machine benchmark configuration extraction, setting key label mapping, and logic to compute shared/differing machine hardware dimensions across configurations.

Sequence Diagram

sequenceDiagram
    participant TestOrch as Test Orchestrator
    participant Toolbox as Container_Bench<br/>(Toolbox Method)
    participant AnsibleExec as Ansible Executor
    participant PodmanSys as Podman/System
    participant Artifacts as Artifacts

    TestOrch->>Toolbox: machine_[first_]start_benchmark(exec_props)
    activate Toolbox
    Toolbox->>AnsibleExec: RunAnsibleRole(machine_[first_]start_benchmark)
    deactivate Toolbox

    activate AnsibleExec
    AnsibleExec->>AnsibleExec: Extract exec_props parameters
    AnsibleExec->>AnsibleExec: Build prepare_commands (init, config machine)
    AnsibleExec->>AnsibleExec: Build benchmark_command (run podman machine start)
    AnsibleExec->>AnsibleExec: Build cleanup_commands (stop, remove machine)
    
    AnsibleExec->>PodmanSys: Execute prepare_commands
    activate PodmanSys
    PodmanSys->>PodmanSys: podman machine stop/rm (cleanup)
    PodmanSys->>PodmanSys: podman machine init benchmark-machine
    PodmanSys->>PodmanSys: podman machine set --cpus/--memory/--rootful
    deactivate PodmanSys
    
    AnsibleExec->>Artifacts: Create artifacts directory
    AnsibleExec->>PodmanSys: Execute benchmark_command (measure start time)
    activate PodmanSys
    PodmanSys->>PodmanSys: podman machine start (time measurement)
    deactivate PodmanSys
    
    AnsibleExec->>PodmanSys: Execute cleanup_commands
    activate PodmanSys
    PodmanSys->>PodmanSys: podman machine stop/rm
    deactivate PodmanSys
    
    deactivate AnsibleExec
    
    TestOrch->>TestOrch: Collect benchmark results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ok-to-test

Poem

🐰 A rabbit bounds through code with glee,
Machine benchmarks now flow fast and free!
From Podman's start to metrics bright,
Plots compare configs left and right—
Hop hop, the framework takes its flight! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[container_bench] Podman machine benchmarks' directly reflects the main change: adding two new machine-focused benchmarks (machine_first_start_benchmark and machine_start_benchmark) to the container_bench project.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Honny1 Honny1 changed the title Podman machine benchmarks [container_bench] Podman machine benchmarks Mar 12, 2026
Honny1 added 3 commits March 12, 2026 19:06
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
@Honny1 Honny1 force-pushed the podman-machine-benchmarks branch from 1812832 to 96b09c2 Compare March 12, 2026 18:06
@Honny1
Copy link
Copy Markdown
Collaborator Author

Honny1 commented Mar 13, 2026

/test cont_bench-jump-ci
/cluster darwin

@psap-forge-bot
Copy link
Copy Markdown

🟢 Test of 'container_bench test test_ci' succeeded after 04 hours 54 minutes 23 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: cont_bench-jump-ci
PR_POSITIONAL_ARG_0: cont_bench-jump-ci

@Honny1 Honny1 marked this pull request as ready for review March 28, 2026 15:56
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
projects/container_bench/testing/test_container_bench.py (1)

352-368: ⚠️ Potential issue | 🟡 Minor

Machine stop() is outside try-finally block - consider error handling.

If machine.stop() (line 355) raises an exception, the finally block won't execute and machine.start() won't be called to restore the original state. This could leave the development machine without a running Podman machine.

Consider wrapping the entire machine lifecycle in the try-finally:

🛡️ Suggested restructuring for safer lifecycle handling
     exit_code = 1
+    machine = PodmanMachine() if platform.container_engine == "podman" else None
     try:
+        if benchmark.is_machine_benchmark and machine:
+            machine.stop()
-
-    machine = PodmanMachine() if platform.container_engine == "podman" else None
-    if benchmark.is_machine_benchmark and machine:
-        machine.stop()
-
-    exit_code = 1
-    try:
         for _ in range(benchmark.runs):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/container_bench/testing/test_container_bench.py` around lines 352 -
368, The call to machine.stop() occurs before the try/finally so if
machine.stop() raises an exception machine.start() in the finally will never
run; wrap the PodmanMachine lifecycle around the benchmark run by moving the
stop/start inside a try/finally: after creating machine when
benchmark.is_machine_benchmark is true call machine.stop() inside the try block
(or call stop() immediately but ensure a surrounding try/finally that always
calls machine.start() in its finally), keep the loop that calls
run.run_toolbox(...) inside that same try, and ensure machine.start() is invoked
in the finally so PodmanMachine.start() always restores state even if stop() or
run.run_toolbox() throws.
🧹 Nitpick comments (3)
projects/container_bench/testing/config_manager.py (1)

188-191: Normalize machine.rootful at config boundary to avoid None propagation.

Line 191 currently forwards raw config and may pass None downstream; defaulting to False here keeps behavior aligned with documented defaults and avoids ambiguous Ansible inputs.

♻️ Suggested patch
     def get_benchmark_machine_config(benchmark_name):
@@
         return {
             'provider': config.project.get_config(f"{benchmark_name}.machine.provider", print=False),
             'cpus': config.project.get_config(f"{benchmark_name}.machine.cpus", print=False),
             'memory': config.project.get_config(f"{benchmark_name}.machine.memory", print=False),
-            'rootful': config.project.get_config(f"{benchmark_name}.machine.rootful", print=False),
+            'rootful': config.project.get_config(
+                f"{benchmark_name}.machine.rootful", default_value=False, print=False
+            ) or False,
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/container_bench/testing/config_manager.py` around lines 188 - 191,
The machine dict construction forwards the raw value of
config.project.get_config(f"{benchmark_name}.machine.rootful", ...) which can be
None; change the code that builds the machine dict (the block using
benchmark_name and config.project.get_config) to normalize rootful to a boolean
by treating None as False (i.e., coalesce the retrieved value to False before
placing it into the dict) so downstream code/Ansible never receives None.
projects/container_bench/toolbox/shared_tasks/windows_machine_benchmark_body.yml (1)

28-30: Consider adding error resilience to cleanup commands.

The cleanup task may fail if the machine is already stopped or removed. In the main.yml files, cleanup_commands includes 2>/dev/null || true for error suppression in the shell command itself, which handles this at the command level. This is acceptable, but consider adding ignore_errors: true at the task level for additional resilience during teardown.

♻️ Optional: Add ignore_errors for cleanup resilience
 - name: Execute benchmark cleanup commands
   ansible.windows.win_shell: "{{ cleanup_commands }}"
   when: cleanup_commands | length > 0
+  ignore_errors: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@projects/container_bench/toolbox/shared_tasks/windows_machine_benchmark_body.yml`
around lines 28 - 30, The cleanup task "Execute benchmark cleanup commands"
using ansible.windows.win_shell with the variable cleanup_commands should be
made resilient to failures by adding ignore_errors: true at the task level;
locate the task that uses ansible.windows.win_shell and cleanup_commands and add
ignore_errors: true so the playbook continues even if cleanup commands fail (the
existing shell-level suppression can remain).
projects/container_bench/toolbox/shared_tasks/unix_machine_benchmark_body.yml (1)

29-32: Consider adding error resilience to cleanup commands.

Same observation as the Windows version - while the shell commands themselves include || true for error suppression, adding ignore_errors: true at the task level provides additional resilience.

♻️ Optional: Add ignore_errors for cleanup resilience
 - name: Execute benchmark cleanup commands
   become: "{{ rootfull | default(false) }}"
   shell: "{{ cleanup_commands }}"
   when: cleanup_commands | length > 0
+  ignore_errors: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@projects/container_bench/toolbox/shared_tasks/unix_machine_benchmark_body.yml`
around lines 29 - 32, The "Execute benchmark cleanup commands" Ansible task
currently runs shell: "{{ cleanup_commands }}" with become: "{{ rootfull |
default(false) }}" and only guards on when: cleanup_commands | length > 0; add
task-level error resilience by setting ignore_errors: true on that task so any
nonzero exit status doesn't fail the playbook even if a command slips past the
inline "|| true" in cleanup_commands—locate the task by its name "Execute
benchmark cleanup commands" and add ignore_errors: true alongside the existing
become, shell and when keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/container_bench/testing/config.yaml`:
- Around line 386-389: The CI matrix never includes the
benchmark_machine.rootful axis because the matrix entry
"test.benchmark_machine.rootful" is commented out; uncomment that line and
enable the corresponding value mappings for rootful in the matrix so the test
matrix includes both rootful:true/false cases. Specifically, restore
"test.benchmark_machine.rootful" alongside "test.benchmark_machine.provider",
"test.benchmark_machine.cpus", and "test.benchmark_machine.memory" and re-enable
the value mapping block that defines the rootful options so runs no longer
always use the default rootful:false.

In `@projects/container_bench/toolbox/container_bench.py`:
- Around line 443-445: The docstrings that list exec_props entries (the blocks
containing "machine_cpus", "machine_memory", and "machine_rootful") are missing
the "machine_provider" field; update each of those docstring sections (the ones
at the shown locations and the other occurrences) to include a short line
describing "machine_provider" (e.g., "machine_provider: provider name for the
machine benchmark, populated by the test harness") so generated docs correctly
reflect the expected exec_props shape.

In
`@projects/container_bench/visualizations/benchmark/plotting/comparison_plots.py`:
- Around line 197-204: The performance plot function _create_exec_time_plot
currently coerces missing execution_time_95th_percentile to 0 which causes
synthetic-only comparisons to produce a misleading zero chart; instead, update
_create_exec_time_plot to first filter configurations to those that have a
non-null execution_time_95th_percentile (e.g., using
c.get("execution_time_95th_percentile") is not None), and if that filtered list
is empty return None so the "No performance data available" fallback runs; then
sort the filtered list by the actual execution_time_95th_percentile value
(remove the "or 0" coercion) and proceed.

In
`@projects/container_bench/visualizations/benchmark/plotting/comparison_report.py`:
- Around line 328-339: The code builds a content list starting with html.H4(...)
then tests `if content:` which is always true; change the condition to only
append the section when there is actual data by checking `synthetic_table` or
`synthetic_plot` (e.g., use `if synthetic_table or synthetic_plot:`) before
calling `tables.append(html.Div(content))`; apply the same change to the other
identical blocks that initialize a content list with a header and then later
test `if content:` so they also check the presence of the data variables (like
`synthetic_table`, `synthetic_plot`, `real_table`, `real_plot`, etc.) instead of
the `content` list.

---

Outside diff comments:
In `@projects/container_bench/testing/test_container_bench.py`:
- Around line 352-368: The call to machine.stop() occurs before the try/finally
so if machine.stop() raises an exception machine.start() in the finally will
never run; wrap the PodmanMachine lifecycle around the benchmark run by moving
the stop/start inside a try/finally: after creating machine when
benchmark.is_machine_benchmark is true call machine.stop() inside the try block
(or call stop() immediately but ensure a surrounding try/finally that always
calls machine.start() in its finally), keep the loop that calls
run.run_toolbox(...) inside that same try, and ensure machine.start() is invoked
in the finally so PodmanMachine.start() always restores state even if stop() or
run.run_toolbox() throws.

---

Nitpick comments:
In `@projects/container_bench/testing/config_manager.py`:
- Around line 188-191: The machine dict construction forwards the raw value of
config.project.get_config(f"{benchmark_name}.machine.rootful", ...) which can be
None; change the code that builds the machine dict (the block using
benchmark_name and config.project.get_config) to normalize rootful to a boolean
by treating None as False (i.e., coalesce the retrieved value to False before
placing it into the dict) so downstream code/Ansible never receives None.

In
`@projects/container_bench/toolbox/shared_tasks/unix_machine_benchmark_body.yml`:
- Around line 29-32: The "Execute benchmark cleanup commands" Ansible task
currently runs shell: "{{ cleanup_commands }}" with become: "{{ rootfull |
default(false) }}" and only guards on when: cleanup_commands | length > 0; add
task-level error resilience by setting ignore_errors: true on that task so any
nonzero exit status doesn't fail the playbook even if a command slips past the
inline "|| true" in cleanup_commands—locate the task by its name "Execute
benchmark cleanup commands" and add ignore_errors: true alongside the existing
become, shell and when keys.

In
`@projects/container_bench/toolbox/shared_tasks/windows_machine_benchmark_body.yml`:
- Around line 28-30: The cleanup task "Execute benchmark cleanup commands" using
ansible.windows.win_shell with the variable cleanup_commands should be made
resilient to failures by adding ignore_errors: true at the task level; locate
the task that uses ansible.windows.win_shell and cleanup_commands and add
ignore_errors: true so the playbook continues even if cleanup commands fail (the
existing shell-level suppression can remain).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da2f988c-5000-40e4-8d5e-50ed918c0a34

📥 Commits

Reviewing files that changed from the base of the PR and between 1c33d22 and 96b09c2.

📒 Files selected for processing (18)
  • docs/toolbox.generated/Container_Bench.machine_first_start_benchmark.rst
  • docs/toolbox.generated/Container_Bench.machine_start_benchmark.rst
  • docs/toolbox.generated/index.rst
  • projects/container_bench/testing/config.yaml
  • projects/container_bench/testing/config_manager.py
  • projects/container_bench/testing/test_container_bench.py
  • projects/container_bench/testing/utils.py
  • projects/container_bench/toolbox/container_bench.py
  • projects/container_bench/toolbox/container_bench_machine_first_start_benchmark/defaults/main/config.yml
  • projects/container_bench/toolbox/container_bench_machine_first_start_benchmark/tasks/main.yml
  • projects/container_bench/toolbox/container_bench_machine_start_benchmark/defaults/main/config.yml
  • projects/container_bench/toolbox/container_bench_machine_start_benchmark/tasks/main.yml
  • projects/container_bench/toolbox/shared_tasks/unix_machine_benchmark_body.yml
  • projects/container_bench/toolbox/shared_tasks/windows_machine_benchmark_body.yml
  • projects/container_bench/visualizations/benchmark/plotting/__init__.py
  • projects/container_bench/visualizations/benchmark/plotting/comparison_plots.py
  • projects/container_bench/visualizations/benchmark/plotting/comparison_report.py
  • projects/container_bench/visualizations/benchmark/plotting/utils/config.py

Comment thread projects/container_bench/testing/config.yaml
Comment on lines +443 to +445
machine_cpus: number of CPUs to allocate to the benchmark machine (optional)
machine_memory: memory in MB to allocate to the benchmark machine (optional)
machine_rootful: whether the podman machine runs in rootful mode (default: false)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document machine_provider in both machine benchmark exec_props docs.

machine_provider is populated by the test harness for machine benchmarks, but it is missing from these method docstrings, which causes generated docs to under-specify the expected exec_props shape.

📝 Suggested docstring update
         properties of exec_props:
           binary_path:     path to the podman binary
           rootfull:        whether to run Ansible tasks as root
           additional_args: additional arguments to pass to the podman binary
           exec_time_path:  path to the exec_time.py script
+          machine_provider:     podman machine provider (optional)
           machine_cpus:         number of CPUs to allocate to the benchmark machine (optional)
           machine_memory:        memory in MB to allocate to the benchmark machine (optional)
           machine_rootful:       whether the podman machine runs in rootful mode (default: false)
@@
         Args:
           exec_props: dict containing execution properties (binary_path, rootfull, additional_args,
-                      exec_time_path, machine_cpus, machine_memory, machine_rootful)
+                      exec_time_path, machine_provider, machine_cpus, machine_memory, machine_rootful)
@@
         properties of exec_props:
           binary_path:     path to the podman binary
           rootfull:        whether to run Ansible tasks as root
           additional_args: additional arguments to pass to the podman binary
           exec_time_path:  path to the exec_time.py script
+          machine_provider:     podman machine provider (optional)
           machine_cpus:         number of CPUs to allocate to the benchmark machine (optional)
           machine_memory:        memory in MB to allocate to the benchmark machine (optional)
           machine_rootful:       whether the podman machine runs in rootful mode (default: false)
@@
         Args:
           exec_props: dict containing execution properties (binary_path, rootfull, additional_args,
-                      exec_time_path, machine_cpus, machine_memory, machine_rootful)
+                      exec_time_path, machine_provider, machine_cpus, machine_memory, machine_rootful)

Also applies to: 452-454, 471-473, 481-483

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/container_bench/toolbox/container_bench.py` around lines 443 - 445,
The docstrings that list exec_props entries (the blocks containing
"machine_cpus", "machine_memory", and "machine_rootful") are missing the
"machine_provider" field; update each of those docstring sections (the ones at
the shown locations and the other occurrences) to include a short line
describing "machine_provider" (e.g., "machine_provider: provider name for the
machine benchmark, populated by the test harness") so generated docs correctly
reflect the expected exec_props shape.

Comment on lines +197 to +204
def _create_exec_time_plot(configurations):
if not configurations:
return None

sorted_configurations = sorted(
configurations,
key=lambda c: c.get("execution_time_95th_percentile", 0) or 0
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip the performance chart when P95 data is absent.

Line 203 coerces missing execution_time_95th_percentile to 0, so synthetic-only comparisons render a misleading all-zero performance plot instead of taking the existing "No performance data available" fallback.

📉 Proposed fix
 def _create_exec_time_plot(configurations):
     if not configurations:
         return None
+
+    configurations = [
+        config for config in configurations
+        if config.get("execution_time_95th_percentile") is not None
+    ]
+    if not configurations:
+        return None
 
     sorted_configurations = sorted(
         configurations,
         key=lambda c: c.get("execution_time_95th_percentile", 0) or 0
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _create_exec_time_plot(configurations):
if not configurations:
return None
sorted_configurations = sorted(
configurations,
key=lambda c: c.get("execution_time_95th_percentile", 0) or 0
)
def _create_exec_time_plot(configurations):
if not configurations:
return None
configurations = [
config for config in configurations
if config.get("execution_time_95th_percentile") is not None
]
if not configurations:
return None
sorted_configurations = sorted(
configurations,
key=lambda c: c.get("execution_time_95th_percentile", 0) or 0
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@projects/container_bench/visualizations/benchmark/plotting/comparison_plots.py`
around lines 197 - 204, The performance plot function _create_exec_time_plot
currently coerces missing execution_time_95th_percentile to 0 which causes
synthetic-only comparisons to produce a misleading zero chart; instead, update
_create_exec_time_plot to first filter configurations to those that have a
non-null execution_time_95th_percentile (e.g., using
c.get("execution_time_95th_percentile") is not None), and if that filtered list
is empty return None so the "No performance data available" fallback runs; then
sort the filtered list by the actual execution_time_95th_percentile value
(remove the "or 0" coercion) and proceed.

Comment on lines +328 to +339
content = [html.H4("Benchmark Results", style=css.STYLE_H4)]
if synthetic_table:
tables.append(html.Div([
html.H4("Benchmark Results", style=css.STYLE_H4),
synthetic_table
]))
content.append(synthetic_table)
if synthetic_plot:
content.append(html.Div(
synthetic_plot,
style=css.STYLE_PLOTS_GRID,
className='plots-grid-responsive'
))

if content:
tables.append(html.Div(content))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Condition if content: is always true.

The content list is initialized with html.H4(...) on line 328 before checking if content: on line 338. This means the condition always evaluates to true. The same pattern appears on lines 348/358.

If the intent is to only append a section when there's actual data (table or plot), the check should verify those elements exist.

🔧 Suggested fix
-        content = [html.H4("Benchmark Results", style=css.STYLE_H4)]
         if synthetic_table:
-            content.append(synthetic_table)
-        if synthetic_plot:
-            content.append(html.Div(
+            content = [html.H4("Benchmark Results", style=css.STYLE_H4)]
+            content.append(synthetic_table)
+            if synthetic_plot:
+                content.append(html.Div(
                 synthetic_plot,
                 style=css.STYLE_PLOTS_GRID,
                 className='plots-grid-responsive'
             ))
-
-        if content:
             tables.append(html.Div(content))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@projects/container_bench/visualizations/benchmark/plotting/comparison_report.py`
around lines 328 - 339, The code builds a content list starting with
html.H4(...) then tests `if content:` which is always true; change the condition
to only append the section when there is actual data by checking
`synthetic_table` or `synthetic_plot` (e.g., use `if synthetic_table or
synthetic_plot:`) before calling `tables.append(html.Div(content))`; apply the
same change to the other identical blocks that initialize a content list with a
header and then later test `if content:` so they also check the presence of the
data variables (like `synthetic_table`, `synthetic_plot`, `real_table`,
`real_plot`, etc.) instead of the `content` list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant