Skip to content

Full parallel content guard test#75

Closed
bbezak wants to merge 2 commits into
masterfrom
full-parallel-content-guard-test
Closed

Full parallel content guard test#75
bbezak wants to merge 2 commits into
masterfrom
full-parallel-content-guard-test

Conversation

@bbezak
Copy link
Copy Markdown
Member

@bbezak bbezak commented May 12, 2026

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces batch processing capabilities for container-related resources in Pulp, including distributions, remotes, repositories, and sync operations. New Ansible modules were added to handle these operations concurrently using Python's ThreadPoolExecutor, and existing roles were refactored to utilize these batch modules. Feedback identifies critical bugs in the container_syncs module where missing existence checks for repositories and remotes could lead to runtime errors, and the sync API is incorrectly passed a context object instead of an HREF string. Additionally, there is a recommendation to refactor Ansible tasks that build lists within loops using set_fact to avoid O(N^2) performance degradation.

repository_ctx = PulpContainerRepositoryContext(
pulp_ctx, entity={"name": sync_item["repository"]}
)
repository = repository_ctx.entity
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code does not check if the repository exists before accessing its attributes. If repository_ctx.entity is None (e.g., if the repository name provided in sync_item is not found), subsequent calls like repository.get("remote") at line 170 or repository.get("latest_version_href") at line 180 will raise an AttributeError.

Suggested change
repository = repository_ctx.entity
repository = repository_ctx.entity
if not repository:
result["failed"] = True
result["msg"] = f"Repository '{sync_item['repository']}' not found."
return result

Comment on lines +175 to +178
remote_ctx = PulpContainerRemoteContext(
pulp_ctx, entity={"name": remote_name}
)
payload["remote"] = remote_ctx
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues here: 1) The code does not check if the remote exists before using it. 2) The sync API expects a string (HREF) for the remote field, but the context object itself is being passed. This should be updated to use the pulp_href from the found entity, consistent with how it's handled in container_distributions.py (line 204).

Suggested change
remote_ctx = PulpContainerRemoteContext(
pulp_ctx, entity={"name": remote_name}
)
payload["remote"] = remote_ctx
remote_ctx = PulpContainerRemoteContext(
pulp_ctx, entity={"name": remote_name}
)
if not remote_ctx.entity:
result["failed"] = True
result["msg"] = f"Remote '{remote_name}' not found."
return result
payload["remote"] = remote_ctx.entity["pulp_href"]

Comment on lines +22 to +45
- name: Build container remotes list
set_fact:
container_remotes_list: "{{ container_remotes_list + [{
'name': item.name + '-remote',
'upstream_name': item.get('upstream_name', item.name),
'url': item.get('url'),
'ca_cert': item.get('ca_cert'),
'client_cert': item.get('client_cert'),
'client_key': item.get('client_key'),
'download_concurrency': item.get('download_concurrency'),
'exclude_tags': item.get('exclude_tags'),
'include_tags': item.get('include_tags'),
'policy': item.get('policy'),
'proxy_url': item.get('proxy_url'),
'proxy_username': item.get('proxy_username'),
'proxy_password': item.get('proxy_password'),
'remote_username': item.get('remote_username'),
'remote_password': item.get('remote_password'),
'tls_validation': item.get('tls_validation'),
'state': item.get('state') } | dict2items | rejectattr('value', 'none') | items2dict ] }}"
loop: "{{ (pulp_repository_container_repos | selectattr('state', 'equalto', 'absent') + pulp_repository_container_repos | selectattr('url', 'defined')) | unique }}"
loop_control:
loop_var: item
label: "{{ item.name }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Building a list using set_fact inside a loop is an anti-pattern in Ansible due to its O(N^2) performance impact as the list grows (each set_fact creates a full copy of the list). It is recommended to use a single set_fact with a Jinja2 template or a complex filter chain to construct the list in one operation.

Comment on lines +64 to +70
- name: Build container syncs list
set_fact:
container_syncs_list: "{{ container_syncs_list + [{'repository': item.name, 'remote': item.name + '-remote'}] }}"
loop: "{{ pulp_repository_container_repos | selectattr('url', 'defined') | selectattr('state', 'equalto', 'present') }}"
loop_control:
loop_var: item
label: "{{ item.name }}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous task, building container_syncs_list using set_fact in a loop is inefficient. This can be refactored into a single set_fact using a Jinja2 expression to improve performance, especially when dealing with a large number of repositories.

Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>
@bbezak bbezak force-pushed the full-parallel-content-guard-test branch from 05794cb to e350ac8 Compare May 12, 2026 14:54
@bbezak bbezak closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants