Downstream implementation of parallel pulp container modules in python by Alex-Welsh · Pull Request #73 · stackhpc/ansible-collection-pulp

Alex-Welsh · 2025-12-22T18:33:26Z

Full disclosure, Gen AI used to create new modules.
I've spent a while testing the changes against an aio-like environment, and included new test playbooks.

The code is complex, but fairly readable, it's a downstream tool and not at all critical. If we hit any issues we can easily revert to the old version.

Results for a full container sync/publish:
Old role: 2h 1m 31s
New modules: 10m 13s

The new implementation ran with a concurrency of 10, could be pushed higher. Test instance on SMS, using an aufn-like environment to create a seed that hosts pulp, so a fair bit of virtualisation overhead.

Depends on #71

gemini-code-assist · 2025-12-22T18:33:52Z

Summary of Changes

Hello @Alex-Welsh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance and scalability of managing Pulp container content through Ansible. By introducing new parallelized modules and refactoring existing roles, it enables concurrent operations for repositories, remotes, distributions, and syncs, leading to substantial time savings in deployment and management workflows. The changes aim to provide a more robust and efficient experience for users interacting with Pulp container services.

Highlights

Performance Improvement: Implemented full parallelization for container sync and publish operations, drastically reducing execution time from over 2 hours to just over 10 minutes for a full container sync/publish, with potential for further optimization by increasing concurrency.
New Ansible Modules: Introduced new Ansible modules (container_repositories, container_remotes, container_distributions, container_syncs) to manage Pulp container entities in parallel, replacing previous single-entity modules and enhancing efficiency for batch operations.
Refactored Content Management: The pulp_container_content module has been removed and its functionality refactored into a new process_content.yml task file within the pulp_container_content role, leveraging generic API calls for more flexible and concurrent content unit management.
Concurrency Support: Added configurable concurrency options to the new container modules and relevant roles, allowing users to specify the maximum number of parallel operations for improved performance.
Updated Dependencies and Compatibility: The collection's galaxy.yml has been updated to version 0.6.0, including a new author and pinning the pulp.squeezer dependency to 0.2.3. The README.md now reflects compatibility with Ansible 11.
Enhanced Test Coverage: New test playbooks have been added for the container_distributions, container_remotes, container_repositories, and container_syncs modules to ensure the stability and correctness of the new parallel implementations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/pull_request.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and impressive refactoring to parallelize Pulp container operations by introducing new concurrent Ansible modules. The performance gains described are substantial. The new modules for repositories, remotes, distributions, and syncs are well-structured, leveraging concurrent.futures for parallelism and pulp-glue for API interactions. The corresponding roles and tests have been effectively updated to use these new batch modules. My review includes a couple of suggestions for the new task-based implementation of container content management to improve correctness and simplify the code.

Alex-Welsh · 2026-01-07T11:28:34Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant performance improvement by implementing parallel processing for Pulp container operations. The new Ansible modules for repositories, remotes, distributions, and syncs are well-designed to handle batch operations concurrently, drastically reducing execution time as noted in the description. The code is generally clean and includes new tests for the added functionality. My review includes a few suggestions to improve maintainability, consistency, and security, such as refactoring duplicated code, improving parameter validation, and ensuring all sensitive data is sanitized. Overall, this is a great contribution that will significantly benefit users managing large numbers of container resources in Pulp.

priteau · 2026-02-24T11:57:45Z

I am not keen on having a commit with GPL license and another with Apache 2, could you please squash?

priteau · 2026-02-24T13:36:31Z

My own comparison of the old and new implementation on repo sync/publish (on an already synced Pulp):

pulp-repo-sync.yml: before 4m39.717s, after 1m25.681s
pulp-repo-publish.yml: before 0m31.651s, after 0m27.345s

technowhizz · 2026-04-02T11:53:33Z

@Alex-Welsh Is this ready to be used?

Alex-Welsh · 2026-04-02T12:03:29Z

@Alex-Welsh Is this ready to be used?

Yeah, I've used it a lot over the last few months, it all seems to work now

grzegorzkoper

Hey, great PR - would like it to get merged asap.
Since it was AI generated, I figured it should be AI reviewed.
My clankers found some minor issues, after that LGTM

grzegorzkoper

Thanks, LGTM

bbezak · 2026-05-12T12:50:56Z

#74

Mark remote/proxy usernames as no_log and strip them from returned container remote data, matching the old pulp.squeezer remote module behavior. Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>

bbezak · 2026-05-12T14:35:41Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces several new Ansible modules—container_distributions, container_remotes, container_repositories, and container_syncs—designed to perform batch operations concurrently on a Pulp server. The existing container tasks in the pulp_distribution and pulp_repository roles have been refactored to use these batch modules, and a new prereqs task ensures the required pulp-glue dependencies are installed. Feedback on the PR identifies critical missing existence checks for content guards, repositories, and remotes within the new modules to prevent potential runtime errors. Additionally, there is a recommendation to refactor the list-building logic in the Ansible tasks to avoid the performance overhead associated with looping set_fact.

gemini-code-assist · 2026-05-12T14:37:08Z

+                    content_guard_ctx = PulpContentGuardContext(
+                        pulp_ctx, entity={"name": content_guard_name}
+                    )
+                    desired_attributes["content_guard"] = content_guard_ctx.pulp_href


The module should verify that the content guard exists if a name is provided. Accessing pulp_href on a context with a missing entity may lead to unexpected behavior or errors if the guard cannot be found in Pulp.

Suggested change

content_guard_ctx = PulpContentGuardContext(

pulp_ctx, entity={"name": content_guard_name}

)

desired_attributes["content_guard"] = content_guard_ctx.pulp_href

content_guard_ctx = PulpContentGuardContext(

pulp_ctx, entity={"name": content_guard_name}

)

if not content_guard_ctx.entity:

result["failed"] = True

result["msg"] = f"Content guard '{content_guard_name}' not found."

return result

desired_attributes["content_guard"] = content_guard_ctx.pulp_href

gemini-code-assist · 2026-05-12T14:37:08Z

+            repository_ctx = PulpContainerRepositoryContext(
+                pulp_ctx, entity={"name": sync_item["repository"]}
+            )
+            repository = repository_ctx.entity


The module should verify that the repository exists before attempting to access its attributes. If repository_ctx.entity is None, the subsequent call to repository.get("remote") on line 170 will raise an AttributeError.

Suggested change

repository = repository_ctx.entity

repository = repository_ctx.entity

if not repository:

result["failed"] = True

result["msg"] = f"Repository '{sync_item['repository']}' not found."

return result

gemini-code-assist · 2026-05-12T14:37:08Z

+                remote_ctx = PulpContainerRemoteContext(
+                    pulp_ctx, entity={"name": remote_name}
+                )
+                payload["remote"] = remote_ctx


The module should verify that the remote exists before using it in the sync payload. Additionally, it is safer to pass the remote's pulp_href (a string) rather than the context object itself to the sync method to ensure compatibility with the underlying API client.

Suggested change

remote_ctx = PulpContainerRemoteContext(

pulp_ctx, entity={"name": remote_name}

)

payload["remote"] = remote_ctx

remote_ctx = PulpContainerRemoteContext(

pulp_ctx, entity={"name": remote_name}

)

if not remote_ctx.entity:

result["failed"] = True

result["msg"] = f"Remote '{remote_name}' not found."

return result

payload["remote"] = remote_ctx.pulp_href

gemini-code-assist · 2026-05-12T14:37:09Z

+    container_remotes_list: "{{ container_remotes_list + [{
+      'name': item.name + '-remote',
+      'upstream_name': item.get('upstream_name', item.name),
+      'url': item.get('url'),
+      'ca_cert': item.get('ca_cert'),
+      'client_cert': item.get('client_cert'),
+      'client_key': item.get('client_key'),
+      'download_concurrency': item.get('download_concurrency'),
+      'exclude_tags': item.get('exclude_tags'),
+      'include_tags': item.get('include_tags'),
+      'policy': item.get('policy'),
+      'proxy_url': item.get('proxy_url'),
+      'proxy_username': item.get('proxy_username'),
+      'proxy_password': item.get('proxy_password'),
+      'remote_username': item.get('remote_username'),
+      'remote_password': item.get('remote_password'),
+      'tls_validation': item.get('tls_validation'),
+      'state': item.get('state') } | dict2items | rejectattr('value', 'none') | items2dict ] }}"


Using set_fact within a loop to build a list is an anti-pattern in Ansible because it re-evaluates the entire list on every iteration, leading to $O(N^2)$ performance. Given that this PR aims to improve performance through parallelism, consider refactoring this to use a single set_fact with a complex Jinja2 expression or a custom filter to build the list in one go, which would be significantly more efficient for a large number of repositories.

Resolve content guard names to hrefs in the batch container distribution module, matching the old pulp.squeezer behavior. Add a test covering guarded container distributions. Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>

Hide the set_fact task that builds remotes with passwords and keys. Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>

gemini-code-assist Bot reviewed Dec 22, 2025

View reviewed changes

Comment thread roles/pulp_container_content/tasks/process_content.yml

Comment thread roles/pulp_container_content/tasks/process_content.yml Outdated

Alex-Welsh force-pushed the full-parallel branch 16 times, most recently from 7d1881f to a0f69f2 Compare December 23, 2025 10:19

Alex-Welsh mentioned this pull request Jan 2, 2026

Add "fast" mode to container repository updates #69

Closed

Alex-Welsh changed the title ~~Full parallel downstream implementation of pulp container modules~~ Downstream implementation of parallel pulp container modules in python Jan 2, 2026

Alex-Welsh force-pushed the full-parallel branch from a0f69f2 to 5440658 Compare January 7, 2026 11:23

Alex-Welsh marked this pull request as ready for review January 7, 2026 11:23

Alex-Welsh requested a review from a team as a code owner January 7, 2026 11:23

Alex-Welsh force-pushed the full-parallel branch from 5440658 to d2a0b81 Compare January 7, 2026 11:26

gemini-code-assist Bot reviewed Jan 7, 2026

View reviewed changes

Alex-Welsh force-pushed the full-parallel branch from 9f8f40c to 93589d8 Compare January 7, 2026 14:20

Alex-Welsh force-pushed the full-parallel branch from 93589d8 to e293c07 Compare February 24, 2026 13:18

Alex-Welsh force-pushed the full-parallel branch from e293c07 to 34b7e60 Compare March 16, 2026 11:46

grzegorzkoper requested changes May 8, 2026

View reviewed changes

owenjones force-pushed the full-parallel branch from 02f17bf to a071ffe Compare May 8, 2026 14:52

Add new parallel container repo modules and tests

4b822ce

owenjones force-pushed the full-parallel branch from a071ffe to 4b822ce Compare May 8, 2026 14:56

owenjones requested a review from grzegorzkoper May 11, 2026 07:20

grzegorzkoper previously approved these changes May 12, 2026

View reviewed changes

bbezak approved these changes May 12, 2026

View reviewed changes

bbezak requested changes May 12, 2026

View reviewed changes

Comment thread plugins/modules/container_remotes.py Outdated

Comment thread plugins/modules/container_remotes.py Outdated

Comment thread plugins/modules/container_remotes.py

Comment thread plugins/modules/container_remotes.py

Preserve remote username no_log handling

f7f3ee2

Mark remote/proxy usernames as no_log and strip them from returned container remote data, matching the old pulp.squeezer remote module behavior. Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>

bbezak dismissed grzegorzkoper’s stale review via b7ac982 May 12, 2026 14:33

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

bbezak force-pushed the full-parallel branch from b7ac982 to a91ca77 Compare May 12, 2026 14:48

bbezak added 2 commits May 12, 2026 16:55

Fix container distribution content guard lookup

b3138a7

Resolve content guard names to hrefs in the batch container distribution module, matching the old pulp.squeezer behavior. Add a test covering guarded container distributions. Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>

Avoid logging container remote secrets

0a15ae1

Hide the set_fact task that builds remotes with passwords and keys. Signed-off-by: Bartosz Bezak <bartosz@stackhpc.com>

bbezak force-pushed the full-parallel branch from a91ca77 to 0a15ae1 Compare May 12, 2026 14:55

-            repository = repository_ctx.entity
+            repository = repository_ctx.entity
+            if not repository:
+                result["failed"] = True
+                result["msg"] = f"Repository '{sync_item['repository']}' not found."
+                return result

Conversation

Alex-Welsh commented Dec 22, 2025

Uh oh!

gemini-code-assist Bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Alex-Welsh commented Jan 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

priteau commented Feb 24, 2026

Uh oh!

priteau commented Feb 24, 2026

Uh oh!

technowhizz commented Apr 2, 2026

Uh oh!

Alex-Welsh commented Apr 2, 2026

Uh oh!

grzegorzkoper left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grzegorzkoper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bbezak commented May 12, 2026

Uh oh!

bbezak commented May 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

grzegorzkoper left a comment •

edited

Loading