fix: issue with resource validation assuming Fargate (#34) by radusuciu · Pull Request #37 · snakemake/snakemake-executor-plugin-aws-batch

radusuciu · 2025-11-15T05:46:31Z

This closes #34, the issue being that validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.

Tested and working with an EC2 queue only.

Summary by CodeRabbit

New Features
- Added automatic platform detection for AWS Batch compute environments (EC2 or FARGATE).
- Implemented platform-specific resource validation with tailored constraints for each compute platform type.
Changes
- Updated default memory allocation to 1024 MiB.

As described in issue snakemake#34, validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.

coderabbitai · 2025-11-15T05:46:36Z

📝 Walkthrough

Walkthrough

The changes introduce platform detection logic to identify whether AWS Batch job queues use EC2 or Fargate compute environments, with corresponding platform-specific resource validation. Two new pass-through methods are added to BatchClient for AWS Batch describe operations.

Changes

Cohort / File(s)	Summary
BatchClient extensions `snakemake_executor_plugin_aws_batch/batch_client.py`	Added two new pass-through methods: `describe_job_queues(kwargs)` and `describe_compute_environments(kwargs)` to delegate AWS Batch operations to underlying boto3 client.
BatchJobBuilder platform support `snakemake_executor_plugin_aws_batch/batch_job_builder.py`	Added platform detection in `__init__` via new `_get_platform_from_queue()` method that queries AWS Batch to determine EC2 vs. Fargate. Refactored `_validate_resources()` to route to platform-specific validators: `_validate_fargate_resources()` (enforces vCPU/memory constraints) and `_validate_ec2_resources()` (basic sanity checks). Updated default memory from 2048 MiB to 1024 MiB. Changed `platformCapabilities` in job definition registration from hardcoded EC2 to dynamic `self.platform`.

Sequence Diagram(s)

sequenceDiagram
    participant BatchJobBuilder
    participant AWS as AWS Batch API
    participant Validator as Resource Validator

    BatchJobBuilder->>+AWS: _get_platform_from_queue()
    AWS-->>-BatchJobBuilder: describe_job_queues response
    BatchJobBuilder->>+AWS: describe_compute_environments
    AWS-->>-BatchJobBuilder: compute environment config
    
    alt computeResources.type is FARGATE/FARGATE_SPOT
        BatchJobBuilder->>BatchJobBuilder: platform = FARGATE
    else
        BatchJobBuilder->>BatchJobBuilder: platform = EC2
    end
    
    Note over BatchJobBuilder: Platform determined and stored
    
    BatchJobBuilder->>+Validator: _validate_resources(vcpu, mem)
    
    alt platform == FARGATE
        Validator->>Validator: _validate_fargate_resources()
        Note over Validator: Check vCPU/memory<br/>combos against<br/>Fargate constraints
    else
        Validator->>Validator: _validate_ec2_resources()
        Note over Validator: Enforce vcpu >= 1<br/>and mem >= 1024 MiB
    end
    
    Validator-->>-BatchJobBuilder: validated vcpu, mem

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Platform detection logic: Review AWS Batch API integration in _get_platform_from_queue(), including error handling, fallback behavior, and logging for missing or invalid data.
Validator routing and logic: Verify that _validate_resources() correctly routes to platform-specific validators and that both _validate_fargate_resources() and _validate_ec2_resources() enforce the correct constraints for their respective platforms.
Integration points: Confirm that the dynamically determined self.platform is correctly propagated to register_job_definition() and that the default memory value change (2048 → 1024 MiB) aligns with product requirements.
Edge cases: Check handling of missing or incomplete AWS Batch responses and whether fallback to EC2 is appropriate for all error scenarios.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main fix: addressing overly restrictive resource validation that was assuming Fargate. It directly relates to the primary changeset.
Linked Issues check	✅ Passed	The PR implements all requirements from issue #34: platform detection from job queue, Fargate-specific validation, and flexible EC2 resource handling.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to solving issue #34: delegating AWS Batch operations and implementing platform-aware resource validation.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2525ca0 and 3cc3117.

📒 Files selected for processing (2)

snakemake_executor_plugin_aws_batch/batch_client.py (1 hunks)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

snakemake_executor_plugin_aws_batch/batch_job_builder.py
snakemake_executor_plugin_aws_batch/batch_client.py

🧬 Code graph analysis (1)

snakemake_executor_plugin_aws_batch/batch_job_builder.py (2)

snakemake_executor_plugin_aws_batch/batch_client.py (2)

describe_job_queues (76-83)

describe_compute_environments (85-92)

snakemake_executor_plugin_aws_batch/constant.py (1)

BATCH_JOB_PLATFORM_CAPABILITIES (57-59)

🪛 Ruff (0.14.4)

snakemake_executor_plugin_aws_batch/batch_job_builder.py

102-102: Do not catch blind exception: Exception

(BLE001)

118-118: Avoid specifying long messages outside the exception class

(TRY003)

134-134: Avoid specifying long messages outside the exception class

(TRY003)

136-136: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (8)

snakemake_executor_plugin_aws_batch/batch_client.py (1)

76-92: LGTM! Clean pass-through wrappers.

Both methods follow the established pattern in the class and enable the downstream platform detection logic.

snakemake_executor_plugin_aws_batch/batch_job_builder.py (7)

33-34: Good approach to detect platform early.

Determining the platform during initialization enables fail-fast behavior and consistent platform-specific validation throughout the job lifecycle.

102-106: Broad exception catch is acceptable here.

While the static analysis tool flags the broad Exception catch, the fallback to EC2 with logging is appropriate for robustness during initialization. EC2 is the safer default as it has more flexible resource constraints than Fargate.

127-137: EC2 validation appropriately minimal.

The basic sanity checks (vcpu >= 1, memory >= 1024 MiB) are appropriate for EC2, which supports flexible resource allocation across various instance types. The 1024 MiB minimum aligns with the default memory value change.

139-150: Clean platform-aware routing.

The validation routing correctly delegates to platform-specific validators based on the detected platform. The integer conversion ensures consistent handling across both validators.

160-160: Verify the default memory reduction is intentional.

The default memory was reduced from 2048 MiB to 1024 MiB. While 1024 MiB is the minimum for EC2, this behavioral change could affect existing users who rely on the previous default.

Please confirm:

Is this default change intentional and documented in release notes?

Are there any implications for existing jobs that don't explicitly specify memory?

205-205: Core fix: dynamic platform capability.

Excellent! This change from a hardcoded EC2 value to the dynamically determined self.platform is the key fix that enables proper platform-specific behavior for both EC2 and Fargate environments.

70-71: ****

The assumption in the review comment is invalid. AWS Batch does not permit mixing EC2 and FARGATE compute environments in the same job queue — all compute environments attached to a single queue must use the same provisioning model (EC2/Spot or FARGATE/FARGATE_SPOT) and architecture. Since the queue's compute environments are guaranteed to have the same platform, checking only the first one is correct and sufficient.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-11-15T18:01:01Z

+    def _validate_fargate_resources(self, vcpu: int, mem: int) -> tuple[str, str]:
+        """Validates vcpu and memory conform to Fargate requirements.
+
+        Fargate requires strict memory/vCPU combinations.
+        https://docs.aws.amazon.com/batch/latest/userguide/fargate.html
+        """
        if mem in VALID_RESOURCES_MAPPING:
            if vcpu in VALID_RESOURCES_MAPPING[mem]:
                return str(vcpu), str(mem)
            else:
-                raise WorkflowError(f"Invalid vCPU value {vcpu} for memory {mem} MB")
+                raise WorkflowError(f"Invalid vCPU value {vcpu} for memory {mem} MB on Fargate")
        else:
            min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v])
            self.logger.warning(
-                f"Memory value {mem} MB is invalid for vCPU {vcpu}."
+                f"Memory value {mem} MB is invalid for vCPU {vcpu} on Fargate. "
                f"Setting memory to minimum allowed value {min_mem} MB."
            )
            return str(vcpu), str(min_mem)


⚠️ Potential issue | 🟠 Major

Handle edge case: invalid vCPU with no valid memory mapping.

Line 120 could raise ValueError if the vcpu value doesn't exist in any memory mapping. This happens when both vcpu and mem are invalid.

Consider adding a check before line 120:

else: + # Check if vcpu exists in any memory mapping + valid_memories = [m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v] + if not valid_memories: + raise WorkflowError( + f"Invalid vCPU value {vcpu} for Fargate. " + f"Valid vCPU values are: {sorted(set(v for values in VALID_RESOURCES_MAPPING.values() for v in values))}" + ) - min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v]) + min_mem = min(valid_memories) self.logger.warning(

🧰 Tools

🪛 Ruff (0.14.4)

118-118: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In snakemake_executor_plugin_aws_batch/batch_job_builder.py around lines 108 to 125, when mem is not a key in VALID_RESOURCES_MAPPING the current code computes min_mem = min([m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v]) which will raise a ValueError if the provided vcpu does not appear in any mapping; add a guard: build the list of candidate memory values first, if it's empty raise a WorkflowError with a clear message about invalid vCPU and memory for Fargate, otherwise take min(candidate_memories), log the adjustment as before and return str(vcpu), str(min_mem).

nh13 · 2026-01-25T00:48:45Z

I am hitting this issue. It was a big surprise to use aws batch and memory scaled down when requesting one cpu.

nh13 · 2026-04-29T06:56:06Z

@radusuciu — your fix landed on #40 as commit 404b0d1 with your authorship preserved, alongside several follow-on fixes. Happy to close this on your behalf if you prefer, or leave it for you to close.

nh13

Looking good so far! A few things before merge:

Cache the detection. _get_platform_from_queue() runs in __init__, and BatchJobBuilder is constructed once per job in run_job, so every job makes two Batch describe calls, which is redundant and a throttling risk on large workflows. Memoize per queue.
Add tests. The new validators and platform detection are completely uncovered — tests_mocked_api.py patches BatchJobBuilder.submit out entirely. Worth adding cases (via moto or a mocked batch_client) for: EC2 accepts a combo Fargate rejects, Fargate still enforces the strict matrix, and the fallback-to-EC2 path.
Default memory 2048 → 1024 looks unrelated to the validation fix. Can you explain it in the description or split it out?
The bare except Exception silently defaults to EC2, which could let a Fargate job through that AWS then rejects, Consider narrowing to botocore.exceptions.ClientError (needs importing).
Minor: in _validate_fargate_resources, min([... if vcpu in v]) raises ValueError rather than a WorkflowError when vcpu isn't one of {1, 2, 4, 8, 16}.
Can you fix merge conflicts?

Let me know if you'd like me to push any of the above fixes!

radusuciu · 2026-06-09T19:57:00Z

@nh13 this is good to close or you still want the above review comments addressed?

fix: issue with resource validation assuming Fargate snakemake#34

3cc3117

As described in issue snakemake#34, validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.

radusuciu changed the title ~~fix: issue with resource validation assuming Fargate #34~~ fix: issue with resource validation assuming Fargate (#34) Nov 15, 2025

radusuciu marked this pull request as ready for review November 15, 2025 17:58

coderabbitai Bot reviewed Nov 15, 2025

View reviewed changes

nh13 mentioned this pull request Jan 31, 2026

fix: use threads directive for vCPU and platform-aware resource validation #38

Closed

3 tasks

nh13 requested changes Jun 9, 2026

View reviewed changes

nh13 closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: issue with resource validation assuming Fargate (#34)#37

fix: issue with resource validation assuming Fargate (#34)#37
radusuciu wants to merge 1 commit into
snakemake:mainfrom
radusuciu:fix-overly-restrictive-resource-validation

radusuciu commented Nov 15, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 15, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Nov 15, 2025

Uh oh!

nh13 commented Jan 25, 2026

Uh oh!

nh13 commented Apr 29, 2026

Uh oh!

nh13 left a comment

Uh oh!

radusuciu commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

radusuciu commented Nov 15, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

nh13 commented Jan 25, 2026

Uh oh!

nh13 commented Apr 29, 2026

Uh oh!

nh13 left a comment

Choose a reason for hiding this comment

Uh oh!

radusuciu commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radusuciu commented Nov 15, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 15, 2025 •

edited

Loading