Skip to content

docs(proposals): enhance parallel processing proposal#419

Open
vibhor-5 wants to merge 1 commit into
kubeedge:mainfrom
vibhor-5:fix/issue-8
Open

docs(proposals): enhance parallel processing proposal#419
vibhor-5 wants to merge 1 commit into
kubeedge:mainfrom
vibhor-5:fix/issue-8

Conversation

@vibhor-5
Copy link
Copy Markdown

@vibhor-5 vibhor-5 commented May 8, 2026

Overview
This PR updates the parallel test case processing proposal (#8) and follows on from PR #308. It consolidates research, refines the architectural design, and strengthens backward compatibility guarantees.

Key Changes

  • Lay-0 Architecture: Renewed the system architecture section using a standardized L1/L2/L3 layered structure (Mermaid) for better alignment with Ianvs standards.
  • Paradigm Deep-Dive:
    • Joint Inference: Analyzed tensor/data partitioning and map-reduce potential for future intra-test-case parallelism.
    • Lifelong Learning: Addressed multi-module/multi-model nature and suitability for pipeline/model partitioning.
  • Resource Management:
    • Added a Default Worker Setting Research Plan using the PCB-AOI example to establish safe RAM defaults.
    • Outlined future work for Dynamic Worker Scaling based on real-time resource probing.
  • Compatibility & Validation:
    • Detailed Code Revision Considerations showing how the core execution pipeline is extended without breaking existing serial paths.
    • Introduced a Three-Tier Validation Plan (Backward Compatibility, Parallel Opt-in, and Result Equivalence).
  • DCO Compliance: All commits are signed off.

Consolidation
All parallel-related research findings and design justifications are now consolidated into a single proposal file: docs/proposals/chore/parallel-processing/parallel-testcase-processing-proposal.md.

Checklist

Note: This PR focuses strictly on the proposal documentation. Implementation will follow once the refined design is approved.

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vibhor-5
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

Welcome @vibhor-5! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot kubeedge-bot requested review from MooreZheng and hsj576 May 8, 2026 09:16
@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 8, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a proposal for parallel test case execution in the Ianvs benchmarking framework using Python's ProcessPoolExecutor. The design emphasizes backward compatibility and configurable parallelism. Review feedback highlights the need to handle single-core systems in the default worker count calculation, maintain consistency between architectural diagrams and workflow descriptions regarding the 'Rank' component, and ensure the impact analysis table comprehensively lists all modified files like init.py.


# Risk Assessment & DoD

- **Risk**: Resource Exhaustion (OOM). **Mitigation**: Conservative `cpu_count-1` default and Phase 1.5 research.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposal suggests a default worker count of cpu_count-1. On single-core systems, this results in 0 workers, which causes ProcessPoolExecutor to fail. Using max(1, cpu_count() - 1) ensures at least one worker is always used.

Suggested change
- **Risk**: Resource Exhaustion (OOM). **Mitigation**: Conservative `cpu_count-1` default and Phase 1.5 research.
- **Risk**: Resource Exhaustion (OOM). **Mitigation**: Conservative max(1, cpu_count() - 1) default and Phase 1.5 research.


## End-to-End Workflow

> CLI parses `--parallel` args $\to$ `BenchmarkingJob` sets internal fields $\to$ `TestCaseController` builds test cases $\to$ `run_testcases()` enters enhanced parallel branch $\to$ `ProcessPoolExecutor` manages workers $\to$ `Rank` saves results.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The workflow description mentions Rank saving results, but this component is not present in the architecture diagram (Lines 75-114) or the mapping section (Lines 116-123). Consider adding it to the diagram or clarifying its role to maintain consistency.

| `benchmarking.py` | CLI argument parsing | Added `-p` and `-w` as optional. Default maintains current behavior. |
| `benchmarkingjob.py` | Configuration parsing | CLI overrides applied *after* YAML; standard priority. |
| `testcasecontroller.py` | `run_testcases` method | Side-by-side implementation. Serial branch is a direct copy of production. |
| `testcase.py` | Worker function | New top-level function. Existing `TestCase.run` remains unchanged. |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Line 135 mentions __init__.py is modified for the worker function, but it is missing from this impact analysis table. Consider including it for completeness.

Suggested change
| `testcase.py` | Worker function | New top-level function. Existing `TestCase.run` remains unchanged. |
| testcase.py and __init__.py | Worker function | New top-level function in testcase.py, exported via __init__.py. Existing TestCase.run remains unchanged. |

… reviewer feedback

- Updated Lay-0 architecture with L1/L2/L3 layered structure.
- Enhanced AI paradigm analysis with deeper Ianvs integration details.
- Added default worker count research plan and future dynamic settings.
- Strengthened backward compatibility justifications and validation plan.
- Consolidated all parallel-related research into a single proposal file.

Addresses reviewer comments for PR kubeedge#308.
Signed-off-by: Krrish Biswas <krrish175-byte@users.noreply.github.com>
Signed-off-by: vibhor kumar <vibhork1105@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants