Fix/llm simple qa pipeline by LakshyaMulchandani · Pull Request #429 · kubeedge/ianvs

LakshyaMulchandani · 2026-05-17T12:55:50Z

Description

This PR restores the functionality of the llm_simple_qa benchmarking pipeline. It addresses upstream API structural drift between KubeEdge/Ianvs and KubeEdge/Sedna, patches interface gaps in the local base model implementation, and refactors absolute directory paths into portable relative paths.

Type of Change

Bug fix (non-breaking change which fixes an issue)
Code cleanup

Problem Analysis & Solution Breakdown

1. Upstream Dependency & Formatting Realignment

-> Problem: The upstream Sedna library expects JSONDataParse to parse legacy computer vision (COCO) formats, triggering a NameError: name 'COCO' is not defined when processing text lines. Additionally, case-sensitivity changes in the library caused JsonlDataParse imports to break.
-> Solution: Designed and injected a dedicated, localized JsonlDataParse class within core/testenvmanager/dataset/dataset.py that reads text queries line-by-line natively via standard JSON, isolating the text evaluation pipeline from legacy computer vision footprints.

2. Paradigm Interface Completion

-> Problem: The singletask_learning paradigm executes a mandatory data prep loop that threw an AttributeError: 'BaseModel' object has no attribute 'preprocess'.
-> Solution: Added a compliant preprocess method to the BaseModel class within examples/llm_simple_qa/testalgorithms/gen/basemodel.py to satisfy the interface lifecycle verification.

3. Path Invariance & Configuration Portability

-> Problem: Configuration files contained hardcoded absolute file paths pointing to external development directories, rendering the example non-functional on new client setups or CI/CD runner nodes.
-> Solution: Normalized all target configurations (benchmarkingjob.yaml, testenv.yaml, gen_algorithm.yaml) to clean, location-invariant relative URLs targeting the root execution environment.

How Has This Been Tested?

Tested locally inside a sandboxed WSL2 Ubuntu virtual environment in Windows on CPU execution mode:

Navigated to the repository root.
Verified localized file structures and initialized dependencies.
Executed the validation framework entry point in the terminal:
python3 benchmarking.py -f examples/llm_simple_qa/benchmarkingjob.yaml

kubeedge-bot · 2026-05-17T12:56:02Z

Welcome @LakshyaMulchandani! It looks like this is your first PR to kubeedge/ianvs 🎉

kubeedge-bot · 2026-05-17T12:56:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LakshyaMulchandani
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request introduces a custom JsonlDataParse class, updates dataset loading logic, and transitions various configuration paths from absolute to relative. Key feedback includes addressing a performance regression caused by hardcoding the device to CPU, improving the robustness and feature parity of the new JSONL parser, and investigating a potential regression when replacing JSONMetaDataParse with JSONDataParse.

gemini-code-assist · 2026-05-17T12:57:45Z

+    def parse(self, filepath):
+        self.x = []
+        self.y = []
+        with open(filepath, "r", encoding="utf-8") as f:
+            for line in f:
+                if line.strip():
+                    raw_data = json.loads(line.strip())
+                    if isinstance(raw_data, dict):
+                        question = raw_data.get("question", raw_data.get("input", ""))
+                        answer = raw_data.get("answer", raw_data.get("output", ""))
+                        self.x.append(str(question))
+                        self.y.append(str(answer))
+                    else:
+                        self.x.append(str(raw_data))
+                        self.y.append("")


The parse method in the new JsonlDataParse class has a few areas for improvement:

Robustness: json.loads() should be wrapped in a try-except block to handle malformed lines gracefully, preventing the entire pipeline from crashing on a single bad line.

Feature Parity: It currently ignores self.func (the feature_process function), which is standard in other Sedna parsers to allow for data preprocessing.

Genericity: The keys question, input, answer, and output are hardcoded. While this works for the current QA pipeline, it makes this core parser less reusable for other text-based tasks. Consider making these keys configurable in the future.

def parse(self, filepath): self.x = [] self.y = [] with open(filepath, "r", encoding="utf-8") as f: for line in f: clean_line = line.strip() if not clean_line: continue try: raw_data = json.loads(clean_line) except json.JSONDecodeError: continue if isinstance(raw_data, dict): question = raw_data.get("question", raw_data.get("input", "")) answer = raw_data.get("answer", raw_data.get("output", "")) self.x.append(str(question)) self.y.append(str(answer)) else: self.x.append(str(raw_data)) self.y.append("") if self.func: self.x = self.func(self.x)

gemini-code-assist · 2026-05-17T12:57:45Z


        if data_format == DatasetFormat.JSONFORLLM.value:
-            data = JSONMetaDataParse(data_type=data_type, func=feature_process)
+            data = JSONDataParse(data_type=data_type, func=feature_process)


Replacing JSONMetaDataParse with JSONDataParse may re-introduce the NameError: name 'COCO' is not defined issue mentioned in the PR description. Since JSONFORLLM (used for metadata.json) is often text-based in LLM tasks, and JSONDataParse reportedly has legacy computer vision dependencies in upstream Sedna, this change might break metadata parsing for LLM pipelines.

… regressions Signed-off-by: Lakshya Mulchandani <mulchandanilakshya@gmail.com>

kubeedge-bot requested review from MooreZheng and hsj576 May 17, 2026 12:56

kubeedge-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 17, 2026

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

kubeedge-bot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 17, 2026

LakshyaMulchandani force-pushed the fix/llm-simple-qa-pipeline branch from aabdca2 to 519e0dd Compare May 17, 2026 18:47

kubeedge-bot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 17, 2026

fix(llm_simple_qa): restore pipeline functionality and resolve device…

644e44f

… regressions Signed-off-by: Lakshya Mulchandani <mulchandanilakshya@gmail.com>

LakshyaMulchandani force-pushed the fix/llm-simple-qa-pipeline branch from 519e0dd to 644e44f Compare May 17, 2026 18:54

ken6078 mentioned this pull request May 21, 2026

Issue summarization for example of examples/llm_simple_qa #452

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/llm simple qa pipeline#429

Fix/llm simple qa pipeline#429
LakshyaMulchandani wants to merge 1 commit into
kubeedge:mainfrom
LakshyaMulchandani:fix/llm-simple-qa-pipeline

LakshyaMulchandani commented May 17, 2026

Uh oh!

kubeedge-bot commented May 17, 2026

Uh oh!

kubeedge-bot commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LakshyaMulchandani commented May 17, 2026

Description

Type of Change

Problem Analysis & Solution Breakdown

1. Upstream Dependency & Formatting Realignment

2. Paradigm Interface Completion

3. Path Invariance & Configuration Portability

How Has This Been Tested?

Uh oh!

kubeedge-bot commented May 17, 2026

Uh oh!

kubeedge-bot commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants