implementation of skills rag by blublinsky · Pull Request #2821 · openshift/lightspeed-service

blublinsky · 2026-03-17T13:10:59Z

Description

Add SkillsRAG hybrid retrieval system for selecting the best skill for a user query, mirroring the existing ToolsRAG pattern (dense + BM25 sparse fusion with configurable alpha/top_k/threshold)
Add SkillsConfig Pydantic model (skills_dir, embed_model_path, alpha, top_k, threshold) wired into OLSConfig with YAML parsing, equality, and cache invalidation on reload
Add skills_rag cached property on AppConfig with three guard clauses (config present → directory exists → skills loaded) before committing to embedding model initialization
Add 5 OpenShift troubleshooting skills in Anthropic-style Markdown format: pod-failure-diagnosis, degraded-operator-recovery, node-not-ready, route-ingress-troubleshooting, namespace-troubleshooting
load_content reads entire skill directory tree recursively, filters binary files via UnicodeDecodeError catch (no suffix allowlist), strips skill.md frontmatter, separates additional files with ## {relative_path} headers
Add 30 unit tests (27 for SkillsRAG + 3 for SkillsConfig) covering loading, indexing, retrieval, subdirectory recursion, binary file exclusion, and config validation

Type of change

Related Tickets & Documents

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

openshift-ci · 2026-03-17T13:14:44Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

onmete · 2026-03-17T14:18:09Z

ols/app/models/config.py

+        description="Path to directory containing skill subdirectories",
+    )
+
+    embed_model_path: Optional[str] = Field(


Should this be configurable? We need to use the same embedding as we have for RAG/ToolsRAG.

We are doing exactly what we are doing in tools Rag. Its local configuration only (per Ashutosh request). Not exposed in CR

onmete · 2026-03-17T14:25:06Z

ols/src/skills/skills_rag.py

+    for child in sorted(skills_path.iterdir()):
+        if not child.is_dir():
+            continue
+        skill_file = child / "skill.md"


While this is consistent with what is brought in the PR - the usual format is SKILL.md (uppercase). So some further tuning will be required - making it case insensitive.

onmete · 2026-03-17T14:28:54Z

ols/src/skills/skills_rag.py

+            except (UnicodeDecodeError, ValueError):
+                continue
+            if entry.name == "skill.md":
+                parts.insert(0, raw.split("---", 2)[2].strip())


There is a lib for parsing frontmatter:

import frontmatter post = frontmatter.load("skills/pod-failure-diagnosis/skill.md") post.metadata # {"name": "pod-failure-diagnosis", "description": "..."} post.content # everything after the frontmatter block

Worth adding it as a new dependency.

good idea, fixed

onmete · 2026-03-17T14:36:20Z

ols/src/skills/skills_rag.py

+    )
+
+
+class SkillsRAG:


Can you explore the option to reuse ToolsRAG? Because this seems like a reimplementation of it.
Maybe you can just create a new instance of that index with skills. Like dynamically build skills as tools

StructuredTool.from_function( name="skill_pod_failure_diagnosis", description="Diagnose pods stuck in CrashLoopBackOff, ImagePullBackOff, Pending...", func=lambda: skill.load_content(), )

Basically:

Create a ToolsRAG instance

Index skills into it using name + description as the document text

At query time, call retrieve_hybrid(query) to get the best match - future work

Load the matched skill's content and inject into system prompt - future work

We never considered straight reuse. The conversation was about reusing the approach.

The reason SkillsRAG is its own class (~80 lines) rather than a ToolsRAG instance is that ToolsRAG carries a lot of tool-specific concerns that don't apply to skills: server filtering in both dense and sparse paths, tool_json metadata, _convert_langchain_tool_to_dict, server-grouped return format, and remove_tools for dynamic MCP updates. We'd have to bypass or stub out most of that.

My point here is that if we wrap each skill as a StructuredTool (just name + description, func never called) and populate a new ToolsRAG instance with them, we can use retrieve_hybrid to find the matching skill and completely avoid creating SkillsRAG.
The retrieval logic is identical between the two classes. The only new code needed is a skill loader that reads directories, parses frontmatter, and returns list[StructuredTool].
It would be just a way of reusing existing components to get the appropriate results for the given query. After that, skills have a different path than tools.

A single index mixing skills and tools will require post-filtering to separate them. The shared infrastructure (QdrantStore, BM25Okapi) is already reused; SkillsRAG only adds ~30 lines of domain-specific wiring that doesn't fit ToolsRAG's server-centric model (server filtering, tool_json metadata, grouped-by-server return format).

bparees · 2026-03-17T14:53:49Z

is the expectation that this functionality is later migrated into LCORE? Or is it going to remain an OLS-specific capability carried within the OLS codebase?

onmete · 2026-03-17T15:04:26Z

@bparees We are now focused on troubleshooting the scenario. Any activities regarding OLS adopting LCORE are postponed - thinking about if/how this can be in LCORE too, sadly falls into these postponed activities.

But generally speaking, given how "hot topic" the skills are, I would assume that if we have it in OLS, it should be in LCORE before we migrate.

blublinsky · 2026-03-17T15:04:40Z

is the expectation that this functionality is later migrated into LCORE? Or is it going to remain an OLS-specific capability carried within the OLS codebase?

Llama stack supports skills differently. For now its OLS only. Lets get some experience first and evaluate how useful it is

blublinsky · 2026-03-17T18:01:58Z

/retest

openshift-ci · 2026-03-17T23:37:14Z

@blublinsky: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-ols-cluster	`df88c16`	link	true	`/test e2e-ols-cluster`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from bparees and joshuawilson March 17, 2026 13:14

blublinsky force-pushed the skill-rag branch from 6b8fcfd to 94ca231 Compare March 17, 2026 14:10

onmete reviewed Mar 17, 2026

View reviewed changes

implementation of skills rag

df88c16

blublinsky force-pushed the skill-rag branch from 94ca231 to df88c16 Compare March 17, 2026 15:15

Conversation

blublinsky commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

openshift-ci bot commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bparees commented Mar 17, 2026

Uh oh!

onmete commented Mar 17, 2026

Uh oh!

blublinsky commented Mar 17, 2026

Uh oh!

blublinsky commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blublinsky commented Mar 17, 2026 •

edited

Loading