Skip to content

feat(nodes): add guardrails node for AI safety with input/output validation#534

Open
nihalnihalani wants to merge 10 commits intorocketride-org:developfrom
nihalnihalani:feature/guardrails-node
Open

feat(nodes): add guardrails node for AI safety with input/output validation#534
nihalnihalani wants to merge 10 commits intorocketride-org:developfrom
nihalnihalani:feature/guardrails-node

Conversation

@nihalnihalani
Copy link
Copy Markdown
Contributor

@nihalnihalani nihalnihalani commented Mar 30, 2026

#Hack-with-bay-2

Contribution Type

Feature — Comprehensive input/output guardrails for AI pipeline safety

Problem / Use Case

RocketRide pipelines have PII anonymization (`nodes/src/nodes/anonymize/`) but no protection against prompt injection attacks, hallucinated outputs, content safety violations, or PII leaks in LLM responses. Per Analytics Vidhya, guardrails are the #1 enterprise blocker for AI agent deployment in 2026.

Proposed Solution

A `guardrails` node with layered defense:

Input guardrails (pre-LLM):

  • Prompt injection detection (15 regex patterns + keyword scoring)
  • Topic restriction (allowed/blocked keyword lists)
  • Input length enforcement (character + token limits)

Output guardrails (post-LLM):

  • Hallucination detection (sentence-level grounding against source documents)
  • Content safety (self-harm, violence, illegal activity patterns)
  • PII leak detection (email, phone, SSN, credit card, IP address regex)
  • Format compliance (JSON, markdown, bullet list, numbered list)

Policy modes: block (raises exception), warn (log + continue), log (silent)

Why This Feature Fits This Codebase

The node uses the standard `writeQuestions` (input guard) and `writeAnswers` (output guard) pipeline flow, matching how LLM nodes at `nodes/src/nodes/llm_base/IInstance.py` process data. The `writeDocuments` method collects source documents for hallucination checking — leveraging the same document lane used by vector DB nodes like `nodes/src/nodes/chroma/`.

The existing `anonymize` node at `nodes/src/nodes/anonymize/IInstance.py` handles PII detection for INPUT text. This guardrails node complements it by detecting PII LEAKS in OUTPUT text (LLM accidentally revealing PII from its training data or context).

Policy enforcement uses `self.preventDefault()` from `IInstanceBase` for the block mode — the same control flow mechanism used throughout the pipeline system.

Validation

```
87 passed in 0.08s
ruff check: 0 errors
ruff format: all files unchanged
```

87 tests across: 15 injection attack patterns, 8 false-positive checks for safe text, hallucination grounding, content safety, PII regex, format compliance, policy enforcement, IInstance lifecycle, deep copy.

How This Could Be Extended

  • Add ML-based prompt injection detection (fine-tuned classifier)
  • Integrate with Galileo's eval-to-guardrail lifecycle
  • Add configurable response templates for blocked content
  • Add Prometheus metrics for violation rates (pairs with observability PR)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configurable guardrails system for content safety and validation
    • Monitors inputs for prompt injections, topic restrictions, and length limits
    • Evaluates outputs for hallucinations, PII leakage, content safety violations, and format compliance
    • Supports multiple enforcement modes (block, warn, log) and pre-configured safety profiles
    • Enables topic allowlists/blocklists and document source grounding

Add comprehensive input/output guardrails for AI pipeline safety:

Input guardrails (pre-LLM):
- Prompt injection detection (regex + keyword scoring)
- Topic restriction (allowed/blocked keyword lists)
- Input length enforcement (char + token limits)

Output guardrails (post-LLM):
- Hallucination detection (sentence-level grounding check)
- Content safety (self-harm, violence, illegal activity patterns)
- PII leak detection (email, phone, SSN, credit card, IP)
- Format compliance (JSON, markdown, bullet/numbered lists)

Policy modes: block (reject), warn (log + continue), log (silent).
Profiles: basic (injection + PII), strict (all checks), custom.
87 tests covering all checks, policy enforcement, and IInstance lifecycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive guardrails system for the pipeline nodes that performs configurable AI safety evaluations on input prompts and output responses. The system comprises a core engine implementing multiple safety checks (prompt injection, topic restriction, PII detection, hallucination, content safety, format compliance), global lifecycle management for initialization and teardown, and per-instance pipeline interception for document, question, and answer processing with policy-driven enforcement (block, warn, or log violations).

Changes

Cohort / File(s) Summary
Core Guardrails Engine
nodes/src/nodes/guardrails/guardrails_engine.py
Implements GuardrailsEngine class with seven check methods for input/output safety validation including prompt injection, topic restriction, input length, hallucination detection, content safety, PII leak detection, and format compliance. The evaluate() method orchestrates checks based on mode (input or output), aggregates violations, and derives policy-driven action (pass, block, warn, or log).
Pipeline Integration
nodes/src/nodes/guardrails/IGlobal.py, nodes/src/nodes/guardrails/IInstance.py
IGlobal manages global setup/teardown and engine instantiation conditional on endpoint mode. IInstance wraps pipeline lanes (writeDocuments, writeQuestions, writeAnswers) to intercept and evaluate content flow, storing source documents, evaluating text through the engine, and preventing propagation on policy violations.
Module Infrastructure
nodes/src/nodes/guardrails/__init__.py, nodes/src/nodes/guardrails/requirements.txt
Exports public interfaces (IGlobal, IInstance) and declares standard-library-only dependencies.
Node Configuration & Testing
nodes/src/nodes/guardrails/services.json, nodes/test/nodes/test_guardrails.py
services.json defines the guardrails:// node service with three pipeline lanes, configurable profiles (basic, strict, custom), and field definitions for policy mode and feature toggles. test_guardrails.py provides 938 lines of comprehensive pytest coverage validating all check methods, policy modes, control flow interception, lifecycle management, and configuration consistency.

Sequence Diagram(s)

sequenceDiagram
    participant Pipeline as Pipeline/<br/>Client
    participant IInstance
    participant IGlobal
    participant Engine as GuardrailsEngine
    participant Wrapped as Wrapped<br/>Instance

    rect rgba(100, 150, 200, 0.5)
        Note over Pipeline,Wrapped: Question Processing Flow
        Pipeline->>IInstance: writeQuestions(question)
        IInstance->>IInstance: Deep copy question
        IInstance->>IInstance: Extract full_text from<br/>questions + context
        
        alt full_text is blank
            IInstance->>Wrapped: Forward deep-copied<br/>question unchanged
        else full_text has content
            IInstance->>IGlobal: Access engine
            IInstance->>Engine: evaluate(full_text,<br/>mode='input')
            Engine-->>IInstance: Return result with<br/>action & violations
            
            alt action == 'block'
                IInstance->>IInstance: Log violations
                IInstance->>IInstance: preventDefault()
            else action == 'warn'
                IInstance->>IInstance: Log violations
                IInstance->>Wrapped: Forward deep-copied<br/>question
            else other action
                IInstance->>Wrapped: Forward deep-copied<br/>question
            end
        end
    end

    rect rgba(200, 150, 100, 0.5)
        Note over Pipeline,Wrapped: Answer Processing Flow
        Pipeline->>IInstance: writeAnswers(answer)
        IInstance->>IInstance: Deep copy answer
        IInstance->>IInstance: Extract text via<br/>answer.getText()
        
        alt text is blank
            IInstance->>Wrapped: Forward unchanged
        else text has content
            IInstance->>IGlobal: Access engine
            IInstance->>Engine: evaluate(text,<br/>mode='output',<br/>context={source_docs})
            Engine-->>IInstance: Return result with<br/>violations & action
            
            alt action == 'block'
                IInstance->>IInstance: Log violations
                IInstance->>IInstance: preventDefault()
            else action == 'warn'
                IInstance->>IInstance: Log violations
                IInstance->>Wrapped: Forward answer
            else other action
                IInstance->>Wrapped: Forward answer
            end
        end
    end

    rect rgba(150, 200, 100, 0.5)
        Note over Pipeline,Wrapped: Document Collection
        Pipeline->>IInstance: writeDocuments(documents)
        IInstance->>IInstance: Extract & store<br/>non-blank page_content<br/>to source_documents
        IInstance->>Wrapped: Forward original<br/>documents
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐰 A guardrail of safety, I tend with great care,
Scanning each prompt and response with flair,
Blocking the bad, and the sensitive too,
Keeping your pipeline both safe and true! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(nodes): add guardrails node for AI safety with input/output validation' clearly and specifically describes the main change: adding a new guardrails node with both input and output validation capabilities for AI safety.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 81-82: The review points out that appending guardrail verdicts
into Question.context via question.addContext leaks internal moderation state
into the prompt; remove the question.addContext(...) call that writes
`[guardrails:input] ...` and instead record result (result["passed"],
result["action"], len(result["violations"])) in non-prompt metadata or logs
(e.g., the existing tracing/logging mechanism used by the node or a dedicated
trace/metadata field) so the LLM prompt remains unchanged; locate the
question.addContext invocation in IInstance.py and replace it with a call that
stores the same information to trace/log metadata rather than Question.context.
- Around line 145-150: In the method that builds grounding context in IInstance
(where self.source_documents is appended), skip appending when a document's body
is null/empty: check doc.page_content (or dict['page_content'] or the string
doc) and only append when it's a non-empty, non-None, non-whitespace string; do
not convert None/'' to "None" or "" as that creates fake sources—ensure the
logic surrounding hasattr(doc, 'page_content'), the dict branch, and the
isinstance(doc, str) branch trims/validates content (e.g., strip() and length
check) before calling self.source_documents.append.
- Around line 84-85: Replace the custom exception usage with the framework
sentinel by calling self.preventDefault() wherever block mode is detected: in
IInstance (the code that checks result['action'] == 'block' at the existing
branch around lines with GuardrailsViolation) and the analogous check later (the
second occurrence around the later GuardrailsViolation). Concretely, remove the
raise GuardrailsViolation(result['violations']) calls and invoke
self.preventDefault(), optionally logging or attaching result['violations'] to
the instance/context before returning, so the framework recognizes the block
instead of treating it as an unexpected exception.

In `@nodes/src/nodes/guardrails/services.json`:
- Around line 136-173: The preconfig profiles for Guardrails currently lack the
max_tokens_estimate and expected_format knobs required by GuardrailsEngine; add
these keys to each profile in preconfig.profiles (e.g., "basic", "strict",
"custom") with sensible defaults (number for max_tokens_estimate, null or string
schema for expected_format) and also add corresponding entries to the service
"fields" schema so the UI/serialization exposes them; ensure the keys are named
exactly max_tokens_estimate and expected_format and that their types/metadata
match how GuardrailsEngine expects to read them.
- Around line 180-279: The UI schema uses prefixed field IDs (e.g.,
"guardrails.policy_mode", "guardrails.enable_prompt_injection") but runtime
(Config.getNodeConfig and GuardrailsEngine) expects unprefixed keys (e.g.,
policy_mode, enable_prompt_injection), so per-profile overrides are ignored; fix
by renaming all field keys to the unprefixed runtime keys (policy_mode,
enable_prompt_injection, enable_content_safety, enable_pii_detection,
enable_hallucination_check, max_input_length, blocked_topics, allowed_topics)
and update the "properties" arrays in the guardrails.custom, guardrails.basic,
and guardrails.strict objects to reference those same unprefixed keys so
Config.getNodeConfig picks up the selected profile values.

In `@test/nodes/test_guardrails.py`:
- Around line 73-89: The test helper _make_engine bypasses the real config
wiring, so add a new end-to-end test that loads the node config via the real
path and constructs GuardrailsEngine through Config.getNodeConfig() (reading the
test services.json fixture) to catch field-name mismatches; specifically, write
a test that reads the test services.json, calls
Config.getNodeConfig('guardrails' or the appropriate node key), then passes that
result into GuardrailsEngine and asserts the expected settings (e.g.,
policy_mode/profile, enable_* flags) match the intended values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c93d0d98-3b38-4fc7-ba96-f78d357813df

📥 Commits

Reviewing files that changed from the base of the PR and between b6b2dc1 and d7b1044.

📒 Files selected for processing (7)
  • nodes/src/nodes/guardrails/IGlobal.py
  • nodes/src/nodes/guardrails/IInstance.py
  • nodes/src/nodes/guardrails/__init__.py
  • nodes/src/nodes/guardrails/guardrails_engine.py
  • nodes/src/nodes/guardrails/requirements.txt
  • nodes/src/nodes/guardrails/services.json
  • test/nodes/test_guardrails.py

nihalnihalani and others added 2 commits March 30, 2026 16:31
- Remove addContext call that leaked guardrail verdicts into question context
- Use self.preventDefault() for block mode instead of raising GuardrailsViolation
- Skip appending documents with null/empty body in writeDocuments
- Add max_tokens_estimate and expected_format to preconfig profiles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter out null/blank document content in writeDocuments to prevent
  fake 'None' sources breaking hallucination detection
- Update tests to verify preventDefault pattern instead of expecting
  GuardrailsViolation raises (matching current IInstance implementation)
- Add max_tokens_estimate and expected_format field definitions to
  services.json so these engine checks are configurable via the UI
- Remove guardrails. prefix from config field IDs to align with
  unprefixed preconfig profile keys (matching sibling node conventions)
- Expose max_tokens_estimate and expected_format in strict profile UI
- Add end-to-end config wiring tests verifying services.json field IDs
  match engine constructor keys and preconfig profiles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 62-70: Replace the manual loops that build text_parts with list
extend calls: instead of iterating over question.questions and appending q.text,
call text_parts.extend(...) to add all question texts at once, and instead of
looping over question.context and appending ctx, call
text_parts.extend(question.context); update the code that constructs full_text
from text_parts accordingly (refer to the variables question, text_parts,
full_text and the places where q.text and ctx are currently appended).

In `@test/nodes/test_guardrails.py`:
- Line 691: The tuple unpacking from _load_iinstance_class() assigns
ViolationClass which is unused and triggers static-analysis warnings; update the
unpacking to prefix unused tuple variables with an underscore (e.g., change
ViolationClass to _ViolationClass) in the line using IInstance, EngineClass,
ViolationClass, FakeQuestion, FakeQuestionText and the other similar unpack at
the later occurrence so unused values are named with a leading underscore while
leaving used names (IInstance, EngineClass, FakeQuestion, FakeQuestionText)
unchanged.
- Around line 867-872: The loop that builds profile_keys should be simplified to
iterate over profiles.values() and over each profile_config's keys directly
rather than using profiles.items() and profile_config.keys(); update the code
that constructs profile_keys (currently using profile_name, profile_config from
profiles.items()) to a single comprehension or union over profiles.values()
skipping the 'title' key, then keep the existing missing = engine_keys -
profile_keys and assertion; refer to the symbols profile_keys, profiles,
profile_config, engine_keys, and missing when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 59f2052b-0f30-413c-9ef4-71c16a569a78

📥 Commits

Reviewing files that changed from the base of the PR and between d7b1044 and 9acaa64.

📒 Files selected for processing (3)
  • nodes/src/nodes/guardrails/IInstance.py
  • nodes/src/nodes/guardrails/services.json
  • test/nodes/test_guardrails.py

…ls node

- Use extend() instead of append loops for list construction in IInstance.py
- Prefix unused ViolationClass with underscore in test unpacking
- Use values() and drop .keys() for dict iteration in TestConfigWiring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 144-147: The loop in IInstance that computes content with a nested
ternary is dense; extract that logic into a small helper (e.g., a private method
like _extract_content or a top-level function get_document_content) and replace
the inline expression in the for loop with a call to that helper; the helper
should accept a single doc, return the page content for objects with a
page_content attribute, use doc.get('page_content') for dicts, fallback to the
doc itself otherwise, convert to string, and return None/empty if strip() yields
an empty string so the existing if content check and append to
self.source_documents remain unchanged.

In `@test/nodes/test_guardrails.py`:
- Around line 894-895: The comprehension building custom_keys uses
custom_profile.keys() unnecessarily; change it to iterate the dict directly by
using "for k in custom_profile" (i.e., replace custom_profile.keys() with
custom_profile) so custom_keys = {k for k in custom_profile if k != 'title'},
leaving custom_profile and custom_keys identifiers unchanged.
- Around line 937-938: The assertion iterates result.keys() unnecessarily;
update the second assertion in the test (the line asserting keys are str) to
iterate the dict directly by replacing result.keys() with result so it becomes
assert all(isinstance(k, str) for k in result) while keeping the preceding
assert isinstance(result, dict) intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3ab15de3-8ad4-435b-a806-728fa1f6c2ba

📥 Commits

Reviewing files that changed from the base of the PR and between 9acaa64 and ca68c92.

📒 Files selected for processing (2)
  • nodes/src/nodes/guardrails/IInstance.py
  • test/nodes/test_guardrails.py

- Extract nested ternary in writeDocuments to if/elif/else for readability
- Remove redundant .keys() calls when iterating dicts in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 61-68: The input evaluation omits Question.instructions so
guardrails validate a different string than the LLM receives; replace the manual
assembly using question.questions and question.context with the composed prompt
from Question.getPrompt() (or otherwise include question.instructions) when
building full_text in IInstance (referencing the variables full_text,
question.questions, question.context and the method Question.getPrompt());
ensure full_text = question.getPrompt() (or append question.instructions to the
assembled parts) so the guardrails run against the exact prompt sent to the
model.

In `@test/nodes/test_guardrails.py`:
- Around line 143-149: The test test_keyword_scoring_threshold currently uses
the high-weight keyword "jailbreak" which alone exceeds the threshold and
therefore never verifies score accumulation; change the test input passed to
engine.check_prompt_injection to a string containing two distinct low-weight
keywords (each below the threshold) so their combined score exceeds the
threshold and triggers the failing branch (i.e., replace the single "jailbreak"
token with two sub-threshold tokens to exercise the combined keyword-score
path).
- Around line 374-380: The test test_multiple_pii_types only asserts that
'email' is reported but should also assert the other PII types; update the
assertions after calling engine.check_pii_leak to verify that result['details']
includes the additional expected keys/markers for phone and ip_address (e.g.,
assert 'phone' in result['details'] and assert 'ip_address' in
result['details']) so the test fails if either phone or IP reporting is dropped.
- Around line 543-675: The sys.modules cleanup in _load_iinstance_class
indiscriminately removes any module name containing 'guardrails', which can
evict unrelated modules; change both cleanup loops to only remove and later
restore the exact injected module names you created (e.g., 'guardrails',
'guardrails.guardrails_engine', 'guardrails.IGlobal', 'guardrails.IInstance')
instead of substring matching, ensuring you save any pre-existing entries for
those exact keys and restore them in the finally block if present; update the
two locations that currently check "if 'guardrails' in mod_name" to iterate that
explicit list of module keys to safely isolate the injected package cleanup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f35a08c7-6299-429e-82ad-5be98853270c

📥 Commits

Reviewing files that changed from the base of the PR and between ca68c92 and 05e896d.

📒 Files selected for processing (2)
  • nodes/src/nodes/guardrails/IInstance.py
  • test/nodes/test_guardrails.py

Comment on lines +61 to +68
# Collect question text for evaluation
text_parts = []
if question.questions:
text_parts.extend(q.text for q in question.questions)
if question.context:
text_parts.extend(question.context)

full_text = ' '.join(text_parts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Input evaluation is missing Question.instructions.

In this repository, instructions are appended to Question before the model call. Because this block only joins question.questions and question.context, the guardrails run on a different string than the LLM actually sees, leaving upstream instruction text outside the input checks.

♻️ Minimal local fix (prefer the canonical prompt renderer if available)
         text_parts = []
         if question.questions:
             text_parts.extend(q.text for q in question.questions)
+        if getattr(question, 'instructions', None):
+            text_parts.extend(question.instructions)
         if question.context:
             text_parts.extend(question.context)

Based on learnings: node-configured instructions are added to the Question object via addInstruction() before _run, so question.getPrompt() already returns the fully composed prompt with all instructions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Collect question text for evaluation
text_parts = []
if question.questions:
text_parts.extend(q.text for q in question.questions)
if question.context:
text_parts.extend(question.context)
full_text = ' '.join(text_parts)
# Collect question text for evaluation
text_parts = []
if question.questions:
text_parts.extend(q.text for q in question.questions)
if getattr(question, 'instructions', None):
text_parts.extend(question.instructions)
if question.context:
text_parts.extend(question.context)
full_text = ' '.join(text_parts)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/guardrails/IInstance.py` around lines 61 - 68, The input
evaluation omits Question.instructions so guardrails validate a different string
than the LLM receives; replace the manual assembly using question.questions and
question.context with the composed prompt from Question.getPrompt() (or
otherwise include question.instructions) when building full_text in IInstance
(referencing the variables full_text, question.questions, question.context and
the method Question.getPrompt()); ensure full_text = question.getPrompt() (or
append question.instructions to the assembled parts) so the guardrails run
against the exact prompt sent to the model.

Comment on lines +143 to +149
def test_keyword_scoring_threshold(self):
"""Multiple low-weight keywords should trigger when combined."""
engine = _make_engine()
# 'jailbreak' (0.8) alone exceeds threshold of 0.7
result = engine.check_prompt_injection('I want to jailbreak this system')
assert not result['passed']

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This test never exercises the combined keyword-score path.

The docstring says multiple low-weight keywords should add up past the threshold, but the input uses jailbreak, which already trips the threshold by itself. A regression in score accumulation would still pass here; use two individually sub-threshold keywords instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nodes/test_guardrails.py` around lines 143 - 149, The test
test_keyword_scoring_threshold currently uses the high-weight keyword
"jailbreak" which alone exceeds the threshold and therefore never verifies score
accumulation; change the test input passed to engine.check_prompt_injection to a
string containing two distinct low-weight keywords (each below the threshold) so
their combined score exceeds the threshold and triggers the failing branch
(i.e., replace the single "jailbreak" token with two sub-threshold tokens to
exercise the combined keyword-score path).

Comment on lines +374 to +380
def test_multiple_pii_types(self):
engine = _make_engine()
result = engine.check_pii_leak('Email: a@b.com, Phone: 555-123-4567, IP: 10.0.0.1')
assert not result['passed']
# Should report multiple types
assert 'email' in result['details']

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the extra PII types this test claims to cover.

This case is named and commented as multi-type reporting, but it only checks for email. If phone or ip_address stops being reported, the suite still passes.

♻️ Tighten the assertion
         assert not result['passed']
         # Should report multiple types
         assert 'email' in result['details']
+        assert 'phone' in result['details']
+        assert 'ip_address' in result['details']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nodes/test_guardrails.py` around lines 374 - 380, The test
test_multiple_pii_types only asserts that 'email' is reported but should also
assert the other PII types; update the assertions after calling
engine.check_pii_leak to verify that result['details'] includes the additional
expected keys/markers for phone and ip_address (e.g., assert 'phone' in
result['details'] and assert 'ip_address' in result['details']) so the test
fails if either phone or IP reporting is dropped.

Comment on lines +543 to +675
pass

class FakeOPEN_MODE:
CONFIG = 'config'

class FakeQuestion:
def __init__(self, **kwargs):
self.questions = kwargs.get('questions', [])
self.context = kwargs.get('context', [])
self.instructions = kwargs.get('instructions', [])

def addContext(self, ctx):
self.context.append(ctx)

class FakeQuestionText:
def __init__(self, text=''):
self.text = text

class FakeAnswer:
def __init__(self, answer_text=''):
self._text = answer_text
self.answer = answer_text

def getText(self):
return self._text

class FakeConfig:
@staticmethod
def getNodeConfig(logicalType, connConfig):
return {}

stubs['rocketlib'].IInstanceBase = FakeIInstanceBase
stubs['rocketlib'].IGlobalBase = FakeIGlobalBase
stubs['rocketlib'].Entry = FakeEntry
stubs['rocketlib'].OPEN_MODE = FakeOPEN_MODE
stubs['rocketlib'].debug = lambda *a, **kw: None
stubs['rocketlib'].warning = lambda *a, **kw: None
stubs['ai.common.schema'].Question = FakeQuestion
stubs['ai.common.schema'].Answer = FakeAnswer
stubs['ai.common.config'].Config = FakeConfig
stubs['depends'].depends = lambda *a, **kw: None

for name, stub in stubs.items():
saved[name] = sys.modules.get(name)
sys.modules[name] = stub

try:
# Force reimport — clean out any cached guardrails modules
for mod_name in list(sys.modules.keys()):
if 'guardrails' in mod_name:
del sys.modules[mod_name]

# Create the package module first so relative imports resolve
pkg_spec = importlib.util.spec_from_file_location(
'guardrails',
os.path.join(_GUARDRAILS_DIR, '__init__.py'),
submodule_search_locations=[_GUARDRAILS_DIR],
)
pkg_mod = importlib.util.module_from_spec(pkg_spec)
sys.modules['guardrails'] = pkg_mod

# Load guardrails_engine submodule
engine_spec = importlib.util.spec_from_file_location(
'guardrails.guardrails_engine',
os.path.join(_GUARDRAILS_DIR, 'guardrails_engine.py'),
)
engine_mod = importlib.util.module_from_spec(engine_spec)
sys.modules['guardrails.guardrails_engine'] = engine_mod
engine_spec.loader.exec_module(engine_mod)

# Load IGlobal submodule
iglobal_spec = importlib.util.spec_from_file_location(
'guardrails.IGlobal',
os.path.join(_GUARDRAILS_DIR, 'IGlobal.py'),
)
iglobal_mod = importlib.util.module_from_spec(iglobal_spec)
sys.modules['guardrails.IGlobal'] = iglobal_mod
iglobal_spec.loader.exec_module(iglobal_mod)

# Load IInstance submodule
iinst_spec = importlib.util.spec_from_file_location(
'guardrails.IInstance',
os.path.join(_GUARDRAILS_DIR, 'IInstance.py'),
)
iinst_mod = importlib.util.module_from_spec(iinst_spec)
sys.modules['guardrails.IInstance'] = iinst_mod
iinst_spec.loader.exec_module(iinst_mod)

return iinst_mod.IInstance, engine_mod.GuardrailsEngine, engine_mod.GuardrailsViolation, FakeQuestion, FakeQuestionText, FakeAnswer
finally:
for name in stubs:
if saved[name] is None:
sys.modules.pop(name, None)
else:
sys.modules[name] = saved[name]
# Clean up guardrails modules
for mod_name in list(sys.modules.keys()):
if 'guardrails' in mod_name:
sys.modules.pop(mod_name, None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '543,675p' test/nodes/test_guardrails.py

python - <<'PY'
candidates = [
    'guardrails',
    'guardrails.IInstance',
    'test_guardrails',
    'pkg.third_party_guardrails',
]
for name in candidates:
    print(f"{name}: {'guardrails' in name}")
PY

Repository: rocketride-org/rocketride-server

Length of output: 5309


Scope the sys.modules cleanup to the injected package names.

The substring match removes any module name containing guardrails, including the current test module (test_guardrails) and unrelated packages (e.g., pkg.third_party_guardrails), and those originals are never restored. Since _load_iinstance_class() runs across multiple tests, this poisons the import cache for the rest of the session.

♻️ Safer cleanup and restoration
         # Stub rocketlib and ai.common.schema
         saved = {}
+        saved_guardrails = {}
         stubs = {
             'rocketlib': types.ModuleType('rocketlib'),
             'rocketlib.error': types.ModuleType('rocketlib.error'),
             ...
         }
...
         try:
             # Force reimport — clean out any cached guardrails modules
             for mod_name in list(sys.modules.keys()):
-                if 'guardrails' in mod_name:
-                    del sys.modules[mod_name]
+                if mod_name == 'guardrails' or mod_name.startswith('guardrails.'):
+                    saved_guardrails[mod_name] = sys.modules.pop(mod_name)
...
         finally:
             for name in stubs:
                 if saved[name] is None:
                     sys.modules.pop(name, None)
                 else:
                     sys.modules[name] = saved[name]
             # Clean up guardrails modules
             for mod_name in list(sys.modules.keys()):
-                if 'guardrails' in mod_name:
-                    sys.modules.pop(mod_name, None)
+                if mod_name == 'guardrails' or mod_name.startswith('guardrails.'):
+                    sys.modules.pop(mod_name, None)
+            sys.modules.update(saved_guardrails)
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 543-543: Missing return type annotation for staticmethod _load_iinstance_class

(ANN205)


[warning] 565-565: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 568-568: Missing return type annotation for private function preventDefault

Add return type annotation: None

(ANN202)


[warning] 582-582: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 582-582: Missing type annotation for **kwargs

(ANN003)


[warning] 587-587: Missing return type annotation for private function addContext

Add return type annotation: None

(ANN202)


[warning] 591-591: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 595-595: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 599-599: Missing return type annotation for private function getText

(ANN202)


[warning] 604-604: Missing return type annotation for staticmethod getNodeConfig

(ANN205)


[warning] 604-604: Unused static method argument: logicalType

(ARG004)


[warning] 604-604: Unused static method argument: connConfig

(ARG004)


[warning] 611-611: Unused lambda argument: a

(ARG005)


[warning] 611-611: Unused lambda argument: kw

(ARG005)


[warning] 612-612: Unused lambda argument: a

(ARG005)


[warning] 612-612: Unused lambda argument: kw

(ARG005)


[warning] 616-616: Unused lambda argument: a

(ARG005)


[warning] 616-616: Unused lambda argument: kw

(ARG005)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nodes/test_guardrails.py` around lines 543 - 675, The sys.modules
cleanup in _load_iinstance_class indiscriminately removes any module name
containing 'guardrails', which can evict unrelated modules; change both cleanup
loops to only remove and later restore the exact injected module names you
created (e.g., 'guardrails', 'guardrails.guardrails_engine',
'guardrails.IGlobal', 'guardrails.IInstance') instead of substring matching,
ensuring you save any pre-existing entries for those exact keys and restore them
in the finally block if present; update the two locations that currently check
"if 'guardrails' in mod_name" to iterate that explicit list of module keys to
safely isolate the injected package cleanup.

Copy link
Copy Markdown
Collaborator

@asclearuc asclearuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

It is a good candidate for an experimental node once all the issues, including those found by CodeRabbit, are resolved. Please keep in mind that CodeRabbit may identify additional issues in this feature, so you should return to this PR some time after your push.

logger = logging.getLogger(__name__)


class GuardrailsViolation(Exception):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception is not used

engine = self.IGlobal.engine

# Deep copy to prevent mutation of the original
question = copy.deepcopy(question)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a single-output pipe node. There is never more than one downstream consumer, so there is nothing to protect against. Pass the object directly. In a RAG pipeline a Question may carry large embedding vectors across its documents list — deep-copying on every message is non-trivial overhead for zero benefit. Same applies to copy.deepcopy(answer) in writeAnswers.

@@ -0,0 +1,938 @@
# =============================================================================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be not at test/nodes but nodes/test/nodes

# =============================================================================

import copy
import logging
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use rocketlib logging functions instead of logging

RocketRide nodes use warning(), debug(), and error() from rocketlib — not Python's standard logging module. Messages routed through rocketlib are visible in the pipeline UI and engine logs; messages sent to a stdlib logger are not.

Replace:

import logging
logger = logging.getLogger(__name__)
...
logger.warning('Guardrails input blocked: %s — %s', violation['rule'], violation['details'])

With:

from rocketlib import warning
...
warning(f'Guardrails input blocked: {violation["rule"]}{violation["details"]}')

Apply to both IInstance.py and guardrails_engine.py.

return

# Run input guardrails
result = engine.evaluate(full_text, mode='input')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IGlobal.beginGlobal() failed before setting self.engine, this raises AttributeError with no clean error. Add a guard:

engine = self.IGlobal.engine
if engine is None:
    self.instance.writeQuestions(question)
    return

Same applies in writeAnswers().

found_count = sum(1 for w in meaningful_words if w in combined_sources)
coverage = found_count / len(meaningful_words) if meaningful_words else 1.0

if coverage < 0.3:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very important magic number!!!
It should be either

  • set as a constant
  • come as a setting

# Extract meaningful words (3+ chars, not stop words)
words = re.findall(r'\b[a-zA-Z]{3,}\b', sentence.lower())
# Filter common stop words
stop_words = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be as a constant on class level

// Capabilities are flags that change the behavior of the underlying
// engine
//
"capabilities": [],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "experimental" in "capabilities"

@nihalnihalani
Copy link
Copy Markdown
Contributor Author

Hi team — all CodeRabbit fixes have been addressed and 92 tests pass. Would appreciate a re-review when you have a chance. Thanks!

… import

- Move inline stop_words set from check_hallucination() to a STOP_WORDS
  frozenset class constant per asclearuc's review feedback
- Remove unused `from rocketlib import warning` in guardrails_engine.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nihalnihalani
Copy link
Copy Markdown
Contributor Author

@asclearuc All 6 review items have been addressed:

  1. Unused exception classGuardrailsViolation removed (earlier commit)
  2. Unnecessary copy.deepcopy — Removed from both writeQuestions and writeAnswers (earlier commit)
  3. Test file location — Moved from test/nodes/test_guardrails.py to nodes/test/nodes/test_guardrails.py
  4. Replace import logging — Now uses from rocketlib import warning in IInstance.py; removed unused warning import from guardrails_engine.py
  5. Guard for uninitialized engine — Both writeQuestions and writeAnswers now check if engine is None and pass through (earlier commit)
  6. Extract magic numbers — Stop words set moved from inline in check_hallucination() to class-level STOP_WORDS frozenset constant. TOKEN_ESTIMATE_MULTIPLIER, HALLUCINATION_COVERAGE_THRESHOLD, and MIN_SENTENCE_LENGTH were already class constants.

Could you re-review when you get a chance?

nihalnihalani and others added 2 commits April 6, 2026 11:41
- Replace stdlib logging with rocketlib warning() for pipeline UI visibility
- Remove copy.deepcopy — single-output pipe node, no downstream mutation risk
- Add engine None guard in writeQuestions/writeAnswers for safe passthrough
- Remove unused GuardrailsViolation exception class
- Update test tuple unpackings and warn mode test after class removal
- Fix test module path after relocation to nodes/test/nodes/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nihalnihalani
Copy link
Copy Markdown
Contributor Author

@asclearuc All 6 review items addressed in commit 2999955:

  1. Unused exception removedGuardrailsViolation class deleted from guardrails_engine.py
  2. copy.deepcopy removed — Single-output pipe node, objects passed directly now
  3. Test file moved — Relocated from test/nodes/ to nodes/test/nodes/ (done in prior commit)
  4. rocketlib logging — Replaced import logging / logger.warning() with from rocketlib import warning / warning() in IInstance.py
  5. Engine None guard — Added if engine is None: forward and return in both writeQuestions() and writeAnswers()
  6. Magic numbers — Extracted TOKEN_ESTIMATE_MULTIPLIER, HALLUCINATION_COVERAGE_THRESHOLD, MIN_SENTENCE_LENGTH as class constants (done in prior commit), plus STOP_WORDS to class-level frozenset

All 90 tests pass. Would appreciate a re-review when convenient. Thanks!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
nodes/test/nodes/test_guardrails.py (3)

374-380: ⚠️ Potential issue | 🟡 Minor

Assert every PII type this case is supposed to cover.

Right now the test only proves email appears in the report. If phone_us or ip_address stops being reported, this still passes.

💡 Suggested fix
         assert not result['passed']
         # Should report multiple types
         assert 'email' in result['details']
+        assert 'phone_us' in result['details']
+        assert 'ip_address' in result['details']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/nodes/test_guardrails.py` around lines 374 - 380, The
test_multiple_pii_types test only asserts that 'email' is present in
result['details']; update the test to assert that all expected PII types are
reported by the engine.check_pii_leak call (e.g., check that 'email', 'phone_us'
and 'ip_address' appear in result['details'] and that result['passed'] is
False), so if any of those types stop being detected the test will fail; modify
the assertions in test_multiple_pii_types to explicitly verify each expected key
in result['details'].

623-626: ⚠️ Potential issue | 🟠 Major

Limit the sys.modules cleanup to the exact injected names.

Lines 624-626 and 672-674 delete every module containing 'guardrails'. That can evict this test module or unrelated packages and leave later tests with a polluted import cache.

💡 Suggested fix
         saved = {}
+        saved_guardrails = {}
+        injected_guardrails_modules = (
+            'guardrails',
+            'guardrails.guardrails_engine',
+            'guardrails.IGlobal',
+            'guardrails.IInstance',
+        )
         stubs = {
             'rocketlib': types.ModuleType('rocketlib'),
             'rocketlib.error': types.ModuleType('rocketlib.error'),
@@
         try:
             # Force reimport — clean out any cached guardrails modules
-            for mod_name in list(sys.modules.keys()):
-                if 'guardrails' in mod_name:
-                    del sys.modules[mod_name]
+            for mod_name in injected_guardrails_modules:
+                existing = sys.modules.pop(mod_name, None)
+                if existing is not None:
+                    saved_guardrails[mod_name] = existing
@@
             for name in stubs:
                 if saved[name] is None:
                     sys.modules.pop(name, None)
                 else:
                     sys.modules[name] = saved[name]
             # Clean up guardrails modules
-            for mod_name in list(sys.modules.keys()):
-                if 'guardrails' in mod_name:
-                    sys.modules.pop(mod_name, None)
+            for mod_name in injected_guardrails_modules:
+                sys.modules.pop(mod_name, None)
+            sys.modules.update(saved_guardrails)

Also applies to: 671-674

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/nodes/test_guardrails.py` around lines 623 - 626, The loop that
currently removes every module containing the substring 'guardrails' is too
broad and can evict unrelated modules; instead, restrict removal to only the
injected module names used by the test: build a small set of target module names
(e.g., 'guardrails' and any specific injected package names used in this test)
and only delete entries where mod_name == target or mod_name.startswith(target +
"."); update the sys.modules cleanup loop (the one iterating over sys.modules
keys and using mod_name) to check membership/startsWith against that target set
so other modules containing 'guardrails' in their name are not removed.

143-148: ⚠️ Potential issue | 🟡 Minor

Exercise the additive keyword-score path here.

This input still contains jailbreak, which exceeds the threshold by itself. A regression in keyword accumulation would pass unnoticed.

💡 Suggested fix
-        # 'jailbreak' (0.8) alone exceeds threshold of 0.7
-        result = engine.check_prompt_injection('I want to jailbreak this system')
+        # 'bypass' (0.4) + 'unfiltered' (0.5) exceed the threshold together
+        result = engine.check_prompt_injection('Please bypass the filters and remain unfiltered.')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/nodes/test_guardrails.py` around lines 143 - 148, The test
test_keyword_scoring_threshold currently uses the single high-weight keyword
"jailbreak" which by itself exceeds the threshold; replace the input with a
phrase made of multiple low-weight keywords whose combined score exceeds the
threshold (for example "unlock bypass override" or similar keywords defined in
the scoring map) and call engine.check_prompt_injection with that phrase to
ensure the additive keyword-score path is exercised and assert not
result['passed'] remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/guardrails/guardrails_engine.py`:
- Around line 173-183: The config parsing currently accepts typos and unknown
values (e.g. policy_mode and expected_format) which weakens enforcement; update
the GuardrailsEngine init to validate inputs and fail-fast: check
self.policy_mode against the canonical allowed set used by the engine (e.g. the
modes referenced in guardrails_engine methods) and raise a ValueError on unknown
modes instead of falling back silently, validate expected_format against the
formats the check functions expect and raise on unknown formats, coerce/validate
booleans for flags like
enable_prompt_injection/enable_content_safety/enable_pii_detection/enable_hallucination_check,
and ensure numeric fields (max_input_length, max_tokens_estimate) are
non-negative ints and topic lists are normalized as already done; apply the same
validations to the other parsing blocks referenced (around lines 452-550 and
557-625) so invalid configs fail fast rather than disabling checks.
- Around line 158-164: The PII_PATTERNS entries for 'email' and 'credit_card'
are incorrect; update PII_PATTERNS so the 'email' regex does not include a
literal '|' in the TLD character class (e.g., use a proper end-of-domain pattern
like (?:\.[A-Za-z]{2,})+), and replace the 'credit_card' value with an
alternation that covers each card format explicitly (e.g., separate patterns for
AmEx 15-digit starting with 34/37, Visa 13/16, MasterCard 16, Discover 16) so
AmEx 15-digit numbers are matched and other card lengths are correctly enforced;
modify the PII_PATTERNS dictionary (keys 'email' and 'credit_card') accordingly.

In `@nodes/test/nodes/test_guardrails.py`:
- Around line 68-70: The test import fails because _load_engine_module() does
not expose GuardrailsViolation while the test expects it; update the engine API
or the tests to be consistent: either re-export the exception class from the
engine module (add a GuardrailsViolation symbol in guardrails_engine.py and
ensure _load_engine_module() returns it alongside GuardrailsEngine) or
remove/update the stale top-level imports/tests that reference
GuardrailsViolation (the import statements using GuardrailsEngine and
GuardrailsViolation and any tests asserting on GuardrailsViolation); adjust only
the symbol exposure so tests and the engine API match.
- Around line 897-906: The test's required_knobs set is missing the
topic-related knobs that GuardrailsEngine.__init__ actually reads; update the
required_knobs in the test to include 'blocked_topics' and 'allowed_topics' so
the test fails if a custom profile drops those fields—specifically modify the
required_knobs definition near the test for GuardrailsEngine to add
'blocked_topics' and 'allowed_topics' to the set so GuardrailsEngine.__init__'s
expectations match the test's required inputs.

---

Duplicate comments:
In `@nodes/test/nodes/test_guardrails.py`:
- Around line 374-380: The test_multiple_pii_types test only asserts that
'email' is present in result['details']; update the test to assert that all
expected PII types are reported by the engine.check_pii_leak call (e.g., check
that 'email', 'phone_us' and 'ip_address' appear in result['details'] and that
result['passed'] is False), so if any of those types stop being detected the
test will fail; modify the assertions in test_multiple_pii_types to explicitly
verify each expected key in result['details'].
- Around line 623-626: The loop that currently removes every module containing
the substring 'guardrails' is too broad and can evict unrelated modules;
instead, restrict removal to only the injected module names used by the test:
build a small set of target module names (e.g., 'guardrails' and any specific
injected package names used in this test) and only delete entries where mod_name
== target or mod_name.startswith(target + "."); update the sys.modules cleanup
loop (the one iterating over sys.modules keys and using mod_name) to check
membership/startsWith against that target set so other modules containing
'guardrails' in their name are not removed.
- Around line 143-148: The test test_keyword_scoring_threshold currently uses
the single high-weight keyword "jailbreak" which by itself exceeds the
threshold; replace the input with a phrase made of multiple low-weight keywords
whose combined score exceeds the threshold (for example "unlock bypass override"
or similar keywords defined in the scoring map) and call
engine.check_prompt_injection with that phrase to ensure the additive
keyword-score path is exercised and assert not result['passed'] remains.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 21e0eb4c-a931-44bf-a538-f15a933fde6a

📥 Commits

Reviewing files that changed from the base of the PR and between 05e896d and fb553b8.

📒 Files selected for processing (2)
  • nodes/src/nodes/guardrails/guardrails_engine.py
  • nodes/test/nodes/test_guardrails.py

Comment on lines +158 to +164
PII_PATTERNS: Dict[str, re.Pattern] = {
'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'),
'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'),
'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'),
'credit_card': re.compile(r'\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'),
'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Correct the PII regexes before relying on them for blocking.

The credit_card pattern advertises AmEx via 3[47] but still requires a 16-digit layout, so real 15-digit AmEx numbers slip through. The email TLD class also accepts a literal |, which widens false positives.

💡 Suggested fix
     PII_PATTERNS: Dict[str, re.Pattern] = {
-        'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'),
+        'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'),
         'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'),
         'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'),
-        'credit_card': re.compile(r'\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'),
+        'credit_card': re.compile(
+            r'\b(?:'
+            r'3[47]\d{2}[- ]?\d{6}[- ]?\d{5}'
+            r'|(?:4\d{3}|5[1-5]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}'
+            r')\b'
+        ),
         'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'),
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PII_PATTERNS: Dict[str, re.Pattern] = {
'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'),
'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'),
'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'),
'credit_card': re.compile(r'\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'),
'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'),
}
PII_PATTERNS: Dict[str, re.Pattern] = {
'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'),
'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'),
'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'),
'credit_card': re.compile(
r'\b(?:'
r'3[47]\d{2}[- ]?\d{6}[- ]?\d{5}'
r'|(?:4\d{3}|5[1-5]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}'
r')\b'
),
'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'),
}
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 158-164: Mutable default value for class attribute

(RUF012)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/guardrails/guardrails_engine.py` around lines 158 - 164, The
PII_PATTERNS entries for 'email' and 'credit_card' are incorrect; update
PII_PATTERNS so the 'email' regex does not include a literal '|' in the TLD
character class (e.g., use a proper end-of-domain pattern like
(?:\.[A-Za-z]{2,})+), and replace the 'credit_card' value with an alternation
that covers each card format explicitly (e.g., separate patterns for AmEx
15-digit starting with 34/37, Visa 13/16, MasterCard 16, Discover 16) so AmEx
15-digit numbers are matched and other card lengths are correctly enforced;
modify the PII_PATTERNS dictionary (keys 'email' and 'credit_card') accordingly.

Comment on lines +173 to +183
self.policy_mode = config.get('policy_mode', 'warn')
self.enable_prompt_injection = config.get('enable_prompt_injection', True)
self.enable_content_safety = config.get('enable_content_safety', True)
self.enable_pii_detection = config.get('enable_pii_detection', True)
self.enable_hallucination_check = config.get('enable_hallucination_check', True)
self.max_input_length = config.get('max_input_length', 0)
self.max_tokens_estimate = config.get('max_tokens_estimate', 0)
self.blocked_topics = [t.strip().lower() for t in config.get('blocked_topics', []) if t.strip()]
self.allowed_topics = [t.strip().lower() for t in config.get('allowed_topics', []) if t.strip()]
self.expected_format = config.get('expected_format', '')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject unsupported config values instead of silently weakening enforcement.

A typo in policy_mode currently falls through to 'log', an unknown mode skips every check and returns passed=True, and an unknown expected_format is treated as a successful check. In a safety node, those should fail fast instead of turning protections off.

💡 Suggested fix
-        self.policy_mode = config.get('policy_mode', 'warn')
+        self.policy_mode = str(config.get('policy_mode', 'warn')).strip().lower()
+        if self.policy_mode not in {'block', 'warn', 'log'}:
+            raise ValueError(f'Unsupported policy_mode: {self.policy_mode}')
-        return {
-            'rule': 'format_compliance',
-            'passed': True,
-            'severity': 'low',
-            'details': f'Unknown format "{expected_format}"; skipping check',
-        }
+        return {
+            'rule': 'format_compliance',
+            'passed': False,
+            'severity': 'medium',
+            'details': f'Unsupported expected_format: {expected_format}',
+        }
-        if mode == 'input':
+        normalized_mode = mode.strip().lower()
+
+        if normalized_mode == 'input':
             # Input guardrails
             if self.enable_prompt_injection:
                 results.append(self.check_prompt_injection(text))
@@
-        elif mode == 'output':
+        elif normalized_mode == 'output':
             # Output guardrails
             if self.enable_hallucination_check:
                 source_docs = context.get('source_documents', [])
                 results.append(self.check_hallucination(text, source_docs))
@@
             expected_fmt = context.get('expected_format', self.expected_format)
             if expected_fmt:
                 results.append(self.check_format_compliance(text, expected_fmt))
+        else:
+            raise ValueError(f'Unsupported guardrails mode: {mode}')
 
         # Aggregate results

Also applies to: 452-550, 557-625

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/guardrails/guardrails_engine.py` around lines 173 - 183, The
config parsing currently accepts typos and unknown values (e.g. policy_mode and
expected_format) which weakens enforcement; update the GuardrailsEngine init to
validate inputs and fail-fast: check self.policy_mode against the canonical
allowed set used by the engine (e.g. the modes referenced in guardrails_engine
methods) and raise a ValueError on unknown modes instead of falling back
silently, validate expected_format against the formats the check functions
expect and raise on unknown formats, coerce/validate booleans for flags like
enable_prompt_injection/enable_content_safety/enable_pii_detection/enable_hallucination_check,
and ensure numeric fields (max_input_length, max_tokens_estimate) are
non-negative ints and topic lists are normalized as already done; apply the same
validations to the other parsing blocks referenced (around lines 452-550 and
557-625) so invalid configs fail fast rather than disabling checks.

Comment on lines +897 to +906
required_knobs = {
'policy_mode',
'enable_prompt_injection',
'enable_content_safety',
'enable_pii_detection',
'enable_hallucination_check',
'max_input_length',
'max_tokens_estimate',
'expected_format',
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The “all engine knobs” set omits the topic-list knobs.

GuardrailsEngine.__init__ reads blocked_topics and allowed_topics, but required_knobs does not. As written, this test can still pass after dropping the topic-restriction inputs from the custom profile.

💡 Suggested fix
         required_knobs = {
             'policy_mode',
             'enable_prompt_injection',
             'enable_content_safety',
             'enable_pii_detection',
             'enable_hallucination_check',
             'max_input_length',
             'max_tokens_estimate',
+            'blocked_topics',
+            'allowed_topics',
             'expected_format',
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/nodes/test_guardrails.py` around lines 897 - 906, The test's
required_knobs set is missing the topic-related knobs that
GuardrailsEngine.__init__ actually reads; update the required_knobs in the test
to include 'blocked_topics' and 'allowed_topics' so the test fails if a custom
profile drops those fields—specifically modify the required_knobs definition
near the test for GuardrailsEngine to add 'blocked_topics' and 'allowed_topics'
to the set so GuardrailsEngine.__init__'s expectations match the test's required
inputs.

@nihalnihalani
Copy link
Copy Markdown
Contributor Author

All 6 review items addressed (unused exception removed, deepcopy eliminated, rocketlib logging, engine null guard, constants extracted). 90 tests pass. Ready for re-review.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

No description provided.

Copy link
Copy Markdown
Collaborator

@asclearuc asclearuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

It is marked as an experimental, and is approved.

@nihalnihalani
Copy link
Copy Markdown
Contributor Author

Merge Request

This PR has:

  • 2 human approvals (asclearuc + stepmikhaylov)
  • All 15 review issues addressed and confirmed
  • 90+ tests passing
  • All CI checks green (Ubuntu, Windows, macOS)
  • CodeRabbit passed

All review feedback has been implemented. Could a maintainer with merge permissions please merge this? The merge state shows BLOCKED which appears to be a branch protection rule — may need an admin override or additional required reviewer.

@asclearuc asclearuc enabled auto-merge (squash) April 9, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants