feat(nodes): add guardrails node for AI safety with input/output validation#534
feat(nodes): add guardrails node for AI safety with input/output validation#534nihalnihalani wants to merge 10 commits intorocketride-org:developfrom
Conversation
Add comprehensive input/output guardrails for AI pipeline safety: Input guardrails (pre-LLM): - Prompt injection detection (regex + keyword scoring) - Topic restriction (allowed/blocked keyword lists) - Input length enforcement (char + token limits) Output guardrails (post-LLM): - Hallucination detection (sentence-level grounding check) - Content safety (self-harm, violence, illegal activity patterns) - PII leak detection (email, phone, SSN, credit card, IP) - Format compliance (JSON, markdown, bullet/numbered lists) Policy modes: block (reject), warn (log + continue), log (silent). Profiles: basic (injection + PII), strict (all checks), custom. 87 tests covering all checks, policy enforcement, and IInstance lifecycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive guardrails system for the pipeline nodes that performs configurable AI safety evaluations on input prompts and output responses. The system comprises a core engine implementing multiple safety checks (prompt injection, topic restriction, PII detection, hallucination, content safety, format compliance), global lifecycle management for initialization and teardown, and per-instance pipeline interception for document, question, and answer processing with policy-driven enforcement (block, warn, or log violations). Changes
Sequence Diagram(s)sequenceDiagram
participant Pipeline as Pipeline/<br/>Client
participant IInstance
participant IGlobal
participant Engine as GuardrailsEngine
participant Wrapped as Wrapped<br/>Instance
rect rgba(100, 150, 200, 0.5)
Note over Pipeline,Wrapped: Question Processing Flow
Pipeline->>IInstance: writeQuestions(question)
IInstance->>IInstance: Deep copy question
IInstance->>IInstance: Extract full_text from<br/>questions + context
alt full_text is blank
IInstance->>Wrapped: Forward deep-copied<br/>question unchanged
else full_text has content
IInstance->>IGlobal: Access engine
IInstance->>Engine: evaluate(full_text,<br/>mode='input')
Engine-->>IInstance: Return result with<br/>action & violations
alt action == 'block'
IInstance->>IInstance: Log violations
IInstance->>IInstance: preventDefault()
else action == 'warn'
IInstance->>IInstance: Log violations
IInstance->>Wrapped: Forward deep-copied<br/>question
else other action
IInstance->>Wrapped: Forward deep-copied<br/>question
end
end
end
rect rgba(200, 150, 100, 0.5)
Note over Pipeline,Wrapped: Answer Processing Flow
Pipeline->>IInstance: writeAnswers(answer)
IInstance->>IInstance: Deep copy answer
IInstance->>IInstance: Extract text via<br/>answer.getText()
alt text is blank
IInstance->>Wrapped: Forward unchanged
else text has content
IInstance->>IGlobal: Access engine
IInstance->>Engine: evaluate(text,<br/>mode='output',<br/>context={source_docs})
Engine-->>IInstance: Return result with<br/>violations & action
alt action == 'block'
IInstance->>IInstance: Log violations
IInstance->>IInstance: preventDefault()
else action == 'warn'
IInstance->>IInstance: Log violations
IInstance->>Wrapped: Forward answer
else other action
IInstance->>Wrapped: Forward answer
end
end
end
rect rgba(150, 200, 100, 0.5)
Note over Pipeline,Wrapped: Document Collection
Pipeline->>IInstance: writeDocuments(documents)
IInstance->>IInstance: Extract & store<br/>non-blank page_content<br/>to source_documents
IInstance->>Wrapped: Forward original<br/>documents
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 81-82: The review points out that appending guardrail verdicts
into Question.context via question.addContext leaks internal moderation state
into the prompt; remove the question.addContext(...) call that writes
`[guardrails:input] ...` and instead record result (result["passed"],
result["action"], len(result["violations"])) in non-prompt metadata or logs
(e.g., the existing tracing/logging mechanism used by the node or a dedicated
trace/metadata field) so the LLM prompt remains unchanged; locate the
question.addContext invocation in IInstance.py and replace it with a call that
stores the same information to trace/log metadata rather than Question.context.
- Around line 145-150: In the method that builds grounding context in IInstance
(where self.source_documents is appended), skip appending when a document's body
is null/empty: check doc.page_content (or dict['page_content'] or the string
doc) and only append when it's a non-empty, non-None, non-whitespace string; do
not convert None/'' to "None" or "" as that creates fake sources—ensure the
logic surrounding hasattr(doc, 'page_content'), the dict branch, and the
isinstance(doc, str) branch trims/validates content (e.g., strip() and length
check) before calling self.source_documents.append.
- Around line 84-85: Replace the custom exception usage with the framework
sentinel by calling self.preventDefault() wherever block mode is detected: in
IInstance (the code that checks result['action'] == 'block' at the existing
branch around lines with GuardrailsViolation) and the analogous check later (the
second occurrence around the later GuardrailsViolation). Concretely, remove the
raise GuardrailsViolation(result['violations']) calls and invoke
self.preventDefault(), optionally logging or attaching result['violations'] to
the instance/context before returning, so the framework recognizes the block
instead of treating it as an unexpected exception.
In `@nodes/src/nodes/guardrails/services.json`:
- Around line 136-173: The preconfig profiles for Guardrails currently lack the
max_tokens_estimate and expected_format knobs required by GuardrailsEngine; add
these keys to each profile in preconfig.profiles (e.g., "basic", "strict",
"custom") with sensible defaults (number for max_tokens_estimate, null or string
schema for expected_format) and also add corresponding entries to the service
"fields" schema so the UI/serialization exposes them; ensure the keys are named
exactly max_tokens_estimate and expected_format and that their types/metadata
match how GuardrailsEngine expects to read them.
- Around line 180-279: The UI schema uses prefixed field IDs (e.g.,
"guardrails.policy_mode", "guardrails.enable_prompt_injection") but runtime
(Config.getNodeConfig and GuardrailsEngine) expects unprefixed keys (e.g.,
policy_mode, enable_prompt_injection), so per-profile overrides are ignored; fix
by renaming all field keys to the unprefixed runtime keys (policy_mode,
enable_prompt_injection, enable_content_safety, enable_pii_detection,
enable_hallucination_check, max_input_length, blocked_topics, allowed_topics)
and update the "properties" arrays in the guardrails.custom, guardrails.basic,
and guardrails.strict objects to reference those same unprefixed keys so
Config.getNodeConfig picks up the selected profile values.
In `@test/nodes/test_guardrails.py`:
- Around line 73-89: The test helper _make_engine bypasses the real config
wiring, so add a new end-to-end test that loads the node config via the real
path and constructs GuardrailsEngine through Config.getNodeConfig() (reading the
test services.json fixture) to catch field-name mismatches; specifically, write
a test that reads the test services.json, calls
Config.getNodeConfig('guardrails' or the appropriate node key), then passes that
result into GuardrailsEngine and asserts the expected settings (e.g.,
policy_mode/profile, enable_* flags) match the intended values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c93d0d98-3b38-4fc7-ba96-f78d357813df
📒 Files selected for processing (7)
nodes/src/nodes/guardrails/IGlobal.pynodes/src/nodes/guardrails/IInstance.pynodes/src/nodes/guardrails/__init__.pynodes/src/nodes/guardrails/guardrails_engine.pynodes/src/nodes/guardrails/requirements.txtnodes/src/nodes/guardrails/services.jsontest/nodes/test_guardrails.py
- Remove addContext call that leaked guardrail verdicts into question context - Use self.preventDefault() for block mode instead of raising GuardrailsViolation - Skip appending documents with null/empty body in writeDocuments - Add max_tokens_estimate and expected_format to preconfig profiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter out null/blank document content in writeDocuments to prevent fake 'None' sources breaking hallucination detection - Update tests to verify preventDefault pattern instead of expecting GuardrailsViolation raises (matching current IInstance implementation) - Add max_tokens_estimate and expected_format field definitions to services.json so these engine checks are configurable via the UI - Remove guardrails. prefix from config field IDs to align with unprefixed preconfig profile keys (matching sibling node conventions) - Expose max_tokens_estimate and expected_format in strict profile UI - Add end-to-end config wiring tests verifying services.json field IDs match engine constructor keys and preconfig profiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 62-70: Replace the manual loops that build text_parts with list
extend calls: instead of iterating over question.questions and appending q.text,
call text_parts.extend(...) to add all question texts at once, and instead of
looping over question.context and appending ctx, call
text_parts.extend(question.context); update the code that constructs full_text
from text_parts accordingly (refer to the variables question, text_parts,
full_text and the places where q.text and ctx are currently appended).
In `@test/nodes/test_guardrails.py`:
- Line 691: The tuple unpacking from _load_iinstance_class() assigns
ViolationClass which is unused and triggers static-analysis warnings; update the
unpacking to prefix unused tuple variables with an underscore (e.g., change
ViolationClass to _ViolationClass) in the line using IInstance, EngineClass,
ViolationClass, FakeQuestion, FakeQuestionText and the other similar unpack at
the later occurrence so unused values are named with a leading underscore while
leaving used names (IInstance, EngineClass, FakeQuestion, FakeQuestionText)
unchanged.
- Around line 867-872: The loop that builds profile_keys should be simplified to
iterate over profiles.values() and over each profile_config's keys directly
rather than using profiles.items() and profile_config.keys(); update the code
that constructs profile_keys (currently using profile_name, profile_config from
profiles.items()) to a single comprehension or union over profiles.values()
skipping the 'title' key, then keep the existing missing = engine_keys -
profile_keys and assertion; refer to the symbols profile_keys, profiles,
profile_config, engine_keys, and missing when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 59f2052b-0f30-413c-9ef4-71c16a569a78
📒 Files selected for processing (3)
nodes/src/nodes/guardrails/IInstance.pynodes/src/nodes/guardrails/services.jsontest/nodes/test_guardrails.py
…ls node - Use extend() instead of append loops for list construction in IInstance.py - Prefix unused ViolationClass with underscore in test unpacking - Use values() and drop .keys() for dict iteration in TestConfigWiring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 144-147: The loop in IInstance that computes content with a nested
ternary is dense; extract that logic into a small helper (e.g., a private method
like _extract_content or a top-level function get_document_content) and replace
the inline expression in the for loop with a call to that helper; the helper
should accept a single doc, return the page content for objects with a
page_content attribute, use doc.get('page_content') for dicts, fallback to the
doc itself otherwise, convert to string, and return None/empty if strip() yields
an empty string so the existing if content check and append to
self.source_documents remain unchanged.
In `@test/nodes/test_guardrails.py`:
- Around line 894-895: The comprehension building custom_keys uses
custom_profile.keys() unnecessarily; change it to iterate the dict directly by
using "for k in custom_profile" (i.e., replace custom_profile.keys() with
custom_profile) so custom_keys = {k for k in custom_profile if k != 'title'},
leaving custom_profile and custom_keys identifiers unchanged.
- Around line 937-938: The assertion iterates result.keys() unnecessarily;
update the second assertion in the test (the line asserting keys are str) to
iterate the dict directly by replacing result.keys() with result so it becomes
assert all(isinstance(k, str) for k in result) while keeping the preceding
assert isinstance(result, dict) intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3ab15de3-8ad4-435b-a806-728fa1f6c2ba
📒 Files selected for processing (2)
nodes/src/nodes/guardrails/IInstance.pytest/nodes/test_guardrails.py
- Extract nested ternary in writeDocuments to if/elif/else for readability - Remove redundant .keys() calls when iterating dicts in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/guardrails/IInstance.py`:
- Around line 61-68: The input evaluation omits Question.instructions so
guardrails validate a different string than the LLM receives; replace the manual
assembly using question.questions and question.context with the composed prompt
from Question.getPrompt() (or otherwise include question.instructions) when
building full_text in IInstance (referencing the variables full_text,
question.questions, question.context and the method Question.getPrompt());
ensure full_text = question.getPrompt() (or append question.instructions to the
assembled parts) so the guardrails run against the exact prompt sent to the
model.
In `@test/nodes/test_guardrails.py`:
- Around line 143-149: The test test_keyword_scoring_threshold currently uses
the high-weight keyword "jailbreak" which alone exceeds the threshold and
therefore never verifies score accumulation; change the test input passed to
engine.check_prompt_injection to a string containing two distinct low-weight
keywords (each below the threshold) so their combined score exceeds the
threshold and triggers the failing branch (i.e., replace the single "jailbreak"
token with two sub-threshold tokens to exercise the combined keyword-score
path).
- Around line 374-380: The test test_multiple_pii_types only asserts that
'email' is reported but should also assert the other PII types; update the
assertions after calling engine.check_pii_leak to verify that result['details']
includes the additional expected keys/markers for phone and ip_address (e.g.,
assert 'phone' in result['details'] and assert 'ip_address' in
result['details']) so the test fails if either phone or IP reporting is dropped.
- Around line 543-675: The sys.modules cleanup in _load_iinstance_class
indiscriminately removes any module name containing 'guardrails', which can
evict unrelated modules; change both cleanup loops to only remove and later
restore the exact injected module names you created (e.g., 'guardrails',
'guardrails.guardrails_engine', 'guardrails.IGlobal', 'guardrails.IInstance')
instead of substring matching, ensuring you save any pre-existing entries for
those exact keys and restore them in the finally block if present; update the
two locations that currently check "if 'guardrails' in mod_name" to iterate that
explicit list of module keys to safely isolate the injected package cleanup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f35a08c7-6299-429e-82ad-5be98853270c
📒 Files selected for processing (2)
nodes/src/nodes/guardrails/IInstance.pytest/nodes/test_guardrails.py
| # Collect question text for evaluation | ||
| text_parts = [] | ||
| if question.questions: | ||
| text_parts.extend(q.text for q in question.questions) | ||
| if question.context: | ||
| text_parts.extend(question.context) | ||
|
|
||
| full_text = ' '.join(text_parts) |
There was a problem hiding this comment.
Input evaluation is missing Question.instructions.
In this repository, instructions are appended to Question before the model call. Because this block only joins question.questions and question.context, the guardrails run on a different string than the LLM actually sees, leaving upstream instruction text outside the input checks.
♻️ Minimal local fix (prefer the canonical prompt renderer if available)
text_parts = []
if question.questions:
text_parts.extend(q.text for q in question.questions)
+ if getattr(question, 'instructions', None):
+ text_parts.extend(question.instructions)
if question.context:
text_parts.extend(question.context)Based on learnings: node-configured instructions are added to the Question object via addInstruction() before _run, so question.getPrompt() already returns the fully composed prompt with all instructions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Collect question text for evaluation | |
| text_parts = [] | |
| if question.questions: | |
| text_parts.extend(q.text for q in question.questions) | |
| if question.context: | |
| text_parts.extend(question.context) | |
| full_text = ' '.join(text_parts) | |
| # Collect question text for evaluation | |
| text_parts = [] | |
| if question.questions: | |
| text_parts.extend(q.text for q in question.questions) | |
| if getattr(question, 'instructions', None): | |
| text_parts.extend(question.instructions) | |
| if question.context: | |
| text_parts.extend(question.context) | |
| full_text = ' '.join(text_parts) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/guardrails/IInstance.py` around lines 61 - 68, The input
evaluation omits Question.instructions so guardrails validate a different string
than the LLM receives; replace the manual assembly using question.questions and
question.context with the composed prompt from Question.getPrompt() (or
otherwise include question.instructions) when building full_text in IInstance
(referencing the variables full_text, question.questions, question.context and
the method Question.getPrompt()); ensure full_text = question.getPrompt() (or
append question.instructions to the assembled parts) so the guardrails run
against the exact prompt sent to the model.
| def test_keyword_scoring_threshold(self): | ||
| """Multiple low-weight keywords should trigger when combined.""" | ||
| engine = _make_engine() | ||
| # 'jailbreak' (0.8) alone exceeds threshold of 0.7 | ||
| result = engine.check_prompt_injection('I want to jailbreak this system') | ||
| assert not result['passed'] | ||
|
|
There was a problem hiding this comment.
This test never exercises the combined keyword-score path.
The docstring says multiple low-weight keywords should add up past the threshold, but the input uses jailbreak, which already trips the threshold by itself. A regression in score accumulation would still pass here; use two individually sub-threshold keywords instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/nodes/test_guardrails.py` around lines 143 - 149, The test
test_keyword_scoring_threshold currently uses the high-weight keyword
"jailbreak" which alone exceeds the threshold and therefore never verifies score
accumulation; change the test input passed to engine.check_prompt_injection to a
string containing two distinct low-weight keywords (each below the threshold) so
their combined score exceeds the threshold and triggers the failing branch
(i.e., replace the single "jailbreak" token with two sub-threshold tokens to
exercise the combined keyword-score path).
| def test_multiple_pii_types(self): | ||
| engine = _make_engine() | ||
| result = engine.check_pii_leak('Email: a@b.com, Phone: 555-123-4567, IP: 10.0.0.1') | ||
| assert not result['passed'] | ||
| # Should report multiple types | ||
| assert 'email' in result['details'] | ||
|
|
There was a problem hiding this comment.
Assert the extra PII types this test claims to cover.
This case is named and commented as multi-type reporting, but it only checks for email. If phone or ip_address stops being reported, the suite still passes.
♻️ Tighten the assertion
assert not result['passed']
# Should report multiple types
assert 'email' in result['details']
+ assert 'phone' in result['details']
+ assert 'ip_address' in result['details']🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/nodes/test_guardrails.py` around lines 374 - 380, The test
test_multiple_pii_types only asserts that 'email' is reported but should also
assert the other PII types; update the assertions after calling
engine.check_pii_leak to verify that result['details'] includes the additional
expected keys/markers for phone and ip_address (e.g., assert 'phone' in
result['details'] and assert 'ip_address' in result['details']) so the test
fails if either phone or IP reporting is dropped.
| pass | ||
|
|
||
| class FakeOPEN_MODE: | ||
| CONFIG = 'config' | ||
|
|
||
| class FakeQuestion: | ||
| def __init__(self, **kwargs): | ||
| self.questions = kwargs.get('questions', []) | ||
| self.context = kwargs.get('context', []) | ||
| self.instructions = kwargs.get('instructions', []) | ||
|
|
||
| def addContext(self, ctx): | ||
| self.context.append(ctx) | ||
|
|
||
| class FakeQuestionText: | ||
| def __init__(self, text=''): | ||
| self.text = text | ||
|
|
||
| class FakeAnswer: | ||
| def __init__(self, answer_text=''): | ||
| self._text = answer_text | ||
| self.answer = answer_text | ||
|
|
||
| def getText(self): | ||
| return self._text | ||
|
|
||
| class FakeConfig: | ||
| @staticmethod | ||
| def getNodeConfig(logicalType, connConfig): | ||
| return {} | ||
|
|
||
| stubs['rocketlib'].IInstanceBase = FakeIInstanceBase | ||
| stubs['rocketlib'].IGlobalBase = FakeIGlobalBase | ||
| stubs['rocketlib'].Entry = FakeEntry | ||
| stubs['rocketlib'].OPEN_MODE = FakeOPEN_MODE | ||
| stubs['rocketlib'].debug = lambda *a, **kw: None | ||
| stubs['rocketlib'].warning = lambda *a, **kw: None | ||
| stubs['ai.common.schema'].Question = FakeQuestion | ||
| stubs['ai.common.schema'].Answer = FakeAnswer | ||
| stubs['ai.common.config'].Config = FakeConfig | ||
| stubs['depends'].depends = lambda *a, **kw: None | ||
|
|
||
| for name, stub in stubs.items(): | ||
| saved[name] = sys.modules.get(name) | ||
| sys.modules[name] = stub | ||
|
|
||
| try: | ||
| # Force reimport — clean out any cached guardrails modules | ||
| for mod_name in list(sys.modules.keys()): | ||
| if 'guardrails' in mod_name: | ||
| del sys.modules[mod_name] | ||
|
|
||
| # Create the package module first so relative imports resolve | ||
| pkg_spec = importlib.util.spec_from_file_location( | ||
| 'guardrails', | ||
| os.path.join(_GUARDRAILS_DIR, '__init__.py'), | ||
| submodule_search_locations=[_GUARDRAILS_DIR], | ||
| ) | ||
| pkg_mod = importlib.util.module_from_spec(pkg_spec) | ||
| sys.modules['guardrails'] = pkg_mod | ||
|
|
||
| # Load guardrails_engine submodule | ||
| engine_spec = importlib.util.spec_from_file_location( | ||
| 'guardrails.guardrails_engine', | ||
| os.path.join(_GUARDRAILS_DIR, 'guardrails_engine.py'), | ||
| ) | ||
| engine_mod = importlib.util.module_from_spec(engine_spec) | ||
| sys.modules['guardrails.guardrails_engine'] = engine_mod | ||
| engine_spec.loader.exec_module(engine_mod) | ||
|
|
||
| # Load IGlobal submodule | ||
| iglobal_spec = importlib.util.spec_from_file_location( | ||
| 'guardrails.IGlobal', | ||
| os.path.join(_GUARDRAILS_DIR, 'IGlobal.py'), | ||
| ) | ||
| iglobal_mod = importlib.util.module_from_spec(iglobal_spec) | ||
| sys.modules['guardrails.IGlobal'] = iglobal_mod | ||
| iglobal_spec.loader.exec_module(iglobal_mod) | ||
|
|
||
| # Load IInstance submodule | ||
| iinst_spec = importlib.util.spec_from_file_location( | ||
| 'guardrails.IInstance', | ||
| os.path.join(_GUARDRAILS_DIR, 'IInstance.py'), | ||
| ) | ||
| iinst_mod = importlib.util.module_from_spec(iinst_spec) | ||
| sys.modules['guardrails.IInstance'] = iinst_mod | ||
| iinst_spec.loader.exec_module(iinst_mod) | ||
|
|
||
| return iinst_mod.IInstance, engine_mod.GuardrailsEngine, engine_mod.GuardrailsViolation, FakeQuestion, FakeQuestionText, FakeAnswer | ||
| finally: | ||
| for name in stubs: | ||
| if saved[name] is None: | ||
| sys.modules.pop(name, None) | ||
| else: | ||
| sys.modules[name] = saved[name] | ||
| # Clean up guardrails modules | ||
| for mod_name in list(sys.modules.keys()): | ||
| if 'guardrails' in mod_name: | ||
| sys.modules.pop(mod_name, None) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '543,675p' test/nodes/test_guardrails.py
python - <<'PY'
candidates = [
'guardrails',
'guardrails.IInstance',
'test_guardrails',
'pkg.third_party_guardrails',
]
for name in candidates:
print(f"{name}: {'guardrails' in name}")
PYRepository: rocketride-org/rocketride-server
Length of output: 5309
Scope the sys.modules cleanup to the injected package names.
The substring match removes any module name containing guardrails, including the current test module (test_guardrails) and unrelated packages (e.g., pkg.third_party_guardrails), and those originals are never restored. Since _load_iinstance_class() runs across multiple tests, this poisons the import cache for the rest of the session.
♻️ Safer cleanup and restoration
# Stub rocketlib and ai.common.schema
saved = {}
+ saved_guardrails = {}
stubs = {
'rocketlib': types.ModuleType('rocketlib'),
'rocketlib.error': types.ModuleType('rocketlib.error'),
...
}
...
try:
# Force reimport — clean out any cached guardrails modules
for mod_name in list(sys.modules.keys()):
- if 'guardrails' in mod_name:
- del sys.modules[mod_name]
+ if mod_name == 'guardrails' or mod_name.startswith('guardrails.'):
+ saved_guardrails[mod_name] = sys.modules.pop(mod_name)
...
finally:
for name in stubs:
if saved[name] is None:
sys.modules.pop(name, None)
else:
sys.modules[name] = saved[name]
# Clean up guardrails modules
for mod_name in list(sys.modules.keys()):
- if 'guardrails' in mod_name:
- sys.modules.pop(mod_name, None)
+ if mod_name == 'guardrails' or mod_name.startswith('guardrails.'):
+ sys.modules.pop(mod_name, None)
+ sys.modules.update(saved_guardrails)🧰 Tools
🪛 Ruff (0.15.7)
[warning] 543-543: Missing return type annotation for staticmethod _load_iinstance_class
(ANN205)
[warning] 565-565: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 568-568: Missing return type annotation for private function preventDefault
Add return type annotation: None
(ANN202)
[warning] 582-582: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 582-582: Missing type annotation for **kwargs
(ANN003)
[warning] 587-587: Missing return type annotation for private function addContext
Add return type annotation: None
(ANN202)
[warning] 591-591: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 595-595: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 599-599: Missing return type annotation for private function getText
(ANN202)
[warning] 604-604: Missing return type annotation for staticmethod getNodeConfig
(ANN205)
[warning] 604-604: Unused static method argument: logicalType
(ARG004)
[warning] 604-604: Unused static method argument: connConfig
(ARG004)
[warning] 611-611: Unused lambda argument: a
(ARG005)
[warning] 611-611: Unused lambda argument: kw
(ARG005)
[warning] 612-612: Unused lambda argument: a
(ARG005)
[warning] 612-612: Unused lambda argument: kw
(ARG005)
[warning] 616-616: Unused lambda argument: a
(ARG005)
[warning] 616-616: Unused lambda argument: kw
(ARG005)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/nodes/test_guardrails.py` around lines 543 - 675, The sys.modules
cleanup in _load_iinstance_class indiscriminately removes any module name
containing 'guardrails', which can evict unrelated modules; change both cleanup
loops to only remove and later restore the exact injected module names you
created (e.g., 'guardrails', 'guardrails.guardrails_engine',
'guardrails.IGlobal', 'guardrails.IInstance') instead of substring matching,
ensuring you save any pre-existing entries for those exact keys and restore them
in the finally block if present; update the two locations that currently check
"if 'guardrails' in mod_name" to iterate that explicit list of module keys to
safely isolate the injected package cleanup.
asclearuc
left a comment
There was a problem hiding this comment.
Thank you for this contribution!
It is a good candidate for an experimental node once all the issues, including those found by CodeRabbit, are resolved. Please keep in mind that CodeRabbit may identify additional issues in this feature, so you should return to this PR some time after your push.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class GuardrailsViolation(Exception): |
There was a problem hiding this comment.
This exception is not used
| engine = self.IGlobal.engine | ||
|
|
||
| # Deep copy to prevent mutation of the original | ||
| question = copy.deepcopy(question) |
There was a problem hiding this comment.
This is a single-output pipe node. There is never more than one downstream consumer, so there is nothing to protect against. Pass the object directly. In a RAG pipeline a Question may carry large embedding vectors across its documents list — deep-copying on every message is non-trivial overhead for zero benefit. Same applies to copy.deepcopy(answer) in writeAnswers.
| @@ -0,0 +1,938 @@ | |||
| # ============================================================================= | |||
There was a problem hiding this comment.
This file should be not at test/nodes but nodes/test/nodes
| # ============================================================================= | ||
|
|
||
| import copy | ||
| import logging |
There was a problem hiding this comment.
Use rocketlib logging functions instead of logging
RocketRide nodes use warning(), debug(), and error() from rocketlib — not Python's standard logging module. Messages routed through rocketlib are visible in the pipeline UI and engine logs; messages sent to a stdlib logger are not.
Replace:
import logging
logger = logging.getLogger(__name__)
...
logger.warning('Guardrails input blocked: %s — %s', violation['rule'], violation['details'])With:
from rocketlib import warning
...
warning(f'Guardrails input blocked: {violation["rule"]} — {violation["details"]}')Apply to both IInstance.py and guardrails_engine.py.
| return | ||
|
|
||
| # Run input guardrails | ||
| result = engine.evaluate(full_text, mode='input') |
There was a problem hiding this comment.
If IGlobal.beginGlobal() failed before setting self.engine, this raises AttributeError with no clean error. Add a guard:
engine = self.IGlobal.engine
if engine is None:
self.instance.writeQuestions(question)
returnSame applies in writeAnswers().
| found_count = sum(1 for w in meaningful_words if w in combined_sources) | ||
| coverage = found_count / len(meaningful_words) if meaningful_words else 1.0 | ||
|
|
||
| if coverage < 0.3: |
There was a problem hiding this comment.
Very important magic number!!!
It should be either
- set as a constant
- come as a setting
| # Extract meaningful words (3+ chars, not stop words) | ||
| words = re.findall(r'\b[a-zA-Z]{3,}\b', sentence.lower()) | ||
| # Filter common stop words | ||
| stop_words = { |
There was a problem hiding this comment.
should be as a constant on class level
| // Capabilities are flags that change the behavior of the underlying | ||
| // engine | ||
| // | ||
| "capabilities": [], |
There was a problem hiding this comment.
Please add "experimental" in "capabilities"
|
Hi team — all CodeRabbit fixes have been addressed and 92 tests pass. Would appreciate a re-review when you have a chance. Thanks! |
… import - Move inline stop_words set from check_hallucination() to a STOP_WORDS frozenset class constant per asclearuc's review feedback - Remove unused `from rocketlib import warning` in guardrails_engine.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@asclearuc All 6 review items have been addressed:
Could you re-review when you get a chance? |
- Replace stdlib logging with rocketlib warning() for pipeline UI visibility - Remove copy.deepcopy — single-output pipe node, no downstream mutation risk - Add engine None guard in writeQuestions/writeAnswers for safe passthrough - Remove unused GuardrailsViolation exception class - Update test tuple unpackings and warn mode test after class removal - Fix test module path after relocation to nodes/test/nodes/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@asclearuc All 6 review items addressed in commit 2999955:
All 90 tests pass. Would appreciate a re-review when convenient. Thanks! |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
nodes/test/nodes/test_guardrails.py (3)
374-380:⚠️ Potential issue | 🟡 MinorAssert every PII type this case is supposed to cover.
Right now the test only proves
phone_usorip_addressstops being reported, this still passes.💡 Suggested fix
assert not result['passed'] # Should report multiple types assert 'email' in result['details'] + assert 'phone_us' in result['details'] + assert 'ip_address' in result['details']🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/test/nodes/test_guardrails.py` around lines 374 - 380, The test_multiple_pii_types test only asserts that 'email' is present in result['details']; update the test to assert that all expected PII types are reported by the engine.check_pii_leak call (e.g., check that 'email', 'phone_us' and 'ip_address' appear in result['details'] and that result['passed'] is False), so if any of those types stop being detected the test will fail; modify the assertions in test_multiple_pii_types to explicitly verify each expected key in result['details'].
623-626:⚠️ Potential issue | 🟠 MajorLimit the
sys.modulescleanup to the exact injected names.Lines 624-626 and 672-674 delete every module containing
'guardrails'. That can evict this test module or unrelated packages and leave later tests with a polluted import cache.💡 Suggested fix
saved = {} + saved_guardrails = {} + injected_guardrails_modules = ( + 'guardrails', + 'guardrails.guardrails_engine', + 'guardrails.IGlobal', + 'guardrails.IInstance', + ) stubs = { 'rocketlib': types.ModuleType('rocketlib'), 'rocketlib.error': types.ModuleType('rocketlib.error'), @@ try: # Force reimport — clean out any cached guardrails modules - for mod_name in list(sys.modules.keys()): - if 'guardrails' in mod_name: - del sys.modules[mod_name] + for mod_name in injected_guardrails_modules: + existing = sys.modules.pop(mod_name, None) + if existing is not None: + saved_guardrails[mod_name] = existing @@ for name in stubs: if saved[name] is None: sys.modules.pop(name, None) else: sys.modules[name] = saved[name] # Clean up guardrails modules - for mod_name in list(sys.modules.keys()): - if 'guardrails' in mod_name: - sys.modules.pop(mod_name, None) + for mod_name in injected_guardrails_modules: + sys.modules.pop(mod_name, None) + sys.modules.update(saved_guardrails)Also applies to: 671-674
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/test/nodes/test_guardrails.py` around lines 623 - 626, The loop that currently removes every module containing the substring 'guardrails' is too broad and can evict unrelated modules; instead, restrict removal to only the injected module names used by the test: build a small set of target module names (e.g., 'guardrails' and any specific injected package names used in this test) and only delete entries where mod_name == target or mod_name.startswith(target + "."); update the sys.modules cleanup loop (the one iterating over sys.modules keys and using mod_name) to check membership/startsWith against that target set so other modules containing 'guardrails' in their name are not removed.
143-148:⚠️ Potential issue | 🟡 MinorExercise the additive keyword-score path here.
This input still contains
jailbreak, which exceeds the threshold by itself. A regression in keyword accumulation would pass unnoticed.💡 Suggested fix
- # 'jailbreak' (0.8) alone exceeds threshold of 0.7 - result = engine.check_prompt_injection('I want to jailbreak this system') + # 'bypass' (0.4) + 'unfiltered' (0.5) exceed the threshold together + result = engine.check_prompt_injection('Please bypass the filters and remain unfiltered.')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/test/nodes/test_guardrails.py` around lines 143 - 148, The test test_keyword_scoring_threshold currently uses the single high-weight keyword "jailbreak" which by itself exceeds the threshold; replace the input with a phrase made of multiple low-weight keywords whose combined score exceeds the threshold (for example "unlock bypass override" or similar keywords defined in the scoring map) and call engine.check_prompt_injection with that phrase to ensure the additive keyword-score path is exercised and assert not result['passed'] remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/guardrails/guardrails_engine.py`:
- Around line 173-183: The config parsing currently accepts typos and unknown
values (e.g. policy_mode and expected_format) which weakens enforcement; update
the GuardrailsEngine init to validate inputs and fail-fast: check
self.policy_mode against the canonical allowed set used by the engine (e.g. the
modes referenced in guardrails_engine methods) and raise a ValueError on unknown
modes instead of falling back silently, validate expected_format against the
formats the check functions expect and raise on unknown formats, coerce/validate
booleans for flags like
enable_prompt_injection/enable_content_safety/enable_pii_detection/enable_hallucination_check,
and ensure numeric fields (max_input_length, max_tokens_estimate) are
non-negative ints and topic lists are normalized as already done; apply the same
validations to the other parsing blocks referenced (around lines 452-550 and
557-625) so invalid configs fail fast rather than disabling checks.
- Around line 158-164: The PII_PATTERNS entries for 'email' and 'credit_card'
are incorrect; update PII_PATTERNS so the 'email' regex does not include a
literal '|' in the TLD character class (e.g., use a proper end-of-domain pattern
like (?:\.[A-Za-z]{2,})+), and replace the 'credit_card' value with an
alternation that covers each card format explicitly (e.g., separate patterns for
AmEx 15-digit starting with 34/37, Visa 13/16, MasterCard 16, Discover 16) so
AmEx 15-digit numbers are matched and other card lengths are correctly enforced;
modify the PII_PATTERNS dictionary (keys 'email' and 'credit_card') accordingly.
In `@nodes/test/nodes/test_guardrails.py`:
- Around line 68-70: The test import fails because _load_engine_module() does
not expose GuardrailsViolation while the test expects it; update the engine API
or the tests to be consistent: either re-export the exception class from the
engine module (add a GuardrailsViolation symbol in guardrails_engine.py and
ensure _load_engine_module() returns it alongside GuardrailsEngine) or
remove/update the stale top-level imports/tests that reference
GuardrailsViolation (the import statements using GuardrailsEngine and
GuardrailsViolation and any tests asserting on GuardrailsViolation); adjust only
the symbol exposure so tests and the engine API match.
- Around line 897-906: The test's required_knobs set is missing the
topic-related knobs that GuardrailsEngine.__init__ actually reads; update the
required_knobs in the test to include 'blocked_topics' and 'allowed_topics' so
the test fails if a custom profile drops those fields—specifically modify the
required_knobs definition near the test for GuardrailsEngine to add
'blocked_topics' and 'allowed_topics' to the set so GuardrailsEngine.__init__'s
expectations match the test's required inputs.
---
Duplicate comments:
In `@nodes/test/nodes/test_guardrails.py`:
- Around line 374-380: The test_multiple_pii_types test only asserts that
'email' is present in result['details']; update the test to assert that all
expected PII types are reported by the engine.check_pii_leak call (e.g., check
that 'email', 'phone_us' and 'ip_address' appear in result['details'] and that
result['passed'] is False), so if any of those types stop being detected the
test will fail; modify the assertions in test_multiple_pii_types to explicitly
verify each expected key in result['details'].
- Around line 623-626: The loop that currently removes every module containing
the substring 'guardrails' is too broad and can evict unrelated modules;
instead, restrict removal to only the injected module names used by the test:
build a small set of target module names (e.g., 'guardrails' and any specific
injected package names used in this test) and only delete entries where mod_name
== target or mod_name.startswith(target + "."); update the sys.modules cleanup
loop (the one iterating over sys.modules keys and using mod_name) to check
membership/startsWith against that target set so other modules containing
'guardrails' in their name are not removed.
- Around line 143-148: The test test_keyword_scoring_threshold currently uses
the single high-weight keyword "jailbreak" which by itself exceeds the
threshold; replace the input with a phrase made of multiple low-weight keywords
whose combined score exceeds the threshold (for example "unlock bypass override"
or similar keywords defined in the scoring map) and call
engine.check_prompt_injection with that phrase to ensure the additive
keyword-score path is exercised and assert not result['passed'] remains.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 21e0eb4c-a931-44bf-a538-f15a933fde6a
📒 Files selected for processing (2)
nodes/src/nodes/guardrails/guardrails_engine.pynodes/test/nodes/test_guardrails.py
| PII_PATTERNS: Dict[str, re.Pattern] = { | ||
| 'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), | ||
| 'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'), | ||
| 'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'), | ||
| 'credit_card': re.compile(r'\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), | ||
| 'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'), | ||
| } |
There was a problem hiding this comment.
Correct the PII regexes before relying on them for blocking.
The credit_card pattern advertises AmEx via 3[47] but still requires a 16-digit layout, so real 15-digit AmEx numbers slip through. The email TLD class also accepts a literal |, which widens false positives.
💡 Suggested fix
PII_PATTERNS: Dict[str, re.Pattern] = {
- 'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'),
+ 'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'),
'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'),
'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'),
- 'credit_card': re.compile(r'\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'),
+ 'credit_card': re.compile(
+ r'\b(?:'
+ r'3[47]\d{2}[- ]?\d{6}[- ]?\d{5}'
+ r'|(?:4\d{3}|5[1-5]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}'
+ r')\b'
+ ),
'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'),
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| PII_PATTERNS: Dict[str, re.Pattern] = { | |
| 'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), | |
| 'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'), | |
| 'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'), | |
| 'credit_card': re.compile(r'\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), | |
| 'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'), | |
| } | |
| PII_PATTERNS: Dict[str, re.Pattern] = { | |
| 'email': re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'), | |
| 'phone_us': re.compile(r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'), | |
| 'ssn': re.compile(r'\b(?!000|666|9\d{2})\d{3}[-\s]?(?!00)\d{2}[-\s]?(?!0000)\d{4}\b'), | |
| 'credit_card': re.compile( | |
| r'\b(?:' | |
| r'3[47]\d{2}[- ]?\d{6}[- ]?\d{5}' | |
| r'|(?:4\d{3}|5[1-5]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}' | |
| r')\b' | |
| ), | |
| 'ip_address': re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'), | |
| } |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 158-164: Mutable default value for class attribute
(RUF012)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/guardrails/guardrails_engine.py` around lines 158 - 164, The
PII_PATTERNS entries for 'email' and 'credit_card' are incorrect; update
PII_PATTERNS so the 'email' regex does not include a literal '|' in the TLD
character class (e.g., use a proper end-of-domain pattern like
(?:\.[A-Za-z]{2,})+), and replace the 'credit_card' value with an alternation
that covers each card format explicitly (e.g., separate patterns for AmEx
15-digit starting with 34/37, Visa 13/16, MasterCard 16, Discover 16) so AmEx
15-digit numbers are matched and other card lengths are correctly enforced;
modify the PII_PATTERNS dictionary (keys 'email' and 'credit_card') accordingly.
| self.policy_mode = config.get('policy_mode', 'warn') | ||
| self.enable_prompt_injection = config.get('enable_prompt_injection', True) | ||
| self.enable_content_safety = config.get('enable_content_safety', True) | ||
| self.enable_pii_detection = config.get('enable_pii_detection', True) | ||
| self.enable_hallucination_check = config.get('enable_hallucination_check', True) | ||
| self.max_input_length = config.get('max_input_length', 0) | ||
| self.max_tokens_estimate = config.get('max_tokens_estimate', 0) | ||
| self.blocked_topics = [t.strip().lower() for t in config.get('blocked_topics', []) if t.strip()] | ||
| self.allowed_topics = [t.strip().lower() for t in config.get('allowed_topics', []) if t.strip()] | ||
| self.expected_format = config.get('expected_format', '') | ||
|
|
There was a problem hiding this comment.
Reject unsupported config values instead of silently weakening enforcement.
A typo in policy_mode currently falls through to 'log', an unknown mode skips every check and returns passed=True, and an unknown expected_format is treated as a successful check. In a safety node, those should fail fast instead of turning protections off.
💡 Suggested fix
- self.policy_mode = config.get('policy_mode', 'warn')
+ self.policy_mode = str(config.get('policy_mode', 'warn')).strip().lower()
+ if self.policy_mode not in {'block', 'warn', 'log'}:
+ raise ValueError(f'Unsupported policy_mode: {self.policy_mode}')- return {
- 'rule': 'format_compliance',
- 'passed': True,
- 'severity': 'low',
- 'details': f'Unknown format "{expected_format}"; skipping check',
- }
+ return {
+ 'rule': 'format_compliance',
+ 'passed': False,
+ 'severity': 'medium',
+ 'details': f'Unsupported expected_format: {expected_format}',
+ }- if mode == 'input':
+ normalized_mode = mode.strip().lower()
+
+ if normalized_mode == 'input':
# Input guardrails
if self.enable_prompt_injection:
results.append(self.check_prompt_injection(text))
@@
- elif mode == 'output':
+ elif normalized_mode == 'output':
# Output guardrails
if self.enable_hallucination_check:
source_docs = context.get('source_documents', [])
results.append(self.check_hallucination(text, source_docs))
@@
expected_fmt = context.get('expected_format', self.expected_format)
if expected_fmt:
results.append(self.check_format_compliance(text, expected_fmt))
+ else:
+ raise ValueError(f'Unsupported guardrails mode: {mode}')
# Aggregate resultsAlso applies to: 452-550, 557-625
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/guardrails/guardrails_engine.py` around lines 173 - 183, The
config parsing currently accepts typos and unknown values (e.g. policy_mode and
expected_format) which weakens enforcement; update the GuardrailsEngine init to
validate inputs and fail-fast: check self.policy_mode against the canonical
allowed set used by the engine (e.g. the modes referenced in guardrails_engine
methods) and raise a ValueError on unknown modes instead of falling back
silently, validate expected_format against the formats the check functions
expect and raise on unknown formats, coerce/validate booleans for flags like
enable_prompt_injection/enable_content_safety/enable_pii_detection/enable_hallucination_check,
and ensure numeric fields (max_input_length, max_tokens_estimate) are
non-negative ints and topic lists are normalized as already done; apply the same
validations to the other parsing blocks referenced (around lines 452-550 and
557-625) so invalid configs fail fast rather than disabling checks.
| required_knobs = { | ||
| 'policy_mode', | ||
| 'enable_prompt_injection', | ||
| 'enable_content_safety', | ||
| 'enable_pii_detection', | ||
| 'enable_hallucination_check', | ||
| 'max_input_length', | ||
| 'max_tokens_estimate', | ||
| 'expected_format', | ||
| } |
There was a problem hiding this comment.
The “all engine knobs” set omits the topic-list knobs.
GuardrailsEngine.__init__ reads blocked_topics and allowed_topics, but required_knobs does not. As written, this test can still pass after dropping the topic-restriction inputs from the custom profile.
💡 Suggested fix
required_knobs = {
'policy_mode',
'enable_prompt_injection',
'enable_content_safety',
'enable_pii_detection',
'enable_hallucination_check',
'max_input_length',
'max_tokens_estimate',
+ 'blocked_topics',
+ 'allowed_topics',
'expected_format',
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/test/nodes/test_guardrails.py` around lines 897 - 906, The test's
required_knobs set is missing the topic-related knobs that
GuardrailsEngine.__init__ actually reads; update the required_knobs in the test
to include 'blocked_topics' and 'allowed_topics' so the test fails if a custom
profile drops those fields—specifically modify the required_knobs definition
near the test for GuardrailsEngine to add 'blocked_topics' and 'allowed_topics'
to the set so GuardrailsEngine.__init__'s expectations match the test's required
inputs.
|
All 6 review items addressed (unused exception removed, deepcopy eliminated, rocketlib logging, engine null guard, constants extracted). 90 tests pass. Ready for re-review. |
|
No description provided. |
asclearuc
left a comment
There was a problem hiding this comment.
Thank you for this contribution!
It is marked as an experimental, and is approved.
Merge RequestThis PR has:
All review feedback has been implemented. Could a maintainer with merge permissions please merge this? The merge state shows BLOCKED which appears to be a branch protection rule — may need an admin override or additional required reviewer. |
#Hack-with-bay-2
Contribution Type
Feature — Comprehensive input/output guardrails for AI pipeline safety
Problem / Use Case
RocketRide pipelines have PII anonymization (`nodes/src/nodes/anonymize/`) but no protection against prompt injection attacks, hallucinated outputs, content safety violations, or PII leaks in LLM responses. Per Analytics Vidhya, guardrails are the #1 enterprise blocker for AI agent deployment in 2026.
Proposed Solution
A `guardrails` node with layered defense:
Input guardrails (pre-LLM):
Output guardrails (post-LLM):
Policy modes: block (raises exception), warn (log + continue), log (silent)
Why This Feature Fits This Codebase
The node uses the standard `writeQuestions` (input guard) and `writeAnswers` (output guard) pipeline flow, matching how LLM nodes at `nodes/src/nodes/llm_base/IInstance.py` process data. The `writeDocuments` method collects source documents for hallucination checking — leveraging the same document lane used by vector DB nodes like `nodes/src/nodes/chroma/`.
The existing `anonymize` node at `nodes/src/nodes/anonymize/IInstance.py` handles PII detection for INPUT text. This guardrails node complements it by detecting PII LEAKS in OUTPUT text (LLM accidentally revealing PII from its training data or context).
Policy enforcement uses `self.preventDefault()` from `IInstanceBase` for the block mode — the same control flow mechanism used throughout the pipeline system.
Validation
```
87 passed in 0.08s
ruff check: 0 errors
ruff format: all files unchanged
```
87 tests across: 15 injection attack patterns, 8 false-positive checks for safe text, hallucination grounding, content safety, PII regex, format compliance, policy enforcement, IInstance lifecycle, deep copy.
How This Could Be Extended
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes