fix: prevent crashes from uninitialized LightRAG, env-var stripping, and parser cleanup#240
Open
jwchmodx wants to merge 4 commits intoHKUDS:mainfrom
Open
fix: prevent crashes from uninitialized LightRAG, env-var stripping, and parser cleanup#240jwchmodx wants to merge 4 commits intoHKUDS:mainfrom
jwchmodx wants to merge 4 commits intoHKUDS:mainfrom
Conversation
- processor.py: move _ensure_lightrag_initialized() before first self.lightrag access in _process_multimodal_content() and process_document_complete_lightrag_api(); raise on init failure instead of silently ignoring the return value - config.py: strip whitespace from SUPPORTED_FILE_EXTENSIONS and CONTEXT_FILTER_CONTENT_TYPES env-var splits so that values like ".pdf, .jpg" match correctly - parser.py: hoist platform.system() check to module-level constant _IS_WINDOWS, removing five redundant local `import platform` + repeated calls; remove unused TypeVar T Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ustness
processor.py:
- process_document_complete_lightrag_api(): the finally block
unconditionally called `async with pipeline_status_lock` even when
`pipeline_status_lock` is None (e.g. when LightRAG initialization
fails before lines 1733-1734 execute). This caused an unhandled
AttributeError that masked the original error. Added None guard and
wrapped in try/except so the finally block never itself raises.
parser.py (DoclingParser):
- read_from_block(): base64 image URI was split on "," and the second
element accessed unconditionally. When the URI has no comma (rare but
valid — some exporters omit the "data:...;base64," prefix) this raises
IndexError silently swallowed by the outer except. Now uses split(",", 1)
with a len-check.
- read_from_block_recursive(): $ref values like "#/body/0" were split
by "/" without checking the resulting length, and int() was called
on the third part without ValueError handling. Malformed refs now
log a warning and are skipped instead of crashing the whole parse.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atch
query.py (aquery_with_multimodal, aquery_vlm_enhanced) and batch.py
(process_folder_complete, process_documents_with_rag_batch) all called
`await self._ensure_lightrag_initialized()` without checking the return
value. _ensure_lightrag_initialized() returns {"success": False, "error":
"..."} on failure (e.g. missing llm_model_func, parser not installed),
so silently ignoring it caused a cryptic AttributeError later when
self.lightrag was accessed while still None.
All four call sites now capture the result and raise RuntimeError with
the initialization error message, consistent with processor.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
processor.py (process_document_complete_lightrag_api):
- current_doc_status UnboundLocalError: the variable was first assigned
inside the try block (line 1706). If self.lightrag.doc_status.get_by_id()
raises before the assignment completes, the except handler at line 1829
references current_doc_status before it was ever bound, causing
UnboundLocalError. Fixed by pre-initialising current_doc_status = {}
before the try block so the **-unpack always has a valid mapping.
- Unsafe dict access on _ensure_lightrag_initialized() result: line 1688
used result["success"] with direct key access, which raises TypeError
if the method returns None. Changed to the safe pattern
`not result or not result.get("success")` already used elsewhere.
- MineruExecutionError.error_msg type coercion: lines 1760-1762 only
handled list and silently left non-string, non-list values as-is.
Now uses str() conversion in both branches so the stored error_msg
is always a plain string.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
|
Thank you for your contribution! I used Codex to review PR #240 against main, focusing on P0 and P1 issues. No P0 issues were found, but one P1 regression was identified.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes three categories of bugs found in
processor.py,config.py, andparser.py.Bug fixes
processor.py— LightRAG accessed before initialization (crash risk)_process_multimodal_content():self.lightrag.doc_statuswas accessed at the top of the function (inside atry/except) before_ensure_lightrag_initialized()was called further down. Ifself.lightragisNoneat that point theAttributeErroris silently swallowed and the function continues in a broken state. Fix: call_ensure_lightrag_initialized()first and return early on failure.process_document_complete_lightrag_api(): same pattern —self.lightrag.doc_status.get_by_id(doc_pre_id)was called on line 1678 before thetryblock that initialises LightRAG on line 1682. Fix: move thedoc_statusfetch after initialisation (it is already fetched again at line 1707, so the early call was redundant).process_document_complete()andinsert_content_list():_ensure_lightrag_initialized()return value was ignored (await self._ensure_lightrag_initialized()). A failed initialisation would silently fall through and crash later on an unrelated line. Fix: capture and check the result, raisingRuntimeErroron failure.config.py— missing.strip()on env-var splitsSUPPORTED_FILE_EXTENSIONSandCONTEXT_FILTER_CONTENT_TYPESare split on","without stripping whitespace. A value like".pdf, .jpg"produces[".pdf", " .jpg"]; the leading space means" .jpg"never matches any file extension check. Fix: add a list-comprehension with.strip().parser.py— redundantplatform.system()calls and unusedTypeVarplatform.system() == "Windows"appeared in five separate functions, each preceded by a localimport platform. This is evaluated at runtime on every call. Fix: hoist it to a single module-level constant_IS_WINDOWSand remove the five local imports. Also removed the unusedTypeVar Timport (it was imported and assigned but never referenced).Test plan
pytest tests/RAGAnythingstill initializes and processes documents correctly whenlightragis pre-providedRAGAnythingraises a clear error (not a crypticAttributeError) whenllm_model_funcis missingSUPPORTED_FILE_EXTENSIONS=".pdf, .jpg"(with space) and confirm both extensions are recognized🤖 Generated with Claude Code