Skip to content

Latest commit

 

History

History
559 lines (458 loc) · 31.9 KB

File metadata and controls

559 lines (458 loc) · 31.9 KB

Helper API Reference -- DeepExtractIDA Agent Analysis Runtime

This reference documents the public API for the 30+ modules in .claude/helpers/. These modules provide the foundational data access, analysis taxonomies, and infrastructure used by skills, agents, and commands.

1. Database Access

individual_analysis_db

Provides read-only access to per-binary analysis SQLite databases.

  • IndividualAnalysisDB(db_path: str)
    • get_file_info() -> FileInfoRecord
    • get_function_by_id(func_id: int) -> FunctionRecord
    • get_function_by_name(name: str) -> FunctionRecord
    • search_functions(name_contains: str, limit: int = 50, offset: int = 0) -> Page[FunctionRecord]
    • get_all_functions(limit: int = 100, offset: int = 0) -> Page[FunctionRecord]
  • FunctionRecord(dataclass)
    • Fields: id, name, mangled_name, signature, decompiled_code, assembly_code, inbound_xrefs, outbound_xrefs, strings, globals, dangerous_api_calls, loop_analysis, stack_frame
  • FileInfoRecord(dataclass)
    • Fields: file_name, file_hash, architecture, security_features, exports, entry_points, tls_callbacks
  • parse_json_safe(data: str | dict | None) -> dict | list | None
    • Parses JSON strings from DB columns into Python objects.

analyzed_files_db

Manages the tracking database for all analyzed modules.

  • AnalyzedFilesDB(db_path: str)
    • get_by_file_name(name: str) -> list[AnalyzedFileRecord]
    • get_complete() -> list[AnalyzedFileRecord]
    • get_by_hash(file_hash: str) -> AnalyzedFileRecord | None
  • AnalyzedFileRecord(dataclass)
    • Fields: file_name, file_hash, analysis_db_path, status, created_at

db_paths

Centralized resolution for database file paths.

  • resolve_db_path(db_path: str, workspace_root: Path) -> str
    • Resolves a DB path relative to workspace_root, falling back to extracted_dbs/.
  • resolve_db_path_auto(db_path: str) -> str
    • Same as above but auto-detects workspace root from helpers/ location.
  • resolve_module_db(module_name_or_path: str, workspace_root: Path, *, require_complete: bool = True) -> str | None
    • Resolves a module name or .db path to an absolute DB path.
  • resolve_module_db_auto(module_name_or_path: str, *, require_complete: bool = True) -> str | None
    • Same as above but auto-detects workspace root.
  • resolve_tracking_db(workspace_root: Path) -> str | None
    • Returns the path to analyzed_files.db (checks extracted_dbs/ then root).
  • resolve_tracking_db_auto() -> str | None
    • Same as above but auto-detects workspace root.

2. Function Resolution

function_index

High-performance function-to-file resolution using function_index.json.

  • load_function_index(module_name: str) -> dict
    • Loads the function index for a specific module.
  • lookup_function(index: dict, name: str) -> dict | None
    • Resolves a function name to its metadata (file, library tag, ID).
  • filter_by_library(index: dict, functions: list[str]) -> list[str]
    • Filters out functions tagged as library code (WIL, STL, CRT, etc.).
  • is_application_function(index: dict, name: str) -> bool
    • Returns True if the function is not tagged as library code.

function_resolver

Unified function lookup across multiple modules.

  • resolve_function(db: IndividualAnalysisDB, identifier: str | int) -> FunctionRecord | None
    • Resolves a function by its name or integer ID.
  • search_functions_by_pattern(db: IndividualAnalysisDB, pattern: str) -> list[FunctionRecord]
    • Searches functions using substring or regex patterns.

unified_search

Multi-dimensional search across module databases.

  • unified_search.py (Standalone Script)
    • Dimensions: name, signature, string, api, dangerous, class, export
    • Match Modes: substring, regex, fuzzy
    • Usage: python .claude/helpers/unified_search.py <db> --query <term>

3. Analysis Taxonomies

api_taxonomy

Classification of Win32/NT APIs into functional and security categories.

  • classify_api(api_name: str) -> str | None
    • Returns the functional category (e.g., file_io, registry, network).
  • classify_api_security(api_name: str) -> str | None
    • Returns the security impact category (e.g., privilege_escalation, data_leakage).
  • classify_api_fingerprint(api_name: str) -> str | None
    • Returns a coarse fingerprint bucket ("com", "rpc", "security", "crypto") for module-level density counting.
  • get_dangerous_api_set() -> set[str]
    • Returns a set of all APIs classified as security-sensitive.
  • DISPATCH_KEYWORDS: tuple of function-name substrings suggesting dispatch/routing behaviour.
  • strip_import_prefix(api_name: str) -> str
    • Removes IDA import-thunk prefixes (__imp_, _imp_, j_, cs:) from an API name.
  • IMP_PREFIX_RE -- Compiled regex for import-prefix stripping.

type_constants

Mappings for C/C++ type sizes and IDA-to-C type conversions.

  • TYPE_SIZES: dict[str, int] (e.g., BYTE: 1, DWORD: 4)
  • IDA_TO_C_TYPE: dict[str, str] (e.g., _BYTE: unsigned char)
  • SIZE_TO_C_TYPE: dict[int, str] (e.g., 4: uint32_t)

4. Graph & Topology

callgraph

Directed graph construction and traversal for function xrefs.

  • CallGraph
    • from_functions(functions: list[FunctionRecord]) -> CallGraph
    • reachable_from(func_id: int, max_depth: int = 5) -> set[int]
    • callers_of(func_id: int, max_depth: int = 5) -> set[int]
    • find_path(start_id: int, end_id: int) -> list[int] | None
    • get_stats() -> dict (node count, edge count, SCC count)

cross_module_graph

Resolution of external function calls across analyzed modules.

  • CrossModuleGraph(tracking_db: AnalyzedFilesDB)
    • resolve_external_call(caller_module: str, callee_name: str) -> FunctionWithModuleInfo | None
    • build_cross_module_chain(start_func: str, start_module: str, depth: int = 3) -> list

5. Infrastructure

cache

Filesystem-based result caching with DB mtime validation.

  • get_cached(db_path: str, operation: str, params: dict = None) -> dict | None
    • Retrieves a cached result if the DB mtime matches and TTL is valid.
  • cache_result(db_path: str, operation: str, data: dict, params: dict = None) -> None
    • Atomically writes a result to the cache directory.
  • clear_cache(module_name: str = None) -> None
    • Clears the cache for a specific module or the entire runtime.

script_runner

Subprocess management and dynamic module loading.

  • run_skill_script(skill: str, script: str, args: list[str]) -> dict
    • Executes a skill script as a subprocess and returns the parsed JSON output.
  • load_skill_module(skill: str, script: str) -> module
    • Dynamically imports a skill script as a Python module.
  • find_skill_script(skill: str, script: str) -> Path
    • Resolves the absolute path to a skill script.

json_output

Standardized JSON output for skill scripts.

  • emit_json(data: dict, *, status: str = "ok") -> None
    • Writes a dict to stdout wrapped with a "status" key.
  • emit_json_list(key: str, items: list, *, extra: dict = None) -> None
    • Writes a list payload under key, wrapped with "status".
  • should_force_json(args) -> bool
    • Returns True when --json was passed or --workspace-dir is set.

errors

Structured JSON error reporting to stderr.

  • emit_error(message: str, code: str) -> None
    • Writes {"error": message, "code": code} to stderr and exits with code 1.
  • log_error(message: str, code: str) -> None
    • Writes the error JSON to stderr without exiting.
  • log_warning(message: str) -> None
    • Writes a warning message to stderr.

progress

Throttled progress reporting for long-running operations.

  • ProgressReporter(total: int, operation: str)
    • update(current: int) -> None
    • status_message(msg: str) -> None
  • progress_iter(iterable, total: int = None, operation: str = "") -> iterator
    • Wraps an iterable with automatic progress reporting.

6. Utilities

mangled_names

Parsing of Microsoft C++ mangled names.

  • parse_class_from_mangled(mangled_name: str) -> dict | None
    • Extracts class_name, method_name, namespaces, and role (e.g., constructor).

module_profile

Access to pre-computed module fingerprints.

  • load_module_profile(module_name: str) -> dict
  • get_noise_ratio(profile: dict) -> float
  • get_technology_flags(profile: dict) -> dict[str, bool] (e.g., com, rpc, security)

batch_operations

Efficient loading of multiple function records.

  • batch_extract_function_data(db: IndividualAnalysisDB, func_ids: list[int]) -> list[dict]
  • batch_resolve_functions(db: IndividualAnalysisDB, identifiers: list[str | int]) -> list[FunctionRecord]

7. WinRT Index

winrt_index

WinRT server index built from extraction data across four access contexts (caller IL x server privilege).

  • WinrtAccessContext(enum) -- HIGH_IL_ALL, HIGH_IL_PRIVILEGED, MEDIUM_IL_ALL, MEDIUM_IL_PRIVILEGED
  • WinrtMethod(dataclass) -- access, type, name, file; properties: short_name, class_name, binary_name
  • WinrtInterface(dataclass) -- name, guid, methods: list[WinrtMethod], pseudo_idl: list[str]; property: method_count
  • WinrtServer(dataclass) -- server class metadata with computed properties:
    • is_out_of_process, is_in_process, runs_as_system, has_permissive_sddl, is_remote_activatable, is_base_trust
    • risk_tier(context) -> str -- compute risk tier for a given access context
    • best_risk_tier -> str -- highest risk across all contexts
    • to_dict() -> dict
  • WinrtIndex -- queryable index:
    • load(data_root) -- load all four access contexts
    • get_servers_for_module(name) -> list[WinrtServer]
    • get_servers_by_class(class_name) -> WinrtServer | None
    • get_procedures_for_module(name) -> list[str]
    • is_winrt_procedure(module, func_name) -> bool
    • get_interfaces_for_module(name) -> list[WinrtInterface]
    • get_methods_for_class(class_name) -> list[WinrtMethod]
    • search_methods(pattern) -> list[WinrtMethod]
    • get_access_contexts_for_class(class_name) -> set[WinrtAccessContext]
    • get_privileged_surface(caller_il) -> list[WinrtServer]
    • get_servers_by_risk(tier) -> list[WinrtServer]
    • summary() -> dict
  • get_winrt_index(force_reload: bool = False) -> WinrtIndex -- cached singleton
  • invalidate_winrt_index() -- clear cached index

com_index

COM server index built from extraction data across four access contexts (caller IL x server privilege).

  • ComAccessContext(enum) -- HIGH_IL_ALL, HIGH_IL_PRIVILEGED, MEDIUM_IL_ALL, MEDIUM_IL_PRIVILEGED
  • ComMethod(dataclass) -- access, type, name, file, interface_name; properties: short_name, class_name, binary_name
  • ComInterface(dataclass) -- name, guid, methods: list[ComMethod], pseudo_idl: list[str]; property: method_count
  • ComServer(dataclass) -- CLSID metadata with computed properties:
    • is_out_of_process (includes DLL surrogate), is_in_process, runs_as_system, has_permissive_launch, has_permissive_access
    • is_remote_activatable, is_trusted_marshaller, can_elevate, auto_elevation
    • risk_tier(context) -> str -- compute risk tier for a given access context
    • best_risk_tier -> str -- highest risk across all contexts
    • to_dict() -> dict
  • ComIndex -- queryable index:
    • load(data_root) -- load all four access contexts
    • get_servers_for_module(name) -> list[ComServer]
    • get_server_by_clsid(clsid) -> ComServer | None
    • get_procedures_for_module(name) -> list[str]
    • is_com_procedure(module, func_name) -> bool
    • get_interfaces_for_module(name) -> list[ComInterface]
    • get_methods_for_clsid(clsid) -> list[ComMethod]
    • search_methods(pattern) -> list[ComMethod]
    • get_access_contexts_for_clsid(clsid) -> set[ComAccessContext]
    • get_privileged_surface(caller_il) -> list[ComServer]
    • get_servers_by_risk(tier) -> list[ComServer]
    • get_elevatable_servers() -> list[ComServer]
    • get_servers_by_service(name) -> list[ComServer]
    • find_servers_for_interface(iid) -> list[ComServer]
    • summary() -> dict
  • get_com_index(force_reload: bool = False) -> ComIndex -- cached singleton
  • invalidate_com_index() -- clear cached index

8. Validation & Sessions

validation

Integrity checking for analysis databases.

  • validate_analysis_db(db_path: str) -> ValidationResult
  • quick_validate(db_path: str) -> bool

session_utils

Session ID resolution and scratchpad path management.

  • resolve_session_id(stdin_data: dict) -> str
    • Resolves the current session ID from environment variables or the hook protocol's stdin JSON payload. Resolution priority:
      1. AGENT_SESSION_ID env var
      2. conversation_id from stdin (Cursor)
      3. session_id from stdin (Claude Code)
      4. UUID4 fallback
  • scratchpad_path(session_id: str) -> Path
    • Returns the path to the session-scoped scratchpad file.

9. Decompiled Code Parsing

decompiled_parser

Regex-based extraction of function calls, arguments, and parameter usage from IDA decompiled C/C++ code.

  • extract_function_calls(code: str, *, keywords: frozenset[str] = _DEFAULT_KEYWORDS) -> list[dict]
    • Extracts call sites from decompiled code. Handles multi-line calls by joining lines when parentheses are unbalanced. Each dict has: function_name, line_number, line, arguments, result_var.
  • discover_calls_with_xrefs(code: str, xrefs: list[dict], *, keywords: frozenset[str] = _DEFAULT_KEYWORDS) -> list[dict]
    • Uses DB simple_outbound_xrefs as ground truth for call discovery, enriches with argument expressions from the regex parser. Preferred over extract_function_calls alone.
  • split_arguments(args_str: str) -> list[str]
    • Splits comma-delimited argument strings while respecting nested () and [].
  • find_param_in_calls(code: str, param_name: str, *, keywords: frozenset[str] = _DEFAULT_KEYWORDS) -> list[dict]
    • Finds calls where a named parameter appears in an argument expression. Each dict has: function_name, arg_position, arg_expression, line_number, is_direct.
  • extract_balanced_parens(text: str, start: int = 0) -> str | None
    • Extracts content from balanced parentheses starting at text[start].

struct_scanner

Scans x64 assembly code for struct/class field accesses to infer memory layouts. Assembly is the sole evidence source -- it provides deterministic sizes from instruction operands.

  • scan_assembly_struct_accesses(asm: str) -> list[dict]
    • Scans x64 assembly for [reg+offset] memory access patterns with ptr-size inference. Returns dicts with: base, byte_offset, size, param_num, source, line_num.
  • scan_batch_struct_accesses(code: str, type_sizes: dict[str, int]) -> list[dict]
    • Batch-lift style scanning: returns base, offset, size, type_name, pattern for indexed, direct, and zero-offset accesses.
  • merge_struct_fields(fields: list[dict]) -> list[dict]
    • Merges overlapping field accesses from multiple functions into a unified layout sorted by offset.
  • parse_signature_params(signature: str) -> dict[str, str]
    • Parses C-style function signature parameter names and types into a {name: type} dict.

10. Taint & Data Flow Analysis

11. Finding Normalization & Merging

finding_schema

Unified finding schema for normalizing results across all vulnerability scanners (taint, memory corruption, logic).

  • Finding(dataclass) -- Scanner-agnostic vulnerability finding.
    • Fields: function_name, function_id, module, source_type, source_category, sink, sink_category, severity, score, exploitability_score, exploitability_rating, verification_status, guards, path, evidence_lines, summary, extra
    • to_dict() -> dict
    • dedup_key (property) -> str -- Deduplication key: function_id::sink::source_category.
    • path_signature (property) -> str -- SHA-256 hash prefix of sorted path elements.
  • from_taint_finding(finding: dict, func_info: dict | None = None) -> Finding
    • Converts a taint-analysis finding dict to a unified Finding.
  • from_memory_finding(finding: dict) -> Finding
    • Converts a MemCorruptionFinding dict to a unified Finding.
  • from_logic_finding(finding: dict) -> Finding
    • Converts a LogicFinding dict to a unified Finding.
  • from_verified_finding(verified: dict) -> Finding
    • Converts a VerificationResult dict to a unified Finding (handles both memory and logic verified outputs).
  • normalize_scanner_output(data: dict, source_type: str) -> list[Finding]
    • Extracts findings from a scanner's JSON output and normalizes them. Handles both raw and verified finding lists.

finding_merge

Merges, deduplicates, and ranks findings across multiple scanner outputs.

  • merge_findings(*scanner_outputs: tuple[dict, str]) -> list[Finding]
    • Merges findings from multiple scanners. Each arg is (data_dict, source_type). Returns deduplicated, score-sorted list.
  • deduplicate(findings: list[Finding], *, max_per_key: int = 3) -> list[Finding]
    • Removes duplicate findings (same function + sink + category). Keeps up to max_per_key distinct paths per dedup key, sorted by score.
  • rank(findings: list[Finding]) -> list[Finding]
    • Sorts findings by composite score descending. Uses exploitability_score if available, severity as tiebreaker.
  • findings_summary(findings: list[Finding]) -> dict
    • Produces summary: total count, by_severity, by_source, top_score.

report_comparison

Cross-report finding comparison for AI vulnerability scanners.

  • discover_reports(reports_dir: Path, scan_type: str | None = None) -> list[ReportMeta]
    • Finds .findings.json companion files in a reports directory, sorted by timestamp (newest first).
  • load_findings_json(path: Path) -> dict
    • Loads and validates a .findings.json companion file. Raises FileNotFoundError or ValueError.
  • compare_findings(current: dict, previous: dict) -> ComparisonResult
    • Compares findings between two scan reports. Matches by vulnerability_type + primary_function. Returns recurring, new, missed, severity changes, verdict conflicts, remediation changes, coverage delta.
  • format_comparison_section(result: ComparisonResult, previous_report_path: str | None, previous_timestamp: str | None) -> str
    • Generates a markdown ## Previous Findings Comparison section for appending to scan reports.

12. Assembly Analysis

calling_conventions

x64 fastcall register mappings and assembly width constants.

  • PARAM_REGISTERS: dict[int, set[str]] -- Parameter number (1-based) to register alias set (e.g., 1: {"rcx", "ecx", "cx", "cl", "ch"}).
  • REGISTER_TO_PARAM: dict[str, int] -- Reverse lookup: register name to parameter number.
  • PARAM_REGS_X64 -- Backward-compatible alias for REGISTER_TO_PARAM.
  • ASM_REG_SIZES: dict[str, int] -- Register name to byte width (e.g., "rax": 8, "eax": 4).
  • ASM_PTR_SIZES: dict[str, int] -- Instruction operand width qualifiers (e.g., "byte": 1, "qword": 8).
  • STACK_REGS: frozenset[str] -- Stack/frame registers excluded from struct-field inference.
  • param_name_for(param_number: int) -> str
    • Returns IDA-style positional parameter name (a1, a2, ...).

13. Security Analysis Helpers

param_risk

Parameter surface metadata from C-style signatures.

  • describe_parameter_surface(signature)dict with keys: param_count, has_buffer_pointer, has_string_pointer, has_size_param, has_buffer_size_pair, has_handle, has_com_interface, has_struct_pointer, has_flags_param, pointer_param_count, characteristics
  • PARAM_TYPE_PATTERNS -- 11 regex patterns mapping Windows types to categories
  • BUFFER_SIZE_PAIR_PATTERNS -- 3 compiled regexes detecting buffer+size parameter pairs

sddl_parser

SDDL ACE parsing with Deny support and effective permission computation.

  • ParsedACE(dataclass)
    • Fields: ace_type ("A" Allow or "D" Deny), flags, rights, object_guid, inherit_object_guid, account_sid
  • parse_sddl_aces(sddl: str) -> list[ParsedACE]
    • Parses all ACEs from an SDDL string in evaluation order.
  • effective_permissions_for_sid(sddl: str, sid: str, *, permissive_sids: set[str] | None = None) -> tuple[bool, str]
    • Determines whether a SID has effective access after Deny evaluation. Returns (has_access, reason).
  • is_permissive_sddl(sddl: str) -> bool
    • Checks whether any permissive SID (WD, AC, AU, IU) has effective access. Correctly handles Deny overrides.
  • PERMISSIVE_SIDS: frozenset of well-known permissive SID abbreviations.

14. Command & Pipeline Infrastructure

command_validation

Pre-execution validation for slash command arguments.

  • CommandValidationResult(dataclass)
    • Fields: ok, errors, error_codes, warnings, resolved
    • add_error(msg: str, code: ErrorCode | str) -> None
    • add_warning(msg: str) -> None
  • validate_module(module_name: str, workspace_root: Path | None = None, *, allow_code_only: bool = False) -> CommandValidationResult
    • Validates module existence and DB accessibility. On success, result.resolved["db_path"] contains the absolute DB path.
  • validate_function_arg(db_path: str, function_ref: str) -> CommandValidationResult
    • Validates that a function reference resolves in the given DB. On success, result.resolved["function"] contains the resolved record.
  • validate_depth_param(value: Any, max_depth: int = 20) -> CommandValidationResult
    • Validates a depth parameter is a positive integer within bounds.
  • validate_command_args(command_name: str, args: dict[str, Any], workspace_root: Path | None = None) -> CommandValidationResult
    • Dispatches to per-command validators based on command_name. Validates module, function, depth, and command-specific flags.
  • command_preflight(command_name: str, module: str | None = None, function: str | None = None, **kwargs) -> CommandValidationResult
    • Convenience wrapper: validates and resolves all arguments in one call.

pipeline_schema

Schema parsing and validation for headless batch pipeline YAML definitions.

  • StepConfig(dataclass, frozen) -- Metadata for a supported pipeline step.
    • Fields: name, kind (StepKind), description, goal, valid_options
  • StepDef(dataclass, frozen) -- A parsed step entry from the YAML file.
    • Fields: name, options, config
  • PipelineSettings(dataclass, frozen) -- Execution settings after config/YAML merge.
    • Fields: continue_on_error, max_workers, step_timeout, parallel_modules, max_module_workers, no_cache
    • module_workers (property) -> int
  • PipelineDef(dataclass, frozen) -- Fully parsed pipeline definition.
    • Fields: name, source_path, modules, steps, settings, output
  • ResolvedModule(dataclass, frozen) -- Module name resolved to a concrete DB path.
    • Fields: module_name, db_path
  • STEP_REGISTRY: dict[str, StepConfig] -- Registry of all supported pipeline step names.
  • load_pipeline(yaml_path: str | Path) -> PipelineDef
    • Parses a YAML pipeline definition file into a typed PipelineDef.
  • resolve_modules(modules: list[str] | Literal["all"], workspace_root: Path) -> list[ResolvedModule]
    • Resolves module names (or "all") to concrete DB paths.
  • validate_pipeline(definition: PipelineDef, workspace_root: Path) -> ValidationResult
    • Validates a parsed pipeline: checks steps exist in registry, modules resolve, options are valid.
  • render_output_path(template: str, module_name: str, workspace_root: Path) -> str
    • Renders an output directory path from a template with {module}, {timestamp} placeholders.

pipeline_executor

Execution engine for headless batch pipelines.

  • StepResult(dataclass) -- Outcome of one pipeline step.
    • Fields: step_name, status, elapsed_seconds, workspace_path, error, data
  • ModuleResult(dataclass) -- Execution summary for a single module.
    • Fields: module_name, db_path, status, elapsed_seconds, step_results, errors
  • BatchResult(dataclass) -- Execution summary for a full batch run.
    • Fields: pipeline_name, source_path, output_dir, status, dry_run, settings, modules, total_elapsed_seconds
  • execute_module(module: ResolvedModule, steps: list[StepDef], settings: PipelineSettings, batch_dir: str | Path) -> ModuleResult
    • Executes all pipeline steps for a single module.
  • execute_pipeline(definition: PipelineDef, workspace_root: Path) -> BatchResult
    • Executes a full batch pipeline: resolves modules, dispatches steps (with optional module-level parallelism), writes manifest and summary.
  • dispatch_goal_step(step: StepDef, module: ResolvedModule, settings: PipelineSettings, batch_dir: Path) -> StepResult
    • Dispatches a goal-type step (triage, security) to the triage-coordinator agent.
  • dispatch_scan_step(step: StepDef, module: ResolvedModule, settings: PipelineSettings, batch_dir: Path) -> StepResult
    • Dispatches a security-scan step (memory, logic scanners).
  • dispatch_skill_step(step: StepDef, module: ResolvedModule, settings: PipelineSettings, batch_dir: Path) -> StepResult
    • Dispatches a skill-group step (classify, callgraph, taint, dossiers, entrypoints).
  • write_batch_manifest(batch_dir: str | Path, definition: PipelineDef, progress: dict) -> None
    • Writes/updates the batch manifest file with current progress.
  • write_batch_summary(batch_dir: str | Path, batch_result: BatchResult) -> str
    • Writes the final batch summary JSON and returns its path.

15. Workspace & Orchestration

workspace

Run-directory I/O primitives for multi-step workflow handoff.

  • create_run_dir(module_name: str, goal: str) -> str
    • Creates and returns a new workspace run directory path under .claude/workspace/.
  • list_runs(module: str | None = None, goal: str | None = None, limit: int | None = 10) -> list[dict]
    • Lists workspace runs, optionally filtered by module or goal. Returns manifest metadata per run.
  • write_results(run_dir: str | Path, step_name: str, full_data: Any, summary_data: Any) -> dict[str, str]
    • Writes full results.json and summary.json for a step. Returns paths dict.
  • read_results(run_dir: str | Path, step_name: str) -> Any
    • Reads and returns full results JSON for a step (with workspace envelope).
  • read_step_payload(run_dir: str | Path, step_name: str) -> Any
    • Reads and returns the unwrapped skill output payload (envelope stripped).
  • read_summary(run_dir: str | Path, step_name: str) -> Any
    • Reads and returns summary JSON for a step.
  • get_step_paths(run_dir: str | Path, step_name: str) -> dict[str, str]
    • Returns paths for a step's results and summary files (no I/O).
  • update_manifest(run_dir: str | Path, step_name: str, status: str, summary_path: str | Path) -> None
    • Updates the manifest with step status and summary path reference.
  • summarize_json_payload(payload: Any) -> dict
    • Produces a compact preview of a JSON payload (key counts, list lengths, scalar truncation).
  • utc_iso() -> str -- Returns current UTC time as ISO-8601 string.
  • safe_name(value: str, fallback: str = "item") -> str -- Sanitizes a string for use in file paths.
  • coerce_path(value: str | Path) -> Path -- Resolves a path to absolute.
  • MANIFEST_FILE, RESULTS_FILE, SUMMARY_FILE -- Filename constants.

workspace_bootstrap

Workspace step setup bootstrap reducing boilerplate for skill scripts.

  • prepare_step(run_dir: str | Path, step_name: str) -> dict[str, str]
    • Creates step subdirectory and returns paths for output files (step_name, step_path, results_path, summary_path).
  • complete_step(run_dir: str | Path, step_name: str, full_data: Any, summary_data: Any, status: str = "success") -> dict[str, str]
    • Writes step results + summary and updates the manifest in one call.

workspace_validation

Validates workspace handoff compliance for run directories.

  • WorkspaceValidationResult(dataclass)
    • Fields: valid, run_dir, issues, manifest, step_count
    • to_dict() -> dict
  • validate_workspace_run(run_dir: str | Path) -> WorkspaceValidationResult
    • Validates: run directory exists, manifest.json is valid, steps have status and summary_path, referenced files exist, each step directory has results.json and summary.json.

agent_common

Shared orchestration helpers for agent scripts.

  • AgentStep(dataclass, frozen) -- Description of a skill-script invocation.
    • Fields: name, skill_name, script_name, args, timeout, json_output, workspace_dir, workspace_step, max_retries
  • AgentStepResult(dataclass) -- Execution result for one step.
    • Fields: name, skill_name, script_name, success, elapsed_seconds, exit_code, error, stdout, stderr, json_data
    • to_dict() -> dict
  • AgentBase -- Shared runner wrapper for agent skill invocations.
    • run_skill_script_result(skill_name, script_name, args, *, timeout, json_output, workspace_dir, workspace_step, max_retries, warn_on_failure) -> dict
    • run_skill_script(skill_name, script_name, args, *, timeout, workspace_dir, workspace_step, max_retries) -> dict | list | None
  • AgentOrchestrator -- Lightweight step execution with retry and circuit-breaker.
    • __init__(runner: AgentBase | None, *, max_workers: int = 4, failure_threshold: int | None = None)
    • run_step(step: AgentStep) -> AgentStepResult
    • results (property) -> list[AgentStepResult]
    • summary() -> dict -- Aggregate statistics: total, failed, elapsed, steps.

16. Cross-Module Indexes

import_export_index

PE import/export table index across all analyzed modules for loader-level dependency resolution.

  • ExportEntry(dataclass, frozen)
    • Fields: module, db_path, name, ordinal, is_forwarded, forwarded_to
  • ImportEntry(dataclass, frozen)
    • Fields: importing_module, source_module, function_name, is_delay_loaded, ordinal
  • ImportExportIndex -- Queryable index (context manager).
    • __init__(tracking_db: str | None = None, workspace_root: str | Path | None = None, *, max_workers: int = 8)
    • who_exports(function_name: str) -> list[ExportEntry] -- Modules whose PE export table contains the function.
    • who_imports(function_name: str, *, source_module: str | None = None) -> list[ImportEntry] -- Modules that import the function.
    • module_consumers(module_name: str) -> dict[str, list[str]] -- Modules that import from module_name, grouped by importing module.
    • module_suppliers(module_name: str) -> dict[str, list[str]] -- Modules that module_name imports from, grouped by supplier.
    • resolve_forwarder_chain(module: str, function_name: str) -> list[ExportEntry] -- Follows PE forwarded export chains to the final implementation.
    • dependency_graph() -> dict[str, set[str]] -- Module-to-module dependency edges from PE import tables.
    • module_export_list(module_name: str) -> list[ExportEntry] -- All exports for a module.
    • summary() -> dict -- Aggregate statistics (module count, export count, import count).
    • Context manager: with ImportExportIndex() as idx: ...

type_constants (expanded)

IDA type size mappings and C type translation tables. Expands the brief entry in section 3.

  • TYPE_SIZES: dict[str, int] -- IDA type name to byte size (e.g., "_BYTE": 1, "DWORD": 4, "_QWORD": 8). Covers _BYTE, BYTE, char, _WORD, WORD, short, _DWORD, DWORD, int, LONG, HRESULT, _QWORD, QWORD, __int64, and unsigned variants.
  • IDA_TO_C_TYPE: dict[str, str] -- IDA type name to C standard type for header generation (e.g., "_DWORD": "uint32_t", "HRESULT": "HRESULT").
  • SIZE_TO_C_TYPE: dict[int, str] -- Field byte-size to default C type (e.g., 1: "uint8_t", 4: "uint32_t", 8: "uint64_t").

17. Low-Level Utilities

sql_utils

Shared SQL utilities for safe LIKE queries.

  • escape_like(value: str) -> str
    • Escapes SQL LIKE meta-characters (\, %, _) so the value is matched literally. Callers must append ESCAPE '\' to the LIKE clause.
  • LIKE_ESCAPE: str -- The SQL ESCAPE '\' clause fragment to append to LIKE expressions.

logging_config

Centralized logging configuration for the runtime.

  • configure_logging() -> None
    • Sets up the helpers logger hierarchy with a stderr handler. Safe to call multiple times. Level controlled by DEEPEXTRACT_LOG_LEVEL environment variable (default WARNING).