DeepExtract: Database Schema and Binary Analysis Data Format

This document provides a technical reference for the data architecture, schemas, and analysis heuristics generated by the DeepExtract binary analysis extractor for IDA Pro.

System Overview

DeepExtract is an IDA Pro Python extractor that converts binaries into structured, queryable data. It performs PE metadata extraction and function‑level analysis (xrefs, stack frames, loop analysis, indirect call resolution, vtable context reconstruction) and stores results in SQLite. Optional report generation produces file_info.md/file_info.json and per‑function .cpp files only when --generate-cpp is enabled and at least one function has usable decompiled output (C++ files are emitted only for functions passing the decompilation filter).

Data Architecture

The structured output uses a SQLite‑based storage model supporting concurrent analysis and large‑scale binary repositories.

Tracking Database (analyzed_files.db): A registry tracking status, hashes, and metadata for binaries in the analysis pipeline. analysis_flags stores AnalysisConfig.to_dict() (see key list below; it does not include cpp_output_dir or SQLite pragma overrides).
Individual Analysis Databases: Per‑binary databases containing disassembly/decompilation output, cross‑references, and analysis metadata. In batch mode the default is {filename}_{hash}.db under extracted_dbs/; in single‑file mode this is the explicit --sqlite-db path.
Optional Report/Code Output: When --generate-cpp is enabled and at least one function has valid decompiled output, the extractor writes file_info.md, file_info.json, and per‑function .cpp files under {cpp_output_dir}/{module_name}/ (or {sqlite_db_dir}/extracted_raw_code/{module_name}/ if --cpp-output-dir is not provided).

Headless Batch Extractor Outputs (PowerShell)

When using headless_batch_extractor.ps1, the script orchestrates multiple IDA runs and produces a consistent on‑disk layout in {StorageDir}:

analyzed_files.db — Tracking database used by check_analyzed_files.py
extracted_dbs/ — SQLite analysis databases ({base_name}.db, see naming below)
extracted_code/ — Generated .cpp files plus file_info.md/file_info.json (only when -NoGenerateCpp is not set)
idb_cache/ — IDA database cache files (.i64)
logs/ — IDA run logs ({base_name}_{timestamp}.log), batch execution log (batch_extractor_{timestamp}.log), and optional symbol download logs (symchk_*.log, symchk_*.log.err)
analyzed_modules_list.txt — One absolute input file path per line (UTF-8), for the files that were analyzed
extraction_report.json — JSON summary report for the batch run

`extraction_report.json` schema (generated by `Write-ExtractionReport`)

{
  "extraction_info": {
    "timestamp": "YYYY-MM-DDTHH:mm:ss",
    "mode": "Directory | FileList | Pid",
    "storage_directory": "path",
    "timeout_hours": number,
    "extract_dirs": ["path", ...],
    "extract_dirs_recursive": ["path", ...],
    "target_pids": [number, ...],
    "process_names": ["string", ...]
  },
  "summary": {
    "total_files": number,
    "successful": number,
    "failed": number,
    "timed_out": number,
    "skipped": number,
    "empty_log_files": number
  },
  "successful_extractions": [
    { "FileName": "path", "DbPath": "path", "IdbPath": "path", "LogFile": "path" }
  ],
  "failed_extractions": [
    { "FileName": "path", "ExitCode": number, "LogFile": "path" }
  ],
  "timed_out_extractions": [
    { "FileName": "path", "LogFile": "path", "ElapsedTime": number, "Reason": "string" }
  ],
  "skipped_files": [
    { "FileName": "path", "Reason": "string" }
  ],
  "empty_log_files": [
    { "FileName": "path", "LogFile": "path", "ExitCode": "number | \"unknown\"", "Reason": "string" }
  ]
}

Notes:

target_pids and process_names are arrays included only when mode is Pid.
extract_dirs (non-recursive) and extract_dirs_recursive are arrays included only when mode is Directory. Either or both may be present depending on which parameters were used.
Per-file base name ({base_name}) is computed as: {file.Name.Replace('.', '_')}_{md5_prefix} lower-cased, where md5_prefix is the first 10 characters of the MD5 of the file contents. Example: kernel32.dll -> kernel32_dll_a1b2c3d4e5.

Analysis Heuristics for Prioritization

The extractor records function‑level heuristics in the functions table (JSON strings) and binary‑level heuristics in the file_info table.

1. Loop Analysis (`loop_analysis`)

Algorithm: Tarjan SCC + dominator/back‑edge detection over IDA flow graphs (loop_analysis.py).
Metrics: Per‑loop entries include complexity, block_count, instruction_count, has_function_calls, memory_access, is_infinite, exit_condition_count, cyclomatic_complexity, back_edges, and nesting_level.
Availability: If analysis is disabled or unavailable, the extractor stores { "loops": [], "loop_count": 0, "analysis_available": false }. Large functions may be skipped with skipped, reason, block_count, and limit. Errors may add an error field.

2. Dangerous API Detection (`dangerous_api_calls`)

Algorithm: Matches outbound xref function_name values against dangerous_apis.json via constants.is_dangerous_api() (case‑insensitive, strips import prefixes like __imp_/_imp_, and treats WPP_ as dangerous).
Output: JSON string containing a list of unique API names. Empty list when none are detected or on error.

3. String Reference Extraction (`string_literals`)

Algorithm: Builds a function→strings map by enumerating idautils.Strings() and xrefs to functions (string_analysis.py). A per‑binary cache in %TEMP% is used when available (validated against IDB mtime).
Output: JSON list of unique referenced string literals (capped at 2000 items, order not guaranteed).

4. Stack Frame Analysis (`stack_frame`)

Fields (when available): local_vars_size, args_size, saved_regs_size, exception_handler, frame_pointer_present, has_canary.
Canary detection: Combines frame variable names with instruction‑level patterns (__security_check_cookie, stack_chk_fail, XOR against frame values).
Availability: When disabled or on failure, the extractor stores { "local_vars_size": null, "args_size": null, "has_canary": null, "analysis_available": false } (other keys are omitted).

5. VTable Contexts (`vtable_contexts`)

Source: Inbound xrefs flagged as vtable candidates; surrounding vtable slots are read and resolved.
Output: Entries include reconstructed_classes (C++ skeleton strings), source_ea, and extraction_type. Capped at 100 entries.

6. TLS Callback Analysis (`tls_callbacks`)

Scope: Binary‑level analysis (file‑info) when TLS data exists.
Output: Per‑callback metadata includes function identity, segment info, xrefs, string/API/crypto indicators, anti‑debug findings, analysis_notes, and threat_score/threat_level.

Analysis Systems

The extractor uses analysis engines to resolve binary patterns:

Inter-procedural Data Flow Analysis

Implemented in interprocedural_analysis.py and invoked by indirect call analysis when available:

Summary-based, bounded: max depth 3, limited callers/callees per function, LRU cache (size 1000) for summaries.
Tracks: parameter usage as function pointers, return value sources (constant/parameter/global), and global‑based call targets.
Outputs: analysis warnings when incomplete or when targets cannot be resolved.

Indirect Call & Jump Table Resolution

Indirect calls: Detects indirect call patterns (call reg, call [mem], call [reg+off]), performs intra‑procedural data‑flow to identify targets, then applies inter‑procedural analysis when no targets are found or confidence < 50%.
Jump tables: Uses IDA switch info when available and a manual fallback for indirect jumps; both produce target lists with confidence scoring and detection metadata (minimum acceptance threshold 30%).
Outbound xrefs: Encoded with confidence and validation checks in the outbound_xrefs JSON entries.

Offline Code Generation (CPP Generator)

Hierarchy: Generates a structured directory of .cpp files plus file_info.md/file_info.json.
Categorization: Groups functions into C++ class method files or standalone global function files based on naming conventions.
Cleaned Source: Replaces IDA escape sequences and wraps comments for readability.

SQLite Database Schemas

Schema validation uses deep_extract/schema.py, but the actual table creation occurs in deep_extract/pe_context_extractor.py:init_sqlite_db().

Tracking Database (`analyzed_files.db`)

Purpose: Ledger tracking the analysis status of all binaries.

Table: `analyzed_files`

Column Name	Data Type	Description
`file_path`	TEXT	PRIMARY KEY. Absolute path to the binary file.
`base_dir`	TEXT	The directory containing the binary file.
`file_name`	TEXT	The name of the binary file, including its extension.
`file_extension`	TEXT	The file's extension (e.g., `.exe`, `.dll`).
`md5_hash`	TEXT	The MD5 cryptographic hash for change detection.
`sha256_hash`	TEXT	The SHA256 cryptographic hash for change detection.
`analysis_db_path`	TEXT	Relative path to the individual analysis SQLite database (relative to the common DB directory; forward slashes; may be null until completion).
`status`	TEXT	Status: `PENDING`, `ANALYZING`, or `COMPLETE`.
`analysis_flags`	TEXT	JSON string of the analysis configuration used (see below).
`analysis_start_timestamp`	TIMESTAMP	Recorded when analysis began (used for stale lock recovery).
`analysis_completion_timestamp`	TIMESTAMP	Recorded when analysis completed.

analysis_flags keys (from AnalysisConfig.to_dict):

sqlite_db, extract_dangerous_apis, extract_strings, extract_stack_frame, extract_globals
analyze_loops, extract_pe_info, extract_pe_metadata, extract_advanced_pe, extract_runtime_info
generate_cpp, force_reanalyze, use_interprocedural_analysis, thunk_depth, min_conf

Note: check_analyzed_files.py compares flags with force_reanalyze removed. cpp_output_dir, sqlite_pragmas, and common_db_pragmas are intentionally not stored in analysis_flags.

Indices

Index Name	Column(s)	Purpose
`idx_files_name`	`file_name`	Fast filename lookups.
`idx_files_hash`	`md5_hash`, `sha256_hash`	Hash‑based change detection.
`idx_an_files_status_lower_name`	`status`, `LOWER(file_name)`	Status + case‑insensitive name search.
`idx_an_files_lower_name`	`LOWER(file_name)`	Case‑insensitive filename search.
`idx_an_files_lower_ext`	`LOWER(file_extension)`	Case‑insensitive extension search.

Individual Analysis Database (per binary)

Purpose: Stores results for a single binary.

Table: `file_info`

Column Name	Data Type	Description
`file_path`	TEXT	Absolute path to the binary (PK).
`base_dir`	TEXT	Directory containing the file.
`file_name`	TEXT	Filename with extension.
`file_extension`	TEXT	Extension (e.g., `.exe`, `.dll`).
`file_size_bytes`	BIGINT	Total size in bytes.
`md5_hash`	TEXT	MD5 cryptographic hash.
`sha256_hash`	TEXT	SHA256 cryptographic hash.
`imports`	TEXT	JSON string of imported modules/functions (regular + delay‑load merged; per‑function `is_delay_loaded`).
`exports`	TEXT	JSON string of exported functions.
`entry_point`	JSON	JSON string of detected entry points (from `extract_all_entry_points_with_methods`).
`file_version`	TEXT	Resource file version string.
`product_version`	TEXT	Product version string.
`company_name`	TEXT	Vendor identification.
`file_description`	TEXT	File purpose description.
`internal_name`	TEXT	Internal module name.
`original_filename`	TEXT	Original filename from PE header.
`legal_copyright`	TEXT	Copyright string.
`product_name`	TEXT	Product name.
`time_date_stamp_str`	TEXT	Compilation timestamp.
`file_modified_date_str`	TEXT	Filesystem modification date.
`sections`	TEXT	JSON string of PE sections.
`pdb_path`	TEXT	Path to PDB debug symbols.
`rich_header`	TEXT	JSON string with Rich Header metadata.
`tls_callbacks`	TEXT	JSON string of TLS callback metadata.
`is_net_assembly`	BOOLEAN	Flag for .NET binaries.
`clr_metadata`	TEXT	JSON string with CLR header details.
`idb_cache_path`	TEXT	Relative path to cached `.i64`/`.idb` (from the analysis DB directory; forward slashes).
`dll_characteristics`	TEXT	JSON string of DLL flags.
`security_features`	TEXT	JSON string of feature assessments.
`exception_info`	TEXT	JSON string for exception directory.
`load_config`	TEXT	JSON string for PE Load Config data.
`analysis_timestamp`	TIMESTAMP	Timestamp (`YYYY-MM-DD HH:MM:SS`) recorded when the `file_info` row is inserted.

Table: `functions`

Column Name	Data Type	Description
`function_id`	INTEGER	Unique primary key.
`function_signature`	TEXT	Demangled signature (IDA long form).
`function_signature_extended`	TEXT	Extended signature using IDA type info when present.
`mangled_name`	TEXT	Original decorated name.
`function_name`	TEXT	Raw IDA function name (may be mangled or `sub_…`).
`assembly_code`	TEXT	Disassembly text (capped at `MAX_ASSEMBLY_LINES`).
`decompiled_code`	TEXT	Hex‑Rays output or placeholder text on failure.
`inbound_xrefs`	TEXT	Detailed JSON of inbound references.
`outbound_xrefs`	TEXT	Detailed JSON of outbound calls/jumps.
`simple_inbound_xrefs`	TEXT	Simplified JSON caller list.
`simple_outbound_xrefs`	TEXT	Simplified JSON callee list.
`vtable_contexts`	TEXT	JSON list of reconstructed class skeletons.
`global_var_accesses`	TEXT	JSON list of global read/write accesses.
`dangerous_api_calls`	TEXT	JSON list of dangerous API names (stored as a JSON string).
`string_literals`	TEXT	JSON list of referenced string literals (order not guaranteed).
`stack_frame`	TEXT	JSON object with stack/frame metrics.
`loop_analysis`	TEXT	JSON object with loop analysis data.
`analysis_errors`	TEXT	JSON array of processing errors/warnings.
`created_at`	TIMESTAMP	Record creation timestamp.

Note: JSON fields in file_info/functions are stored as TEXT (serialized JSON). Most fields use json_safety.to_json_safe(), so values may be "null" or include truncation metadata.

Table: `function_xrefs`

Purpose: Normalized, relational cross-reference table that represents the call graph and data-reference graph as individual rows. Each row is a single directed edge between a source function and a target (which may be an internal function, an imported API, a data reference, or a vtable slot).

This table complements the JSON xref columns on the functions table (simple_inbound_xrefs, simple_outbound_xrefs, inbound_xrefs, outbound_xrefs). The JSON columns are optimized for per-function reads (loading one function's context), while function_xrefs is optimized for cross-module graph queries (traversals, reachability, call-chain analysis) using standard SQL joins and index lookups.

Column Name	Data Type	Description
`id`	INTEGER	PRIMARY KEY (autoincrement). Row identifier.
`source_id`	INTEGER	`function_id` of the calling/referencing function. NULL for data-only inbound references with no resolved source function.
`target_id`	INTEGER	`function_id` of the called/referenced function. NULL when the target is an external import, data reference, or unresolved symbol.
`target_name`	TEXT	Name of the target function or symbol (e.g., `"CreateProcessAsUserW"`, `"AiCheckForAdminUser"`, `"_SVCHOST_GLOBAL_DATA * g_pSvchostSharedGlobals"`).
`target_module`	TEXT	Where the target lives: `"internal"` for same-module functions, DLL name for imports (e.g., `"kernelbase.dll"`), `"data"` for globals, `"vtable"` for vtable slots, `"static_library"` for statically linked library functions.
`function_type`	INTEGER	Target classification using IDA function type constants: `FT_UNK=0`, `FT_GEN=1` (internal), `FT_LIB=2` (static library), `FT_API=3` (import), `FT_MEM=4` (data), `FT_VTB=8` (vtable/unknown offset).
`xref_type`	TEXT	IDA cross-reference type: `"Call Near"`, `"Offset"`, `"Read"`, `"Write"`, etc.
`direction`	TEXT	Edge direction relative to the function being processed: `"outbound"` (this function calls/references the target) or `"inbound"` (the target calls/references this function).

Uniqueness constraint: UNIQUE(source_id, target_id, target_name, target_module, xref_type, direction) prevents duplicate edges. Insertion uses INSERT OR IGNORE to silently drop duplicates from IDA's raw xref data.

Relationship to JSON xref columns: The function_xrefs rows are derived from the same raw xref data as simple_inbound_xrefs and simple_outbound_xrefs. Both representations are populated during the same extraction pass. The JSON columns retain the original per-function grouping; the function_xrefs table flattens all edges into a single queryable structure.

Use Cases

Reverse lookup -- find all callers of a specific function:

SELECT DISTINCT f.function_name
FROM function_xrefs x
JOIN functions f ON f.function_id = x.source_id
WHERE x.target_name = 'CreateProcessAsUserW'
  AND x.direction = 'outbound';

Internal call graph reconstruction:

SELECT s.function_name AS caller, t.function_name AS callee
FROM function_xrefs x
JOIN functions s ON s.function_id = x.source_id
JOIN functions t ON t.function_id = x.target_id
WHERE x.direction = 'outbound'
  AND x.target_module = 'internal';

Import dependency surface -- which DLLs does a function depend on:

SELECT DISTINCT x.target_module, x.target_name
FROM function_xrefs x
WHERE x.source_id = ?
  AND x.direction = 'outbound'
  AND x.function_type = 3;  -- FT_API

Dead code detection -- functions with no inbound references:

SELECT f.function_name
FROM functions f
WHERE f.function_id NOT IN (
    SELECT target_id FROM function_xrefs
    WHERE direction = 'inbound' AND target_id IS NOT NULL
);

Transitive call chain -- two-hop callers of a function (who calls the callers):

SELECT DISTINCT f2.function_name AS indirect_caller
FROM function_xrefs x1
JOIN function_xrefs x2 ON x2.target_id = x1.source_id
JOIN functions f2 ON f2.function_id = x2.source_id
WHERE x1.target_name = 'NtDuplicateToken'
  AND x1.direction = 'outbound'
  AND x2.direction = 'outbound'
  AND x2.target_module = 'internal';

Global data access analysis -- find all functions that read a specific global:

SELECT f.function_name
FROM function_xrefs x
JOIN functions f ON f.function_id = x.source_id
WHERE x.target_name LIKE '%g_pSvchostSharedGlobals%'
  AND x.direction = 'outbound'
  AND x.xref_type = 'Read';

Table: `schema_version`

Column Name	Data Type	Description
`version`	INTEGER	PRIMARY KEY. Schema version number.
`description`	TEXT	Description of the schema version.
`applied_timestamp`	TIMESTAMP	When this schema version was applied.
`migration_notes`	TEXT	Details regarding migration from prior version.

Indices

Index Name	Table	Column(s)	Purpose
`idx_file_info_lower_ext`	`file_info`	`LOWER(file_extension)`	Case-insensitive extension search.
`idx_file_info_lower_name`	`file_info`	`LOWER(file_name)`	Case-insensitive filename search.
`idx_functions_mangled`	`functions`	`mangled_name COLLATE NOCASE`	Case-insensitive mangled name lookups.
`idx_functions_name`	`functions`	`function_name COLLATE NOCASE`	Case-insensitive function name lookups.
`idx_functions_signature`	`functions`	`function_signature COLLATE NOCASE`	Case-insensitive signature search.
`idx_fxrefs_source`	`function_xrefs`	`source_id`	Fast caller lookups and outbound joins.
`idx_fxrefs_target`	`function_xrefs`	`target_id`	Fast callee lookups and inbound joins.
`idx_fxrefs_target_name`	`function_xrefs`	`target_name COLLATE NOCASE`	Name-based target search (e.g., imports).
`idx_fxrefs_direction`	`function_xrefs`	`direction`	Filter by inbound/outbound direction.

Function JSON Field Formats

The following JSON fields are stored as strings in the functions table.

inbound_xrefs

Detailed inbound references (callers). Each entry commonly includes:

source_ea (hex string), source_location_details, xref_type, is_code, is_vtable_candidate
source_name_info (object with mangled/short/long or null)
source_function_name, source_function_name_short, source_function_name_long, source_function_name_mangled
source_module_name, source_module_host, source_function_type
call_confidence, validation_checks, validation_warnings, extraction_type
Optional: function_name (for code call/jump/offset xrefs)

outbound_xrefs

Detailed outbound calls/jumps. Base fields include:

source_instruction_ea, target_ea, resolved_target_ea, xref_type
function_name, short_function_name, function_name_short, function_name_long, function_name_mangled
module_name, module_host (pre‑API‑set resolution), function_type
is_tail_call, is_thunk_resolved
call_confidence, validation_checks, validation_warnings, extraction_type
Optional: original_target_name (present when a thunk or jump table target was resolved)

Specialized entries add:

VTable calls: is_vtable_call, vtable_info (vtable address, name, method offset/names, size)
Indirect calls: is_indirect_call, indirect_call_info (call_type, confidence, analysis_method)
Jump table targets: is_jump_table_target, jump_table_detection_confidence, jump_table_detection_method

simple_inbound_xrefs / simple_outbound_xrefs

Simplified xref objects with fields:

function_name, function_id (null for data/external refs), module_name, function_type, xref_type, extraction_type ("script")

function_type values: FT_UNK=0, FT_GEN=1, FT_LIB=2, FT_API=3, FT_MEM=4, FT_VTB=8, FT_SYS=16.

vtable_contexts

List of reconstructed class skeletons derived from vtable slots:

reconstructed_classes (array of C++ class skeleton strings)
source_ea
extraction_type (e.g., detailed_vtable_analysis)

global_var_accesses

List of global variable reads/writes:

address (hex string)
name
access_type (Read or Write)

loop_analysis

Object containing:

loops: list of loop entries with complexity, block_count, instruction_count, has_function_calls, memory_access, is_infinite, exit_condition_count, cyclomatic_complexity, back_edges, nesting_level
loop_count
Optional: analysis_available (false when disabled or unsupported), skipped, reason, block_count, limit, function_name, or error

stack_frame

Object containing:

local_vars_size, args_size, saved_regs_size, exception_handler, frame_pointer_present, has_canary
Optional: analysis_available (false when stack analysis is disabled or fails). When disabled/unavailable the extractor stores { "local_vars_size": null, "args_size": null, "has_canary": null, "analysis_available": false }.

string_literals

List of string literals referenced by the function.

dangerous_api_calls

List of unique API names flagged as dangerous.

analysis_errors

List of error/warning objects. All entries include stage. Entries created from exceptions include error; truncation warnings may include severity, reason, and max_lines.

JSON Serialization Limits

JSON fields are serialized with json_safety.to_json_safe, which enforces size and item limits. When truncation occurs:

Lists may include a trailing {"_truncated": true, "original_count": N, "shown_count": M} entry.
Dicts may include _truncated_keys when key limits are exceeded.
Both lists and dicts receive _serialization_metadata with truncated: true and a transformations list.
If a field exceeds the byte limit (default 50MB), an additional _truncated object (or _error object) may be inserted with fields like reason, original_size_bytes, and max_bytes.

Common per‑field caps (list limits):

entry_point: 100 items
imports: 1000 modules
exports: 5000 entries
sections: 200 entries
tls_callbacks: 256 entries
string_literals: 2000 entries
vtable_contexts: 100 entries
loop_analysis: 500 items per list inside the object
global_var_accesses: 1000 entries
analysis_errors: 100 entries
inbound_xrefs / outbound_xrefs / simple_*_xrefs: capped by EXTRACTOR_MAX_XREFS (env var; default 50,000; clamped to 1,000–500,000)

Usage Notes for AI Agents

Registry Discovery: Query analyzed_files.db to find individual module databases and verify analysis completeness across software packages.
Call Chain Reconstruction: Use function_xrefs for efficient SQL-based call chain queries. For lightweight per-function reads, use simple_outbound_xrefs. For indirect calls/jump tables/vtables, use outbound_xrefs entries with indirect_call_info, jump_table_*, or vtable_info.
Cross-Reference Analysis: Use function_xrefs with SQL joins for graph queries (reverse lookups, transitive callers, import surface analysis). Use inbound_xrefs and outbound_xrefs JSON columns when full xref detail is needed for a single function.
Heuristic Data Retrieval: Use SQL JSON_EXTRACT to filter functions by technical metrics such as loop complexity, stack size, or anti-debug indicators.
Technical Verification: Correlate observations in decompiled_code with the raw assembly_code and stack_frame (canary status) for technical validation.

FilesExpand file tree

data_format_reference.md

Latest commit

History

data_format_reference.md

File metadata and controls

DeepExtract: Database Schema and Binary Analysis Data Format

System Overview

Data Architecture

Headless Batch Extractor Outputs (PowerShell)

extraction_report.json schema (generated by Write-ExtractionReport)

Analysis Heuristics for Prioritization

1. Loop Analysis (loop_analysis)

2. Dangerous API Detection (dangerous_api_calls)

3. String Reference Extraction (string_literals)

4. Stack Frame Analysis (stack_frame)

5. VTable Contexts (vtable_contexts)

6. TLS Callback Analysis (tls_callbacks)

Analysis Systems

Inter-procedural Data Flow Analysis

Indirect Call & Jump Table Resolution

Offline Code Generation (CPP Generator)

SQLite Database Schemas

Tracking Database (analyzed_files.db)

Table: analyzed_files

Indices

Individual Analysis Database (per binary)

Table: file_info

Table: functions

Table: function_xrefs

Use Cases

Table: schema_version

Indices

Function JSON Field Formats

inbound_xrefs

outbound_xrefs

simple_inbound_xrefs / simple_outbound_xrefs

vtable_contexts

global_var_accesses

loop_analysis

stack_frame

string_literals

dangerous_api_calls

analysis_errors

JSON Serialization Limits

Usage Notes for AI Agents

`extraction_report.json` schema (generated by `Write-ExtractionReport`)

1. Loop Analysis (`loop_analysis`)

2. Dangerous API Detection (`dangerous_api_calls`)

3. String Reference Extraction (`string_literals`)

4. Stack Frame Analysis (`stack_frame`)

5. VTable Contexts (`vtable_contexts`)

6. TLS Callback Analysis (`tls_callbacks`)

Tracking Database (`analyzed_files.db`)

Table: `analyzed_files`

Table: `file_info`

Table: `functions`

Table: `function_xrefs`

Table: `schema_version`