This document describes the format, computation, and purpose of module_profile.json, a pre-computed fingerprint summarising a module's identity, scale, library composition, API surface, complexity characteristics, and security posture.
Unlike file_info.json and function_index.json, the module profile is generated unconditionally after function extraction completes -- it does not require --generate-cpp. The profile reads exclusively from the per-module SQLite database (file_info and functions tables) and adds negligible extraction time.
The profile is written to a directory that depends on whether C++ generation is enabled:
- With
--generate-cpp:{cpp_output_dir}/{module_name}/module_profile.json(co-located with generated C++ files) - Without
--generate-cpp:{sqlite_db_dir}/module_profile.json(next to the analysis database)
module_name is derived from the input file name as {stem}_{extension} (extension without the dot), then sanitized with CppGenerator.sanitize_filename() (:: -> _, non [a-zA-Z0-9_.-] replaced with _, truncated to 100 chars). For example: kernel32.dll -> kernel32_dll.
Output directory resolution when --generate-cpp is enabled:
main.pydefault: if--cpp-output-diris not provided, output goes to{sqlite_db_dir}/extracted_raw_code/{module_name}/.main.pywith--cpp-output-dir: output goes to{cpp_output_dir}/{module_name}/.headless_batch_extractor.ps1: passes--cpp-output-dir "{StorageDir}/extracted_code", so the profile is written to{StorageDir}/extracted_code/{module_name}/.
The profile contains six top-level sections, each aggregating a different aspect of the module.
{
"identity": {
"module_name": "string",
"file_name": "string | null",
"description": "string | null",
"company": "string | null",
"version": "string | null"
},
"scale": {
"total_functions": "number",
"named_functions": "number",
"unnamed_sub_functions": "number",
"with_decompiled": "number",
"with_assembly": "number",
"class_count": "number",
"export_count": "number"
},
"library_profile": {
"app_functions": "number",
"library_functions": "number",
"noise_ratio": "number",
"breakdown": {
"<tag>": "number"
}
},
"api_profile": {
"dangerous_api_functions": "number",
"total_dangerous_refs": "number",
"security_api_count": "number",
"crypto_api_count": "number",
"com_api_count": "number",
"rpc_api_count": "number",
"winrt_api_count": "number",
"named_pipe_api_count": "number",
"process_api_count": "number",
"import_surface": {
"com_present": "boolean",
"rpc_present": "boolean",
"winrt_present": "boolean",
"named_pipes_present": "boolean",
"com_modules": ["string"],
"rpc_modules": ["string"],
"winrt_apisets": ["string"],
"named_pipe_functions": ["string"]
}
},
"complexity_profile": {
"functions_with_loops": "number",
"total_loops": "number",
"avg_asm_size": "number",
"max_asm_size": "number",
"functions_over_500_instructions": "number"
},
"security_posture": {
"aslr": "boolean | null",
"dep": "boolean | null",
"cfg": "boolean | null",
"seh": "boolean | null",
"canary_coverage_pct": "number | null"
}
}Basic identification metadata sourced from the file_info table.
module_name: Sanitised module name (e.g.appinfo_dll).file_name: Original file name from the PE header (file_info.file_name).description: File description from the PE version resource (file_info.file_description).company: Company name from the PE version resource (file_info.company_name).version: File version string. Usesfile_info.file_version; falls back tofile_info.product_versionwhen the file version is absent.
Function-level size metrics derived from the functions table.
total_functions: Total rows in thefunctionstable.named_functions: Functions whosefunction_namedoes not start withsub_(IDA's default prefix for unnamed subroutines).unnamed_sub_functions:total_functions - named_functions.with_decompiled: Functions with usable decompiled output (notnull, not"Decompiler not available", and not starting with"Decompilation failed:").with_assembly: Functions with a non-nullassembly_codecolumn.class_count: Distinct class prefixes extracted from function names containing::.export_count: Number of entries in thefile_info.exportsJSON array.
Library-vs-application composition using the same detection logic as function_index.json.
Each function's function_name and mangled_name are checked against CppGenerator._LIBRARY_PATTERNS. Matching functions receive a tag (WIL, STL, WRL, CRT, ETW/TraceLogging); unmatched functions are counted as application code.
app_functions: Functions with no library tag match.library_functions: Functions matching at least one library pattern.noise_ratio:library_functions / total_functions, rounded to three decimal places. A value of0.475means 47.5% of functions are library boilerplate.breakdown: Object mapping each detected library tag to its function count, ordered by frequency (most common first).
API surface analysis combining two data sources: function-level dangerous-API references and module-level import scanning.
Aggregated from the functions.dangerous_api_calls JSON column. Each row is a JSON array of API names flagged during extraction.
dangerous_api_functions: Number of functions with at least one dangerous API reference.total_dangerous_refs: Total dangerous API references across all functions.security_api_count: References matching security/token/ACL patterns (e.g.AdjustTokenPrivileges,OpenProcessToken,ImpersonateLoggedOnUser,AccessCheck).crypto_api_count: References matching cryptographic patterns (e.g.BCrypt*,NCrypt*,CryptEncrypt,CertOpenStore).com_api_count: References matching COM patterns (e.g.CoCreateInstance,CoInitializeEx,CLSIDFromProgID).rpc_api_count: References matching RPC patterns (e.g.RpcServerListen,NdrClientCall*,UuidCreate).winrt_api_count: References matching WinRT patterns (e.g.RoInitialize,RoActivateInstance,WindowsCreateString).named_pipe_api_count: References matching named-pipe patterns (e.g.CreateNamedPipe*,ConnectNamedPipe,TransactNamedPipe).process_api_count: References matching process/thread patterns (e.g.CreateProcess*,OpenProcess,CreateRemoteThread).
A single API reference can match multiple categories (e.g. an API that is both security-related and process-related).
Module-level technology presence derived from the file_info.imports JSON column. Each import entry contains module_name (resolved DLL), raw_module_name (original import or API-set name), and a functions[] array.
com_present:trueif any imported module resolves tocombase.dll,ole32.dll, oroleaut32.dll, or if an API-set name containscom-l.rpc_present:trueif any imported module resolves toRPCRT4.dll.winrt_present:trueif any API-set name containswinrt(e.g.api-ms-win-core-winrt-l1-1-0).named_pipes_present:trueif any imported function matches a named-pipe pattern (CreateNamedPipe*,ConnectNamedPipe,CallNamedPipe*,WaitNamedPipe*,TransactNamedPipe,PeekNamedPipe,DisconnectNamedPipe).com_modules: Sorted list of COM-related module names detected in imports.rpc_modules: Sorted list of RPC-related module names detected in imports.winrt_apisets: Sorted list of WinRT-related API-set names detected in imports.named_pipe_functions: Sorted list of named-pipe function names detected in imports.
Structural complexity metrics from loop analysis and assembly size.
functions_with_loops: Functions whoseloop_analysisJSON contains at least one detected loop.total_loops: Sum of all detected loops across all functions.avg_asm_size: Average assembly line count per function (rounded to nearest integer). Only functions with non-nullassembly_codeare counted.max_asm_size: Largest assembly line count for any single function.functions_over_500_instructions: Number of functions exceeding 500 assembly lines.
Binary-level security features from PE metadata, plus stack-canary coverage computed from cross-reference data.
aslr:trueifDYNAMIC_BASEis set inDllCharacteristics. Sourced fromfile_info.security_features.aslr_enabled.dep:trueifNX_COMPATis set. Sourced fromfile_info.security_features.dep_enabled.cfg:trueifGUARD_CFis set. Sourced fromfile_info.security_features.cfg_enabled.seh:trueifNO_SEHis not set. Sourced fromfile_info.security_features.seh_enabled.canary_coverage_pct: Percentage of functions (with assembly code) that reference__security_check_cookie,__GSHandlerCheck, or__security_cookiein their outbound cross-references. A value of78.5means 78.5% of functions are protected by stack canaries.
{
"identity": {
"module_name": "appinfo_dll",
"file_name": "appinfo.dll",
"description": "Application Information Service",
"company": "Microsoft Corporation",
"version": "10.0.26100.7824"
},
"scale": {
"total_functions": 1166,
"named_functions": 1050,
"unnamed_sub_functions": 116,
"with_decompiled": 1166,
"with_assembly": 1166,
"class_count": 60,
"export_count": 5
},
"library_profile": {
"app_functions": 598,
"library_functions": 554,
"noise_ratio": 0.475,
"breakdown": {
"WIL": 465,
"WRL": 47,
"ETW/TraceLogging": 40,
"STL": 1,
"CRT": 1
}
},
"api_profile": {
"dangerous_api_functions": 127,
"total_dangerous_refs": 176,
"security_api_count": 45,
"crypto_api_count": 2,
"com_api_count": 12,
"rpc_api_count": 3,
"winrt_api_count": 8,
"named_pipe_api_count": 0,
"process_api_count": 8,
"import_surface": {
"com_present": true,
"rpc_present": true,
"winrt_present": true,
"named_pipes_present": false,
"com_modules": ["combase.dll"],
"rpc_modules": ["RPCRT4.dll"],
"winrt_apisets": [
"api-ms-win-core-winrt-error-l1-1-0",
"api-ms-win-core-winrt-l1-1-0",
"api-ms-win-core-winrt-string-l1-1-0"
],
"named_pipe_functions": []
}
},
"complexity_profile": {
"functions_with_loops": 224,
"total_loops": 488,
"avg_asm_size": 57,
"max_asm_size": 1733,
"functions_over_500_instructions": 9
},
"security_posture": {
"aslr": true,
"dep": true,
"cfg": true,
"seh": true,
"canary_coverage_pct": 78.5
}
}- The profile is generated by
deep_extract/module_profile.pyviagenerate_module_profile(), called fromrun_analysis_pipeline()inpe_context_extractor.py. - If the
file_infotable is missing (e.g. database was only partially populated), theidentityandsecurity_posturesections will containnullfields. Other sections still compute from thefunctionstable. - If dangerous API extraction was disabled (
--no-extract-dangerous-apis),api_profilecounts default to0. Theimport_surfacesub-object is still populated from thefile_info.importscolumn when available. - If loop analysis was disabled (
--no-analyze-loops),complexity_profile.functions_with_loopsandtotal_loopsdefault to0. Assembly-size metrics are still computed. noise_ratiois0.0when thefunctionstable is empty (no division by zero).canary_coverage_pctisnullwhen no functions have assembly code.- The
*_api_countfields count individual references, not unique API names. A single function callingCreateProcessWtwice contributes 2 toprocess_api_count.