SigHunter is a specialized, dual-mode analysis tool developed in C++ designed to scan compiled binaries and live process memory for specific hexadecimal patterns, commonly known as "signatures". Developed as a final project for CS50, this utility is built for reverse engineers and security analysts who need to identify compiler artifacts, packers, or malicious code fragments within Windows Portable Executable (PE) files and live RAM.
Unlike a simple string search, SigHunter performs a deep dive into the binary structure. It maps the PE header to identify all sections but performs a Global Deep Scan, bypassing structural corruptions (like XOR obfuscation) to find hidden payloads even in file overlays.
- Dynamic Memory Analysis (RAM Scan): Attach the engine directly to a live Process ID (PID) to extract and scan raw memory using Windows API (
ReadProcessMemory), defeating on-disk packers. - Global Deep Scan: SigHunter scans the entire byte array (from
0x0to EOF). Matches are automatically tagged with their exact memory section or flagged as[In: OVERLAY/UNKNOWN]if found outside mapped regions. - PE Header Parsing: Automatically detects Windows PE files, validates the
MZandPEsignatures, and calculates the exact bounds of all file sections. - Wildcard Support: Supports dynamic byte patterns (e.g.,
60 E8 ?? ?? ?? ??) to match instructions with variable memory offsets or addresses. - Noise Reduction & Logging: Implements a "Smart Trim" feature that limits terminal output for high-frequency matches to prevent flooding, while allowing full data export to a raw log file.
- Scanner.hpp: Defines the engine's core data structures (
SigByte,Signature,MatchRecord,SectionRange) and theScannerclass blueprint. - Scanner.cpp: The implementation layer containing the PE parser logic, memory reading, hexadecimal string compilation, and the sliding-window search algorithm.
- SigHunter.cpp: The CLI entry point that handles argument parsing, input validation, and engine initialization.
- signatures.txt: A modular external database following the format:
[CATEGORY]|PATTERN|DESCRIPTION. - PoC's/: A folder containing Proof of Concept (PoC) scripts, including Python XOR obfuscators, a upx packed program, and a program that's run, insert a signature direct in it's memory, and it's pid, base memory, and the memory where the payload where injected for you to analize it dinamicaly.
The search logic utilizes a high-performance sliding window approach.
- Static Mode: Loads the target binary into a
std::vector<uint8_t>in RAM, preventing I/O bottlenecking during multi-signature scans. - Dynamic Mode: Uses
EnumProcessModulesandGetModuleInformationto determine the live image size, then copies the live process memory directly into the scanning buffer. - Comparison Logic: The engine iterates through the memory buffer. If a wildcard flag (
??) is detected in the compiled signature, the engine skips that specific byte comparison, ensuring a match for dynamic instructions.
To achieve professional-grade accuracy, SigHunter implements a manual PE Parser:
- Validation: Validates the
MZsignature (0x4D 0x5A) and locates the NT Headers via thee_lfanewpointer. - Section Mapping: Iterates through the Section Table to identify all segments (e.g.,
.rdata,.data), extractingPointerToRawDataandSizeOfRawData. This allows the engine to accurately report exactly where a malicious signature is hiding.
To compile the project using a standard C++ compiler (like g++ or MSVC):
'''bash
g++ SigHunter.cpp Scanner.cpp -o sighunter.exe -lpsapi
'''
(Note: -lpsapi is required for the Windows Process Status API used in RAM scanning).
1. Static Scan (File on Disk) Run a static scan with full log export: '''bash ./sighunter.exe -t target_binary.exe -w signatures.txt -o full_report.txt '''
2. Dynamic Scan (Live Process Memory) Run a scan directly against a running Process ID (PID): '''bash ./sighunter.exe -p -w signatures.txt -o mem_report.txt '''
To comply with security policies and avoid distributing pre-compiled or potentially flagged executables (.exe), the tests/ directory contains the source code and automation scripts to build the test environment locally.
- Navigate to the
tests/folder. - Run the provided batch script to generate the targets:
'''bash
./setup_tests.bat
'''
This script requires
g++andPython 3. It will automatically:
- Compile the
hello.cppsource code into a base binary (hello_clean.exe). - Execute a custom Python script (
PoC_corrupt.py) that uses an XOR cipher to corrupt the PE header and injects a custom0xRobertpayload into the file overlay (hello_infected.exe). - And Compile the
Poc.cppinto (Poc.exe) for a dinamic PoC example.
You can then run sighunter.exe against these generated binaries or execute the infected binary and scan its PID to see the dynamic RAM extraction in action.
- Memory vs. Disk: Loading the binary into a RAM buffer was prioritized to ensure that multi-signature scans are almost instantaneous.
- Zero-Dependency Parsing: The PE parser was written from scratch using bitwise operations rather than the
<windows.h>library, making the static analysis portable and capable of analyzing Windows binaries on Linux systems. - UI/UX for Analysts: The "Smart Trim" logic was a critical addition; during testing, standard compiler padding generated over 14,000 matches. Trimming this noise in the terminal while logging it to a file ensures the tool is usable for human analysts.
Robert Souza Lages is an undergraduate student in Systems Analysis and Development at UniSenac (Pelotas, Brazil). This project reflects a deep interest in low-level software engineering, C++, memory management, and the internal structures of operating systems.