SigHunter: High-Performance Binary Pattern Analyzer

Project Overview

SigHunter is a specialized, dual-mode analysis tool developed in C++ designed to scan compiled binaries and live process memory for specific hexadecimal patterns, commonly known as "signatures". Developed as a final project for CS50, this utility is built for reverse engineers and security analysts who need to identify compiler artifacts, packers, or malicious code fragments within Windows Portable Executable (PE) files and live RAM.

Unlike a simple string search, SigHunter performs a deep dive into the binary structure. It maps the PE header to identify all sections but performs a Global Deep Scan, bypassing structural corruptions (like XOR obfuscation) to find hidden payloads even in file overlays.

Core Features

Dynamic Memory Analysis (RAM Scan): Attach the engine directly to a live Process ID (PID) to extract and scan raw memory using Windows API (ReadProcessMemory), defeating on-disk packers.
Global Deep Scan: SigHunter scans the entire byte array (from 0x0 to EOF). Matches are automatically tagged with their exact memory section or flagged as [In: OVERLAY/UNKNOWN] if found outside mapped regions.
PE Header Parsing: Automatically detects Windows PE files, validates the MZ and PE signatures, and calculates the exact bounds of all file sections.
Wildcard Support: Supports dynamic byte patterns (e.g., 60 E8 ?? ?? ?? ??) to match instructions with variable memory offsets or addresses.
Noise Reduction & Logging: Implements a "Smart Trim" feature that limits terminal output for high-frequency matches to prevent flooding, while allowing full data export to a raw log file.

Technical Architecture

1. Project Structure

Scanner.hpp: Defines the engine's core data structures (SigByte, Signature, MatchRecord, SectionRange) and the Scanner class blueprint.
Scanner.cpp: The implementation layer containing the PE parser logic, memory reading, hexadecimal string compilation, and the sliding-window search algorithm.
SigHunter.cpp: The CLI entry point that handles argument parsing, input validation, and engine initialization.
signatures.txt: A modular external database following the format: [CATEGORY]|PATTERN|DESCRIPTION.
PoC's/: A folder containing Proof of Concept (PoC) scripts, including Python XOR obfuscators, a upx packed program, and a program that's run, insert a signature direct in it's memory, and it's pid, base memory, and the memory where the payload where injected for you to analize it dinamicaly.

2. The Scanning Engine

The search logic utilizes a high-performance sliding window approach.

Static Mode: Loads the target binary into a std::vector<uint8_t> in RAM, preventing I/O bottlenecking during multi-signature scans.
Dynamic Mode: Uses EnumProcessModules and GetModuleInformation to determine the live image size, then copies the live process memory directly into the scanning buffer.
Comparison Logic: The engine iterates through the memory buffer. If a wildcard flag (??) is detected in the compiled signature, the engine skips that specific byte comparison, ensuring a match for dynamic instructions.

3. Cross-Platform PE Parser

To achieve professional-grade accuracy, SigHunter implements a manual PE Parser:

Validation: Validates the MZ signature (0x4D 0x5A) and locates the NT Headers via the e_lfanew pointer.
Section Mapping: Iterates through the Section Table to identify all segments (e.g., .rdata, .data), extracting PointerToRawData and SizeOfRawData. This allows the engine to accurately report exactly where a malicious signature is hiding.

Usage

Build

To compile the project using a standard C++ compiler (like g++ or MSVC): '''bash g++ SigHunter.cpp Scanner.cpp -o sighunter.exe -lpsapi ''' (Note: -lpsapi is required for the Windows Process Status API used in RAM scanning).

Execution

1. Static Scan (File on Disk) Run a static scan with full log export: '''bash ./sighunter.exe -t target_binary.exe -w signatures.txt -o full_report.txt '''

2. Dynamic Scan (Live Process Memory) Run a scan directly against a running Process ID (PID): '''bash ./sighunter.exe -p -w signatures.txt -o mem_report.txt '''

How to Test (Proof of Concept)

To comply with security policies and avoid distributing pre-compiled or potentially flagged executables (.exe), the tests/ directory contains the source code and automation scripts to build the test environment locally.

Navigate to the tests/ folder.
Run the provided batch script to generate the targets: '''bash ./setup_tests.bat ''' This script requires g++ and Python 3. It will automatically:

Compile the hello.cpp source code into a base binary (hello_clean.exe).
Execute a custom Python script (PoC_corrupt.py) that uses an XOR cipher to corrupt the PE header and injects a custom 0xRobert payload into the file overlay (hello_infected.exe).
And Compile the Poc.cpp into (Poc.exe) for a dinamic PoC example.

You can then run sighunter.exe against these generated binaries or execute the infected binary and scan its PID to see the dynamic RAM extraction in action.

Design Decisions

Memory vs. Disk: Loading the binary into a RAM buffer was prioritized to ensure that multi-signature scans are almost instantaneous.
Zero-Dependency Parsing: The PE parser was written from scratch using bitwise operations rather than the <windows.h> library, making the static analysis portable and capable of analyzing Windows binaries on Linux systems.
UI/UX for Analysts: The "Smart Trim" logic was a critical addition; during testing, standard compiler padding generated over 14,000 matches. Trimming this noise in the terminal while logging it to a file ensures the tool is usable for human analysts.

About the Author

Robert Souza Lages is an undergraduate student in Systems Analysis and Development at UniSenac (Pelotas, Brazil). This project reflects a deep interest in low-level software engineering, C++, memory management, and the internal structures of operating systems.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Scanner.cpp		Scanner.cpp
Scanner.hpp		Scanner.hpp
SigHunter.cpp		SigHunter.cpp
signatures.txt		signatures.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SigHunter: High-Performance Binary Pattern Analyzer

Project Overview

Core Features

Technical Architecture

1. Project Structure

2. The Scanning Engine

3. Cross-Platform PE Parser

Usage

Build

Execution

How to Test (Proof of Concept)

Design Decisions

About the Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SigHunter: High-Performance Binary Pattern Analyzer

Project Overview

Core Features

Technical Architecture

1. Project Structure

2. The Scanning Engine

3. Cross-Platform PE Parser

Usage

Build

Execution

How to Test (Proof of Concept)

Design Decisions

About the Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages