This guide will help you build and test the new MLIR-based string and symbol obfuscation integration.
- Previous version used separate
symbol-obfuscatordirectory (never existed in current codebase) - String encryption was done at source level transformation
- String encryption now happens at MLIR level using XOR cipher
- Symbol obfuscation now happens at MLIR level using random hex names
- Cryptographic hashing now available at MLIR level using SHA256/BLAKE2B/SipHash
- Constant obfuscation now obfuscates ALL constants (strings, integers, floats, arrays)
- NEW: ClangIR frontend - High-level MLIR frontend for advanced C/C++ obfuscation (LLVM 22 native)
- Two pipelines available:
- Default (CLANG):
Source → Clang → LLVM IR → MLIR → Obfuscated MLIR → LLVM IR → Binary - New (CLANGIR):
Source → ClangIR → High-level MLIR (CIR) → Obfuscated MLIR → LLVM IR → Binary
- Default (CLANG):
📖 For detailed ClangIR documentation, see CLANGIR_PIPELINE_GUIDE.md
┌─────────────────────────────────────────────────────────────┐
│ Default MLIR Pipeline (CLANG) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Input: source.c → [MLIR Passes] → Output: binary │
│ │
│ Stage 1: MLIR Obfuscation │
│ ├─ Clang: source.c → .mlir │
│ ├─ mlir-opt: Apply passes │
│ │ ├─ string-encrypt: Encrypt string literals │
│ │ ├─ symbol-obfuscate: Randomize symbol names │
│ │ ├─ crypto-hash: Cryptographic symbol hashing │
│ │ └─ constant-obfuscate: All constants (int/float/str) │
│ └─ mlir-translate: .mlir → .ll (LLVM IR) │
│ │
│ Stage 2: OLLVM Obfuscation (Optional) │
│ ├─ Control flow flattening │
│ ├─ Instruction substitution │
│ ├─ Bogus control flow │
│ └─ Basic block splitting │
│ │
│ Stage 3: Compilation │
│ └─ Clang: LLVM IR → Binary │
│ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ ClangIR Pipeline (CLANGIR) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Input: source.c → [High-level MLIR] → Output: binary │
│ │
│ Stage 1: ClangIR Frontend (NEW!) │
│ ├─ ClangIR: source.c → .mlir (CIR dialect) │
│ ├─ mlir-opt: --cir-to-llvm (Lower to LLVM dialect) │
│ └─ Result: LLVM dialect MLIR │
│ │
│ Stage 2: MLIR Obfuscation (Same as default) │
│ ├─ mlir-opt: Apply passes │
│ │ ├─ string-encrypt: Encrypt string literals │
│ │ ├─ symbol-obfuscate: Randomize symbol names │
│ │ ├─ crypto-hash: Cryptographic symbol hashing │
│ │ └─ constant-obfuscate: All constants (int/float/str) │
│ └─ mlir-translate: .mlir → .ll (LLVM IR) │
│ │
│ Stage 3: OLLVM Obfuscation (Optional) │
│ └─ Same as default pipeline │
│ │
│ Stage 4: Compilation │
│ └─ Clang: LLVM IR → Binary │
│ │
└─────────────────────────────────────────────────────────────┘
Advantages of ClangIR:
✅ Preserves high-level C/C++ semantics for better obfuscation
✅ More accurate control flow analysis
✅ Native LLVM 22 support
✅ Better type preservation
✅ Future-proof (official LLVM component)
See CLANGIR_PIPELINE_GUIDE.md for detailed usage.
# 1. Build MLIR library
cd mlir-obs && ./build.sh && cd ..
# 2. Obfuscate with default pipeline
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
--enable-string-encrypt \
--enable-crypto-hash \
--output ./output# 1. Ensure ClangIR is installed (check Docker or build from source)
which clangir || echo "ClangIR not found - see CLANGIR_PIPELINE_GUIDE.md"
# 2. Build MLIR library (same as above)
cd mlir-obs && ./build.sh && cd ..
# 3. Obfuscate with ClangIR pipeline
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
--mlir-frontend clangir \
--enable-string-encrypt \
--enable-crypto-hash \
--enable-constant-obfuscate \
--output ./output_clangirSee CLANGIR_PIPELINE_GUIDE.md for detailed ClangIR setup and usage.
On your VM (with LLVM/MLIR installed):
cd /path/to/oaas/mlir-obs
# Run the build script
chmod +x build.sh test.sh
./build.shExpected output:
==========================================
Building MLIR Obfuscation Library
==========================================
Checking for required tools...
✅ All required tools found
Creating build directory...
Configuring with CMake...
Building library...
==========================================
✅ Build successful!
==========================================
Library location:
./build/lib/libMLIRObfuscation.so
# Test the MLIR passes in isolation
./test.shThis will run 8 tests:
- Create test C file
- C → MLIR conversion
- Symbol obfuscation pass
- String encryption pass
- Combined passes
- MLIR → LLVM IR
- Binary compilation
- Obfuscation verification
cd /path/to/oaas
# Create a test C file
cat > test_auth.c << 'EOF'
#include <stdio.h>
#include <string.h>
const char* MASTER_PASSWORD = "SuperSecret2024!";
const char* API_KEY = "sk_live_abc123xyz";
int authenticate(const char* password) {
return strcmp(password, MASTER_PASSWORD) == 0;
}
int main() {
if (authenticate("SuperSecret2024!")) {
printf("Access granted\n");
} else {
printf("Access denied\n");
}
return 0;
}
EOF
# Test with string encryption only
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
--enable-string-encrypt \
--output ./output_string
# Verify strings are hidden
strings ./output_string/test_auth | grep -i "SuperSecret"
# Should return no results if encryption worked
# Test with symbol obfuscation only
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
--enable-symbol-obfuscate \
--output ./output_symbol
# Verify symbols are obfuscated
nm ./output_symbol/test_auth | grep "authenticate"
# Should return no results if obfuscation worked
# Test with both passes
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
--enable-string-encrypt \
--enable-symbol-obfuscate \
--output ./output_both
# Verify both worked
echo "Checking for strings..."
strings ./output_both/test_auth | grep -iE "SuperSecret|sk_live"
echo "Checking for symbols..."
nm ./output_both/test_auth | grep "authenticate"
# Both should return no results
# Test execution
echo "Testing binary execution..."
./output_both/test_auth
# Should output: Access granted# Combine MLIR passes with OLLVM passes
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
--enable-string-encrypt \
--enable-symbol-obfuscate \
--enable-flattening \
--enable-bogus-cf \
--output ./output_full \
--custom-pass-plugin /path/to/LLVMObfuscationPlugin.so
# Analyze the result
python3 -m cmd.llvm-obfuscator.cli.obfuscate analyze ./output_full/test_auth| Flag | Description | Default |
|---|---|---|
--mlir-frontend <frontend> |
Choose MLIR frontend: clang or clangir |
clang (existing pipeline) |
Examples:
# Use default pipeline (existing, stable)
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile source.c
# Use ClangIR pipeline (new, high-level MLIR)
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile source.c --mlir-frontend clangir| Flag | Description | Pass Applied |
|---|---|---|
--enable-string-encrypt |
Encrypt string literals (basic) | string-encrypt |
--enable-symbol-obfuscate |
Obfuscate function/symbol names (RNG-based) | symbol-obfuscate |
--enable-crypto-hash |
Cryptographically hash symbol names (SHA256/BLAKE2B) | crypto-hash |
--crypto-hash-algorithm <algo> |
Hash algorithm: sha256, blake2b, siphash | Used with crypto-hash |
--crypto-hash-salt <salt> |
Salt for deterministic hashing | Used with crypto-hash |
--crypto-hash-length <N> |
Truncate hash to N characters (default: 12) | Used with crypto-hash |
--enable-constant-obfuscate |
Obfuscate ALL constants (strings, ints, floats, arrays) | constant-obfuscate |
| Flag | Description |
|---|---|
--enable-flattening |
Control flow flattening |
--enable-substitution |
Instruction substitution |
--enable-bogus-cf |
Bogus control flow |
--enable-split |
Basic block splitting |
# Level 1: String encryption only (Default pipeline)
python3 -m cli.obfuscate compile source.c --enable-string-encrypt
# Level 1: String encryption only (ClangIR pipeline)
python3 -m cli.obfuscate compile source.c \
--mlir-frontend clangir \
--enable-string-encrypt
# Level 2: Symbol + String (RNG-based, Default pipeline)
python3 -m cli.obfuscate compile source.c \
--enable-string-encrypt \
--enable-symbol-obfuscate
# Level 2: Crypto Hash + String (Cryptographically secure, Default pipeline)
python3 -m cli.obfuscate compile source.c \
--enable-string-encrypt \
--enable-crypto-hash \
--crypto-hash-algorithm sha256 \
--crypto-hash-salt "my-secret-salt-2024" \
--crypto-hash-length 16
# Level 2: Crypto Hash + String (Cryptographically secure, ClangIR pipeline)
python3 -m cli.obfuscate compile source.c \
--mlir-frontend clangir \
--enable-string-encrypt \
--enable-crypto-hash \
--crypto-hash-algorithm blake2b \
--crypto-hash-salt "my-secret-salt-2024" \
--crypto-hash-length 16
# Level 3: MLIR + Basic OLLVM (Default pipeline)
python3 -m cli.obfuscate compile source.c \
--enable-string-encrypt \
--enable-crypto-hash \
--enable-flattening
# Level 4: Full obfuscation (Default pipeline)
python3 -m cli.obfuscate compile source.c \
--enable-string-encrypt \
--enable-crypto-hash \
--crypto-hash-algorithm blake2b \
--enable-flattening \
--enable-bogus-cf \
--enable-split
# Level 4: Full obfuscation (ClangIR pipeline - RECOMMENDED for C/C++)
python3 -m cli.obfuscate compile source.c \
--mlir-frontend clangir \
--enable-string-encrypt \
--enable-crypto-hash \
--crypto-hash-algorithm blake2b \
--enable-constant-obfuscate \
--enable-flattening \
--enable-bogus-cf \
--enable-splitWhen to use ClangIR pipeline:
- ✅ C/C++ source code (best semantic preservation)
- ✅ Complex C++ with templates and classes
- ✅ You want better obfuscation quality
- ✅ You're using LLVM 22 (native support)
When to use default pipeline:
- ✅ Already working pipeline (proven, stable)
- ✅ Non-C/C++ languages
- ✅ You need maximum compatibility
- ✅ ClangIR is not available in your environment
After running obfuscation, verify the results:
# Before obfuscation
strings original_binary | grep "password"
# Output: MyPassword123
# After obfuscation
strings obfuscated_binary | grep "password"
# Output: (nothing - strings are encrypted)# Before obfuscation
nm original_binary | grep -v ' U '
# Output: Shows function names like "authenticate", "validate_user"
# After obfuscation
nm obfuscated_binary | grep -v ' U '
# Output: Shows hex names like "f_a3b7f8d2", "f_9e2c1d4a"# The obfuscated binary should still work correctly
./obfuscated_binary "test_input"
# Should produce same output as original binaryCause: MLIR library not built
Solution:
cd mlir-obs
./build.shCause: LLVM/MLIR tools not in PATH
Solution:
# Find LLVM installation
find /usr -name "mlir-opt" 2>/dev/null
# Add to PATH (example for LLVM 22)
export PATH="/usr/lib/llvm-22/bin:$PATH"
# Make permanent
echo 'export PATH="/usr/lib/llvm-22/bin:$PATH"' >> ~/.bashrc
source ~/.bashrcPossible causes:
- Pass didn't run (check logs)
- Strings are in a different format (wide strings, etc.)
- Pass needs refinement for specific string types
Debug:
# Check intermediate MLIR to see if encryption applied
clang -emit-llvm -S -emit-mlir source.c -o before.mlir
mlir-opt before.mlir \
--load-pass-plugin=mlir-obs/build/lib/libMLIRObfuscation.so \
--pass-pipeline="builtin.module(string-encrypt)" \
-o after.mlir
# Compare before and after
diff before.mlir after.mlirPossible causes:
- External/system functions not obfuscated (by design)
- Main function preserved (can be configured)
- Pass needs refinement
Debug:
# Check which symbols remain
nm obfuscated_binary | grep -v ' U ' | grep -v '^_'
# See the MLIR transformation
mlir-opt input.mlir \
--load-pass-plugin=mlir-obs/build/lib/libMLIRObfuscation.so \
--pass-pipeline="builtin.module(symbol-obfuscate)" \
--mlir-print-ir-after-all \
-o output.mlir 2>&1 | lessCause: ClangIR is not installed or not in PATH
Solution:
Option 1: Use Docker (Recommended)
# Build Docker image with ClangIR
docker-compose -f docker-compose.test.yaml build
# Run obfuscation in Docker
docker-compose -f docker-compose.test.yaml run --rm test \
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
--mlir-frontend clangir \
--enable-string-encryptOption 2: Build ClangIR from source
# See CLANGIR_PIPELINE_GUIDE.md for detailed build instructions
# Quick summary:
git clone --depth=1 --branch release/22.x https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake ../llvm \
-DLLVM_ENABLE_PROJECTS="clang;mlir" \
-DCLANG_ENABLE_CIR=ON \
-DCMAKE_INSTALL_PREFIX=/opt/clangir
ninja install
export PATH="/opt/clangir/bin:$PATH"Option 3: Fall back to default pipeline
# Simply don't use --mlir-frontend clangir flag
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
--enable-string-encryptCause: ClangIR build doesn't have CIR dialect enabled
Solution:
# Rebuild ClangIR with -DCLANG_ENABLE_CIR=ON flag
# See CLANGIR_PIPELINE_GUIDE.md section "Building ClangIR from Source"Cause: Missing --cir-to-llvm lowering pass
Debug:
# Check if mlir-opt has the lowering pass
mlir-opt --help | grep "cir-to-llvm"
# If not found, ClangIR build is incomplete
# Rebuild with full MLIR integrationname: Build Obfuscated Binary
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install LLVM/MLIR
run: |
sudo apt-get update
sudo apt-get install -y llvm-22 mlir-22-tools libmlir-22-dev clang-22
- name: Build MLIR Library
run: |
cd mlir-obs
./build.sh
- name: Build Obfuscated Binary
run: |
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile src/main.c \
--enable-string-encrypt \
--enable-symbol-obfuscate \
--output ./release
- name: Verify Obfuscation
run: |
# Check that secrets are not visible
! strings ./release/main | grep -iE "password|secret|api_key"
# Check that symbols are obfuscated
! nm ./release/main | grep -E "authenticate|validate"
- name: Upload Binary
uses: actions/upload-artifact@v3
with:
name: obfuscated-binary
path: ./release/main| Operation | Time | Overhead |
|---|---|---|
| MLIR emission | ~50-100ms | Baseline |
| String encryption pass | ~2-5ms | +2-5% |
| Symbol obfuscation pass | ~1-3ms | +1-3% |
| MLIR to LLVM IR | ~20-50ms | Baseline |
| Total MLIR overhead | ~3-8ms | ~3-8% |
For reference:
- OLLVM passes: +500-2000ms (+50-200%)
- Compiler optimization (-O3): +200-500ms
MLIR passes are significantly faster than OLLVM passes while providing effective obfuscation.
- ✅ Build MLIR library
- ✅ Test standalone passes
- ✅ Test Python CLI integration
- ⏭️ Test on real-world codebase
- ⏭️ Measure performance impact
- ⏭️ Deploy to production
If you encounter issues:
- Check the logs:
python3 -m cli.obfuscate compile ... 2>&1 | tee obfuscation.log - Verify MLIR library is built:
ls -la mlir-obs/build/lib/ - Test passes standalone:
cd mlir-obs && ./test.sh - Check LLVM/MLIR installation:
mlir-opt --version
For bugs or feature requests, open an issue on GitHub.
| Feature | Default (CLANG) | ClangIR (CLANGIR) |
|---|---|---|
| Stability | ✅ Proven, stable | |
| LLVM Version | LLVM 22 | LLVM 22 (native) |
| High-level Semantics | ❌ Lost after LLVM IR | ✅ Preserved in CIR |
| Control Flow Analysis | ✅ Enhanced | |
| Type Preservation | ✅ Advanced | |
| C++ Templates | ✅ Better handling | |
| Obfuscation Quality | ✅ Good | ✅✅ Better |
| Build Time | Fast | Slightly slower |
| Compatibility | ✅ All platforms | |
| Use Case | General purpose | Advanced C/C++ |
Recommendation:
- Start with default pipeline (stable, proven)
- Try ClangIR pipeline for advanced C/C++ obfuscation
- Both pipelines use the same MLIR passes (no change in core obfuscation logic)
- ClangIR Pipeline Guide: CLANGIR_PIPELINE_GUIDE.md - Comprehensive ClangIR documentation
- Main README:
README.md- Project overview - MLIR passes README:
mlir-obs/README.md- MLIR pass details - Test scripts:
mlir-obs/test.sh- Standalone MLIR tests - Example configs:
cmd/llvm-obfuscator/examples/- Configuration examples