Skip to content

Latest commit

 

History

History
647 lines (511 loc) · 20.9 KB

File metadata and controls

647 lines (511 loc) · 20.9 KB

MLIR Integration Quick Start Guide

This guide will help you build and test the new MLIR-based string and symbol obfuscation integration.

What Changed

Old Implementation (Removed)

  • Previous version used separate symbol-obfuscator directory (never existed in current codebase)
  • String encryption was done at source level transformation

New Implementation (MLIR-based)

  • String encryption now happens at MLIR level using XOR cipher
  • Symbol obfuscation now happens at MLIR level using random hex names
  • Cryptographic hashing now available at MLIR level using SHA256/BLAKE2B/SipHash
  • Constant obfuscation now obfuscates ALL constants (strings, integers, floats, arrays)
  • NEW: ClangIR frontend - High-level MLIR frontend for advanced C/C++ obfuscation (LLVM 22 native)
  • Two pipelines available:
    • Default (CLANG): Source → Clang → LLVM IR → MLIR → Obfuscated MLIR → LLVM IR → Binary
    • New (CLANGIR): Source → ClangIR → High-level MLIR (CIR) → Obfuscated MLIR → LLVM IR → Binary

📖 For detailed ClangIR documentation, see CLANGIR_PIPELINE_GUIDE.md

Pipeline Architecture

Default Pipeline (CLANG - Existing)

┌─────────────────────────────────────────────────────────────┐
│              Default MLIR Pipeline (CLANG)                   │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  Input: source.c  →  [MLIR Passes]  →  Output: binary       │
│                                                               │
│  Stage 1: MLIR Obfuscation                                   │
│    ├─ Clang: source.c → .mlir                               │
│    ├─ mlir-opt: Apply passes                                │
│    │   ├─ string-encrypt: Encrypt string literals           │
│    │   ├─ symbol-obfuscate: Randomize symbol names          │
│    │   ├─ crypto-hash: Cryptographic symbol hashing         │
│    │   └─ constant-obfuscate: All constants (int/float/str) │
│    └─ mlir-translate: .mlir → .ll (LLVM IR)                 │
│                                                               │
│  Stage 2: OLLVM Obfuscation (Optional)                      │
│    ├─ Control flow flattening                                │
│    ├─ Instruction substitution                               │
│    ├─ Bogus control flow                                     │
│    └─ Basic block splitting                                  │
│                                                               │
│  Stage 3: Compilation                                        │
│    └─ Clang: LLVM IR → Binary                               │
│                                                               │
└─────────────────────────────────────────────────────────────┘

New Pipeline (CLANGIR - Optional)

┌─────────────────────────────────────────────────────────────┐
│              ClangIR Pipeline (CLANGIR)                      │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  Input: source.c  →  [High-level MLIR]  →  Output: binary  │
│                                                               │
│  Stage 1: ClangIR Frontend (NEW!)                           │
│    ├─ ClangIR: source.c → .mlir (CIR dialect)               │
│    ├─ mlir-opt: --cir-to-llvm (Lower to LLVM dialect)       │
│    └─ Result: LLVM dialect MLIR                              │
│                                                               │
│  Stage 2: MLIR Obfuscation (Same as default)                │
│    ├─ mlir-opt: Apply passes                                │
│    │   ├─ string-encrypt: Encrypt string literals           │
│    │   ├─ symbol-obfuscate: Randomize symbol names          │
│    │   ├─ crypto-hash: Cryptographic symbol hashing         │
│    │   └─ constant-obfuscate: All constants (int/float/str) │
│    └─ mlir-translate: .mlir → .ll (LLVM IR)                 │
│                                                               │
│  Stage 3: OLLVM Obfuscation (Optional)                      │
│    └─ Same as default pipeline                               │
│                                                               │
│  Stage 4: Compilation                                        │
│    └─ Clang: LLVM IR → Binary                               │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Advantages of ClangIR:
✅ Preserves high-level C/C++ semantics for better obfuscation
✅ More accurate control flow analysis
✅ Native LLVM 22 support
✅ Better type preservation
✅ Future-proof (official LLVM component)

See CLANGIR_PIPELINE_GUIDE.md for detailed usage.

Quick Start

Try the Default Pipeline (CLANG)

# 1. Build MLIR library
cd mlir-obs && ./build.sh && cd ..

# 2. Obfuscate with default pipeline
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --output ./output

Try the ClangIR Pipeline (NEW)

# 1. Ensure ClangIR is installed (check Docker or build from source)
which clangir || echo "ClangIR not found - see CLANGIR_PIPELINE_GUIDE.md"

# 2. Build MLIR library (same as above)
cd mlir-obs && ./build.sh && cd ..

# 3. Obfuscate with ClangIR pipeline
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
    --mlir-frontend clangir \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --enable-constant-obfuscate \
    --output ./output_clangir

See CLANGIR_PIPELINE_GUIDE.md for detailed ClangIR setup and usage.

Step-by-Step Setup

1. Build MLIR Obfuscation Library

On your VM (with LLVM/MLIR installed):

cd /path/to/oaas/mlir-obs

# Run the build script
chmod +x build.sh test.sh
./build.sh

Expected output:

==========================================
  Building MLIR Obfuscation Library
==========================================

Checking for required tools...
✅ All required tools found

Creating build directory...
Configuring with CMake...
Building library...
==========================================
✅ Build successful!
==========================================

Library location:
./build/lib/libMLIRObfuscation.so

2. Test MLIR Passes Standalone

# Test the MLIR passes in isolation
./test.sh

This will run 8 tests:

  1. Create test C file
  2. C → MLIR conversion
  3. Symbol obfuscation pass
  4. String encryption pass
  5. Combined passes
  6. MLIR → LLVM IR
  7. Binary compilation
  8. Obfuscation verification

3. Test Integration with Python CLI

cd /path/to/oaas

# Create a test C file
cat > test_auth.c << 'EOF'
#include <stdio.h>
#include <string.h>

const char* MASTER_PASSWORD = "SuperSecret2024!";
const char* API_KEY = "sk_live_abc123xyz";

int authenticate(const char* password) {
    return strcmp(password, MASTER_PASSWORD) == 0;
}

int main() {
    if (authenticate("SuperSecret2024!")) {
        printf("Access granted\n");
    } else {
        printf("Access denied\n");
    }
    return 0;
}
EOF

# Test with string encryption only
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
    --enable-string-encrypt \
    --output ./output_string

# Verify strings are hidden
strings ./output_string/test_auth | grep -i "SuperSecret"
# Should return no results if encryption worked

# Test with symbol obfuscation only
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
    --enable-symbol-obfuscate \
    --output ./output_symbol

# Verify symbols are obfuscated
nm ./output_symbol/test_auth | grep "authenticate"
# Should return no results if obfuscation worked

# Test with both passes
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
    --enable-string-encrypt \
    --enable-symbol-obfuscate \
    --output ./output_both

# Verify both worked
echo "Checking for strings..."
strings ./output_both/test_auth | grep -iE "SuperSecret|sk_live"
echo "Checking for symbols..."
nm ./output_both/test_auth | grep "authenticate"
# Both should return no results

# Test execution
echo "Testing binary execution..."
./output_both/test_auth
# Should output: Access granted

4. Test with OLLVM Passes Combined

# Combine MLIR passes with OLLVM passes
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test_auth.c \
    --enable-string-encrypt \
    --enable-symbol-obfuscate \
    --enable-flattening \
    --enable-bogus-cf \
    --output ./output_full \
    --custom-pass-plugin /path/to/LLVMObfuscationPlugin.so

# Analyze the result
python3 -m cmd.llvm-obfuscator.cli.obfuscate analyze ./output_full/test_auth

CLI Flags Reference

Pipeline Selection Flags (NEW)

Flag Description Default
--mlir-frontend <frontend> Choose MLIR frontend: clang or clangir clang (existing pipeline)

Examples:

# Use default pipeline (existing, stable)
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile source.c

# Use ClangIR pipeline (new, high-level MLIR)
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile source.c --mlir-frontend clangir

MLIR Obfuscation Flags

Flag Description Pass Applied
--enable-string-encrypt Encrypt string literals (basic) string-encrypt
--enable-symbol-obfuscate Obfuscate function/symbol names (RNG-based) symbol-obfuscate
--enable-crypto-hash Cryptographically hash symbol names (SHA256/BLAKE2B) crypto-hash
--crypto-hash-algorithm <algo> Hash algorithm: sha256, blake2b, siphash Used with crypto-hash
--crypto-hash-salt <salt> Salt for deterministic hashing Used with crypto-hash
--crypto-hash-length <N> Truncate hash to N characters (default: 12) Used with crypto-hash
--enable-constant-obfuscate Obfuscate ALL constants (strings, ints, floats, arrays) constant-obfuscate

OLLVM Obfuscation Flags

Flag Description
--enable-flattening Control flow flattening
--enable-substitution Instruction substitution
--enable-bogus-cf Bogus control flow
--enable-split Basic block splitting

Example Combinations

# Level 1: String encryption only (Default pipeline)
python3 -m cli.obfuscate compile source.c --enable-string-encrypt

# Level 1: String encryption only (ClangIR pipeline)
python3 -m cli.obfuscate compile source.c \
    --mlir-frontend clangir \
    --enable-string-encrypt

# Level 2: Symbol + String (RNG-based, Default pipeline)
python3 -m cli.obfuscate compile source.c \
    --enable-string-encrypt \
    --enable-symbol-obfuscate

# Level 2: Crypto Hash + String (Cryptographically secure, Default pipeline)
python3 -m cli.obfuscate compile source.c \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --crypto-hash-algorithm sha256 \
    --crypto-hash-salt "my-secret-salt-2024" \
    --crypto-hash-length 16

# Level 2: Crypto Hash + String (Cryptographically secure, ClangIR pipeline)
python3 -m cli.obfuscate compile source.c \
    --mlir-frontend clangir \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --crypto-hash-algorithm blake2b \
    --crypto-hash-salt "my-secret-salt-2024" \
    --crypto-hash-length 16

# Level 3: MLIR + Basic OLLVM (Default pipeline)
python3 -m cli.obfuscate compile source.c \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --enable-flattening

# Level 4: Full obfuscation (Default pipeline)
python3 -m cli.obfuscate compile source.c \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --crypto-hash-algorithm blake2b \
    --enable-flattening \
    --enable-bogus-cf \
    --enable-split

# Level 4: Full obfuscation (ClangIR pipeline - RECOMMENDED for C/C++)
python3 -m cli.obfuscate compile source.c \
    --mlir-frontend clangir \
    --enable-string-encrypt \
    --enable-crypto-hash \
    --crypto-hash-algorithm blake2b \
    --enable-constant-obfuscate \
    --enable-flattening \
    --enable-bogus-cf \
    --enable-split

When to use ClangIR pipeline:

  • ✅ C/C++ source code (best semantic preservation)
  • ✅ Complex C++ with templates and classes
  • ✅ You want better obfuscation quality
  • ✅ You're using LLVM 22 (native support)

When to use default pipeline:

  • ✅ Already working pipeline (proven, stable)
  • ✅ Non-C/C++ languages
  • ✅ You need maximum compatibility
  • ✅ ClangIR is not available in your environment

Verification Checklist

After running obfuscation, verify the results:

✅ String Encryption Verification

# Before obfuscation
strings original_binary | grep "password"
# Output: MyPassword123

# After obfuscation
strings obfuscated_binary | grep "password"
# Output: (nothing - strings are encrypted)

✅ Symbol Obfuscation Verification

# Before obfuscation
nm original_binary | grep -v ' U '
# Output: Shows function names like "authenticate", "validate_user"

# After obfuscation
nm obfuscated_binary | grep -v ' U '
# Output: Shows hex names like "f_a3b7f8d2", "f_9e2c1d4a"

✅ Binary Execution Test

# The obfuscated binary should still work correctly
./obfuscated_binary "test_input"
# Should produce same output as original binary

Troubleshooting

Issue: "MLIR plugin not found"

Cause: MLIR library not built

Solution:

cd mlir-obs
./build.sh

Issue: "mlir-opt: command not found"

Cause: LLVM/MLIR tools not in PATH

Solution:

# Find LLVM installation
find /usr -name "mlir-opt" 2>/dev/null

# Add to PATH (example for LLVM 22)
export PATH="/usr/lib/llvm-22/bin:$PATH"

# Make permanent
echo 'export PATH="/usr/lib/llvm-22/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Issue: Strings still visible after encryption

Possible causes:

  1. Pass didn't run (check logs)
  2. Strings are in a different format (wide strings, etc.)
  3. Pass needs refinement for specific string types

Debug:

# Check intermediate MLIR to see if encryption applied
clang -emit-llvm -S -emit-mlir source.c -o before.mlir
mlir-opt before.mlir \
    --load-pass-plugin=mlir-obs/build/lib/libMLIRObfuscation.so \
    --pass-pipeline="builtin.module(string-encrypt)" \
    -o after.mlir

# Compare before and after
diff before.mlir after.mlir

Issue: Original function names still visible

Possible causes:

  1. External/system functions not obfuscated (by design)
  2. Main function preserved (can be configured)
  3. Pass needs refinement

Debug:

# Check which symbols remain
nm obfuscated_binary | grep -v ' U ' | grep -v '^_'

# See the MLIR transformation
mlir-opt input.mlir \
    --load-pass-plugin=mlir-obs/build/lib/libMLIRObfuscation.so \
    --pass-pipeline="builtin.module(symbol-obfuscate)" \
    --mlir-print-ir-after-all \
    -o output.mlir 2>&1 | less

Issue: ClangIR pipeline fails with "clangir: command not found"

Cause: ClangIR is not installed or not in PATH

Solution:

Option 1: Use Docker (Recommended)

# Build Docker image with ClangIR
docker-compose -f docker-compose.test.yaml build

# Run obfuscation in Docker
docker-compose -f docker-compose.test.yaml run --rm test \
    python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
    --mlir-frontend clangir \
    --enable-string-encrypt

Option 2: Build ClangIR from source

# See CLANGIR_PIPELINE_GUIDE.md for detailed build instructions
# Quick summary:
git clone --depth=1 --branch release/22.x https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake ../llvm \
    -DLLVM_ENABLE_PROJECTS="clang;mlir" \
    -DCLANG_ENABLE_CIR=ON \
    -DCMAKE_INSTALL_PREFIX=/opt/clangir
ninja install
export PATH="/opt/clangir/bin:$PATH"

Option 3: Fall back to default pipeline

# Simply don't use --mlir-frontend clangir flag
python3 -m cmd.llvm-obfuscator.cli.obfuscate compile test.c \
    --enable-string-encrypt

Issue: ClangIR pipeline produces "CIR dialect not found" error

Cause: ClangIR build doesn't have CIR dialect enabled

Solution:

# Rebuild ClangIR with -DCLANG_ENABLE_CIR=ON flag
# See CLANGIR_PIPELINE_GUIDE.md section "Building ClangIR from Source"

Issue: "Cannot lower CIR to LLVM dialect"

Cause: Missing --cir-to-llvm lowering pass

Debug:

# Check if mlir-opt has the lowering pass
mlir-opt --help | grep "cir-to-llvm"

# If not found, ClangIR build is incomplete
# Rebuild with full MLIR integration

Integration with CI/CD

GitHub Actions Example

name: Build Obfuscated Binary

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install LLVM/MLIR
        run: |
          sudo apt-get update
          sudo apt-get install -y llvm-22 mlir-22-tools libmlir-22-dev clang-22

      - name: Build MLIR Library
        run: |
          cd mlir-obs
          ./build.sh

      - name: Build Obfuscated Binary
        run: |
          python3 -m cmd.llvm-obfuscator.cli.obfuscate compile src/main.c \
            --enable-string-encrypt \
            --enable-symbol-obfuscate \
            --output ./release

      - name: Verify Obfuscation
        run: |
          # Check that secrets are not visible
          ! strings ./release/main | grep -iE "password|secret|api_key"

          # Check that symbols are obfuscated
          ! nm ./release/main | grep -E "authenticate|validate"

      - name: Upload Binary
        uses: actions/upload-artifact@v3
        with:
          name: obfuscated-binary
          path: ./release/main

Performance Impact

Operation Time Overhead
MLIR emission ~50-100ms Baseline
String encryption pass ~2-5ms +2-5%
Symbol obfuscation pass ~1-3ms +1-3%
MLIR to LLVM IR ~20-50ms Baseline
Total MLIR overhead ~3-8ms ~3-8%

For reference:

  • OLLVM passes: +500-2000ms (+50-200%)
  • Compiler optimization (-O3): +200-500ms

MLIR passes are significantly faster than OLLVM passes while providing effective obfuscation.

Next Steps

  1. ✅ Build MLIR library
  2. ✅ Test standalone passes
  3. ✅ Test Python CLI integration
  4. ⏭️ Test on real-world codebase
  5. ⏭️ Measure performance impact
  6. ⏭️ Deploy to production

Support

If you encounter issues:

  1. Check the logs: python3 -m cli.obfuscate compile ... 2>&1 | tee obfuscation.log
  2. Verify MLIR library is built: ls -la mlir-obs/build/lib/
  3. Test passes standalone: cd mlir-obs && ./test.sh
  4. Check LLVM/MLIR installation: mlir-opt --version

For bugs or feature requests, open an issue on GitHub.

Pipeline Comparison

Feature Default (CLANG) ClangIR (CLANGIR)
Stability ✅ Proven, stable ⚠️ New, experimental
LLVM Version LLVM 22 LLVM 22 (native)
High-level Semantics ❌ Lost after LLVM IR ✅ Preserved in CIR
Control Flow Analysis ⚠️ Limited ✅ Enhanced
Type Preservation ⚠️ Basic ✅ Advanced
C++ Templates ⚠️ Instantiated early ✅ Better handling
Obfuscation Quality ✅ Good ✅✅ Better
Build Time Fast Slightly slower
Compatibility ✅ All platforms ⚠️ Requires ClangIR build
Use Case General purpose Advanced C/C++

Recommendation:

  • Start with default pipeline (stable, proven)
  • Try ClangIR pipeline for advanced C/C++ obfuscation
  • Both pipelines use the same MLIR passes (no change in core obfuscation logic)

References

  • ClangIR Pipeline Guide: CLANGIR_PIPELINE_GUIDE.md - Comprehensive ClangIR documentation
  • Main README: README.md - Project overview
  • MLIR passes README: mlir-obs/README.md - MLIR pass details
  • Test scripts: mlir-obs/test.sh - Standalone MLIR tests
  • Example configs: cmd/llvm-obfuscator/examples/ - Configuration examples