Crackernaut

A sophisticated password variant generation and ML security research tool that combines rule-based transformations with machine learning to produce human-like password variants for ethical penetration testing and security analysis.

Overview

Crackernaut is a password security research platform designed for authorized security professionals and researchers. It leverages advanced machine learning approaches including transformers, RNNs, and MLPs to analyze, score, and generate password variants that mimic human behavior patterns.

Recent Updates

🔧 Code Quality Enhancement (June 2025):
Major refactoring to reduce cognitive complexity across training functions, improving maintainability and readability following SOLID principles.
🧠 Cognitive Complexity Optimization:
Refactored bulk_train_on_wordlist function from complexity 39 → <15 and interactive_training function from complexity 23 → <15 using helper function extraction.
📊 Improved Function Modularity:
Breaking down monolithic functions into focused, single-responsibility helper functions for better testing and maintenance.
⚡ Enhanced Training Pipeline:
Streamlined training workflows with dedicated helper functions for batch processing, variant generation, and model-specific training logic.
🎯 Extracted Helper Functions:
- _load_wordlist(), _setup_training_components(), _generate_training_variants() for bulk training
- _train_rnn_batch(), _train_mlp_batch(), _process_training_batch() for model-specific operations
- _display_variants_and_options(), _process_user_input(), _handle_training_iteration() for interactive training
🎯 Major Project Restructuring (June 2025):
Complete reorganization with professional directory structure (src/, scripts/), removal of unused components, and improved maintainability.
🧹 Comprehensive Cleanup:
Removed redundant empty model files, moved utilities to organized directories, and updated all imports and references.
📁 Modern Project Structure:
Implemented src/utils/, src/models/ organization with scripts/ directory for better code organization.
🔧 Enhanced uv Integration:
Fully migrated to uv package manager with automated setup scripts and legacy environment cleanup utilities.
🤖 Advanced Transformer Architecture:
Lightweight transformer-based models for richer password embeddings and more accurate variant scoring.
⚡ Optimized GPU Pipeline:
Enhanced pipeline supporting CUDA acceleration, distributed training, and efficient GPU memory management.
📊 Intelligent Data Processing:
Asynchronous processing of massive password datasets with Mini-Batch K-Means clustering for optimized training.
🚀 High-Performance Batch Processing:
Memory-efficient batch processing with producer-consumer patterns for large-scale operations.

Project Structure

Crackernaut follows a modern, organized project structure for better maintainability and development experience:

Crackernaut/
├── src/                          # Main source code directory
│   ├── utils/                    # Utility modules
│   │   ├── config_utils.py       # Configuration management
│   │   ├── variant_utils.py      # Password variant generation logic
│   │   ├── async_utils.py        # Asynchronous processing utilities
│   │   ├── common_utils.py       # Common helper functions
│   │   └── performance_utils.py  # Performance monitoring utilities
│   ├── models/                   # ML model implementations
│   │   ├── embedding/            # Password embedding models
│   │   └── transformer/          # Transformer architecture models
│   ├── cuda_ml.py                # CUDA GPU acceleration utilities
│   ├── distributed_training.py   # Multi-GPU distributed training
│   └── list_preparer.py          # Password list processing and clustering
├── scripts/                      # Setup and utility scripts
│   ├── setup.ps1                 # Windows PowerShell setup script
│   ├── setup.sh                  # Unix/Linux setup script
│   ├── check_gpu.py              # GPU status verification utility
│   ├── cleanup_old_envs.py       # Legacy environment cleanup tool
│   └── migrate_to_uv.py          # Migration utilities for legacy setups
├── tests/                        # Test files and validation scripts
│   ├── test_variants.py          # Password variant testing and validation
│   ├── test_crackernaut.py       # Main application tests
│   ├── test_torch.py             # PyTorch/CUDA functionality tests
│   └── test_imports.py           # Import and dependency tests
├── docs/                         # Documentation and guides
│   ├── README.md                 # Documentation navigation
│   ├── SETUP.md                  # Setup and configuration guide
│   ├── STRUCTURE.md              # Project structure and organization
│   ├── AGENTS.md                 # AI agent configurations
│   └── MARKDOWN_STYLE_GUIDE.md   # Markdown formatting standards
├── scripts/                      # Setup and utility scripts
│   ├── setup.ps1                 # Windows PowerShell setup script
│   ├── setup.sh                  # Unix/Linux setup script
│   ├── check_gpu.py              # GPU status verification utility
│   ├── cleanup_old_envs.py       # Legacy environment cleanup tool
│   ├── migrate_to_uv.py          # Migration utilities for legacy setups
│   └── dev/                      # Development and debugging utilities
│       ├── check_cuda.py         # Basic CUDA availability check
│       ├── debug_torch.py        # Detailed PyTorch debugging
│       ├── simple_cuda_test.py   # Simple CUDA functionality test
│       └── simple_torch_test.py  # Basic PyTorch installation test
├── crackernaut.py                # Main application entry point
├── crackernaut_train.py          # ML model training pipeline
├── config.json                   # Configuration file for models and processing
├── pyproject.toml                # uv dependency management and project metadata
├── trainingdata/                 # Password datasets (excluded from git)
├── clusters/                     # Processed clustering data
├── .vscode/                      # VS Code configuration and tasks
└── .github/                      # GitHub configuration and Copilot instructions

Key Organizational Benefits

tests/: Centralized test files for better organization and pytest compatibility
docs/: Comprehensive documentation hub with clear navigation
src/: Clean separation of main source code from scripts and configuration
src/utils/: Centralized utility functions for better maintainability and reusability
src/models/: Organized ML model implementations with clear architecture separation
scripts/: Setup and maintenance scripts separate from application logic
scripts/dev/: Development utilities isolated from production scripts
Removed redundancy: Eliminated empty model files and unused dependencies
Professional structure: Follows Python packaging best practices for research projects

├── pyproject.toml                # Project dependencies (uv)
└── trainingdata/                 # Training datasets (not in VCS)

This structure provides:

Clear separation of tests, documentation, core code, utilities, and scripts
Professional organization following Python packaging best practices
Maintainability with logical grouping of related functionality
Scalability with room for growth without cluttering the root directory
Developer experience with dedicated documentation and debugging tools

For detailed information about the structure and recent changes, see docs/STRUCTURE.md.

Purpose

Crackernaut is a sophisticated password guessing utility designed to generate human-like password variants from a given base password. It combines rule-based transformations with machine learning to produce plausible password guesses that reflect common patterns humans use when creating passwords. This tool is intended for security researchers, penetration testers, and anyone needing to test password strength by generating realistic variants for analysis or cracking attempts.

Features

Human-Like Variants:
Generates passwords using transformations like numeric increments, symbol additions, capitalization changes, leet speak, shifts, repetitions, and middle insertions.
Machine Learning Scoring:
Uses PyTorch-based models (including a new transformer model and legacy models such as MLP, RNN, BiLSTM) to score variants based on their likelihood of human use.
GPU Acceleration:
Leverages CUDA for faster computation on compatible hardware with automatic device detection.
Smart List Preparation:
Processes large password datasets using transformer-based embeddings and clustering to create optimized, diverse training sets.
Configurable:
Adjust transformation weights, chain depth, and maximum length via a JSON configuration file.
Multi-Processing:
Employs parallel processing for variant generation and a producer-consumer pattern for efficient processing.
Training Modes:
- Bulk training from wordlists with automated learning.
- Self-supervised learning from password pattern mining.
- Intelligent dataset preparation through clustering.
Hyperparameter Tuning:
Optimizes ML models using Bayesian optimization with the Ax library.
Distributed Training:
Supports distributed data parallelism across multiple GPUs for faster training.
Asynchronous I/O:
Uses asynchronous file operations for efficient data handling.
Flexible Output:
Outputs variants to the console or a file, with options to limit quantity and length.
Multiple Model Options:
- NEW: Transformer model (more accurate, supports batch processing).
- Legacy models: MLP, RNN, BiLSTM.

Requirements

Python 3.11+ (tested with Python 3.12.8)
PyTorch with CUDA support (automatically configured for RTX 3090 and similar GPUs)
NVIDIA GPU with CUDA 12.1+ support (optional, for GPU acceleration)
uv (modern Python package manager for fast, reliable dependency management)

All dependencies are managed via pyproject.toml with optional extras:

cuda: For CUDA/GPU acceleration with PyTorch CUDA 12.1 support
dev: Development tools (black, flake8, mypy, pre-commit, pytest)
CUDA Configuration: The project is pre-configured for NVIDIA RTX 3090 and similar GPUs with CUDA 12.1 support. PyTorch will automatically install with CUDA acceleration when using the cuda extra.

Hardware

Recommended: NVIDIA RTX 3090, RTX 4090, or newer CUDA-capable GPU for optimal performance
Supported: Any NVIDIA GPU with CUDA Compute Capability 7.0+ and CUDA 12.1+ drivers
Minimum: Any CPU (runs without GPU support, though significantly slower for ML operations)

Installation

Install uv (if not already installed):

# On Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

Clone this repository:

git clone <repository-url>
cd crackernaut

Quick Setup (Recommended):

# Windows
.\scripts\setup.ps1

# macOS/Linux/WSL
chmod +x scripts/setup.sh && ./scripts/setup.sh

Manual Installation:
```
# Basic installation
uv sync

# With CUDA support (recommended for GPU acceleration)
uv sync --extra cuda

# With development tools
uv sync --extra dev

# All extras (recommended for full development setup)
uv sync --all-extras
```
PyTorch CUDA Configuration:
- The project automatically installs PyTorch 2.5.1 with CUDA 12.1 support when using --extra cuda
- Includes all necessary NVIDIA CUDA runtime libraries (cublas, cudnn, etc.)
- Pre-configured for RTX 3090 and similar high-end GPUs
- Falls back to CPU mode if CUDA is not available

Verify GPU Setup (if using CUDA):

# Check CUDA availability and GPU detection
uv run python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}'); print(f'GPU Count: {torch.cuda.device_count()}'); [print(f'GPU {i}: {torch.cuda.get_device_name(i)}') for i in range(torch.cuda.device_count())]"

# Or use the built-in script
uv run python scripts/check_gpu.py

Migrating from Legacy pip/venv Setup

If you're upgrading from an older version that used pip and virtual environments, use the automated migration:

# Automated migration (recommended)
uv run python migrate_to_uv.py

# Or use the quick setup scripts (they clean up old environments automatically)
# Windows:
.\scripts\setup.ps1

# macOS/Linux/WSL:
chmod +x scripts/setup.sh && ./scripts/setup.sh

The migration and setup scripts will:

Remove old virtual environment directories (.venv, venv, env)
Install uv if not already present
Set up all dependencies using uv
Update your development workflow to use uv commands

Usage

Basic Usage

Run Crackernaut to generate variants from a base password:

uv run python crackernaut.py --password "mypassword" --model transformer

Command Line Options

--password, -p: Base password to analyze.
--config, -c: Path to configuration file (default: config.json).
--depth, -d: Chain depth for variant generation.
--model, -m: Model type: transformer, rnn, bilstm, mlp (default: transformer).
--prepare: Trigger list preparation.
--lp-dataset: Path to large password dataset for list preparation.
--lp-output: Output directory for clusters (default: clusters).
--lp-chunk-size: Chunk size for list preparation (default: 1000000).

Training the Model

Use crackernaut_train.py to train the ML model and refine configuration.

Bulk Training

Train on a wordlist (one password per line):

uv run python crackernaut_train.py --wordlist <wordlist_file> [-t <iterations>] [--model <model_type>]

Example:

uv run python crackernaut_train.py --wordlist rockyou.txt --times 5 --model bilstm

Intelligent Dataset Preparation

Process large wordlists into optimized training sets:

uv run python crackernaut_train.py --prepare --lp-dataset <path_to_dataset> [--clusters <num_clusters>] [--lp-chunk-size <size>] [--lp-output <output_dir>]

Example:

uv run python crackernaut_train.py --prepare --lp-dataset breach_compilation.txt --clusters 20000 --lp-chunk-size 2000000

Interactive Training

Fine-tune the model with interactive feedback:

uv run python crackernaut_train.py --interactive

Configuration

Customize Crackernaut via the config.json file. Key options include:

model_type: Model for scoring (transformer, rnn, bilstm, mlp)
model_embed_dim: Embedding dimension for the transformer (default: 64)
model_num_heads: Number of attention heads (default: 4)
model_num_layers: Number of transformer layers (default: 3)
model_hidden_dim: Hidden dimension in transformer feed-forward layers (default: 128)
model_dropout: Dropout rate (default: 0.2)
chain_depth: Maximum number of modifications (default: 2)
max_length: Maximum length for generated passwords
transformation_weights: Weights for different transformation types
current_base: Base password for interactive training
learning_rate: Model training learning rate

Ethical and Security Considerations

Always obtain explicit permission before using Crackernaut for security testing.
Handle password datasets securely, using encryption where necessary, and comply with all applicable data protection laws.

Technical Architecture

Modern Project Organization

Crackernaut now features a professionally structured codebase:

Source Code Structure: All core modules organized under src/ with logical separation
Utility Modules: Common functionality grouped in src/utils/ for reusability
Model Architecture: ML models isolated in src/models/ with clear interfaces
Script Organization: Setup and utility scripts separated in scripts/ directory
Clean Dependencies: Modern uv-based dependency management with optional extras

Password Transformations

Crackernaut implements various transformation strategies:

Character Substitution: Replace letters with similar symbols
Case Modification: Alter capitalization patterns
Numeric Manipulation: Change numerical parts intelligently
Symbol Addition: Insert special characters strategically
Pattern Recognition: Apply common password creation patterns

Machine Learning Models

🎯 Transformer Model (Primary):
State-of-the-art architecture providing superior pattern recognition and batch processing capabilities
🔧 Legacy Models:
MLP, RNN, BiLSTM models retained for backward compatibility and research purposes

List Preparation System

📊 Chunked Processing: Efficiently handles massive password datasets without memory overflow
🧠 Transformer Embeddings: Generates sophisticated low-dimensional password representations
🎯 Clustering: Uses Mini‑Batch K‑Means to group similar passwords for optimal training sets
✨ Representative Selection: Intelligently chooses diverse samples for comprehensive training

Processing Pipeline

Input Processing: Base password analysis and preparation
Variant Generation: Rule-based transformation pool creation
ML Scoring: Transformer-based variant likelihood assessment
Intelligent Filtering: Smart ranking and selection algorithms
Optimized Output: Top-scored variants with confidence metrics

Development Environment

GitHub Copilot Configuration

This workspace includes optimized GitHub Copilot configuration for ML security research:

Repository Instructions: .github/copilot-instructions.md provides project-specific context
Workspace Settings: .vscode/settings.json contains Copilot agent configuration
Privacy Protection: .copilotignore excludes sensitive training data from indexing

The configuration ensures Copilot understands:

Password security research context and ethical guidelines
PyTorch/CUDA development patterns and error handling
Async I/O patterns for large dataset processing
Type safety and proper documentation standards

Work in Progress

List preparation module for organizing password datasets
Updated training pipeline for transformer models
Improved test coverage

Disclaimer

Crackernaut is intended for ethical use only. Misuse of this tool for unauthorized access or malicious purposes is strictly prohibited.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.cspell		.cspell
docs		docs
scripts		scripts
src		src
tests		tests
.copilotignore		.copilotignore
.gitattributes		.gitattributes
.gitignore		.gitignore
COPILOT_SETUP.md		COPILOT_SETUP.md
LICENSE		LICENSE
MARKDOWN_FIX_SUMMARY.md		MARKDOWN_FIX_SUMMARY.md
MARKDOWN_GUIDE.md		MARKDOWN_GUIDE.md
MARKDOWN_STYLE_GUIDE.md		MARKDOWN_STYLE_GUIDE.md
README.md		README.md
crackernaut.py		crackernaut.py
crackernaut_train.py		crackernaut_train.py
pyproject.toml		pyproject.toml
test_structure.py		test_structure.py
test_torch_simple.py		test_torch_simple.py
uv.lock		uv.lock
verify_fixes.py		verify_fixes.py

Folders and files

Latest commit

History

Repository files navigation

Crackernaut

Overview

Recent Updates

Project Structure

Key Organizational Benefits

Purpose

Features

Requirements

Hardware

Installation

Migrating from Legacy pip/venv Setup

Usage

Basic Usage

Command Line Options

Training the Model

Bulk Training

Intelligent Dataset Preparation

Interactive Training

Configuration

Ethical and Security Considerations

Technical Architecture

Modern Project Organization

Password Transformations

Machine Learning Models

List Preparation System

Processing Pipeline

Development Environment

GitHub Copilot Configuration

Work in Progress

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages