A sophisticated password variant generation and ML security research tool that combines rule-based transformations with machine learning to produce human-like password variants for ethical penetration testing and security analysis.
Crackernaut is a password security research platform designed for authorized security professionals and researchers. It leverages advanced machine learning approaches including transformers, RNNs, and MLPs to analyze, score, and generate password variants that mimic human behavior patterns.
- 🔧 Code Quality Enhancement (June 2025):
Major refactoring to reduce cognitive complexity across training functions, improving maintainability and readability following SOLID principles. - 🧠 Cognitive Complexity Optimization:
Refactoredbulk_train_on_wordlistfunction from complexity 39 → <15 andinteractive_trainingfunction from complexity 23 → <15 using helper function extraction. - 📊 Improved Function Modularity:
Breaking down monolithic functions into focused, single-responsibility helper functions for better testing and maintenance. - ⚡ Enhanced Training Pipeline:
Streamlined training workflows with dedicated helper functions for batch processing, variant generation, and model-specific training logic. - 🎯 Extracted Helper Functions:
_load_wordlist(),_setup_training_components(),_generate_training_variants()for bulk training_train_rnn_batch(),_train_mlp_batch(),_process_training_batch()for model-specific operations_display_variants_and_options(),_process_user_input(),_handle_training_iteration()for interactive training
- 🎯 Major Project Restructuring (June 2025):
Complete reorganization with professional directory structure (src/,scripts/), removal of unused components, and improved maintainability. - 🧹 Comprehensive Cleanup:
Removed redundant empty model files, moved utilities to organized directories, and updated all imports and references. - 📁 Modern Project Structure:
Implementedsrc/utils/,src/models/organization withscripts/directory for better code organization. - 🔧 Enhanced uv Integration:
Fully migrated to uv package manager with automated setup scripts and legacy environment cleanup utilities. - 🤖 Advanced Transformer Architecture:
Lightweight transformer-based models for richer password embeddings and more accurate variant scoring. - ⚡ Optimized GPU Pipeline:
Enhanced pipeline supporting CUDA acceleration, distributed training, and efficient GPU memory management. - 📊 Intelligent Data Processing:
Asynchronous processing of massive password datasets with Mini-Batch K-Means clustering for optimized training. - 🚀 High-Performance Batch Processing:
Memory-efficient batch processing with producer-consumer patterns for large-scale operations.
Crackernaut follows a modern, organized project structure for better maintainability and development experience:
Crackernaut/
├── src/ # Main source code directory
│ ├── utils/ # Utility modules
│ │ ├── config_utils.py # Configuration management
│ │ ├── variant_utils.py # Password variant generation logic
│ │ ├── async_utils.py # Asynchronous processing utilities
│ │ ├── common_utils.py # Common helper functions
│ │ └── performance_utils.py # Performance monitoring utilities
│ ├── models/ # ML model implementations
│ │ ├── embedding/ # Password embedding models
│ │ └── transformer/ # Transformer architecture models
│ ├── cuda_ml.py # CUDA GPU acceleration utilities
│ ├── distributed_training.py # Multi-GPU distributed training
│ └── list_preparer.py # Password list processing and clustering
├── scripts/ # Setup and utility scripts
│ ├── setup.ps1 # Windows PowerShell setup script
│ ├── setup.sh # Unix/Linux setup script
│ ├── check_gpu.py # GPU status verification utility
│ ├── cleanup_old_envs.py # Legacy environment cleanup tool
│ └── migrate_to_uv.py # Migration utilities for legacy setups
├── tests/ # Test files and validation scripts
│ ├── test_variants.py # Password variant testing and validation
│ ├── test_crackernaut.py # Main application tests
│ ├── test_torch.py # PyTorch/CUDA functionality tests
│ └── test_imports.py # Import and dependency tests
├── docs/ # Documentation and guides
│ ├── README.md # Documentation navigation
│ ├── SETUP.md # Setup and configuration guide
│ ├── STRUCTURE.md # Project structure and organization
│ ├── AGENTS.md # AI agent configurations
│ └── MARKDOWN_STYLE_GUIDE.md # Markdown formatting standards
├── scripts/ # Setup and utility scripts
│ ├── setup.ps1 # Windows PowerShell setup script
│ ├── setup.sh # Unix/Linux setup script
│ ├── check_gpu.py # GPU status verification utility
│ ├── cleanup_old_envs.py # Legacy environment cleanup tool
│ ├── migrate_to_uv.py # Migration utilities for legacy setups
│ └── dev/ # Development and debugging utilities
│ ├── check_cuda.py # Basic CUDA availability check
│ ├── debug_torch.py # Detailed PyTorch debugging
│ ├── simple_cuda_test.py # Simple CUDA functionality test
│ └── simple_torch_test.py # Basic PyTorch installation test
├── crackernaut.py # Main application entry point
├── crackernaut_train.py # ML model training pipeline
├── config.json # Configuration file for models and processing
├── pyproject.toml # uv dependency management and project metadata
├── trainingdata/ # Password datasets (excluded from git)
├── clusters/ # Processed clustering data
├── .vscode/ # VS Code configuration and tasks
└── .github/ # GitHub configuration and Copilot instructions
- tests/: Centralized test files for better organization and pytest compatibility
- docs/: Comprehensive documentation hub with clear navigation
- src/: Clean separation of main source code from scripts and configuration
- src/utils/: Centralized utility functions for better maintainability and reusability
- src/models/: Organized ML model implementations with clear architecture separation
- scripts/: Setup and maintenance scripts separate from application logic
- scripts/dev/: Development utilities isolated from production scripts
- Removed redundancy: Eliminated empty model files and unused dependencies
- Professional structure: Follows Python packaging best practices for research projects
├── pyproject.toml # Project dependencies (uv)
└── trainingdata/ # Training datasets (not in VCS)
This structure provides:
- Clear separation of tests, documentation, core code, utilities, and scripts
- Professional organization following Python packaging best practices
- Maintainability with logical grouping of related functionality
- Scalability with room for growth without cluttering the root directory
- Developer experience with dedicated documentation and debugging tools
For detailed information about the structure and recent changes, see docs/STRUCTURE.md.
Crackernaut is a sophisticated password guessing utility designed to generate human-like password variants from a given base password. It combines rule-based transformations with machine learning to produce plausible password guesses that reflect common patterns humans use when creating passwords. This tool is intended for security researchers, penetration testers, and anyone needing to test password strength by generating realistic variants for analysis or cracking attempts.
- Human-Like Variants:
Generates passwords using transformations like numeric increments, symbol additions, capitalization changes, leet speak, shifts, repetitions, and middle insertions. - Machine Learning Scoring:
Uses PyTorch-based models (including a new transformer model and legacy models such as MLP, RNN, BiLSTM) to score variants based on their likelihood of human use. - GPU Acceleration:
Leverages CUDA for faster computation on compatible hardware with automatic device detection. - Smart List Preparation:
Processes large password datasets using transformer-based embeddings and clustering to create optimized, diverse training sets. - Configurable:
Adjust transformation weights, chain depth, and maximum length via a JSON configuration file. - Multi-Processing:
Employs parallel processing for variant generation and a producer-consumer pattern for efficient processing. - Training Modes:
- Bulk training from wordlists with automated learning.
- Self-supervised learning from password pattern mining.
- Intelligent dataset preparation through clustering.
- Hyperparameter Tuning:
Optimizes ML models using Bayesian optimization with the Ax library. - Distributed Training:
Supports distributed data parallelism across multiple GPUs for faster training. - Asynchronous I/O:
Uses asynchronous file operations for efficient data handling. - Flexible Output:
Outputs variants to the console or a file, with options to limit quantity and length. - Multiple Model Options:
- NEW: Transformer model (more accurate, supports batch processing).
- Legacy models: MLP, RNN, BiLSTM.
- Python 3.11+ (tested with Python 3.12.8)
- PyTorch with CUDA support (automatically configured for RTX 3090 and similar GPUs)
- NVIDIA GPU with CUDA 12.1+ support (optional, for GPU acceleration)
- uv (modern Python package manager for fast, reliable dependency management)
All dependencies are managed via pyproject.toml with optional extras:
-
cuda: For CUDA/GPU acceleration with PyTorch CUDA 12.1 support -
dev: Development tools (black, flake8, mypy, pre-commit, pytest) -
CUDA Configuration: The project is pre-configured for NVIDIA RTX 3090 and similar GPUs with CUDA 12.1 support. PyTorch will automatically install with CUDA acceleration when using the
cudaextra.
- Recommended: NVIDIA RTX 3090, RTX 4090, or newer CUDA-capable GPU for optimal performance
- Supported: Any NVIDIA GPU with CUDA Compute Capability 7.0+ and CUDA 12.1+ drivers
- Minimum: Any CPU (runs without GPU support, though significantly slower for ML operations)
-
Install uv (if not already installed):
# On Windows (PowerShell) powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # On macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh
-
Clone this repository:
git clone <repository-url> cd crackernaut
-
Quick Setup (Recommended):
# Windows .\scripts\setup.ps1 # macOS/Linux/WSL chmod +x scripts/setup.sh && ./scripts/setup.sh
-
Manual Installation:
# Basic installation uv sync # With CUDA support (recommended for GPU acceleration) uv sync --extra cuda # With development tools uv sync --extra dev # All extras (recommended for full development setup) uv sync --all-extras
PyTorch CUDA Configuration:
- The project automatically installs PyTorch 2.5.1 with CUDA 12.1 support when using
--extra cuda - Includes all necessary NVIDIA CUDA runtime libraries (cublas, cudnn, etc.)
- Pre-configured for RTX 3090 and similar high-end GPUs
- Falls back to CPU mode if CUDA is not available
- The project automatically installs PyTorch 2.5.1 with CUDA 12.1 support when using
-
Verify GPU Setup (if using CUDA):
# Check CUDA availability and GPU detection uv run python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}'); print(f'GPU Count: {torch.cuda.device_count()}'); [print(f'GPU {i}: {torch.cuda.get_device_name(i)}') for i in range(torch.cuda.device_count())]" # Or use the built-in script uv run python scripts/check_gpu.py
If you're upgrading from an older version that used pip and virtual environments, use the automated migration:
# Automated migration (recommended)
uv run python migrate_to_uv.py
# Or use the quick setup scripts (they clean up old environments automatically)
# Windows:
.\scripts\setup.ps1
# macOS/Linux/WSL:
chmod +x scripts/setup.sh && ./scripts/setup.shThe migration and setup scripts will:
- Remove old virtual environment directories (
.venv,venv,env) - Install uv if not already present
- Set up all dependencies using uv
- Update your development workflow to use uv commands
Run Crackernaut to generate variants from a base password:
uv run python crackernaut.py --password "mypassword" --model transformer--password, -p: Base password to analyze.--config, -c: Path to configuration file (default: config.json).--depth, -d: Chain depth for variant generation.--model, -m: Model type: transformer, rnn, bilstm, mlp (default: transformer).--prepare: Trigger list preparation.--lp-dataset: Path to large password dataset for list preparation.--lp-output: Output directory for clusters (default: clusters).--lp-chunk-size: Chunk size for list preparation (default: 1000000).
Use crackernaut_train.py to train the ML model and refine configuration.
Train on a wordlist (one password per line):
uv run python crackernaut_train.py --wordlist <wordlist_file> [-t <iterations>] [--model <model_type>]Example:
uv run python crackernaut_train.py --wordlist rockyou.txt --times 5 --model bilstmProcess large wordlists into optimized training sets:
uv run python crackernaut_train.py --prepare --lp-dataset <path_to_dataset> [--clusters <num_clusters>] [--lp-chunk-size <size>] [--lp-output <output_dir>]Example:
uv run python crackernaut_train.py --prepare --lp-dataset breach_compilation.txt --clusters 20000 --lp-chunk-size 2000000Fine-tune the model with interactive feedback:
uv run python crackernaut_train.py --interactiveCustomize Crackernaut via the config.json file. Key options include:
model_type: Model for scoring (transformer, rnn, bilstm, mlp)model_embed_dim: Embedding dimension for the transformer (default: 64)model_num_heads: Number of attention heads (default: 4)model_num_layers: Number of transformer layers (default: 3)model_hidden_dim: Hidden dimension in transformer feed-forward layers (default: 128)model_dropout: Dropout rate (default: 0.2)chain_depth: Maximum number of modifications (default: 2)max_length: Maximum length for generated passwordstransformation_weights: Weights for different transformation typescurrent_base: Base password for interactive traininglearning_rate: Model training learning rate
- Always obtain explicit permission before using Crackernaut for security testing.
- Handle password datasets securely, using encryption where necessary, and comply with all applicable data protection laws.
Crackernaut now features a professionally structured codebase:
- Source Code Structure: All core modules organized under
src/with logical separation - Utility Modules: Common functionality grouped in
src/utils/for reusability - Model Architecture: ML models isolated in
src/models/with clear interfaces - Script Organization: Setup and utility scripts separated in
scripts/directory - Clean Dependencies: Modern uv-based dependency management with optional extras
Crackernaut implements various transformation strategies:
- Character Substitution: Replace letters with similar symbols
- Case Modification: Alter capitalization patterns
- Numeric Manipulation: Change numerical parts intelligently
- Symbol Addition: Insert special characters strategically
- Pattern Recognition: Apply common password creation patterns
- 🎯 Transformer Model (Primary):
State-of-the-art architecture providing superior pattern recognition and batch processing capabilities - 🔧 Legacy Models:
MLP, RNN, BiLSTM models retained for backward compatibility and research purposes
- 📊 Chunked Processing: Efficiently handles massive password datasets without memory overflow
- 🧠 Transformer Embeddings: Generates sophisticated low-dimensional password representations
- 🎯 Clustering: Uses Mini‑Batch K‑Means to group similar passwords for optimal training sets
- ✨ Representative Selection: Intelligently chooses diverse samples for comprehensive training
- Input Processing: Base password analysis and preparation
- Variant Generation: Rule-based transformation pool creation
- ML Scoring: Transformer-based variant likelihood assessment
- Intelligent Filtering: Smart ranking and selection algorithms
- Optimized Output: Top-scored variants with confidence metrics
This workspace includes optimized GitHub Copilot configuration for ML security research:
- Repository Instructions:
.github/copilot-instructions.mdprovides project-specific context - Workspace Settings:
.vscode/settings.jsoncontains Copilot agent configuration - Privacy Protection:
.copilotignoreexcludes sensitive training data from indexing
The configuration ensures Copilot understands:
- Password security research context and ethical guidelines
- PyTorch/CUDA development patterns and error handling
- Async I/O patterns for large dataset processing
- Type safety and proper documentation standards
- List preparation module for organizing password datasets
- Updated training pipeline for transformer models
- Improved test coverage
Crackernaut is intended for ethical use only. Misuse of this tool for unauthorized access or malicious purposes is strictly prohibited.