Skip to content

tehw0lf/wlgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

A recursive wordlist generator written in Python. For each string position, custom character sets can be defined.

Prerequisites

Python 3.0 or higher

Installation

From PyPI:

pip install wlgen

From GitHub:

git clone https://github.com/tehw0lf/wlgen.git
cd wlgen
pip install .

Usage

Smart Dispatcher (Recommended)

The generate_wordlist() function automatically selects the optimal algorithm based on problem size:

import wlgen

# Auto-select optimal algorithm
charset = {0: '123', 1: 'ABC', 2: 'xyz'}
wordlist = wlgen.generate_wordlist(charset)

# Use the result (list or iterator depending on size)
for word in wordlist if not isinstance(wordlist, list) else wordlist:
    print(word)

Selection Strategy:

  • Tiny (<1K combinations): Uses gen_wordlist (fastest, low memory)
  • Small (1K-100K): Uses gen_wordlist (fast and convenient)
  • Medium+ (100K+): Uses gen_wordlist_iter (optimal throughput, constant memory)

Options:

# Force memory-efficient mode for any size
wordlist = wlgen.generate_wordlist(charset, prefer_memory_efficient=True)

# Manual algorithm selection
wordlist = wlgen.generate_wordlist(charset, method='iter')  # Force iterator
wordlist = wlgen.generate_wordlist(charset, method='list')  # Force list
wordlist = wlgen.generate_wordlist(charset, method='words') # Force gen_words

# Skip input validation for pre-cleaned data (performance optimization)
wordlist = wlgen.generate_wordlist(charset, method='iter', clean_input=True)

Estimate wordlist size before generation:

size = wlgen.estimate_wordlist_size(charset)
print(f"Will generate {size:,} combinations")

Direct Implementation Access

Three implementations are available for direct use:

  • gen_wordlist_iter: Fast generator using itertools.product (~780-810K comb/s).
  • gen_wordlist: Builds entire list in memory. Fastest for small lists (~900K-1.6M comb/s).
  • gen_words: Memory-efficient recursive generator (~210-230K comb/s).

All implementations calculate the n-ary Cartesian product of input character sets.

import wlgen

charset = {0: '123', 1: 'ABC'}

# Use specific implementation
for word in wlgen.gen_wordlist_iter(charset):
    print(word)

Note: NumPy, CUDA, and multiprocessing were investigated but found to provide no performance benefit for this workload. String operations are fundamentally CPU-bound and too fast for parallelization overhead. See issues #17, #18, #20 for detailed analysis.

Development

This project uses uv for dependency management and development workflow.

Setup

Install dependencies:

uv sync --all-extras --group lint

Testing

Run tests:

uv run python -m unittest discover

Benchmarking

Run performance benchmarks:

uv run python wlgen/benchmarks/benchmark.py

Code Quality

Lint code:

uv run ruff check

Auto-fix linting issues:

uv run ruff check --fix

Format code:

uv run ruff format

Building

Build wheel and source distribution:

uv build

About

A wordlist generator written in Python

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages