Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
*.dll
*.so
*.dylib
cmd/alterx/alterx
dist
.idea
.vscode
Expand All @@ -14,6 +13,9 @@ dist

# Output of the go coverage tool, specifically when used with LiteIDE
*.out
coverage.html

# Dependency directories (remove the comment below to include it)
# vendor/
/cmd/alterx/alterx
/alterx
204 changes: 204 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

AlterX is a fast and customizable subdomain wordlist generator using DSL (Domain-Specific Language). This is a **Go port** that integrates pattern mining capabilities from [Regulator](https://github.com/cramppet/regulator) by @cramppet into the original [ProjectDiscovery alterx](https://github.com/projectdiscovery/alterx).

**Key Features:**
- Template-based subdomain generation using variables like `{{sub}}`, `{{suffix}}`, `{{word}}`
- Pattern mining mode that automatically discovers subdomain patterns from observed data
- Three operation modes: default (user patterns), discover (mined patterns), both (combined)
- ClusterBomb attack pattern for generating permutations

## Build & Development Commands

```bash
# Build the binary
make build

# Run tests
make test

# Run tests with coverage
make test-coverage

# Run linter (requires golangci-lint)
make lint

# Format code
make fmt

# Clean build artifacts
make clean

# Install to $GOPATH/bin
make install

# Build and run help
make run
```

**Single test execution:**
```bash
go test -v -run TestFunctionName ./path/to/package
```

## Architecture

### Core Components

**1. Entry Point** (`cmd/alterx/main.go`)
- CLI argument parsing via `runner.ParseFlags()`
- Mode selection logic (default/discover/both)
- Pattern mining flow orchestration
- Deduplication between mined and user-defined patterns

**2. Mutator Engine** (`mutator.go`, `algo.go`)
- `Mutator` struct: Core permutation generator
- `ClusterBomb` algorithm: Nth-order payload combination using recursion
- `IndexMap`: Maintains deterministic ordering for payload iteration
- Template replacement using variables extracted from input domains

**3. Input Processing** (`inputs.go`)
- `Input` struct: Parses domains into components (sub, suffix, tld, etld, etc.)
- Variable extraction: `{{sub}}`, `{{sub1}}`, `{{suffix}}`, `{{root}}`, `{{sld}}`, etc.
- Multi-level subdomain support (e.g., `cloud.api.example.com` → `sub=cloud`, `sub1=api`)

**4. Pattern Mining** (`internal/patternmining/`)
- **Three-phase discovery algorithm:**
1. Edit distance clustering (no prefix enforcement)
2. N-gram clustering (unigrams/bigrams)
3. N-gram prefix clustering with edit distance refinement
- **Quality control:** Pattern threshold and quality ratio prevent over-generation
- **Regex generation:** Converts clusters to patterns with alternations `(a|b)` and optional groups `(...)?`
- **Number compression:** Optimizes `[0-9]` ranges automatically

**5. DFA Engine** (`internal/dank/dank.go`)
- Brzozowski's algorithm for DFA minimization
- Thompson NFA construction from regex
- Subset construction for NFA→DFA conversion
- Reverse DFA for minimization (determinize → reverse → determinize → reverse → determinize)
- Fixed-length string generation from automaton

### File Structure

```
cmd/alterx/main.go # Entry point, mode selection, orchestration
internal/runner/
├── runner.go # CLI flag definitions and parsing
├── config.go # Version and config management
└── banner.go # Banner display
internal/patternmining/
├── patternmining.go # Main mining algorithm (3 phases)
├── clustering.go # Edit distance clustering logic
└── regex.go # Tokenization and regex generation
internal/dank/
└── dank.go # DFA-based pattern generation (Brzozowski)
mutator.go # Core Mutator with ClusterBomb algorithm
algo.go # ClusterBomb implementation and IndexMap
inputs.go # Domain parsing and variable extraction
replacer.go # Template variable replacement
config.go # Default patterns and payloads
util.go # Helper functions
```

## Key Concepts

### Variables System
Templates use variables extracted from input domains:
- `{{sub}}`: Leftmost subdomain part (e.g., `api` in `api.example.com`)
- `{{suffix}}`: Everything except leftmost part (e.g., `example.com`)
- `{{root}}`: eTLD+1 (e.g., `example.com`)
- `{{sld}}`: Second-level domain (e.g., `example`)
- `{{tld}}`: Top-level domain (e.g., `com`)
- `{{etld}}`: Extended TLD (e.g., `co.uk`)
- `{{subN}}`: Multi-level support where N is depth (e.g., `{{sub1}}`, `{{sub2}}`)

### ClusterBomb Algorithm
Generates all combinations of payloads across variables:
- Uses recursion with vector construction
- Maintains deterministic ordering via IndexMap
- Avoids redundant combinations (e.g., `api-api.example.com`)
- Early exit when no variables present in template

### Pattern Mining Workflow
1. **Validate input:** Ensure domains share common target (e.g., `.example.com`)
2. **Build distance table:** Compute pairwise Levenshtein distances
3. **Phase 1 - Edit clustering:** Group by edit distance (min to max)
4. **Phase 2 - N-grams:** Generate unigrams/bigrams, cluster by prefix
5. **Phase 3 - Prefix clustering:** Apply edit distance within prefix groups
6. **Quality validation:** Filter patterns using threshold and ratio metrics
7. **Generate subdomains:** Use DFA to produce strings from patterns

## Pattern Mining Modes

**Default Mode** (`-m default` or omit):
- Original alterx behavior
- Uses user-defined or default patterns from config

**Discover Mode** (`-m discover`):
- Pattern mining only
- Discovers patterns from input domains
- Generates subdomains based only on mined patterns

**Both Mode** (`-m both`):
- Combines user-defined and mined patterns
- Deduplicates results across both sources
- Best for maximum coverage

**Key Flags:**
- `-min-distance 2`: Minimum Levenshtein distance for clustering
- `-max-distance 10`: Maximum Levenshtein distance for clustering
- `-pattern-threshold 500`: Minimum synthetic subdomains before ratio check
- `-quality-ratio 25`: Max ratio of synthetic/observed subdomains
- `-save-rules output.json`: Save discovered patterns and metadata to JSON file

## Common Patterns

### Adding New CLI Flags
1. Add field to `Options` struct in `internal/runner/runner.go`
2. Register flag in `ParseFlags()` using appropriate flag group
3. Handle flag value in main logic (`cmd/alterx/main.go`)

### Adding New Variables
1. Parse in `NewInput()` in `inputs.go`
2. Add to `Input.GetMap()` return value
3. Update template validation in `mutator.go`

### Modifying Pattern Mining
- **Clustering logic:** `internal/patternmining/clustering.go`
- **Tokenization rules:** `tokenize()` in `internal/patternmining/regex.go`
- **Quality metrics:** `isGoodRule()` in `internal/patternmining/patternmining.go`

## Testing Strategy

- Unit tests in `*_test.go` files (e.g., `mutator_test.go`, `inputs_test.go`)
- Test individual components before integration
- Use table-driven tests for variable extraction and pattern generation
- Validate pattern mining with known domain sets

## Important Notes

- **Dedupe enabled by default:** `DedupeResults = true` in `mutator.go`
- **Prefix optimization:** ClusterBomb skips words already in leftmost subdomain
- **Pattern quality critical:** Low thresholds generate millions of subdomains
- **Distance memoization:** Pattern mining caches Levenshtein distances for performance
- **DFA minimization:** Three-pass Brzozowski ensures minimal automaton
- **No breaking changes:** All pattern mining is additive; default behavior unchanged

## Credits

- **Original alterx:** [ProjectDiscovery](https://github.com/projectdiscovery/alterx)
- **Pattern mining algorithm:** [Regulator](https://github.com/cramppet/regulator) by @cramppet
- **DFA implementation:** Ported from original regulator/dank library

## Development Guidelines

- Maintain compatibility with original alterx API
- Keep pattern mining as optional feature (don't force on users)
- Preserve deterministic output ordering for testing
- Use `gologger` for all logging (not fmt.Println)
- Follow Go naming conventions and project structure
- Add tests for new features
74 changes: 74 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
.PHONY: all build install test clean lint fmt vet help

# Binary name
BINARY_NAME=alterx
BINARY_PATH=./cmd/alterx

# Go parameters
GOCMD=go
GOBUILD=$(GOCMD) build
GOCLEAN=$(GOCMD) clean
GOTEST=$(GOCMD) test
GOGET=$(GOCMD) get
GOMOD=$(GOCMD) mod
GOFMT=$(GOCMD) fmt
GOVET=$(GOCMD) vet

# Build flags
LDFLAGS=-s -w

all: build ## Build the project

build: ## Build the binary
@echo "Building $(BINARY_NAME)..."
$(GOBUILD) -ldflags="$(LDFLAGS)" -o $(BINARY_NAME) $(BINARY_PATH)
@echo "Build complete: ./$(BINARY_NAME)"

install: ## Install the binary to $GOPATH/bin
@echo "Installing $(BINARY_NAME)..."
$(GOCMD) install $(BINARY_PATH)
@echo "Install complete"

test: ## Run tests
@echo "Running tests..."
$(GOTEST) -v ./...

test-coverage: ## Run tests with coverage
@echo "Running tests with coverage..."
$(GOTEST) -v -coverprofile=coverage.out ./...
$(GOCMD) tool cover -html=coverage.out -o coverage.html
@echo "Coverage report generated: coverage.html"

lint: ## Run linter (requires golangci-lint)
@echo "Running linter..."
@which golangci-lint > /dev/null || (echo "golangci-lint not installed. Install: https://golangci-lint.run/usage/install/" && exit 1)
golangci-lint run ./...

fmt: ## Format code
@echo "Formatting code..."
$(GOFMT) ./...

vet: ## Run go vet
@echo "Running go vet..."
$(GOVET) ./...

clean: ## Clean build artifacts
@echo "Cleaning..."
$(GOCLEAN)
rm -f $(BINARY_NAME)
rm -f coverage.out coverage.html
@echo "Clean complete"

deps: ## Download dependencies
@echo "Downloading dependencies..."
$(GOMOD) download
$(GOMOD) tidy

run: build ## Build and run the binary
./$(BINARY_NAME) -h

help: ## Show this help message
@echo "Available targets:"
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'


27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@
- **Automatic word enrichment**
- Pre-defined variables
- **Configurable Patterns**
- **Pattern Mining** - Automatically discover subdomain patterns (Go port of [Regulator](https://github.com/cramppet/regulator))
- STDIN / List input
- Multiple operation modes (default, discover, both)

## Installation
To install alterx, you need to have Golang 1.19 installed on your system. You can download Golang from [here](https://go.dev/doc/install). After installing Golang, you can use the following command to install alterx:
Expand All @@ -45,6 +47,31 @@ To install alterx, you need to have Golang 1.19 installed on your system. You ca
go install github.com/projectdiscovery/alterx/cmd/alterx@latest
```

### Building from Source
```bash
# Clone the repository
git clone https://github.com/projectdiscovery/alterx.git
cd alterx

# Build using Makefile
make build

# Or build manually
go build ./cmd/alterx
```

Available Makefile targets:
```bash
make help # Show all available targets
make build # Build the binary
make test # Run tests
make test-coverage # Run tests with coverage
make lint # Run linter
make fmt # Format code
make clean # Clean build artifacts
make install # Install to $GOPATH/bin
```

## Help Menu
You can use the following command to see the available flags and options:

Expand Down
Loading
Loading