Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,14 @@ dist

# Dependency directories (remove the comment below to include it)
# vendor/

# Cross-validation testing (not for upstream)
mining/cross_validation_test.go
mining/clustering_cross_validation_test.go
mining/quality_check_cross_validation_test.go
mining/hierarchical_cross_validation_test.go
mining/test_helpers.go
mining/testData/
mining/cross_validation/
/alterx
/cmd/alterx/alterx
31 changes: 31 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.PHONY: all build test lint clean

# Default target
all: build

# Build the alterx binary
build:
@echo "Building alterx..."
@go build -o alterx cmd/alterx/main.go
@echo "Build complete: ./alterx"

# Run tests
test:
@echo "Running tests..."
@go test -v ./...

# Run linters
lint:
@echo "Running linters..."
@if command -v golangci-lint >/dev/null 2>&1; then \
golangci-lint run ./...; \
else \
echo "golangci-lint not found, running go vet..."; \
go vet ./...; \
fi

# Clean build artifacts
clean:
@echo "Cleaning build artifacts..."
@rm -f alterx
@echo "Clean complete"
176 changes: 176 additions & 0 deletions PATTERN_MINING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Pattern Mining Feature

## Overview

Pattern mining allows alterx to automatically discover patterns from a list of input domains, eliminating the need to manually define patterns and payloads.

## Usage

### Basic Discover Mode

```bash
# Discover patterns from a list of domains
alterx -l domains.txt -mode discover

# Limit the output
alterx -l domains.txt -mode discover -limit 100
```

### Advanced Options

All pattern mining options are grouped under the **"Pattern Mining"** flag group:

| Flag | Default | Description |
|------|---------|-------------|
| `-m, -mode` | `default` | Pattern mode: `default`, `discover`, or `both` |
| `-min-distance` | `2` | Minimum levenshtein distance for clustering |
| `-max-distance` | `5` | Maximum levenshtein distance for clustering |
| `-pattern-threshold` | `500` | Pattern threshold for filtering low-quality patterns |
| `-quality-ratio` | `25` | Pattern quality ratio threshold |
| `-ngrams-limit` | `0` | Limit number of n-grams to process (0 = all) |

#### Pattern Modes

The `-mode` flag controls which patterns are used:

- **`default`**: Use user-specified or default patterns only (traditional behavior, used when `-mode` not specified)
- **`discover`**: Use only mined patterns from input domains (no defaults)
- **`both`**: Combine mined patterns with defaults for maximum coverage

### Examples

**1. Basic discover mode (mined patterns only):**
```bash
# Use -mode discover to mine patterns (no default patterns)
alterx -l subdomains.txt -mode discover -limit 50
```

**2. Combine mined patterns with defaults:**
```bash
# Use -mode both to get maximum coverage
alterx -l subdomains.txt -mode both -limit 100
```

**3. Explicitly use only default patterns:**
```bash
# Use -mode default for traditional behavior
alterx -l subdomains.txt -mode default -limit 50
```

**4. Custom mining parameters:**
```bash
alterx -l subdomains.txt -mode discover \
-min-distance 3 \
-max-distance 6 \
-pattern-threshold 500 \
-quality-ratio 80 \
-limit 100
```

**5. Fast mode (limit n-grams):**
```bash
# Process only first 100 n-grams for faster results
alterx -l subdomains.txt -mode discover -ngrams-limit 100
```

**6. Discover and save to file:**
```bash
alterx -l subdomains.txt -mode discover -o permutations.txt
```

## Input Requirements

For optimal pattern discovery:
- **Minimum**: 10 domains (warning shown if fewer)
- **Recommended**: 50+ domains for better pattern diversity
- **Best**: 100+ domains with varied structures

## How It Works

The pattern mining algorithm uses two complementary approaches:

1. **Levenshtein Distance Clustering**: Groups similar subdomains based on edit distance
2. **Hierarchical N-gram Clustering**: Analyzes subdomains at multiple granularity levels

### Example

Given input domains:
```
api-prod.example.com
api-staging.example.com
web-prod.example.com
web-staging.example.com
```

Discovered patterns:
```
api-{{p0}}.{{root}} → payloads: {"p0": ["prod", "staging"]}
web-{{p0}}.{{root}} → payloads: {"p0": ["prod", "staging"]}
{{p0}}.{{root}} → payloads: {"p0": ["api-prod", "api-staging", "web-prod", "web-staging"]}
```

Generated permutations:
```
api-prod.example.com
api-staging.example.com
web-prod.example.com
web-staging.example.com
(and many more combinations...)
```

## Architecture

The implementation uses a clean interface-based design:

- **`PatternProvider`** interface: Common contract for pattern generation strategies
- **`ManualPatternProvider`**: Traditional mode with user-specified patterns
- **`MinedPatternProvider`**: Discover mode with automatic pattern mining
- **Mutator**: Uses patterns/payloads from provider transparently

## Backward Compatibility

Manual mode remains unchanged:
```bash
# Traditional usage still works exactly as before
alterx -l domains.txt -p "{{word}}.{{root}}" -pp 'word=words.txt'
```

## Performance Tuning

### For Large Datasets (1000+ domains)

```bash
# Reduce distance ranges
alterx -l large-list.txt -mode discover -min-distance 2 -max-distance 4

# Limit n-grams for faster processing
alterx -l large-list.txt -mode discover -ngrams-limit 200
```

### For Quality over Speed

```bash
# Process all n-grams with strict thresholds
alterx -l domains.txt -mode discover \
-ngrams-limit 0 \
-pattern-threshold 2000 \
-quality-ratio 150
```

## Testing

Run pattern mining tests:
```bash
# Unit tests
go test -v -run TestMinedPatternProvider

# Integration tests
go test -v -run TestMutatorIntegration_DiscoverMode

# Cross-validation tests (requires Python)
cd mining && go test -v -run TestPatternDifferences
```

## Algorithm Details

See [mining/README.md](mining/README.md) for detailed algorithm documentation and Python reference implementation comparison.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,15 @@ $ alterx -list tesla.txt -enrich -p '{{word}}-{{year}}.{{suffix}}' -pp word=keyw

**For more information, please checkout the release blog** - https://blog.projectdiscovery.io/introducing-alterx-simplifying-active-subdomain-enumeration-with-patterns/

## Pattern Mining

The pattern mining implementation in this project is based on the [regulator](https://github.com/cramppet/regulator) project by [@cramppet](https://github.com/cramppet). Regulator is a subdomain pattern mining tool that uses hierarchical clustering algorithms to automatically discover patterns in subdomain datasets. We've adapted and extended these concepts to provide automatic pattern generation capabilities within AlterX.

### Attribution

The hierarchical ngram-based clustering approach and pattern mining algorithms are inspired by and adapted from the [regulator project](https://github.com/cramppet/regulator). Special thanks to [@cramppet](https://github.com/cramppet) for the excellent work on subdomain pattern analysis.

---

Do also check out the below similar open-source projects that may fit in your workflow:

Expand Down
7 changes: 7 additions & 0 deletions algo.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ func ClusterBomb(payloads *IndexMap, callback func(varMap map[string]interface{}
// step 4) At end of recursion len(Vector) == len(payloads).Cap() - 1
// which translates that Vn = {r0,r1,...,rn} and only rn is missing
// in this case/situation iterate over all possible values of rn i.e payload.GetNth(n)

// Debug: Check if payloads is empty
if payloads.Cap() == 0 {
// No payloads to expand - this will cause pattern to be returned unexpanded
return
}

if len(Vector) == payloads.Cap()-1 {
// end of vector
vectorMap := map[string]interface{}{}
Expand Down
24 changes: 17 additions & 7 deletions cmd/alterx/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,18 @@ func main() {
cliOpts := runner.ParseFlags()

alterOpts := alterx.Options{
Domains: cliOpts.Domains,
Patterns: cliOpts.Patterns,
Payloads: cliOpts.Payloads,
Limit: cliOpts.Limit,
Enrich: cliOpts.Enrich, // enrich payloads
MaxSize: cliOpts.MaxSize,
Domains: cliOpts.Domains,
Patterns: cliOpts.Patterns,
Payloads: cliOpts.Payloads,
Limit: cliOpts.Limit,
Enrich: cliOpts.Enrich, // enrich payloads
MaxSize: cliOpts.MaxSize,
Mode: cliOpts.Mode,
MinLDist: cliOpts.MinLDist,
MaxLDist: cliOpts.MaxLDist,
PatternThreshold: cliOpts.PatternThreshold,
PatternQualityRatio: cliOpts.PatternQualityRatio,
NgramsLimit: cliOpts.NgramsLimit,
}

if cliOpts.PermutationConfig != "" {
Expand All @@ -44,7 +50,11 @@ func main() {
gologger.Fatal().Msgf("failed to open output file %v got %v", cliOpts.Output, err)
}
output = fs
defer fs.Close()
defer func() {
if err := fs.Close(); err != nil {
gologger.Error().Msgf("failed to close output file: %v", err)
}
}()
} else {
output = os.Stdout
}
Expand Down
4 changes: 3 additions & 1 deletion examples/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,7 @@ func main() {
if err != nil {
gologger.Fatal().Msg(err.Error())
}
m.ExecuteWithWriter(os.Stdout)
if err := m.ExecuteWithWriter(os.Stdout); err != nil {
gologger.Fatal().Msgf("failed to execute: %v", err)
}
}
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ module github.com/projectdiscovery/alterx
go 1.23.0

require (
github.com/armon/go-radix v1.0.0
github.com/ka-weihe/fast-levenshtein v0.0.0-20201227151214-4c99ee36a1ba
github.com/projectdiscovery/fasttemplate v0.0.2
github.com/projectdiscovery/goflags v0.1.72
github.com/projectdiscovery/gologger v1.1.45
Expand Down
15 changes: 13 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ github.com/Masterminds/semver/v3 v3.2.1 h1:RN9w6+7QoMeJVGyfmbcgs28Br8cvmnucEXnY0
github.com/Masterminds/semver/v3 v3.2.1/go.mod h1:qvl/7zhW3nngYb5+80sSMF+FG2BjYrf8m9wsX0PNOMQ=
github.com/VividCortex/ewma v1.2.0 h1:f58SaIzcDXrSy3kWaHNvuJgJ3Nmz59Zji6XoJR/q1ow=
github.com/VividCortex/ewma v1.2.0/go.mod h1:nz4BbCtbLyFDeC9SUHbtcT5644juEuWfUAUnGx7j5l4=
github.com/agnivade/levenshtein v1.1.0 h1:n6qGwyHG61v3ABce1rPVZklEYRT8NFpCMrpZdBUbYGM=
github.com/agnivade/levenshtein v1.1.0/go.mod h1:veldBMzWxcCG2ZvUTKD2kJNRdCk5hVbJomOvKkmgYbo=
github.com/akrylysov/pogreb v0.10.1 h1:FqlR8VR7uCbJdfUob916tPM+idpKgeESDXOA1K0DK4w=
github.com/akrylysov/pogreb v0.10.1/go.mod h1:pNs6QmpQ1UlTJKDezuRWmaqkgUE2TuU0YTWyqJZ7+lI=
github.com/alecthomas/assert/v2 v2.7.0 h1:QtqSACNS3tF7oasA8CU6A6sXZSBDqnm7RfpLl9bZqbE=
Expand All @@ -15,6 +17,10 @@ github.com/alecthomas/repr v0.4.0/go.mod h1:Fr0507jx4eOXV7AlPV6AVZLYrLIuIeSOWtW5
github.com/andybalholm/brotli v1.0.1/go.mod h1:loMXtMfwqflxFJPmdbJO0a3KNoPuLBgiu3qAvBg8x/Y=
github.com/andybalholm/brotli v1.0.6 h1:Yf9fFpf49Zrxb9NlQaluyE92/+X7UVHlhMNJN2sxfOI=
github.com/andybalholm/brotli v1.0.6/go.mod h1:fO7iG3H7G2nSZ7m0zPUDn85XEX2GTukHGRSepvi9Eig=
github.com/arbovm/levenshtein v0.0.0-20160628152529-48b4e1c0c4d0 h1:jfIu9sQUG6Ig+0+Ap1h4unLjW6YQJpKZVmUzxsD4E/Q=
github.com/arbovm/levenshtein v0.0.0-20160628152529-48b4e1c0c4d0/go.mod h1:t2tdKJDJF9BV14lnkjHmOQgcvEKgtqs5a1N3LNdJhGE=
github.com/armon/go-radix v1.0.0 h1:F4z6KzEeeQIMeLFa97iZU6vupzoecKdU5TX24SNppXI=
github.com/armon/go-radix v1.0.0/go.mod h1:ufUuZ+zHj4x4TnLV4JWEpy2hxWSpsRywHrMgIH9cCH8=
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 h1:DklsrG3dyBCFEj5IhUbnKptjxatkF07cF2ak3yi77so=
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2/go.mod h1:WaHUgvxTVq04UNunO+XhnAqY/wQc+bxr74GqbsZ/Jqw=
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
Expand Down Expand Up @@ -42,6 +48,9 @@ github.com/cnf/structhash v0.0.0-20201127153200-e1b16c1ebc08/go.mod h1:pCxVEbcm3
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dgryski/trifles v0.0.0-20200323201526-dd97f9abfb48/go.mod h1:if7Fbed8SFyPtHLHbg49SI7NAdJiC5WIA09pe59rfAA=
github.com/dgryski/trifles v0.0.0-20200830180326-aaf60a07f6a3 h1:JibukGTEjdN4VMX7YHmXQsLr/gPURUbetlH4E6KvHSU=
github.com/dgryski/trifles v0.0.0-20200830180326-aaf60a07f6a3/go.mod h1:if7Fbed8SFyPtHLHbg49SI7NAdJiC5WIA09pe59rfAA=
github.com/dlclark/regexp2 v1.11.4 h1:rPYF9/LECdNymJufQKmri9gV604RvvABwgOA8un7yAo=
github.com/dlclark/regexp2 v1.11.4/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
github.com/dsnet/compress v0.0.2-0.20210315054119-f66993602bf5 h1:iFaUwBSo5Svw6L7HYpRu/0lE3e0BaElwnNO1qkNQxBY=
Expand Down Expand Up @@ -87,6 +96,8 @@ github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnr
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
github.com/julienschmidt/httprouter v1.3.0 h1:U0609e9tgbseu3rBINet9P48AI/D3oJs4dN7jwJOQ1U=
github.com/julienschmidt/httprouter v1.3.0/go.mod h1:JR6WtHb+2LUe8TCKY3cZOxFyyO8IZAc4RVcycCCAKdM=
github.com/ka-weihe/fast-levenshtein v0.0.0-20201227151214-4c99ee36a1ba h1:keZ4vJpYOVm6yrjLzZ6QgozbEBaT0GjfH30ihbO67+4=
github.com/ka-weihe/fast-levenshtein v0.0.0-20201227151214-4c99ee36a1ba/go.mod h1:kaXTPU4xitQT0rfT7/i9O9Gm8acSh3DXr0p4y3vKqiE=
github.com/klauspost/compress v1.4.1/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
github.com/klauspost/compress v1.11.4/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.17.4 h1:Ej5ixsIri7BrIjBkRZLTo6ghwrEtHFk7ijlczPW4fZ4=
Expand Down Expand Up @@ -211,8 +222,6 @@ github.com/tklauser/numcpus v0.6.1 h1:ng9scYS7az0Bk4OZLvrNXNSAO2Pxr1XXRAPyjhIx+F
github.com/tklauser/numcpus v0.6.1/go.mod h1:1XfjsgE2zo8GVw7POkMbHENHzVg3GzmoZ9fESEdAacY=
github.com/ulikunitz/xz v0.5.8/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
github.com/ulikunitz/xz v0.5.9/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
github.com/ulikunitz/xz v0.5.14 h1:uv/0Bq533iFdnMHZdRBTOlaNMdb1+ZxXIlHDZHIHcvg=
github.com/ulikunitz/xz v0.5.14/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
github.com/ulikunitz/xz v0.5.15 h1:9DNdB5s+SgV3bQ2ApL10xRc35ck0DuIX/isZvIk+ubY=
github.com/ulikunitz/xz v0.5.15/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
github.com/valyala/bytebufferpool v1.0.0 h1:GqA5TC/0021Y/b9FG4Oi9Mr3q7XYx6KllzawFIhcdPw=
Expand All @@ -231,6 +240,8 @@ github.com/zcalusic/sysinfo v1.0.2 h1:nwTTo2a+WQ0NXwo0BGRojOJvJ/5XKvQih+2RrtWqfx
github.com/zcalusic/sysinfo v1.0.2/go.mod h1:kluzTYflRWo6/tXVMJPdEjShsbPpsFRyy+p1mBQPC30=
go.etcd.io/bbolt v1.3.7 h1:j+zJOnnEjF/kyHlDDgGnVL/AIqIJPq8UoB2GSNfkUfQ=
go.etcd.io/bbolt v1.3.7/go.mod h1:N9Mkw9X8x5fupy0IKsmuqVtoGDyxsaDlbk4Rd05IAQw=
go.uber.org/multierr v1.11.0 h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0=
go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210220033148-5ea612d1eb83/go.mod h1:jdWPYTVW3xRLrWPugEBEK3UY2ZEsg3UU495nc5E+M+I=
golang.org/x/crypto v0.0.0-20211209193657-4570a0811e8b/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
Expand Down
16 changes: 16 additions & 0 deletions internal/runner/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ type Options struct {
Enrich bool
Limit int
MaxSize int
// Mining/Discovery options
Mode string
MinLDist int
MaxLDist int
PatternThreshold int
PatternQualityRatio int
NgramsLimit int
// internal/unexported fields
wordlists goflags.RuntimeMap
}
Expand Down Expand Up @@ -60,6 +67,15 @@ func ParseFlags() *Options {
flagSet.IntVar(&opts.Limit, "limit", 0, "limit the number of results to return (default 0)"),
)

flagSet.CreateGroup("mining", "Pattern Mining",
flagSet.StringVarP(&opts.Mode, "mode", "m", "", "pattern mode: 'default' (user/default patterns), 'discover' (mined only), 'both' (combined)"),
flagSet.IntVar(&opts.MinLDist, "min-distance", 2, "minimum levenshtein distance for clustering"),
flagSet.IntVar(&opts.MaxLDist, "max-distance", 5, "maximum levenshtein distance for clustering"),
flagSet.IntVar(&opts.PatternThreshold, "pattern-threshold", 500, "pattern threshold for filtering low-quality patterns"),
flagSet.IntVar(&opts.PatternQualityRatio, "quality-ratio", 25, "pattern quality ratio threshold"),
flagSet.IntVar(&opts.NgramsLimit, "ngrams-limit", 0, "limit number of n-grams to process (0 = all)"),
)

flagSet.CreateGroup("update", "Update",
flagSet.CallbackVarP(GetUpdateCallback(), "update", "up", "update alterx to latest version"),
flagSet.BoolVarP(&opts.DisableUpdateCheck, "disable-update-check", "duc", false, "disable automatic alterx update check"),
Expand Down
Loading
Loading