-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Problem
When running pattern mining in discover mode on certain inputs with many domains (~900+), the subdomain generation step can hang indefinitely. The hang occurs in GenerateAtFixedLength() when processing patterns that cause exponential recursion in the DFS traversal.
Example Input
Input file with 898 subdomains under .tstaging.tools domain causes hanging around pattern 400-500 during generation phase. Sample domains that contribute to problematic patterns:
mobile-prod-genymotion-64.ue1.mobile.tstaging.tools
grafana.ue1.s11.tstaging.tools
kibana-logging.eck.ue1.cloudhub.tstaging.tools
gateway.ue1.stg1.tstagingsub-97-35-127.gateway.ue1.stg1.tstaging.tools
The issue is that certain discovered patterns, when fed to DankEncoder's GenerateAtFixedLength(), trigger deep recursion that takes an impractical amount of time to complete.
Root Cause
The internal/dank/dank.go library's GenerateAtFixedLength() function:
- Uses pure recursive DFS without any exit conditions
- Has no way to be interrupted or cancelled
- Has no limit on number of results generated
Requested Features
1. Context Support for Early Cancellation
Add context parameter to allow graceful cancellation:
func (d *DankEncoder) GenerateAtFixedLengthWithContext(ctx context.Context, fixedLen int) ([]string, error)This would allow the caller to set timeouts and cancel expensive operations.
2. Max Results Limit
Add parameter to limit maximum results and exit early:
func (d *DankEncoder) GenerateAtFixedLengthWithLimit(fixedLen int, maxResults int) []stringThis would prevent runaway recursion by stopping once a threshold is reached.
Benefits
- Prevents hanging on complex patterns
- Allows reasonable timeouts for pattern generation
- Makes discover mode viable for larger input sets
- Maintains backwards compatibility (keep existing functions, add new variants)
Workaround
Currently using a conservative NumWords() estimate check, but it's not accurate enough to prevent all problematic patterns from being processed.