Official implementation and interactive visualizations of the Position Pure (PP) linear-time ranking/unranking algorithm.
The Position-Pure Algorithm and Circle-Permutation-Algorithm represent a paradigm shift in combinatorial generation by optimizing the fundamental cost of state transitions.
- Heap's Algorithm (The Classical Baseline): Long considered the gold standard due to its swap-based approach. While swaps were once thought to be the minimum-cost operation, they involve multiple memory accesses and conditional logic that limit peak performance.
- Ives' Iterative Algorithm: Improved upon Heap's by utilizing a single-shift and single-assignment pattern, reducing the overhead of element movement.
-
Position-Pure Algorithm (Our Work): Achieves a new frontier by eliminating conditional branching and implementing a highly refined shift-and-assign strategy.
-
Scalability: Optimized for large
$N$ . - Concurrency: Stateless design allows for massive parallel execution, a feat traditional recursive algorithms cannot achieve.
-
Scalability: Optimized for large
- Circle-Permutation-Algorithm : Specifically engineered for circular symmetry. While it introduces controlled spatial overhead, it shatters traditional performance barriers by realizing an O((n-1)!) complexity framework.
- Significant Speedup:
permPure_fullconsistently outperforms Heap's Algorithm by a factor of approximately 7x. - Algorithmic Efficiency: The performance gap highlights the superior memory access patterns and lower computational overhead inherent in the PositionPure algorithm.
- Scalability: As the permutation space grows factorially, the performance gap remains stable, demonstrating excellent algorithmic efficiency for large-scale generation.
The PositionPure algorithm is a high-performance, iterative approach to generating all permutations of a set. By utilizing an iterative state machine rather than traditional recursion, it significantly reduces function call overhead and optimizes CPU branch prediction efficiency.
The algorithm utilizes an iterative state machine to eliminate recursion overhead and leverages hardware affinity binding via the Windows API to minimize context-switching noise, with performance verified to nanosecond-level accuracy through high-precision benchmarking.
Last Run: Fri Feb 06 07:13:05 2026 UTC / Fri Feb 06 15:13:05 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)
| N | Heap (s) | PP (s) | Speedup |
|---|---|---|---|
| 9 | 0.005815 | 0.000984 | 5.90x |
| 10 | 0.058239 | 0.007166 | 8.12x |
| 11 | 0.645273 | 0.074127 | 8.70x |
| 12 | 7.837899 | 0.837278 | 9.36x |
Last Run: Fri Feb 06 07:20:50 2026 UTC / Fri Feb 06 15:20:50 2026 (UTC+8) Environment: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz (GitHub Actions Runner)
| N | Heap (s) | PP (s) | Speedup |
|---|---|---|---|
| 9 | 0.006039 | 0.000871 | 6.93x |
| 10 | 0.059160 | 0.007174 | 8.24x |
| 11 | 0.657323 | 0.075407 | 8.71x |
| 12 | 7.809986 | 1.004821 | 7.77x |
| 13 | 103.446021 | 11.693761 | 8.84x |
| 14 | 1468.660328 | 162.986940 | 9.01x |
At the request of Reddit users and other community members, a performance comparison with Pythonβs built-in itertools was conducted. Results are based on the implementation in Position-Pure-Algorithm/python/pp_iter.py.
- Runtime Environment: The tests were performed using PyPy3. Since the standard
itertoolslibrary is C-optimized, using PyPy3 helps bridge the low-level language gap, allowing for a more authentic comparison of algorithmic efficiency. - Results: Benchmark tests show that this algorithm improves performance by at least 1.4x, with potential gains reaching over 2x compared to the standard library.
- Future Plans: For applications requiring even higher performance, a C-compiled version should be considered.
Last Run: Fri Feb 06 09:07:10 2026 UTC / Fri Feb 06 17:07:10 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)
| N | Total Permutations | Itertools (s) | Position Pure (s) | Speed-up |
|---|---|---|---|---|
| 10 | 3,628,800 | 0.1073s | 0.0667s | 1.61x |
| 11 | 39,916,800 | 1.2113s | 0.5685s | 2.13x |
| 12 | 479,001,600 | 15.0139s | 6.6579s | 2.26x |
| 13 | 6,227,020,800 | 203.8653s | 84.7977s | 2.40x |
| 14 | 87,178,291,200 | 2916.0625s | 1163.6335s | 2.51x |
Last Run: Fri Feb 06 05:40:44 2026 UTC / Fri Feb 06 13:40:44 2026 (UTC+8) Environment: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz (GitHub Actions Runner)
| N | Total Permutations | Itertools (s) | Position Pure (s) | Speed-up |
|---|---|---|---|---|
| 10 | 3,628,800 | 0.1513s | 0.1334s | 1.13x |
| 11 | 39,916,800 | 2.0003s | 1.0969s | 1.82x |
Last Run: Sat Feb 07 11:41:19 2026 UTC / Sat Feb 07 19:41:19 2026 (UTC+8) Environment: AMD EPYC 7763 64-Core Processor (GitHub Actions Runner)
| N | Dist | MR (ns/it) | PP (ns/it) | Speedup |
|---|---|---|---|---|
| 1000 | Random | 746.4 | 629.9 | 1.18x |
| 1000 | Sorted | 677.2 | 628.0 | 1.08x |
| 1000 | Reverse | 740.8 | 628.0 | 1.18x |
| 100000 | Random | 114304.8 | 91588.8 | 1.25x |
| 100000 | Sorted | 67190.6 | 62576.1 | 1.07x |
| 100000 | Reverse | 79229.8 | 67114.2 | 1.18x |
| 1000000 | Random | 1510619.8 | 1208586.6 | 1.25x |
| 1000000 | Sorted | 711168.4 | 625344.8 | 1.14x |
| 1000000 | Reverse | 810659.0 | 663776.2 | 1.22x |
The core C++ implementations of the algorithms can be found here:
- map_perm_algorithms.cpp: Includes
Position_unrank,Position_rank,PositionPure_unrank, andPositionPure_rank. - permPure_full.cpp: Serves as a high-performance reference implementation for generating all permutations of a set using the PositionPure iterative algorithm.
The following commands compile both algorithms using the same aggressive optimization flags to ensure a fair performance comparison:
# Compile options
g++ -O3 -std=c++11 -march=native -ffast-math -fomit-frame-pointerIf you use this algorithm or implementation in your research, please cite it as follows:
HU, Y. (2026). Position Method: A Linear-Time Generation Algorithm for Permutations (Version v1.0.0). GitHub Repository. DOI: 10.5281/zenodo.18170157
@software{hu_yusheng_2026_18170157,
author = {Hu, Yusheng},
title = {Position Method: A Linear-Time Generation Algorithm for Permutations},
year = 2026,
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.18170157},
url = {[https://doi.org/10.5281/zenodo.18170157](https://doi.org/10.5281/zenodo.18170157)}
}
Since we can determine any value at position k in O(N) time without generating the full array, we can implement a Lazy-Loaded Permutation Object.
- Use Case: Accessing elements of a massive permutation (N > 10^6) that exceeds RAM capacity.
- Concept: Overload
operator[]to callPP_get_value_at_position(k)on the fly.
The backward-tracing logic if (C[i] == current_target_pos) is highly vectorizable.
- Idea: Use AVX-512 to scan 16 elements of
Csimultaneously. - Goal: Achieve near O(1) perceived latency for individual element lookups.
Leverage the independence of the get_value function to split a single permutation validation task across multiple CPU cores or network nodes without shared state.
The Position Pure (PP) algorithm provides a more intuitive
- Simpler Indexing: Replaces the traditional swap-based unranking with a direct mapping logic.
- Educational Value: The provided visualizations make the complex mapping process easy to understand.
- Preprint: need long time.
-
Status:
$O(n)$ complexity achieved, improving upon the classic Myrvold-Ruskey logic.
Please note: Starting from version v1.1.1, this algorithm has been officially renamed from Position Pro to Position Pure (PP).
- Trademark Consideration: We found that "Position Pro" is a registered trademark in several industrial and commercial sectors. To ensure the algorithm's academic independence and avoid legal confusion, we have transitioned to a unique name.
- Technical Essence (Why "Pure"?): The name "Pure" more accurately describes the algorithm's technical implementation. In computer science, "pure" often implies a clean, branchless execution flow.
- Core Feature - Branchless Logic: Unlike the base version, Position Pure (PP) achieves its linear-time performance through direct element-wise shifting and replacement. It eliminates conditional branching (no
ifstatements in the core loop), leading to a "purer" and faster execution path on modern CPUs.