GitHub - dankswoops/noclvag: modified fork: NOSTR vanity Npub farmer by Alex Gleson

Smelly-noclvag (a NOSTR vanity Npub farmer)

Optimized noclvag for RX 7900 XTX OC.

This could was created by the Gleasonator himself.

I am not claiming any of his work as my own, only using this repo to A/B hardware hack for faster outputs.

Use this software at your own risk, it was already as is software before I played with it.

After upgrading my GPU and 2.5 hours of agentic cooking, I went from 186m key/s to almost 900m keys/s.

Setup

# Choose your name
023456789acdefghjklmnpqrstuvwxyz

# Not availble 
1,b,i,o,

# Open a terminal in the root of the project
cd ~/Documents/Code/noclvag

# Compile if you haven't
make

# Verify both platforms show up
./noclvag-search -l

# Clear the last run if you already ran
rm -f noclvag-secrets.txt noclvag-request.txt noclvag-response.txt

# Generate the search request
./noclvag-tool --step1 npub1danksw00ps
# required "npub1" afterwards is your vanity name.

# Run the search on ROCm at ~898 Mkey/s
./noclvag-search -p 1 -d 0 -w 1024 -2

# Finalize the key once a match is found
./noclvag-tool --step3

Personal conclusion on npub char number

As you can see below, it is a privlage to have a 10 char name. Given moores law, in time this will become more normal. Anything beyond 10 is either a FANNG level company getting into NOSTR for brand reasons or absolute SPOONED rng where you hit 11 and maybe 12 while looking for 10. There will be very few who farm beyond 10 and client devs should not encourage npub vanity beyond 10 chars for reasonable engagement with all users. 10 seems to be the magic number to age into gracefully but not be too exclusive in modern day.

50% probability = 0.693 x 32^N keys. Based on ~900 Mkey/s per RX 7900 XTX.

Time required for the user to hit around 10 days.

Chars	key/s needed	Spec range	Price range
1	instant	Any device	$0
2	instant	Any device	$0
3	instant	Any device	$0
4	~1 /s	Any device	$0
5	~27 /s	Any device	$0
6	~861 /s	Any CPU	$0
7	~28 Kkey/s	Any CPU	$0
8	~882 Kkey/s	Low-end GPU	$200-300
9	~28 Mkey/s	Budget GPU (RX 6600)	$200-350
10	~903 Mkey/s	High-end GPU (RX 7900 XTX)	$800-1500
11	~29 Gkey/s	~32 high-end GPUs	$25K-50K
12	~925 Gkey/s	~1,000 GPUs	Not feasible
13	~30 Tkey/s	~33,000 GPUs	Not feasible
14	~947 Tkey/s	~1,000,000 GPUs	Not feasible

Time required for the user to hit around 1 year.

Chars	key/s needed	Spec range	Price range
1	instant	Any device	$0
2	instant	Any device	$0
3	instant	Any device	$0
4	instant	Any device	$0
5	~1 /s	Any device	$0
6	~24 /s	Any device	$0
7	~755 /s	Any CPU	$0
8	~24 Kkey/s	Multi-core CPU	$0
9	~773 Kkey/s	Low-end GPU	$200-300
10	~25 Mkey/s	Budget GPU (RX 6600)	$200-350
11	~792 Mkey/s	High-end GPU (RX 7900 XTX)	$800-1500
12	~25 Gkey/s	~28 high-end GPUs	$22K-42K
13	~811 Gkey/s	~900 GPUs	Not feasible
14	~26 Tkey/s	~29,000 GPUs	Not feasible

noclvag Optimization Notes

Hardware Profile

GPU: Asus TUF RX 7900 XTX OC 24GB (RDNA 3, navi31, gfx1100, 96 CUs, wave32)
CPU: AMD Ryzen 7 3700X, RAM: 32GB, Mobo: Asus TUF B450M (PCIe 3.0 x16)

Performance (2026-02-11)

Baseline: 595 Mkey/s (rusticl, unmodified code)
Current: 898 Mkey/s (ROCm, optimized code) — +51%
Rusticl fallback: 717 Mkey/s (+20%)

Usage

# Step 1: Generate search request
./noclvag-tool --step1 npub1<pattern>

# Step 2: Search (ROCm, 898 Mkey/s)
./noclvag-search -p 1 -d 0 -w 1024 -2

# Step 3: Finalize key
./noclvag-tool --step3

# Rusticl fallback (no ROCm needed)
./noclvag-search -p 0 -d 0 -2

# List available platforms
./noclvag-search -l

Code Changes (all in oclengine.c)

1. Skip DEEP_VLIW on GCN/RDNA GPUs (line ~428)

VLIW-style 3-pass add/subtract was designed for pre-GCN TeraScale (HD 2000-6000). RDNA 3 is scalar — VLIW codegen is actively harmful. Detection checks device name for navi/gfx9/gfx10/gfx11/Vega/Polaris/RX 5/6/7 strings.

2. Skip VERY_EXPENSIVE_BRANCHES on GCN/RDNA GPUs (same block)

Full unrolling of Montgomery multiplication outer loop causes register spilling on wave32 SIMD, killing occupancy. Same detection logic as above.

3. Default GPU worksize 2048 -> 8192 (line 2442)

Lets auto-tuner pick larger grids before hitting the worksize cap.

4. VRAM utilization 50% -> 75% (line 2453)

memsize /= 2 changed to memsize = memsize * 3 / 4. Uses ~18GB of 24GB.

5. Fixed rekey_max infinite loop bug (lines 1808, 1834-1839)

Root cause: rekey_max was hardcoded to 100M. When grid round size (rows * cols) exceeded it, the condition (npoints + round) < rekey_at was never true, creating an infinite CPU-only rekey loop with 0% GPU work. ROCm auto-selected 12288x8192 = 100.7M which crossed this threshold. Fix: Base increased to 1B, plus safety: if (rekey_max < round * 2) rekey_max = round * 2.

Tuning Rules Learned

ROCm vs Rusticl

ROCm is 25% faster than rusticl for this kernel on RDNA 3
ROCm prefers small worksize (-w 1024 = 898), rusticl prefers large (-w 65536 = 717)
ROCm reports 48 CUs (WGPs), rusticl reports 96 CUs — different grid auto-selection

Grid Size

Auto-selected 6144x4096 is near-optimal for both platforms
Smaller grids (4096x4096) lose ~8% throughput
Larger grids hit rusticl's 2GB CL_DEVICE_MAX_MEM_ALLOC_SIZE limit and crash
Never override -i (iteration count) — defaults are optimal

What Doesn't Help

RUSTICL_FEATURES=fp64 — no measurable improvement
ACCESS_STRIDE changes — current 1024/128 already coalesces for wave32
Manual -i override — auto-selected values beat all manual settings

ROCm Installation

wget https://repo.radeon.com/amdgpu-install/7.1.1/ubuntu/noble/amdgpu-install_7.1.1.70101-1_all.deb
sudo apt install ./amdgpu-install_7.1.1.70101-1_all.deb
sudo apt update
sudo amdgpu-install --usecase=opencl --no-dkms
sudo usermod -a -G render,video $LOGNAME
# Reboot required. Platforms: 0=rusticl, 1=ROCm

Debugging Notes

Progress output uses \r not \n — pipe through tr '\r' '\n' to parse
-v flag spams "GPU idle" lines that hide Mkey/s — omit for clean output
.oclbin files are kernel caches keyed by MD5(platform+device+flags+source) — safe to delete
First run after deleting .oclbin takes minutes for kernel JIT compilation
If GPU hangs after failed CL_INVALID_BUFFER_SIZE, pkill -9 noclvag-search and retry

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.clang-format		.clang-format
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
avl.h		avl.h
calc_addrs.cl		calc_addrs.cl
globals.h		globals.h
noclvag-search.c		noclvag-search.c
noclvag-tool.c		noclvag-tool.c
oclengine.c		oclengine.c
oclengine.h		oclengine.h
pattern.c		pattern.c
pattern.h		pattern.h
util-openssl.c		util-openssl.c
util-openssl.h		util-openssl.h
util.c		util.c
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smelly-noclvag (a NOSTR vanity Npub farmer)

Setup

Personal conclusion on npub char number

Time required for the user to hit around 10 days.

Time required for the user to hit around 1 year.

noclvag Optimization Notes

Hardware Profile

Performance (2026-02-11)

Usage

Code Changes (all in oclengine.c)

1. Skip DEEP_VLIW on GCN/RDNA GPUs (line ~428)

2. Skip VERY_EXPENSIVE_BRANCHES on GCN/RDNA GPUs (same block)

3. Default GPU worksize 2048 -> 8192 (line 2442)

4. VRAM utilization 50% -> 75% (line 2453)

5. Fixed rekey_max infinite loop bug (lines 1808, 1834-1839)

Tuning Rules Learned

ROCm vs Rusticl

Grid Size

What Doesn't Help

ROCm Installation

Debugging Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smelly-noclvag (a NOSTR vanity Npub farmer)

Setup

Personal conclusion on npub char number

Time required for the user to hit around 10 days.

Time required for the user to hit around 1 year.

noclvag Optimization Notes

Hardware Profile

Performance (2026-02-11)

Usage

Code Changes (all in oclengine.c)

1. Skip DEEP_VLIW on GCN/RDNA GPUs (line ~428)

2. Skip VERY_EXPENSIVE_BRANCHES on GCN/RDNA GPUs (same block)

3. Default GPU worksize 2048 -> 8192 (line 2442)

4. VRAM utilization 50% -> 75% (line 2453)

5. Fixed rekey_max infinite loop bug (lines 1808, 1834-1839)

Tuning Rules Learned

ROCm vs Rusticl

Grid Size

What Doesn't Help

ROCm Installation

Debugging Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages