NTK-Mirror

Hassana Labs — Leon Chlon (lc574@cantab.ac.uk)

LoRA-free forward-pass fine-tuning for Hugging Face causal language models.

ntkmirror learns a small signed controller on top of a frozen Transformer. It adds no LoRA modules and makes no permanent weight edits. The controller is a sparse set of shared log-gates on decoder-layer output channels:

h'_{layer, token, channel} = exp(s_{layer, channel}) h_{layer, token, channel}

The gates are learned from teacher-forced examples and then attached to the same Hugging Face model during evaluation or generation.

Install

git clone https://github.com/leochlon/ntkmirror.git
cd ntkmirror
pip install -e .

Minimal use

Create train.jsonl:

{"prompt":"Question: 14 + 27 = ?\nAnswer:","completion":" 41"}
{"prompt":"Question: 36 + 18 = ?\nAnswer:","completion":" 54"}

Fit a controller:

ntkmirror fit \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --train train.jsonl \
  --out controller.pt

Evaluate it:

ntkmirror eval \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --controller controller.pt \
  --eval eval.jsonl

Generate with it:

ntkmirror generate \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --controller controller.pt \
  --prompt "Question: 47 + 36 = ?\nAnswer:"

One-command demo

pip install -e .
bash examples/run_demo.sh

For a smaller run:

GATES=512 STEPS=40 bash examples/run_demo.sh

Python API

from transformers import AutoModelForCausalLM, AutoTokenizer
from ntkmirror import ForwardFineTuner, load_jsonl_examples

model_name = "Qwen/Qwen2.5-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto").cuda()

tuner = ForwardFineTuner(model, tokenizer, gates=5000)
tuner.fit(load_jsonl_examples("train.jsonl"), steps=240)
tuner.save("controller.pt")

print(tuner.generate("Question: 47 + 36 = ?\nAnswer:"))

Data format

Preferred JSONL schema:

{"prompt":"...context...","completion":"...teacher-forced target..."}

Also accepted:

{"instruction":"...","response":"..."}
{"question":"...","answer":"..."}
{"text":"..."}

Important defaults

Option	Default	Meaning
`--gates`	`5000`	number of layer-channel log-gates
`--steps`	`240`	AdamW steps on gate parameters only
`--lr`	`5e-3`	controller learning rate
`--max-log-gate`	`0.05`	bound on each signed log-gate
`--layers`	`all`	decoder layers to score and gate
`--score-batches`	`16`	batches used to select gates

Compose two task controllers

Controllers are saved in signed log-gate coordinates, so composition is simple: add the signed log-gates, clip to a safe budget, and attach the resulting controller. This is the activation-space analogue of adding task directions, except the addition happens in log-mask/mirror coordinates rather than LoRA weight space.

ntkmirror compose \
  --controllers runs/gsm8k_controller.pt runs/mbpp_controller.pt \
  --out runs/gsm8k_plus_mbpp.pt \
  --report runs/composition_report.json

ntkmirror inspect \
  --controllers runs/gsm8k_controller.pt runs/mbpp_controller.pt runs/gsm8k_plus_mbpp.pt

A disjoint-task runner is included:

pip install -e '.[datasets]'
bash scripts/run_disjoint_composition.sh

It builds GSM8K and MBPP JSONL subsets, fits one controller per task, composes them, and evaluates base / task-A / task-B / composed controllers on both eval sets. See docs/composability.md.

Persistent controller memory

A memory item can be stored as a controller: one controller per conversation, document, user preference, task style, or procedure. At inference time, ntkmirror retrieves relevant items, composes their signed log-gates, and attaches the composed controller before generation. This injects retrieved context through the forward pass without appending the memory text to the prompt.

Fit-and-store a memory controller:

ntkmirror memory add \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --store runs/memory \
  --id arithmetic-carrying \
  --train examples/math_train.jsonl \
  --text "worked addition arithmetic with carrying" \
  --tags math,arithmetic

Or register an existing controller:

ntkmirror memory add \
  --store runs/memory \
  --id arithmetic-carrying \
  --controller runs/arithmetic.pt \
  --text "two-digit addition with carrying: add ones, carry, then tens"

Retrieve, compose, and generate:

ntkmirror memory search \
  --store runs/memory \
  --query "solve an addition problem with carrying"

ntkmirror memory generate \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --store runs/memory \
  --query "addition with carrying" \
  --prompt "Problem: 47 + 36 = ?\nSolution:"

Try the demo:

bash examples/run_memory_demo.sh

The default retriever is a dependency-free lexical TF-IDF scorer. That is intentional for first-run UX: the main bottleneck in controller memory is retrieval quality, not controller storage. For production, replace the retriever with an embedding or hybrid vector-store layer and keep the same compose_states interface. See docs/persistent_memory.md.

Activation-control NTK tools

The fit command trains signed log-gates by support NLL and remains the deployable path. A separate research path adds diagnostics and a field-locked fitting harness for the stricter NTK-dual claim: the local activation-control tangent

B_C(s) = d(P_C z(s)) / ds

should realise the full frozen-model weight-SGD projected-logit field

d_C^theta = -eta J_{theta,C} J_{theta,S}^T g_S .

Bv is an exact autograd JVP, B^T y is an exact VJP, and the CG operator is B M^{-1} B^T + ridge I. Reports include adjoint_error, symmetry_error, range_residual, and the actual forward realized_residual; field_residual is the realised-forward residual, not the same local matvec used inside the solve.

Audit whether a selected gate basis can realise the full-weight field:

ntkmirror dual-diagnose \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --support train.jsonl \
  --calibration eval.jsonl \
  --controller controller.pt \
  --projection topk --top-k 32 \
  --target-step-size 1e-5 \
  --jvp-mode exact \
  --metric activation

Fit pathwise by matching the full-weight NTK field instead of using support-Adam:

ntkmirror fit-dual \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --train train.jsonl \
  --out controller_dual.pt \
  --steps 8 \
  --projection topk --top-k 32 \
  --jvp-mode exact \
  --metric activation

Check whether a finite controller has left the initial gate tangent:

ntkmirror secant \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --controller controller.pt \
  --eval eval.jsonl

The important numbers are range_residual and realized_residual, not raw gate norm. A large secant error only says the initial gate chart is no longer a global linear model; it does not by itself refute pathwise NTK duality. See docs/activation_control_ntk.md for the theory, command details, and failure mode checklist.

A safe diffusion scale-gate runner is also included:

python scripts/diffusion/train_scale_gate_adam_m.py \
  --image-dir images \
  --prompts "a photo of sks dog" \
  --out runs/diffusion_scale_gates.pt \
  --steps 1500

It uses Adam with a step-adaptive activation metric and cosh self-damping, and represents channel pruning with finite q_prune hard-dead masks, separate q/shift caps, and non-finite guards.

What this repo is not

The default UX remains the simple deployable support-Adam package. The diagnostic and field-locked commands expose a research harness for NTK-vector diagnostics and field-locked local updates; they are slower than fit and are not the default first-run path.

Notes for benchmark claims

Always report the base model, controller, and LoRA on the same train/eval manifest. For exact-answer tasks, report exact accuracy and teacher-forced NLL. For system claims, report adaptation time and peak memory. See docs/method.md for failure modes.

Citation

@software{chlon2026ntkmirror,
  author       = {Leon Chlon},
  title        = {{NTK-Mirror: LoRA-free forward-pass fine-tuning via signed log-mask controllers}},
  year         = {2026},
  organization = {Hassana Labs},
  url          = {https://github.com/leochlon/ntkmirror}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
src/ntkmirror		src/ntkmirror
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_persistent_memory_benchmark.sh		run_persistent_memory_benchmark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NTK-Mirror

Install

Minimal use

One-command demo

Python API

Data format

Important defaults

Compose two task controllers

Persistent controller memory

Activation-control NTK tools

What this repo is not

Notes for benchmark claims

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NTK-Mirror

Install

Minimal use

One-command demo

Python API

Data format

Important defaults

Compose two task controllers

Persistent controller memory

Activation-control NTK tools

What this repo is not

Notes for benchmark claims

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages