Large language models fail in characteristic ways: they loop on the same phrase (attention locking), hallucinate (semantic drift), or lose track of context mid-sentence (structural fragmentation). Recync is a control framework that detects these failure modes in real time via internal state monitoring and corrects them through principled intervention.
The framework operates at two levels:
- Token-level intervention (Recync v3): Per-step hidden-state monitoring with adaptive sampling control. Achieves robust detection (Cohen's d > 1.3) but modest intervention effects (d = +0.211) due to structural limits in discrete-step control.
- Response-level intervention (Recync v4): Checkpoint restart at pre-crisis points with seed perturbation. Achieves medium-to-large intervention effects (d = +0.494 to +1.020, all p < 0.0001) with zero iatrogenic harm across 5 model architectures (117M--1.5B parameters).
| Model | Params | PIR Cohen's d | 95% CI | Iatrogenic | Crisis-Free |
|---|---|---|---|---|---|
| GPT-2 Small | 117M | +0.494 | [+0.354, +0.687] | 0/137 (0%) | 43.1% |
| Pythia-160M | 160M | +0.958 | [+0.762, +1.185] | 0/177 (0%) | 21.5% |
| GPT-2 Medium | 355M | +0.796 | [+0.591, +1.028] | 0/44 (0%) | 59.1% |
| Qwen2-1.5B | 1,544M | +1.020 | [+0.866, +1.193] | 0/183 (0%) | 14.2% |
| TinyLlama-1.1B | 1,100M | +1.40 | -- | 0% | -- |
- Zero-tuning transfer: Identical detection/intervention parameters across all 5 models -- no per-model calibration
- Zero iatrogenic events: The protocol never makes things worse (0% across all experiments)
- Scale reversal: GPT-2 Medium -- most resistant to token-level control -- becomes a strong responder under response-level restart (d = +0.796)
- Billion-scale validation: Largest effect (d = +1.020) on Qwen2-1.5B (GQA, RoPE, SwiGLU architecture)
- Length-invariant: Effect sizes maintained at T=300; multi-restart (2R) improves crisis-free rate by +10.6pp
- Detection: Cohen's d = 1.83 (fragmentation), d = 1.36 (hallucination)
- Intervention: d = +0.211 (p = 0.037) on GPT-2 Small after extensive optimization
- Structural limits identified: harm threshold, attractor switching, model-specific calibration
This repository contains two companion papers that tell a single story: how far can we push real-time coherence control in Transformers?
Paper 1 -- Token-Level Intervention (paper/token_level/)
K. Sato, "From Monitoring to Intervention: Control-Theoretic Coherence Management in Transformers and the Limits of Discrete Safety Enforcement," 2026.
Recync v3 introduces the theoretical foundations: a 3D order-parameter space Z(t) = [lambda, lambda_sem, z] extracted from Transformer internals via a non-invasive Phi-mapping. The dynamics are governed by a Ginzburg-Landau potential with provable stability, and safety is enforced through stochastic Control Barrier Functions solved via quadratic programming. Over 69 experiments across six phases, the paper systematically discovers that monitoring is solved (d > 1.3 across failure modes and architectures) but token-level intervention hits structural walls: a harm threshold in intervention frequency, semantic attractor switches during hidden-state steering, and model-specific calibration that breaks across architectures. The final configuration -- severity-adaptive temperature with a recovery gate -- achieves the first significant positive result (d = +0.211, p = 0.037), but the gap between detection and intervention motivates a fundamentally different approach.
Paper 2 -- Response-Level Intervention (paper/response_level/)
K. Sato, "Beyond Micro-Control: Response-Level Checkpoint Restart for Safe Coherence Recovery in Transformers," 2026.
Recync v4 resolves the detection-intervention asymmetry by moving from token to response granularity. When a crisis is detected, instead of nudging individual tokens, the protocol rewinds to 3 tokens before crisis onset and regenerates with a different random seed. This simple strategy exploits the model's own representational capacity to find alternative coherent trajectories. The result is a complete scale reversal: GPT-2 Medium, which was the most resistant model under token-level control, becomes a strong responder (d = +0.796, crisis-free rate 59.1%). Effect sizes increase with model size rather than decreasing, zero-tuning transfer holds across all tested architectures (117M to 1.5B), and zero iatrogenic harm is maintained across all 2,070 experimental units. The paper explicitly tests and rejects adaptive complexity in favor of a single fixed protocol.
Cosine similarity between consecutive last-layer hidden states (cosim) serves as a real-time coherence proxy. A crisis is detected when k=3 consecutive cosim values fall below a relative threshold (mean - 0.6*std over a 20-step baseline window). This signal transfers across all tested architectures (GPT-2, GPTNeoX, Qwen2, Llama) without modification.
Per-step intervention during generation: when a crisis is detected, the framework adjusts sampling parameters (temperature, top-p) based on crisis severity, with a recovery gate that skips intervention when the model is self-recovering. 69 experiments systematically characterize the structural limits of this approach -- including harm thresholds in intervention frequency and semantic attractor switches -- culminating in a modest but significant effect (d = +0.211, p = 0.037).
Generate --> Monitor cosim --> Detect crisis --> Rewind to onset-3 --> Regenerate with new seed
When a crisis is detected, the protocol rewinds to 3 tokens before crisis onset and regenerates with a different random seed (seed + 10000) at natural temperature (T=1.0, top_p=0.9). Rather than fighting the model's dynamics at token granularity, this exploits the model's own representational capacity to find alternative coherent trajectories. The same detection parameters and restart rule work identically across all tested models (117M--1.5B).
Recync/
├── experiments/
│ ├── gpt2_integration/ # Token-level experiments (Exp 01--69)
│ │ ├── README.md # Experiment guide
│ │ ├── PAPER_RESULTS.md # Full results report
│ │ └── *.py # 72 experiment scripts + result files
│ └── response_level/ # Response-level experiments (Exp 01--14)
│ ├── README.md # Experiment guide
│ ├── PAPER_RESULTS.md # Full results report
│ └── *.py # 15 experiment scripts + result files
├── paper/
│ ├── token_level/ # Paper 1: token-level intervention
│ │ ├── from_monitoring_to_intervention.tex/.md/.pdf
│ │ └── figures/
│ └── response_level/ # Paper 2: response-level intervention
│ ├── beyond_micro_control.tex/.md/.pdf
│ └── figures/
├── pyproject.toml
└── requirements.txt
git clone https://github.com/metaSATOKEN/Recync_framework.git
cd Recync_framework
# Install dependencies
pip install torch transformers scipy numpy
# Run a response-level experiment
python experiments/response_level/03_restart_diff_replication.py
# Results saved as timestamped JSON
ls experiments/response_level/*.json| Item | Spec |
|---|---|
| Machine | MacBook Pro (Apple Silicon, arm64) |
| RAM | 16 GB |
| GPU | Apple MPS (Metal Performance Shaders) |
| Python | 3.13.5 |
| PyTorch | 2.10.0 |
| Transformers | 5.0.0 |
@article{sato2026token,
title = {From Monitoring to Intervention: Control-Theoretic Coherence
Management in Transformers and the Limits of Discrete Safety
Enforcement},
author = {Sato, Kentaro},
year = {2026},
doi = {10.5281/zenodo.19148449},
url = {https://doi.org/10.5281/zenodo.19148449}
}@article{sato2026response,
title = {Beyond Micro-Control: Response-Level Checkpoint Restart
for Safe Coherence Recovery in Transformers},
author = {Sato, Kentaro},
year = {2026},
doi = {10.5281/zenodo.19148721},
url = {https://doi.org/10.5281/zenodo.19148721}
}- Code: Apache License 2.0 -- see LICENSE
- Papers: Creative Commons Attribution 4.0 (CC BY 4.0) -- see paper/LICENSE