Trajectory Drift and Execution Validity in Multi-Step LLM Workflows

Deterministic execution-state analysis for multi-step LLM workflows.

This repository accompanies the paper:

Trajectory Drift and Execution Validity in Multi-Step LLM Workflows

The work introduces a deterministic framework for analyzing execution trajectories across multi-step workflows using replayable lexical and structural signals only.

The analysis focuses on:

continuation behavior
trajectory drift
branching execution
convergence behavior
transition stability
execution persistence over time

The framework intentionally avoids:

embeddings
semantic evaluators
judge models
probabilistic scoring
learned continuation policies

The contribution is a deterministic structural analysis framework for multi-step execution behavior.

Core Finding

Multi-step LLM workflows can remain locally coherent while progressively diverging from their originating execution trajectory.

Adjacent execution steps may continue appearing structurally stable even as long-range trajectory persistence weakens across continuation depth.

This creates measurable local-versus-global mismatch regimes where execution appears locally coherent despite weakening baseline persistence over time.

Runtime Relevance

Request-level telemetry alone does not expose whether iterative execution remains structurally aligned over time.

Long-running workflows may continue consuming:

retries
orchestration cycles
tool calls
latency budget
infrastructure resources

while progressively weakening in structural persistence relative to originating trajectory conditions.

The paper frames this as a runtime execution analysis problem rather than purely:

a token-efficiency problem
a reasoning-compression problem
or a semantic evaluation problem

Repository Contents

This repository contains:

final manuscript
publication figures
deterministic replay notes
methodology documentation
representative trajectory examples
scope and boundary documentation

Internal capture infrastructure and unreleased experimental tooling are intentionally excluded from this release.

Repository Structure

Directory	Purpose
`paper/`	Final manuscript (PDF/DOCX)
`plots/`	Publication figures
`docs/`	Methodology, replay, and scope documentation
`examples/`	Representative trajectory examples

Execution Families

The analysis separates four execution families:

Family	Description
Continuation	Sustained refinement preserving trajectory direction
Drift	Progressive divergence from originating trajectory
Branching	Divergent execution paths from a shared origin
Convergence	Independent trajectories converging structurally over time

These families exhibit distinguishable transition behaviors across deterministic replay analysis.

Runtime Diagnostics

The framework derives the following deterministic runtime diagnostics from replayable structural signals:

baseline alignment
local continuity
drift velocity
transition stability
branch divergence
branch convergence
redundancy accumulation

All measurements remain deterministic and replayable.

Experimental Scope

The corpus includes provider-backed replayable traces captured from:

OpenAI models
Anthropic models

The experimental corpora include an infrastructure validation corpus focused on:

deterministic replay
serialization stability
branch isolation
persistence guarantees
capture integrity

The release also includes an extended trajectory corpus introducing:

deeper continuation depth
branch-separated analysis
transition-resolution diagnostics
family-separated execution analysis

No synthetic traces were used in persisted experimental artifacts.

Relationship to X-Ray

This research establishes deterministic execution primitives later usable in continuation-aware runtime systems.

X-Ray extends the deterministic replay and trajectory-analysis primitives explored in this work into continuation-aware execution analysis for multi-step orchestration workflows.

Repository:

https://github.com/veloryn-intel/veloryn-xray

Key Observation

Primary Empirical Observation

local continuity can remain high while baseline alignment progressively weakens.

This means workflows may continue appearing locally coherent even while long-range trajectory persistence deteriorates over time.

The paper argues that:

continued execution is not sufficient evidence of continued trajectory persistence.

Scope Boundaries

This work does not evaluate:

semantic correctness
factual accuracy
hallucination detection
reasoning quality
task success

The framework analyzes structural execution evolution only.

Research Direction

This repository contains the experimental infrastructure and deterministic trajectory-analysis framework developed across two connected research directions:

Efficiency Collapse in Multi-Step LLM Execution: An Empirical Study of Cost, Redundancy, and Phase Dynamics
- execution redundancy
- phase-transition behavior
- continuation inefficiency
Trajectory Drift and Execution Validity in Multi-Step LLM Workflows
- execution-state evolution
- local/global mismatch
- deterministic trajectory analysis
- continuation-state diagnostics

Positioning

X-Ray is a deterministic execution-state analysis framework for multi-step LLM systems.

The current research focuses on:

execution-state evolution
trajectory drift
continuation behavior
branch divergence and convergence
deterministic replayable analysis

The broader direction is the development of continuation-aware execution infrastructure and runtime control primitives for long-horizon AI systems.

Rather than treating continuation solely as a token-efficiency problem, the framework investigates how execution trajectories evolve structurally over time under iterative workflows.

Citation

If you use this work in research, runtime analysis, or execution instrumentation contexts, please cite:

@report{veloryn2026trajectorydrift,
  title={Trajectory Drift and Execution Validity in Multi-Step LLM Workflows},
  author={P., V.},
  year={2026},
  doi={10.5281/zenodo.20290421}
}

DOI: https://doi.org/10.5281/zenodo.20290421

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
examples		examples
paper		paper
plots		plots
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trajectory Drift and Execution Validity in Multi-Step LLM Workflows

Core Finding

Runtime Relevance

Repository Contents

Repository Structure

Execution Families

Runtime Diagnostics

Experimental Scope

Relationship to X-Ray

Key Observation

Primary Empirical Observation

Scope Boundaries

Research Direction

Positioning

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Trajectory Drift and Execution Validity in Multi-Step LLM Workflows

Core Finding

Runtime Relevance

Repository Contents

Repository Structure

Execution Families

Runtime Diagnostics

Experimental Scope

Relationship to X-Ray

Key Observation

Primary Empirical Observation

Scope Boundaries

Research Direction

Positioning

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages