[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450) by Saswatsusmoy · Pull Request #456 · SciML/ReservoirComputing.jl

Saswatsusmoy · 2026-06-16T20:46:18Z

Warning

Draft / discussion only. No real implementation yet — the only commit is a docstring stub that errors at call time. Stacked on top of #450; I'll rebase onto master once that merges.

Hey @MartinuzziFrancesco — PR2 (#450) is sitting in good shape ("MERGEABLE / CLEAN", all 17 checks green) and per the roadmap I'd start PR3 (CTESN model + Mackey-Glass / Lorenz validation + docs) next. Before I write code I wanted to surface a handful of design questions, because the literature on CTESN diverges from the ESN-community defaults in a few non-obvious places. Easier to align upfront than redo the implementation later.

The model

Anantharaman2021 — "Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks" — defines the reservoir as

$$\dot{\mathbf{r}}(t) = \tanh\!\left(A\,\mathbf{r}(t) + W_{\text{hyb}}\,\mathbf{x}(t)\right)$$

with f = tanh, g = id hardcoded, A sparse random, W_hyb random dense. No leak term, no bias. Their training procedure is a parametric surrogate: simulate the full model at n Sobol-sampled parameters, fit one W_out per sample by SVD, then RBF-interpolate W_out(p) across the parameter space (uses Surrogates.jl, mentioned explicitly).

This is genuinely different from the "leaky-integrator continuous ESN" that one naturally expects as a continuous limit of Lukosevicius2012 — ṙ = -α r + tanh(W_in u + W_r r + b). Several of my questions below stem from this gap.

I read the canonical paper's PDF and skimmed Bhatnagar 2024 (LPCTESN / NLPCTESN follow-up); ground-truth sources noted where they matter below.

Open questions

Q1 — Wrapping pattern: mirror ESN, or expose CTESN as the reservoir layer directly?

ESN is @concrete struct ESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)} with the cell sitting in :reservoir. Two options for CTESN:

(a) Mirror ESN. CTESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)}, with the :reservoir field holding a concrete <: AbstractSciMLProblemReservoir that PR2's _collectstates(::AbstractSciMLProblemReservoir, …) dispatches on. User-facing call site is CTESN(in_dims, res_dims, out_dims; …).
(b) CTESN is the reservoir layer. CTESN <: AbstractSciMLProblemReservoir, and the user does ReservoirComputer(CTESN(...), modifiers, readout) themselves (the way SciMLProblemReservoir is used in the PR2 tutorial).

(a) is more discoverable but adds a thin wrapper; (b) is closer to what PR2 already does. Your "built-in models get their own types" steer (issuecomment-4604259768) reads to me as (a), but I want to confirm.

Q2 — Default ODE form: paper-canonical or leaky-integrator?

The paper-canonical form has no α and no bias. The leaky-integrator form is what most ESN users would expect from a "continuous ESN" type. My proposal:

Default to paper-canonical (leak_rate = 0.0) for faithfulness to Anantharaman et al.
Expose leak_rate::Real kwarg so users can opt into the leaky form trivially.

OK with you, or should the leaky form be the default?

Q3 — Sampler: does `TerminalStateSampling` cover forecasting?

For the forecasting flavour (single trajectory, fit W_out by Y · R⁺) we need one reservoir-state column per output column. Reading PR2's _collectstates carefully, the window-end saveat grid already produces exactly that (n_samples = size(data, 2) columns), so TerminalStateSampling looks sufficient — I don't think a DenseTrajectorySampling sampler is needed in PR3. Want to sanity-check that with you before assuming.

(The parametric-surrogate flavour from the paper would want dense trajectories. I'd defer that to a future PR — see Q6.)

Q4 — File placement: `src/models/ctesn.jl` or `src/extensions/ctesn.jl`?

The draft commit puts the stub under src/extensions/ctesn.jl, mirroring RECA (the only existing extension-required model). But CTESN is conceptually a sibling of ESN/EuSN/ES2N which all live in src/models/. Happy to move it — what's your preference?

Q5 — Validation benchmarks

The paper validates CTESN on stiff systems (Robertson, HVAC, POLLU). My proposal target for PR3 is Mackey-Glass (τ = 17, Jaeger2001) and Lorenz '63 (NRMSE < 0.1 at N = 300), which are the ESN-community standards and have rich published baselines for sanity-checking (Pathak2017 is the Lorenz-N=300 reference). Neither benchmark has published CTESN numbers — we'd be contributing the first.

Two paths:

(i) Land Mackey-Glass + Lorenz only (matches what's in the proposal, ships faster).
(ii) Add a stiff benchmark too (Robertson is the smallest — 3 equations) to demonstrate the paper's stated regime.

I lean (i) for PR3 and adding (ii) as a follow-up, but I'll do (ii) in PR3 if you'd rather have the paper's own validation domain in there from day one.

Q6 — Scope: LPCTESN only, or also NLPCTESN / parametric surrogate?

Bhatnagar 2024 discusses LPCTESN (linear W_out · r) and NLPCTESN (k-NN polynomial-augmented RBF readout, ~3 orders of magnitude smaller reservoirs at the same accuracy on Robertson). The Anantharaman 2021 paper itself does the parametric-surrogate-over-parameter-space training (Surrogates.jl + RBF interpolation of W_out(p)).

PR3 as currently scoped covers forecasting + LPCTESN only. NLPCTESN needs a polynomial-feature readout + k-NN RBF (extra deps). Parametric-surrogate training needs Surrogates.jl. Both feel out-of-scope for one PR — defer to follow-ups?

Q7 — Spectral-norm warning (open Q5 from #397 / proposal)

The continuous-time echo state property needs ‖A‖_2 < α (operator 2-norm), not the discrete-time ρ(A) < α. Existing scale_radius! (src/inits/inits_components.jl:93) uses maximum(abs.(eigvals(M))) — that's the discrete bound. Three options for the doc warning:

(a) Inline note in CTESN docstring only ("treat ρ as empirical hyperparameter; the proven sufficient condition is ‖A‖_2 < α").
(b) Add a paragraph to scale_radius! docstring flagging that the bound it enforces is the discrete analogue.
(c) Add a new helper scale_opnorm! for users who want the strict continuous bound.

(a) + (b) feels minimal and right; (c) adds API surface area I don't want to commit to without your nod.

Q8 — NRMSE: inline or proper helper?

grep -i nrmse src/ returns zero matches and lib/ReservoirComputingBenchmarks/ is still an empty placeholder, so #414 hasn't landed yet. For PR3 I'll inline NRMSE (Lukoševičius 2012 eq 1: sqrt(mean((ŷ-y)²) / var(y)) per output, averaged) in the test + tutorial code. If #414 lands first I'll source it from there instead. Fine?

Q9 — Pacing

Are you OK with me starting PR3 now stacked on add-ode-reservoir-ext, on the assumption that PR2 will land roughly as-is? If you'd rather I wait for PR2 to merge first, I'll park this draft and come back.

What I'll do once you reply

Convert this stub into the real implementation: concrete CTESN type, in-place ODE RHS with pre-allocated buffers (mul! for both matvecs — the RHS gets called O(10²–10⁴) times per solve), test file under test/models/, and a tutorial under docs/src/tutorials/. I'll rebase onto master after PR2 merges.

For reference / my own notes — the parallel research pass turned up three small factual things I'd like to flag for the project proposal too (paper title is "Stiff Nonlinear Systems" not "Chaotic Dynamical Systems", the canonical-paper authors are Anantharaman, Ma, Gowda, Laughman, Shah, Edelman, Rackauckas, and the 56× speedup figure in the proposal doesn't appear in the abstract — happy to ping you separately about that rather than mix it into the PR thread).

Thanks!

Saswatsusmoy · 2026-06-17T14:13:55Z

Thanks @MartinuzziFrancesco — both the §3.2.6 pointer and the "skip the struct, run it raw" suggestion turned out to land in interesting places.

On §3.2.6 (Lukoševičius 2012, "Leaking Rate")

The relevant ODE is eq (5):

$$\dot{\mathbf{x}} = -\mathbf{x} + \tanh\!\left(\mathbf{W}^{\text{in}}[1;\mathbf{u}] + \mathbf{W}\,\mathbf{x}\right)$$

— no α in the continuous form. The leaking rate appears only when you Euler-discretise: take Δx/Δt ≈ ẋ with step Δt = α and you recover the discrete leaky update x(n+1) = (1-α)x(n) + α·tanh(W_in[1;u] + Wx(n)). So α is the discretisation step, not a separate parameter of the ODE. Equivalently you can fold it into eq (5) as a LHS time constant α·ẋ = … if you want to keep Δt ≡ 1. The section also notes this form is preferred over alternatives "because it guarantees that x(n) never goes outside the (-1, 1) interval" — bounded state by construction, no extra contraction argument needed. Practical advice: tune α to the timescale of u/y_target; can be per-neuron for multi-timescale tasks.

This reframes PR3 quite a bit. The Anantharaman CTESN paper uses ṙ = tanh(Ar + W_hyb x) with no decay term and trains a parametric surrogate via RBF interpolation of W_out(p) — a different approach aimed at stiff systems. The "continuous ESN" §3.2.6 points at is just eq (5), which is exactly what the Lorenz eye-test in PR #450's tutorial implements verbatim. So PR3 collapses from "build a CTESN model mirroring the paper" → "thin convenience constructor around SciMLProblemReservoir that pre-bakes eq (5)". Several of the open questions in the original PR body fall away.

Smoke test — eq (5) raw on Lorenz, no CTESN struct

Ran the tutorial pattern scaled to N_res = 300, train = 5000, predict = 1250 samples (dt = 0.02, σ = 10, ρ = 28, β = 8/3). Spectral scaling ‖W_r‖₂ = 0.9 (continuous-ESP-sufficient bound), ‖W_in‖ ≈ 0.1, ‖b‖ ≈ 0.05, MersenneTwister(17), Tsit5 at reltol = 1e-6. Happy to inline the full script as a follow-up comment if useful.

First pass, train NRMSE = 0.0004 (readout fits beautifully) but autoregressive rollout NRMSE was ~1.5 at all horizons. Tracked it down to the autoregressive _predict path in ext/RCODEReservoirExt.jl:

current_state = res.prob.u0   # → zeros(N) every time

The trained reservoir's terminal state never makes it into the predict-time u0. So the rollout starts cold and produces nothing useful regardless of how well the readout was trained. After manually re-running collectstates, grabbing the last column, and rebuilding the reservoir with remake(prob; u0 = terminal_state), the numbers became:

Horizon (Lyapunov times)	Steps	NRMSE
1 t_λ	55	0.157
2 t_λ	110	0.118
3 t_λ	166	0.112
4 t_λ	221	0.210
6 t_λ	331	0.729
8 t_λ	442	0.981

So eq (5) forecasts Lorenz at N=300 to ~3 Lyapunov times under NRMSE ≈ 0.11 before chaotic divergence dominates. Not quite our PR3 target of NRMSE < 0.1, but within striking distance of it before any hyperparameter tuning (spectral radius vs spectral norm, leak/time-constant, washout length, regularisation). I think the < 0.1 target is reachable.

What I'd like your read on:

The warm-up hole. Is this expected for the current PR feat: continuous-time reservoirs via RCODEReservoirExt #450 predict semantics — i.e., users are meant to remake(prob; u0 = …) themselves before calling predict? Or is this something that should be wired in (e.g. predict(rc, steps, ps, st; initialdata, warmup_data = nothing) that internally runs collectstates on warmup_data and seeds u0)? The current behaviour is correct but really easy to footgun on, as I just demonstrated.
PR3 reframe. Given §3.2.6, would you rather PR3 be:
- (a) Drop the "CTESN" name entirely and ship a LeakyContinuousESN(in_dims, res_dims, out_dims; α, …) thin wrapper that pre-bakes eq (5) — semantically a sibling of ESN. The Anantharaman parametric-surrogate flavour would be a future PR with its own type.
- (b) Keep the CTESN name but make it the eq-(5) wrapper, and treat the parametric-surrogate version as a future extension within the same family.
- (c) Something else entirely.
Validation. Should the in-PR test assert specifically against the eq-(5) Lorenz NRMSE (after tuning to push below 0.1 at some agreed horizon — e.g. 4 t_λ), or do you want a tighter set of unit tests + Mackey-Glass + Lorenz in the docs as eye-tests only?

Will hold off on writing real code until you've had a chance to push back on any of this — easier to redo the design now than after I've committed 600 lines.

MartinuzziFrancesco · 2026-06-17T14:26:13Z

Thanks for running the experiments, they are quite interesting and I think necessary even for merging #450. To address you point number 1) I think having the two code snippets side to side would help in understanding if this is something that could be a gotcha for the user, and therefore if we should find a way to deal with it in a elegant manner. As far as 2), I think we can go with ContinuousESN thin wrapper of eq 5). And as far as validation I think we can go with the eye test of attractor reconstruction following the example of the readme. If you want a more qualitative test we can use the difference in fractal dimensions using FractalDimensions's Grassberger-Procaccia algorithm. I would say that this is a bit more solid wrt to vpt, and it should be enough for PR3

Saswatsusmoy · 2026-06-17T19:49:40Z

Thanks @MartinuzziFrancesco. Re-ran both paths from scratch for the side-by-side, and lining up the other two answers below.

Cold vs warmed predict — side-by-side

Both scripts are identical except for the warmup step. Reproducible numbers on Julia 1.12.5 with MersenneTwister(17), N = 300, train = 5000, predict = 1250, dt = 0.02, Tsit5 at reltol = 1e-6, ‖W_r‖₂ = 0.9, ‖W_in‖ ≈ 0.1.

lorenz_cold.jl — predict's u0 is whatever the user gave the ODEProblem (here zeros(N))

import Pkg; Pkg.activate("/tmp/lorenz_env"; io = devnull)
using Random, Statistics, LinearAlgebra, Printf
using ReservoirComputing, SciMLBase, DataInterpolations, OrdinaryDiffEqTsit5

const SEED, N_RES = 17, 300
const TRAIN_LEN, PREDICT_LEN, DT, SHIFT = 5000, 1250, 0.02, 300
const T_LYAPUNOV = 1 / 0.9056

lorenz!(du, u, p, t) = begin
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end
data_prob = ODEProblem(lorenz!, [1.0, 0.0, 0.0],
    (0.0, (SHIFT + TRAIN_LEN + PREDICT_LEN + 100) * DT), [10.0, 28.0, 8 / 3])
data = Array(solve(data_prob, Tsit5(); saveat = DT))
input_data  = data[:, SHIFT:(SHIFT + TRAIN_LEN - 1)]
target_data = data[:, (SHIFT + 1):(SHIFT + TRAIN_LEN)]
test_data   = data[:, (SHIFT + TRAIN_LEN):(SHIFT + TRAIN_LEN + PREDICT_LEN - 1)]

rng = MersenneTwister(SEED); Random.seed!(SEED)
Wr_raw = randn(rng, N_RES, N_RES) ./ sqrt(N_RES)
Wr  = (0.9 / opnorm(Wr_raw)) .* Wr_raw
Win = 0.1 .* randn(rng, N_RES, 3)
b   = 0.05 .* randn(rng, N_RES)

esn_rhs!(dx, x, p, t) = (dx .= .-x .+ tanh.(p.Wr * x .+ p.Win * p.input(t) .+ p.b); nothing)

function build_rc(tspan_len, u0)
    prob = ODEProblem(esn_rhs!, u0, (0.0, Float64(tspan_len)), (Wr = Wr, Win = Win, b = b))
    res = SciMLProblemReservoir(prob, TerminalStateSampling(),
        (0.0, Float64(tspan_len)), Tsit5(); reltol = 1.0e-6, abstol = 1.0e-8)
    return ReservoirComputer(res, LinearReadout(N_RES => 3))
end

rc_train = build_rc(TRAIN_LEN, zeros(N_RES))
ps, st = setup(rng, rc_train)
ps, st = train!(rc_train, input_data, target_data, ps, st)

# COLD: predict's u0 stays at the prob's original u0 (zeros).
rc_pred = build_rc(PREDICT_LEN, zeros(N_RES))
ps_pred, st_pred = setup(rng, rc_pred)
ps_pred = merge(ps_pred, (readout = ps.readout,))
st_pred = merge(st_pred, (readout = st.readout,))
output, _ = predict(rc_pred, PREDICT_LEN, ps_pred, st_pred; initialdata = test_data[:, 1])

nrmse(yhat, y) = mean(sqrt(mean((yhat[i, :] .- y[i, :]) .^ 2)) / std(y[i, :]) for i in 1:size(y, 1))
for lt in (1.0, 2.0, 3.0, 4.0, 6.0, 8.0)
    h = min(round(Int, lt * T_LYAPUNOV / DT), PREDICT_LEN)
    @printf("  horizon %.1f t_λ  (%4d steps): NRMSE = %.4f\n", lt, h, nrmse(output[:, 1:h], test_data[:, 1:h]))
end

lorenz_warm.jl — same script, with the warmup inserted before build_rc(PREDICT_LEN, …)

# … everything above identical to lorenz_cold.jl through `train!` …

# WARMUP: re-run collectstates, grab terminal reservoir state.
trained_states, _ = collectstates(rc_train, input_data, ps, st)
warm_u0 = collect(trained_states[:, end])

# WARMED: predict's u0 = trained reservoir terminal state.
rc_pred = build_rc(PREDICT_LEN, warm_u0)
ps_pred, st_pred = setup(rng, rc_pred)
ps_pred = merge(ps_pred, (readout = ps.readout,))
st_pred = merge(st_pred, (readout = st.readout,))
output, _ = predict(rc_pred, PREDICT_LEN, ps_pred, st_pred; initialdata = test_data[:, 1])

# … NRMSE loop identical to lorenz_cold.jl …

The semantic delta is exactly: collectstates once after train!, take trained_states[:, end], rebuild the predict reservoir with that as u0. Four added lines, one swapped line.

Results (autoregressive rollout NRMSE per Lukoševičius 2012 eq 1, mean over channels of RMSE / std):

Horizon	Cold (`u0 = 0`)	Warmed (`u0 = trained terminal`)
1 t_λ (55 steps)	1.4893	0.1568
2 t_λ (110 steps)	1.4960	0.1183
3 t_λ (166 steps)	1.7431	0.1118
4 t_λ (221 steps)	1.6959	0.2099
6 t_λ (331 steps)	1.6444	0.7294
8 t_λ (442 steps)	1.4708	0.9808

Train NRMSE (teacher-forced one-step on training data) was 0.0004 in both cases — the readout fits cleanly; the cold path's failure is purely the predict-time u0.

Two small caveats on what I ran:

Eq (5) bundles bias as the leading column of W_in[1;u]; my script splits it into a separate +b term — algebraically identical, just different parameter layout.
warm_u0 = trained_states[:, end] is the post-modifier state if any state_modifiers are wired up. None here, so it equals the raw reservoir terminal. If a real warmup API were to expose this, "warm from the raw reservoir terminal" and "warm from the modified terminal" would need to be a deliberate choice.

If you read this as a real footgun, the lightest-touch fix I can see is a warmup_data::Union{Nothing, AbstractMatrix} = nothing kwarg on the SciMLProblemReservoir paths of predict, where a non-nothing value triggers collectstates(rc, warmup_data, ps, st) internally and seeds u0 from the last column. No behaviour change for users who don't pass it. Happy to follow your lead on whether this lands in #450 or in PR3's wrapper.

On the naming

ContinuousESN works for me. I'll rename the stub on add-ctesn → new branch + new file + re-titled PR. The Anantharaman parametric-surrogate flavour can be its own type in its own PR if anyone needs it.

On validation via Grassberger-Procaccia

Will pull in FractalDimensions.jl (v1.9.6 as of writing). Plan is grassberger_proccacia_dim(StateSpaceSet(...)) on both the true Lorenz attractor and the warmed-rollout attractor, then assert the predicted dimension matches reference (Lorenz '63 correlation dimension is reported at ≈ 2.05 in the literature) to some agreed tolerance, alongside the attractor-reconstruction eye-test from the README. I won't quote actual fractal-dim numbers from my run until I've actually computed them — flagging the plan rather than the result.

Francesco's call on SciML#456: drop the CTESN name (which referred to the Anantharaman 2021 parametric-surrogate paper) and ship instead a thin wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator continuous ESN. Stub file moves from `src/extensions/` to `src/models/` to sit alongside ESN as a semantic sibling; docstring rewritten to reflect the new scope (leaky-integrator default, parametric-surrogate CTESN explicitly out of scope).

Skeleton-only placeholder so the draft PR exists as a discussion vehicle. No include, no export, no tests — the body errors with a clear message directing the user to the (not-yet-implemented) extension constructor. Open design questions are tracked in the PR description. Stacked on PR SciML#450; will be rebased onto master once that merges.

Francesco's call on SciML#456: drop the CTESN name (which referred to the Anantharaman 2021 parametric-surrogate paper) and ship instead a thin wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator continuous ESN. Stub file moves from `src/extensions/` to `src/models/` to sit alongside ESN as a semantic sibling; docstring rewritten to reflect the new scope (leaky-integrator default, parametric-surrogate CTESN explicitly out of scope).

Prior commit captured only the file rename — the docstring content update was on disk but not staged. This commit folds in the actual docstring rewrite: references Lukoševičius 2012 §3.2.6 eq (5) as the canonical ODE, explains how the discrete leak rate α maps to the continuous Euler step, and explicitly distinguishes from the Anantharaman 2021 parametric-surrogate CTESN.

`ContinuousESN <: AbstractSciMLProblemReservoir`. Thin wrapper that pre-bakes the Lukoševičius 2012 §3.2.6 eq (5) ODE ẋ(t) = α · (-x(t) + tanh(W_in·u(t) + W_r·x(t) + b)) with `leak_coefficient = α`. Forward-Euler discretisation at Δt = 1 collapses to the discrete leaky ESN exactly. Struct lives in src/models/, real constructor + in-place RHS (mul!, fused tanh broadcast, compile-time bias branch) in `RCODEReservoirExt`. Validates `tspan`, dim positivity, leak positivity, finite endpoints, length-2 tspan, and Inf-on-cast at setup. Test suite covers shape, construction errors, bias toggle, bias under Tsit5, forward determinism, Euler equivalence vs discrete leaky ESN at α ∈ {1.0, 0.3}, both predict modes, modifier composition, and T-kwarg propagation — 43 assertions across 10 testsets. Tutorial added under docs/src/tutorials/continuous_esn.md (Lorenz forecasting walkthrough). Adds Lukoševičius 2012 and Anantharaman 2021 to refs.bib. Drops the stale PR3/PR5 stub comment in PR2's sciml_reservoir.jl now that the first concrete subtype has landed.

MartinuzziFrancesco · 2026-06-25T15:02:51Z

 ```@docs
    AbstractSciMLProblemReservoir
    SciMLProblemReservoir
+    ContinuousESN


this can go on its own page since it's a model

MartinuzziFrancesco · 2026-06-25T15:03:31Z

+```julia
+using ReservoirComputing
+using SciMLBase
+using DataInterpolations
+using OrdinaryDiffEqTsit5
+```


not needed if you ahve the example below

- Move `ContinuousESN` from layers.md to models.md — it is a model (sibling of `ESN`/`EuSN`), not a continuous-time reservoir building block; the `SciMLProblemReservoir` family stays under layers. - Drop the standalone "Loading the extension" block in the tutorial; the `@example continuous-esn-lorenz` block immediately below already carries the `using` lines, so the prose block was redundant.

Saswatsusmoy · 2026-06-25T17:55:28Z

Both done in 7a7a84b5. Moved ContinuousESN into a new "Continuous-time Echo State Networks" section in api/models.md — agreed, it's a model not a layer. Dropped the standalone using block in the tutorial and folded the one-liner about which packages do what into the intro paragraph above the @example block.

MartinuzziFrancesco · 2026-06-29T16:16:37Z

+@concrete struct ContinuousESN <: AbstractSciMLProblemReservoir
+    prob
+    sampler
+    tspan
+    args
+    kwargs
+    in_dims
+    res_dims
+    leak_coefficient
+    use_bias
+    init_reservoir
+    init_input
+    init_bias
+    matrix_type
+end


too much here, the ContinuousESN should just hold the reservoir, state modifires, and readout. the construction is similar to the ESN, we just provide a convenient wrapper for the user bu we use the internal components. so the the esn equations are fed into the SciMLProblemReservoir, the dims into the ESNCell etcetc

Happy to restructure to the 3-field shape — want to make sure I land the composition right before I rewrite. Three things I'd like to confirm:

Where does the cell sit? I see two readings: (a) reservoir::SciMLProblemReservoir with ESNCell used only at construction time to mint prob.p, or (b) introduce a small ContinuousReservoir <: AbstractSciMLProblemReservoir that holds the ESNCell as a field and delegates initialparameters to it (so weights stay in ps.reservoir and the existing _build_solve_params flow keeps working). I lean (b) since (a) bakes the weights into prob.p at construction and they'd no longer live in ps. Which were you picturing?

leak_coefficient — agreed eq (5) doesn't carry α; α is the Euler step. Want me to drop the leak_coefficient kwarg entirely (user controls α implicitly via tspan / n_samples), or keep it as a thin convenience that adjusts tspan under the hood?

ESNCell init signatures — ESNCell's initializers take (rng, dims...) (no eltype), but the current ContinuousESN exposes T and passes (rng, T, dims...). To reuse ESNCell as-is I'd lose the T kwarg and inherit ESNCell's Float32 default. Acceptable, or should I keep T?

I see this raises a good point on adding the ContinuousESNCell as an additional layer. This way we can save those elements in the new cell, which should then also contain the esn equations. something like this maybe

@concrete struct ContinuousESNESNCell <: AbstractEchoStateNetworkCell #maybe change here activation in_dims <: IntegerType out_dims <: IntegerType init_bias init_reservoir init_input #init_feedback::F init_state use_bias <: StaticBool equations end

also make sure to use IntegerType and StaticBool for Int and Bools

Pushed d9b5fda9 with the restructure. ContinuousESN is now 3-field (reservoir, states_modifiers, readout) exactly like ESN; ContinuousESNCell <: AbstractSciMLProblemReservoir so the existing continuous dispatch in RCODEReservoirExt fires on rc.reservoir. initialparameters returns the canonical (input_matrix, reservoir_matrix, [bias]) trio matching ESNCell. Default equations is eq (5) without α — α emerges purely from the Euler step Δt, per your correction.

One field-list deviation worth flagging — the cell carries three more fields than you sketched (tspan, args, kwargs). The reason: since the cell is the dispatch target for _collectstates / _predict, the ext needs cell.tspan to set up the saveat grid and cell.args / cell.kwargs to call solve. If those lived on ContinuousESN instead, the cell would still need a way to reach them at dispatch time, which would mean either threading them through every _collectstates call or duplicating dispatch on the outer model. Adding them to the cell felt like the lowest-friction option, but happy to move them onto ContinuousESN and reach for them via rc.tspan etc. if you'd rather keep the cell strictly ESNCell-shaped — just say the word.

Also renamed u_t → input_t per your other comment; equivalence test now uses Float-exact α ({1.0, 0.5}) so the saveat grid aligns bit-for-bit with the Euler step boundaries (non-FP-exact α introduces a sub-step offset that the Euler interpolant smooths over).

MartinuzziFrancesco · 2026-06-29T16:17:48Z

+# path.
+# ---------------------------------------------------------------------------
+
+function _continuous_esn_rhs!(dx, x, p, t)


try not to use single letter variables

MartinuzziFrancesco · 2026-06-29T16:19:20Z

+# Concrete subtype of `AbstractSciMLProblemReservoir` that pre-bakes the
+# leaky-integrator continuous ESN ODE of Lukoševičius 2012 §3.2.6 eq (5):
+#
+#   ẋ(t) = α · (-x(t) + tanh(W_in·u(t) + W_r·x(t) + b))


Lukoševičius 2012 §3.2.6 eq (5) does not have alpha in it, it comes out as a consequence of discretization of the ode

…3-field ESN shape Address Francesco's review on SciML#456: `ContinuousESN` now shares its struct shape with `ESN` — three fields `(reservoir, states_modifiers, readout)` — and the substance moves into a new `ContinuousESNCell` that mirrors `ESNCell`'s field layout plus an `equations` field carrying the ODE right-hand side. * `src/layers/continuous_esn_cell.jl` (NEW): `ContinuousESNCell <: AbstractSciMLProblemReservoir` so the existing continuous dispatch in `RCODEReservoirExt` fires on `rc.reservoir`. `initialparameters` returns the canonical `(input_matrix, reservoir_matrix, [bias])` trio. Default `_continuous_esn_rhs!` implements Lukoševičius 2012 §3.2.6 eq (5) verbatim — no α in the RHS; α emerges only when the ODE is forward-Euler discretised at step `Δt = α`. Only the `@concrete` inner constructor is exposed; the convenience builder lives in the extension's `ContinuousESN(...)` constructor (avoids a Pair-vs-activation ambiguity with the auto-generated inner ctor that Aqua flagged). * `src/models/continuous_esn.jl`: 13-field flat struct → 3-field container. Stub constructor errors without the extension; real one in `RCODEReservoirExt`. * `ext/RCODEReservoirExt.jl`: drop the bulk constructor + the old α-bearing RHS + the `_typed_zeros_init` helper. Add a slim `ContinuousESN(in_dims, res_dims, out_dims, [activation,] tspan, args...; use_bias, init_*, equations, state_modifiers, ...)` that validates inputs, builds the cell, and wraps it with `state_modifiers` + `LinearReadout`. Add `_collectstates(::ContinuousESNCell, ...)` (builds the `ODEProblem` on the fly from `cell.equations`, sized by `cell.out_dims`, with `ps.reservoir` merged into the solve `p`) and a parallel `_predict(::ContinuousESNCell, ..., steps; initialdata)` for the autoregressive path. Teacher-forced `_predict` inherits the generic `AbstractSciMLProblemReservoir` method unchanged. Rename `u_t` to `input_t` per the same review thread. * Field-list note: Francesco's sketch listed only the ESNCell-like fields. The cell additionally carries `tspan`, `args`, `kwargs` because it IS the dispatch target for the continuous path — without those fields `_collectstates` would have nothing to feed `solve`. Documented as an open question in the PR reply. * `test/test_continuous_esn.jl`: rewritten for the new field shape (`ps.reservoir.input_matrix` / `reservoir_matrix` / `bias`). Euler equivalence test moved from "α in the RHS, dt=1" to "no α in the RHS, dt=α with tspan=(0, T_steps·α)"; α values restricted to `{1.0, 0.5}` so the requested saveat grid aligns bit-for-bit with the Euler step boundaries (non-FP-exact α introduces a sub-step offset that the Euler interpolant smooths over). 9 testsets, 45 assertions, all green at `atol=1e-10` for the equivalence test. * Docs: add `ContinuousESNCell` to the layers API page and rewrite the Lorenz tutorial without the dropped `leak_coefficient` / `T` kwargs.

MartinuzziFrancesco · 2026-07-01T14:22:09Z


 ```@docs
    AdditiveEIESNCell
+    ContinuousESNCell


I would also put this in a different subsection, similar to the models. Put it in a Continuous layers section

MartinuzziFrancesco · 2026-07-01T14:22:30Z

+# Continuous ESN: forecasting Lorenz
+
+[`ContinuousESN`](@ref) is a thin convenience wrapper around a
+[`ContinuousESNCell`](@ref) that pre-bakes the leaky-integrator


~~leaky-integrator~~

MartinuzziFrancesco · 2026-07-01T14:24:51Z

+@article{Anantharaman2021,
+  title = {Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks},
+  url = {https://arxiv.org/abs/2010.04004},
+  author = {Anantharaman, Ranjan and Ma, Yingbo and Gowda, Shashi and Laughman, Christopher and Shah, Viral and Edelman, Alan and Rackauckas, Christopher},
+  year = {2021},
+  journal = {arXiv:2010.04004}
+}


is this necessary now? I think the implementation we have is not the one provided by this paper

MartinuzziFrancesco · 2026-07-01T14:25:32Z

 # `solve` and `remake` come from `SciMLBase`. The user picks the concrete
 # solver type (e.g. `Tsit5()`) and loads its package separately
 # (`OrdinaryDiffEqTsit5`, `OrdinaryDiffEq`, …); dispatch at solve time
 # selects the right method via the type they passed in `res.args[1]`. We
 # deliberately don't list a solver package as a weakdep trigger so users
 # aren't forced to pull the full `OrdinaryDiffEq` meta-package in.


comments not really useful here

MartinuzziFrancesco · 2026-07-01T14:26:31Z

+# ---------------------------------------------------------------------------
+# ContinuousESN — convenience constructor
+#
+# Builds the 3-field `(reservoir, states_modifiers, readout)` model whose
+# `reservoir` is a `ContinuousESNCell`. The cell carries Lukoševičius 2012
+# §3.2.6 eq (5) as its `equations` field. Each of `collectstates`,
+# `predict(rc, data, ...)`, and `predict(rc, steps, ...; initialdata)`
+# dispatches on `rc.reservoir::ContinuousESNCell` below, building the
+# `ODEProblem` lazily so the reservoir weights stay in `ps.reservoir` and
+# can be re-initialised by `setup(rng, rc)` like every other ESN-family
+# model.
+# ---------------------------------------------------------------------------


MartinuzziFrancesco · 2026-07-01T14:29:38Z

+This type is normally constructed by the [`ContinuousESN`](@ref)
+convenience constructor in the `RCODEReservoirExt` extension rather than
+directly. Driving the inner constructor by hand requires placing every
+field in the order above and pre-static-ing `use_bias`.


too low level information for a docstring

MartinuzziFrancesco · 2026-07-01T14:30:44Z

+  - `reservoir_matrix :: (out_dims × out_dims)` — `W_r`
+  - `bias :: (out_dims,)` — present only if `use_bias = True()`
+"""
+@concrete struct ContinuousESNCell <: AbstractSciMLProblemReservoir


we still need a constructor here like for the normal ESNCell

MartinuzziFrancesco · 2026-07-01T14:31:10Z

+                  kwargs...)
+
+Continuous-time leaky-integrator Echo State Network ([Lukosevicius2012](@cite)
+§3.2.6, eq 5). Thin convenience wrapper that composes:


~~§3.2.6, eq 5)~~

MartinuzziFrancesco · 2026-07-01T14:31:41Z

+Structurally identical to [`ESN`](@ref) — three fields
+`(reservoir, states_modifiers, readout)` — so it slots into the same
+training and prediction pipeline. The reservoir field is the
+`ContinuousESNCell` rather than a `StatefulLayer(ESNCell)`, which routes
+`collectstates` / `predict` through the continuous-time
+`RCODEReservoirExt` extension.


Structurally identical to ESN, but the reservoir field is the
ContinuousESNCell rather than a StatefulLayer(ESNCell)

there are too many implementation level details that the user may not be familiar with or not care about

MartinuzziFrancesco · 2026-07-01T14:32:11Z

+No leaking-rate term `α` appears in the ODE. `α` emerges only when the
+ODE is forward-Euler discretised with step `Δt = α`, recovering the
+discrete leaky update of [`ESN`](@ref). To change the effective leak,
+adjust `tspan` so the per-input-window width matches the desired `Δt`.


MartinuzziFrancesco · 2026-07-01T14:39:33Z

I think we are almost there! There are a few points to clarify:

the documentation, especially docstrings, has to be clear to human users. This means that details about the routing or the dispatch or how things work under the hood are not needed. When I use + I am not interested in how the dispatch was made but in what it does for my use case.
Similarly, too many details on sources can be a bit much. It suffices to say which paper the model comes from using the citation.
Comments are not needed, the code usually should be clear enough to roughly understand what is going on. I would remove most of the comments there are

Also, we do need a convenience constructor for the ContinuousESNCell.

Overall it's ok to use llms for coding, but make sure the code is up to standard. Forcing the SciML guide on a prompt usually helps in catching a lot of these issues, as well as having a local .md similar to this one

Saswatsusmoy · 2026-07-01T14:49:31Z

I will have to look into this seriously, I was trying and experimenting documentations and code styling changes through autonomous agents (mostly claude & hermes) will have to fine tune the agents (will force the SciML guide on a project level). For now I will fallback to the prev workflow (without LLM agents) for the PR

…s, and cell constructor Clean up implementation details from docstrings and comments across the ContinuousESN model, cell, tutorial, and extension. Add a Pair-signature convenience constructor for ContinuousESNCell matching the ESNCell pattern. Move ContinuousESNCell to a new Continuous-time Layers subsection in the API docs and drop the unused Anantharaman2021 citation.

Saswatsusmoy · 2026-07-01T16:13:12Z

I kept _continuous_esn_rhs! unexported. It is documented and used as a default kwarg (equations = _continuous_esn_rhs!), so it shows up in signatures and @ref links, but the _ prefix is a Julia convention for internal and no other _-prefixed helper in the codebase is exported. Fine as is, or would you rather export it?

Saswatsusmoy · 2026-07-02T15:44:03Z

Coming back to the warmup discussion. The cold vs warm autoregressive predict gap is still there, and the user currently has to manually collectstates after training and rebuild the reservoir with the terminal state as u0. In the side-by-side scripts the gap was NRMSE ~1.5 (cold) vs ~0.11 (warmed).

Do you want me to wire in automatic warmup as part of this PR, or keep it as-is and defer to a separate PR? Two lightweight approaches on the table:

Add a warmup_data kwarg to predict(rc, steps, ...) for the continuous paths, where a non-nothing value triggers collectstates internally and seeds u0 from the last column. No behaviour change for users who do not pass it.
Keep the current manual workflow as the documented path and just note the cold/warm distinction in the tutorial.

Saswatsusmoy changed the title ~~[WIP/Draft] feat(models): CTESN — PR3 design discussion (stacked on #450)~~ [WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450) Jun 17, 2026

Saswatsusmoy added 3 commits June 23, 2026 02:00

Saswatsusmoy force-pushed the add-ctesn branch from c6c96fe to 6324908 Compare June 22, 2026 20:31

MartinuzziFrancesco reviewed Jun 25, 2026

View reviewed changes

MartinuzziFrancesco requested changes Jun 29, 2026

View reviewed changes

MartinuzziFrancesco requested changes Jul 1, 2026

View reviewed changes

MartinuzziFrancesco marked this pull request as ready for review July 2, 2026 15:02

Uh oh!

Uh oh!

Conversation

Saswatsusmoy commented Jun 16, 2026

The model

Open questions

Q1 — Wrapping pattern: mirror ESN, or expose CTESN as the reservoir layer directly?

Q2 — Default ODE form: paper-canonical or leaky-integrator?

Q3 — Sampler: does TerminalStateSampling cover forecasting?

Q4 — File placement: src/models/ctesn.jl or src/extensions/ctesn.jl?

Q5 — Validation benchmarks

Q6 — Scope: LPCTESN only, or also NLPCTESN / parametric surrogate?

Q7 — Spectral-norm warning (open Q5 from #397 / proposal)

Q8 — NRMSE: inline or proper helper?

Q9 — Pacing

What I'll do once you reply

Uh oh!

Saswatsusmoy commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

On §3.2.6 (Lukoševičius 2012, "Leaking Rate")

Smoke test — eq (5) raw on Lorenz, no CTESN struct

Uh oh!

MartinuzziFrancesco commented Jun 17, 2026

Uh oh!

Saswatsusmoy commented Jun 17, 2026

Cold vs warmed predict — side-by-side

On the naming

On validation via Grassberger-Procaccia

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Saswatsusmoy commented Jun 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MartinuzziFrancesco commented Jul 1, 2026

Uh oh!

Saswatsusmoy commented Jul 1, 2026

Uh oh!

Saswatsusmoy commented Jul 1, 2026

Uh oh!

Saswatsusmoy commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Q3 — Sampler: does `TerminalStateSampling` cover forecasting?

Q4 — File placement: `src/models/ctesn.jl` or `src/extensions/ctesn.jl`?

Saswatsusmoy commented Jun 17, 2026 •

edited

Loading