Skip to content

[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450)#456

Open
Saswatsusmoy wants to merge 7 commits into
SciML:masterfrom
Saswatsusmoy:add-ctesn
Open

[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450)#456
Saswatsusmoy wants to merge 7 commits into
SciML:masterfrom
Saswatsusmoy:add-ctesn

Conversation

@Saswatsusmoy

Copy link
Copy Markdown
Contributor

Warning

Draft / discussion only. No real implementation yet — the only commit is a docstring stub that errors at call time. Stacked on top of #450; I'll rebase onto master once that merges.

Hey @MartinuzziFrancesco — PR2 (#450) is sitting in good shape ("MERGEABLE / CLEAN", all 17 checks green) and per the roadmap I'd start PR3 (CTESN model + Mackey-Glass / Lorenz validation + docs) next. Before I write code I wanted to surface a handful of design questions, because the literature on CTESN diverges from the ESN-community defaults in a few non-obvious places. Easier to align upfront than redo the implementation later.

The model

Anantharaman2021"Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks" — defines the reservoir as

$$\dot{\mathbf{r}}(t) = \tanh\!\left(A\,\mathbf{r}(t) + W_{\text{hyb}}\,\mathbf{x}(t)\right)$$

with f = tanh, g = id hardcoded, A sparse random, W_hyb random dense. No leak term, no bias. Their training procedure is a parametric surrogate: simulate the full model at n Sobol-sampled parameters, fit one W_out per sample by SVD, then RBF-interpolate W_out(p) across the parameter space (uses Surrogates.jl, mentioned explicitly).

This is genuinely different from the "leaky-integrator continuous ESN" that one naturally expects as a continuous limit of Lukosevicius2012ṙ = -α r + tanh(W_in u + W_r r + b). Several of my questions below stem from this gap.

I read the canonical paper's PDF and skimmed Bhatnagar 2024 (LPCTESN / NLPCTESN follow-up); ground-truth sources noted where they matter below.

Open questions

Q1 — Wrapping pattern: mirror ESN, or expose CTESN as the reservoir layer directly?

ESN is @concrete struct ESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)} with the cell sitting in :reservoir. Two options for CTESN:

  • (a) Mirror ESN. CTESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)}, with the :reservoir field holding a concrete <: AbstractSciMLProblemReservoir that PR2's _collectstates(::AbstractSciMLProblemReservoir, …) dispatches on. User-facing call site is CTESN(in_dims, res_dims, out_dims; …).
  • (b) CTESN is the reservoir layer. CTESN <: AbstractSciMLProblemReservoir, and the user does ReservoirComputer(CTESN(...), modifiers, readout) themselves (the way SciMLProblemReservoir is used in the PR2 tutorial).

(a) is more discoverable but adds a thin wrapper; (b) is closer to what PR2 already does. Your "built-in models get their own types" steer (issuecomment-4604259768) reads to me as (a), but I want to confirm.

Q2 — Default ODE form: paper-canonical or leaky-integrator?

The paper-canonical form has no α and no bias. The leaky-integrator form is what most ESN users would expect from a "continuous ESN" type. My proposal:

  • Default to paper-canonical (leak_rate = 0.0) for faithfulness to Anantharaman et al.
  • Expose leak_rate::Real kwarg so users can opt into the leaky form trivially.

OK with you, or should the leaky form be the default?

Q3 — Sampler: does TerminalStateSampling cover forecasting?

For the forecasting flavour (single trajectory, fit W_out by Y · R⁺) we need one reservoir-state column per output column. Reading PR2's _collectstates carefully, the window-end saveat grid already produces exactly that (n_samples = size(data, 2) columns), so TerminalStateSampling looks sufficient — I don't think a DenseTrajectorySampling sampler is needed in PR3. Want to sanity-check that with you before assuming.

(The parametric-surrogate flavour from the paper would want dense trajectories. I'd defer that to a future PR — see Q6.)

Q4 — File placement: src/models/ctesn.jl or src/extensions/ctesn.jl?

The draft commit puts the stub under src/extensions/ctesn.jl, mirroring RECA (the only existing extension-required model). But CTESN is conceptually a sibling of ESN/EuSN/ES2N which all live in src/models/. Happy to move it — what's your preference?

Q5 — Validation benchmarks

The paper validates CTESN on stiff systems (Robertson, HVAC, POLLU). My proposal target for PR3 is Mackey-Glass (τ = 17, Jaeger2001) and Lorenz '63 (NRMSE < 0.1 at N = 300), which are the ESN-community standards and have rich published baselines for sanity-checking (Pathak2017 is the Lorenz-N=300 reference). Neither benchmark has published CTESN numbers — we'd be contributing the first.

Two paths:

  • (i) Land Mackey-Glass + Lorenz only (matches what's in the proposal, ships faster).
  • (ii) Add a stiff benchmark too (Robertson is the smallest — 3 equations) to demonstrate the paper's stated regime.

I lean (i) for PR3 and adding (ii) as a follow-up, but I'll do (ii) in PR3 if you'd rather have the paper's own validation domain in there from day one.

Q6 — Scope: LPCTESN only, or also NLPCTESN / parametric surrogate?

Bhatnagar 2024 discusses LPCTESN (linear W_out · r) and NLPCTESN (k-NN polynomial-augmented RBF readout, ~3 orders of magnitude smaller reservoirs at the same accuracy on Robertson). The Anantharaman 2021 paper itself does the parametric-surrogate-over-parameter-space training (Surrogates.jl + RBF interpolation of W_out(p)).

PR3 as currently scoped covers forecasting + LPCTESN only. NLPCTESN needs a polynomial-feature readout + k-NN RBF (extra deps). Parametric-surrogate training needs Surrogates.jl. Both feel out-of-scope for one PR — defer to follow-ups?

Q7 — Spectral-norm warning (open Q5 from #397 / proposal)

The continuous-time echo state property needs ‖A‖_2 < α (operator 2-norm), not the discrete-time ρ(A) < α. Existing scale_radius! (src/inits/inits_components.jl:93) uses maximum(abs.(eigvals(M))) — that's the discrete bound. Three options for the doc warning:

  • (a) Inline note in CTESN docstring only ("treat ρ as empirical hyperparameter; the proven sufficient condition is ‖A‖_2 < α").
  • (b) Add a paragraph to scale_radius! docstring flagging that the bound it enforces is the discrete analogue.
  • (c) Add a new helper scale_opnorm! for users who want the strict continuous bound.

(a) + (b) feels minimal and right; (c) adds API surface area I don't want to commit to without your nod.

Q8 — NRMSE: inline or proper helper?

grep -i nrmse src/ returns zero matches and lib/ReservoirComputingBenchmarks/ is still an empty placeholder, so #414 hasn't landed yet. For PR3 I'll inline NRMSE (Lukoševičius 2012 eq 1: sqrt(mean((ŷ-y)²) / var(y)) per output, averaged) in the test + tutorial code. If #414 lands first I'll source it from there instead. Fine?

Q9 — Pacing

Are you OK with me starting PR3 now stacked on add-ode-reservoir-ext, on the assumption that PR2 will land roughly as-is? If you'd rather I wait for PR2 to merge first, I'll park this draft and come back.


What I'll do once you reply

Convert this stub into the real implementation: concrete CTESN type, in-place ODE RHS with pre-allocated buffers (mul! for both matvecs — the RHS gets called O(10²–10⁴) times per solve), test file under test/models/, and a tutorial under docs/src/tutorials/. I'll rebase onto master after PR2 merges.

For reference / my own notes — the parallel research pass turned up three small factual things I'd like to flag for the project proposal too (paper title is "Stiff Nonlinear Systems" not "Chaotic Dynamical Systems", the canonical-paper authors are Anantharaman, Ma, Gowda, Laughman, Shah, Edelman, Rackauckas, and the 56× speedup figure in the proposal doesn't appear in the abstract — happy to ping you separately about that rather than mix it into the PR thread).

Thanks!

@Saswatsusmoy

Saswatsusmoy commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @MartinuzziFrancesco — both the §3.2.6 pointer and the "skip the struct, run it raw" suggestion turned out to land in interesting places.

On §3.2.6 (Lukoševičius 2012, "Leaking Rate")

The relevant ODE is eq (5):

$$\dot{\mathbf{x}} = -\mathbf{x} + \tanh\!\left(\mathbf{W}^{\text{in}}[1;\mathbf{u}] + \mathbf{W}\,\mathbf{x}\right)$$

— no α in the continuous form. The leaking rate appears only when you Euler-discretise: take Δx/Δt ≈ ẋ with step Δt = α and you recover the discrete leaky update x(n+1) = (1-α)x(n) + α·tanh(W_in[1;u] + Wx(n)). So α is the discretisation step, not a separate parameter of the ODE. Equivalently you can fold it into eq (5) as a LHS time constant α·ẋ = … if you want to keep Δt ≡ 1. The section also notes this form is preferred over alternatives "because it guarantees that x(n) never goes outside the (-1, 1) interval" — bounded state by construction, no extra contraction argument needed. Practical advice: tune α to the timescale of u/y_target; can be per-neuron for multi-timescale tasks.

This reframes PR3 quite a bit. The Anantharaman CTESN paper uses ṙ = tanh(Ar + W_hyb x) with no decay term and trains a parametric surrogate via RBF interpolation of W_out(p) — a different approach aimed at stiff systems. The "continuous ESN" §3.2.6 points at is just eq (5), which is exactly what the Lorenz eye-test in PR #450's tutorial implements verbatim. So PR3 collapses from "build a CTESN model mirroring the paper" → "thin convenience constructor around SciMLProblemReservoir that pre-bakes eq (5)". Several of the open questions in the original PR body fall away.

Smoke test — eq (5) raw on Lorenz, no CTESN struct

Ran the tutorial pattern scaled to N_res = 300, train = 5000, predict = 1250 samples (dt = 0.02, σ = 10, ρ = 28, β = 8/3). Spectral scaling ‖W_r‖₂ = 0.9 (continuous-ESP-sufficient bound), ‖W_in‖ ≈ 0.1, ‖b‖ ≈ 0.05, MersenneTwister(17), Tsit5 at reltol = 1e-6. Happy to inline the full script as a follow-up comment if useful.

First pass, train NRMSE = 0.0004 (readout fits beautifully) but autoregressive rollout NRMSE was ~1.5 at all horizons. Tracked it down to the autoregressive _predict path in ext/RCODEReservoirExt.jl:

current_state = res.prob.u0   # → zeros(N) every time

The trained reservoir's terminal state never makes it into the predict-time u0. So the rollout starts cold and produces nothing useful regardless of how well the readout was trained. After manually re-running collectstates, grabbing the last column, and rebuilding the reservoir with remake(prob; u0 = terminal_state), the numbers became:

Horizon (Lyapunov times) Steps NRMSE
1 t_λ 55 0.157
2 t_λ 110 0.118
3 t_λ 166 0.112
4 t_λ 221 0.210
6 t_λ 331 0.729
8 t_λ 442 0.981

So eq (5) forecasts Lorenz at N=300 to ~3 Lyapunov times under NRMSE ≈ 0.11 before chaotic divergence dominates. Not quite our PR3 target of NRMSE < 0.1, but within striking distance of it before any hyperparameter tuning (spectral radius vs spectral norm, leak/time-constant, washout length, regularisation). I think the < 0.1 target is reachable.

What I'd like your read on:

  1. The warm-up hole. Is this expected for the current PR feat: continuous-time reservoirs via RCODEReservoirExt #450 predict semantics — i.e., users are meant to remake(prob; u0 = …) themselves before calling predict? Or is this something that should be wired in (e.g. predict(rc, steps, ps, st; initialdata, warmup_data = nothing) that internally runs collectstates on warmup_data and seeds u0)? The current behaviour is correct but really easy to footgun on, as I just demonstrated.

  2. PR3 reframe. Given §3.2.6, would you rather PR3 be:

    • (a) Drop the "CTESN" name entirely and ship a LeakyContinuousESN(in_dims, res_dims, out_dims; α, …) thin wrapper that pre-bakes eq (5) — semantically a sibling of ESN. The Anantharaman parametric-surrogate flavour would be a future PR with its own type.
    • (b) Keep the CTESN name but make it the eq-(5) wrapper, and treat the parametric-surrogate version as a future extension within the same family.
    • (c) Something else entirely.
  3. Validation. Should the in-PR test assert specifically against the eq-(5) Lorenz NRMSE (after tuning to push below 0.1 at some agreed horizon — e.g. 4 t_λ), or do you want a tighter set of unit tests + Mackey-Glass + Lorenz in the docs as eye-tests only?

Will hold off on writing real code until you've had a chance to push back on any of this — easier to redo the design now than after I've committed 600 lines.

@MartinuzziFrancesco

Copy link
Copy Markdown
Collaborator

Thanks for running the experiments, they are quite interesting and I think necessary even for merging #450. To address you point number 1) I think having the two code snippets side to side would help in understanding if this is something that could be a gotcha for the user, and therefore if we should find a way to deal with it in a elegant manner. As far as 2), I think we can go with ContinuousESN thin wrapper of eq 5). And as far as validation I think we can go with the eye test of attractor reconstruction following the example of the readme. If you want a more qualitative test we can use the difference in fractal dimensions using FractalDimensions's Grassberger-Procaccia algorithm. I would say that this is a bit more solid wrt to vpt, and it should be enough for PR3

@Saswatsusmoy

Copy link
Copy Markdown
Contributor Author

Thanks @MartinuzziFrancesco. Re-ran both paths from scratch for the side-by-side, and lining up the other two answers below.

Cold vs warmed predict — side-by-side

Both scripts are identical except for the warmup step. Reproducible numbers on Julia 1.12.5 with MersenneTwister(17), N = 300, train = 5000, predict = 1250, dt = 0.02, Tsit5 at reltol = 1e-6, ‖W_r‖₂ = 0.9, ‖W_in‖ ≈ 0.1.

lorenz_cold.jl — predict's u0 is whatever the user gave the ODEProblem (here zeros(N))
import Pkg; Pkg.activate("/tmp/lorenz_env"; io = devnull)
using Random, Statistics, LinearAlgebra, Printf
using ReservoirComputing, SciMLBase, DataInterpolations, OrdinaryDiffEqTsit5

const SEED, N_RES = 17, 300
const TRAIN_LEN, PREDICT_LEN, DT, SHIFT = 5000, 1250, 0.02, 300
const T_LYAPUNOV = 1 / 0.9056

lorenz!(du, u, p, t) = begin
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end
data_prob = ODEProblem(lorenz!, [1.0, 0.0, 0.0],
    (0.0, (SHIFT + TRAIN_LEN + PREDICT_LEN + 100) * DT), [10.0, 28.0, 8 / 3])
data = Array(solve(data_prob, Tsit5(); saveat = DT))
input_data  = data[:, SHIFT:(SHIFT + TRAIN_LEN - 1)]
target_data = data[:, (SHIFT + 1):(SHIFT + TRAIN_LEN)]
test_data   = data[:, (SHIFT + TRAIN_LEN):(SHIFT + TRAIN_LEN + PREDICT_LEN - 1)]

rng = MersenneTwister(SEED); Random.seed!(SEED)
Wr_raw = randn(rng, N_RES, N_RES) ./ sqrt(N_RES)
Wr  = (0.9 / opnorm(Wr_raw)) .* Wr_raw
Win = 0.1 .* randn(rng, N_RES, 3)
b   = 0.05 .* randn(rng, N_RES)

esn_rhs!(dx, x, p, t) = (dx .= .-x .+ tanh.(p.Wr * x .+ p.Win * p.input(t) .+ p.b); nothing)

function build_rc(tspan_len, u0)
    prob = ODEProblem(esn_rhs!, u0, (0.0, Float64(tspan_len)), (Wr = Wr, Win = Win, b = b))
    res = SciMLProblemReservoir(prob, TerminalStateSampling(),
        (0.0, Float64(tspan_len)), Tsit5(); reltol = 1.0e-6, abstol = 1.0e-8)
    return ReservoirComputer(res, LinearReadout(N_RES => 3))
end

rc_train = build_rc(TRAIN_LEN, zeros(N_RES))
ps, st = setup(rng, rc_train)
ps, st = train!(rc_train, input_data, target_data, ps, st)

# COLD: predict's u0 stays at the prob's original u0 (zeros).
rc_pred = build_rc(PREDICT_LEN, zeros(N_RES))
ps_pred, st_pred = setup(rng, rc_pred)
ps_pred = merge(ps_pred, (readout = ps.readout,))
st_pred = merge(st_pred, (readout = st.readout,))
output, _ = predict(rc_pred, PREDICT_LEN, ps_pred, st_pred; initialdata = test_data[:, 1])

nrmse(yhat, y) = mean(sqrt(mean((yhat[i, :] .- y[i, :]) .^ 2)) / std(y[i, :]) for i in 1:size(y, 1))
for lt in (1.0, 2.0, 3.0, 4.0, 6.0, 8.0)
    h = min(round(Int, lt * T_LYAPUNOV / DT), PREDICT_LEN)
    @printf("  horizon %.1f t_λ  (%4d steps): NRMSE = %.4f\n", lt, h, nrmse(output[:, 1:h], test_data[:, 1:h]))
end
lorenz_warm.jl — same script, with the warmup inserted before build_rc(PREDICT_LEN, …)
# … everything above identical to lorenz_cold.jl through `train!` …

# WARMUP: re-run collectstates, grab terminal reservoir state.
trained_states, _ = collectstates(rc_train, input_data, ps, st)
warm_u0 = collect(trained_states[:, end])

# WARMED: predict's u0 = trained reservoir terminal state.
rc_pred = build_rc(PREDICT_LEN, warm_u0)
ps_pred, st_pred = setup(rng, rc_pred)
ps_pred = merge(ps_pred, (readout = ps.readout,))
st_pred = merge(st_pred, (readout = st.readout,))
output, _ = predict(rc_pred, PREDICT_LEN, ps_pred, st_pred; initialdata = test_data[:, 1])

# … NRMSE loop identical to lorenz_cold.jl …

The semantic delta is exactly: collectstates once after train!, take trained_states[:, end], rebuild the predict reservoir with that as u0. Four added lines, one swapped line.

Results (autoregressive rollout NRMSE per Lukoševičius 2012 eq 1, mean over channels of RMSE / std):

Horizon Cold (u0 = 0) Warmed (u0 = trained terminal)
1 t_λ (55 steps) 1.4893 0.1568
2 t_λ (110 steps) 1.4960 0.1183
3 t_λ (166 steps) 1.7431 0.1118
4 t_λ (221 steps) 1.6959 0.2099
6 t_λ (331 steps) 1.6444 0.7294
8 t_λ (442 steps) 1.4708 0.9808

Train NRMSE (teacher-forced one-step on training data) was 0.0004 in both cases — the readout fits cleanly; the cold path's failure is purely the predict-time u0.

Two small caveats on what I ran:

  • Eq (5) bundles bias as the leading column of W_in[1;u]; my script splits it into a separate +b term — algebraically identical, just different parameter layout.
  • warm_u0 = trained_states[:, end] is the post-modifier state if any state_modifiers are wired up. None here, so it equals the raw reservoir terminal. If a real warmup API were to expose this, "warm from the raw reservoir terminal" and "warm from the modified terminal" would need to be a deliberate choice.

If you read this as a real footgun, the lightest-touch fix I can see is a warmup_data::Union{Nothing, AbstractMatrix} = nothing kwarg on the SciMLProblemReservoir paths of predict, where a non-nothing value triggers collectstates(rc, warmup_data, ps, st) internally and seeds u0 from the last column. No behaviour change for users who don't pass it. Happy to follow your lead on whether this lands in #450 or in PR3's wrapper.

On the naming

ContinuousESN works for me. I'll rename the stub on add-ctesn → new branch + new file + re-titled PR. The Anantharaman parametric-surrogate flavour can be its own type in its own PR if anyone needs it.

On validation via Grassberger-Procaccia

Will pull in FractalDimensions.jl (v1.9.6 as of writing). Plan is grassberger_proccacia_dim(StateSpaceSet(...)) on both the true Lorenz attractor and the warmed-rollout attractor, then assert the predicted dimension matches reference (Lorenz '63 correlation dimension is reported at ≈ 2.05 in the literature) to some agreed tolerance, alongside the attractor-reconstruction eye-test from the README. I won't quote actual fractal-dim numbers from my run until I've actually computed them — flagging the plan rather than the result.

Saswatsusmoy added a commit to Saswatsusmoy/ReservoirComputing.jl that referenced this pull request Jun 17, 2026
Francesco's call on SciML#456: drop the CTESN name (which referred to the
Anantharaman 2021 parametric-surrogate paper) and ship instead a thin
wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator
continuous ESN. Stub file moves from `src/extensions/` to `src/models/`
to sit alongside ESN as a semantic sibling; docstring rewritten to
reflect the new scope (leaky-integrator default, parametric-surrogate
CTESN explicitly out of scope).
@Saswatsusmoy Saswatsusmoy changed the title [WIP/Draft] feat(models): CTESN — PR3 design discussion (stacked on #450) [WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450) Jun 17, 2026
Skeleton-only placeholder so the draft PR exists as a discussion
vehicle. No include, no export, no tests — the body errors with a
clear message directing the user to the (not-yet-implemented) extension
constructor. Open design questions are tracked in the PR description.

Stacked on PR SciML#450; will be rebased onto master once that merges.
Francesco's call on SciML#456: drop the CTESN name (which referred to the
Anantharaman 2021 parametric-surrogate paper) and ship instead a thin
wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator
continuous ESN. Stub file moves from `src/extensions/` to `src/models/`
to sit alongside ESN as a semantic sibling; docstring rewritten to
reflect the new scope (leaky-integrator default, parametric-surrogate
CTESN explicitly out of scope).
Prior commit captured only the file rename — the docstring content
update was on disk but not staged. This commit folds in the actual
docstring rewrite: references Lukoševičius 2012 §3.2.6 eq (5) as the
canonical ODE, explains how the discrete leak rate α maps to the
continuous Euler step, and explicitly distinguishes from the
Anantharaman 2021 parametric-surrogate CTESN.
`ContinuousESN <: AbstractSciMLProblemReservoir`. Thin wrapper that
pre-bakes the Lukoševičius 2012 §3.2.6 eq (5) ODE

    ẋ(t) = α · (-x(t) + tanh(W_in·u(t) + W_r·x(t) + b))

with `leak_coefficient = α`. Forward-Euler discretisation at Δt = 1
collapses to the discrete leaky ESN exactly. Struct lives in
src/models/, real constructor + in-place RHS (mul!, fused tanh
broadcast, compile-time bias branch) in `RCODEReservoirExt`.

Validates `tspan`, dim positivity, leak positivity, finite endpoints,
length-2 tspan, and Inf-on-cast at setup. Test suite covers shape,
construction errors, bias toggle, bias under Tsit5, forward
determinism, Euler equivalence vs discrete leaky ESN at α ∈ {1.0, 0.3},
both predict modes, modifier composition, and T-kwarg propagation —
43 assertions across 10 testsets.

Tutorial added under docs/src/tutorials/continuous_esn.md (Lorenz
forecasting walkthrough). Adds Lukoševičius 2012 and Anantharaman 2021
to refs.bib. Drops the stale PR3/PR5 stub comment in PR2's
sciml_reservoir.jl now that the first concrete subtype has landed.
Comment thread docs/src/api/layers.md Outdated
```@docs
AbstractSciMLProblemReservoir
SciMLProblemReservoir
ContinuousESN

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can go on its own page since it's a model

Comment thread docs/src/tutorials/continuous_esn.md Outdated
Comment on lines +31 to +36
```julia
using ReservoirComputing
using SciMLBase
using DataInterpolations
using OrdinaryDiffEqTsit5
```

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed if you ahve the example below

- Move `ContinuousESN` from layers.md to models.md — it is a model
  (sibling of `ESN`/`EuSN`), not a continuous-time reservoir building
  block; the `SciMLProblemReservoir` family stays under layers.
- Drop the standalone "Loading the extension" block in the tutorial;
  the `@example continuous-esn-lorenz` block immediately below already
  carries the `using` lines, so the prose block was redundant.
@Saswatsusmoy

Copy link
Copy Markdown
Contributor Author

Both done in 7a7a84b5. Moved ContinuousESN into a new "Continuous-time Echo State Networks" section in api/models.md — agreed, it's a model not a layer. Dropped the standalone using block in the tutorial and folded the one-liner about which packages do what into the intro paragraph above the @example block.

Comment thread src/models/continuous_esn.jl Outdated
Comment on lines +74 to +88
@concrete struct ContinuousESN <: AbstractSciMLProblemReservoir
prob
sampler
tspan
args
kwargs
in_dims
res_dims
leak_coefficient
use_bias
init_reservoir
init_input
init_bias
matrix_type
end

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much here, the ContinuousESN should just hold the reservoir, state modifires, and readout. the construction is similar to the ESN, we just provide a convenient wrapper for the user bu we use the internal components. so the the esn equations are fed into the SciMLProblemReservoir, the dims into the ESNCell etcetc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to restructure to the 3-field shape — want to make sure I land the composition right before I rewrite. Three things I'd like to confirm:

  1. Where does the cell sit? I see two readings: (a) reservoir::SciMLProblemReservoir with ESNCell used only at construction time to mint prob.p, or (b) introduce a small ContinuousReservoir <: AbstractSciMLProblemReservoir that holds the ESNCell as a field and delegates initialparameters to it (so weights stay in ps.reservoir and the existing _build_solve_params flow keeps working). I lean (b) since (a) bakes the weights into prob.p at construction and they'd no longer live in ps. Which were you picturing?
  2. leak_coefficient — agreed eq (5) doesn't carry α; α is the Euler step. Want me to drop the leak_coefficient kwarg entirely (user controls α implicitly via tspan / n_samples), or keep it as a thin convenience that adjusts tspan under the hood?
  3. ESNCell init signatures — ESNCell's initializers take (rng, dims...) (no eltype), but the current ContinuousESN exposes T and passes (rng, T, dims...). To reuse ESNCell as-is I'd lose the T kwarg and inherit ESNCell's Float32 default. Acceptable, or should I keep T?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this raises a good point on adding the ContinuousESNCell as an additional layer. This way we can save those elements in the new cell, which should then also contain the esn equations. something like this maybe

@concrete struct ContinuousESNESNCell <: AbstractEchoStateNetworkCell #maybe change here
    activation
    in_dims <: IntegerType
    out_dims <: IntegerType
    init_bias
    init_reservoir
    init_input
    #init_feedback::F
    init_state
    use_bias <: StaticBool
    equations
end

also make sure to use IntegerType and StaticBool for Int and Bools

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed d9b5fda9 with the restructure. ContinuousESN is now 3-field (reservoir, states_modifiers, readout) exactly like ESN; ContinuousESNCell <: AbstractSciMLProblemReservoir so the existing continuous dispatch in RCODEReservoirExt fires on rc.reservoir. initialparameters returns the canonical (input_matrix, reservoir_matrix, [bias]) trio matching ESNCell. Default equations is eq (5) without α — α emerges purely from the Euler step Δt, per your correction.

One field-list deviation worth flagging — the cell carries three more fields than you sketched (tspan, args, kwargs). The reason: since the cell is the dispatch target for _collectstates / _predict, the ext needs cell.tspan to set up the saveat grid and cell.args / cell.kwargs to call solve. If those lived on ContinuousESN instead, the cell would still need a way to reach them at dispatch time, which would mean either threading them through every _collectstates call or duplicating dispatch on the outer model. Adding them to the cell felt like the lowest-friction option, but happy to move them onto ContinuousESN and reach for them via rc.tspan etc. if you'd rather keep the cell strictly ESNCell-shaped — just say the word.

Also renamed u_tinput_t per your other comment; equivalence test now uses Float-exact α ({1.0, 0.5}) so the saveat grid aligns bit-for-bit with the Euler step boundaries (non-FP-exact α introduces a sub-step offset that the Euler interpolant smooths over).

Comment thread ext/RCODEReservoirExt.jl Outdated
# path.
# ---------------------------------------------------------------------------

function _continuous_esn_rhs!(dx, x, p, t)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try not to use single letter variables

Comment thread ext/RCODEReservoirExt.jl Outdated
# Concrete subtype of `AbstractSciMLProblemReservoir` that pre-bakes the
# leaky-integrator continuous ESN ODE of Lukoševičius 2012 §3.2.6 eq (5):
#
# ẋ(t) = α · (-x(t) + tanh(W_in·u(t) + W_r·x(t) + b))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lukoševičius 2012 §3.2.6 eq (5) does not have alpha in it, it comes out as a consequence of discretization of the ode

…3-field ESN shape

Address Francesco's review on SciML#456: `ContinuousESN` now shares its struct
shape with `ESN` — three fields `(reservoir, states_modifiers, readout)`
— and the substance moves into a new `ContinuousESNCell` that mirrors
`ESNCell`'s field layout plus an `equations` field carrying the ODE
right-hand side.

  * `src/layers/continuous_esn_cell.jl` (NEW): `ContinuousESNCell <:
    AbstractSciMLProblemReservoir` so the existing continuous dispatch
    in `RCODEReservoirExt` fires on `rc.reservoir`. `initialparameters`
    returns the canonical `(input_matrix, reservoir_matrix, [bias])`
    trio. Default `_continuous_esn_rhs!` implements Lukoševičius 2012
    §3.2.6 eq (5) verbatim — no α in the RHS; α emerges only when the
    ODE is forward-Euler discretised at step `Δt = α`. Only the
    `@concrete` inner constructor is exposed; the convenience builder
    lives in the extension's `ContinuousESN(...)` constructor (avoids a
    Pair-vs-activation ambiguity with the auto-generated inner ctor
    that Aqua flagged).

  * `src/models/continuous_esn.jl`: 13-field flat struct → 3-field
    container. Stub constructor errors without the extension; real one
    in `RCODEReservoirExt`.

  * `ext/RCODEReservoirExt.jl`: drop the bulk constructor + the old
    α-bearing RHS + the `_typed_zeros_init` helper. Add a slim
    `ContinuousESN(in_dims, res_dims, out_dims, [activation,] tspan,
    args...; use_bias, init_*, equations, state_modifiers, ...)` that
    validates inputs, builds the cell, and wraps it with
    `state_modifiers` + `LinearReadout`. Add
    `_collectstates(::ContinuousESNCell, ...)` (builds the `ODEProblem`
    on the fly from `cell.equations`, sized by `cell.out_dims`, with
    `ps.reservoir` merged into the solve `p`) and a parallel
    `_predict(::ContinuousESNCell, ..., steps; initialdata)` for the
    autoregressive path. Teacher-forced `_predict` inherits the generic
    `AbstractSciMLProblemReservoir` method unchanged. Rename `u_t` to
    `input_t` per the same review thread.

  * Field-list note: Francesco's sketch listed only the ESNCell-like
    fields. The cell additionally carries `tspan`, `args`, `kwargs`
    because it IS the dispatch target for the continuous path — without
    those fields `_collectstates` would have nothing to feed `solve`.
    Documented as an open question in the PR reply.

  * `test/test_continuous_esn.jl`: rewritten for the new field shape
    (`ps.reservoir.input_matrix` / `reservoir_matrix` / `bias`). Euler
    equivalence test moved from "α in the RHS, dt=1" to "no α in the
    RHS, dt=α with tspan=(0, T_steps·α)"; α values restricted to
    `{1.0, 0.5}` so the requested saveat grid aligns bit-for-bit with
    the Euler step boundaries (non-FP-exact α introduces a sub-step
    offset that the Euler interpolant smooths over). 9 testsets, 45
    assertions, all green at `atol=1e-10` for the equivalence test.

  * Docs: add `ContinuousESNCell` to the layers API page and rewrite
    the Lorenz tutorial without the dropped `leak_coefficient` / `T`
    kwargs.
Comment thread docs/src/api/layers.md Outdated

```@docs
AdditiveEIESNCell
ContinuousESNCell

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also put this in a different subsection, similar to the models. Put it in a Continuous layers section

Comment thread docs/src/tutorials/continuous_esn.md Outdated
# Continuous ESN: forecasting Lorenz

[`ContinuousESN`](@ref) is a thin convenience wrapper around a
[`ContinuousESNCell`](@ref) that pre-bakes the leaky-integrator

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaky-integrator

Comment thread docs/src/refs.bib Outdated
Comment on lines +573 to +579
@article{Anantharaman2021,
title = {Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks},
url = {https://arxiv.org/abs/2010.04004},
author = {Anantharaman, Ranjan and Ma, Yingbo and Gowda, Shashi and Laughman, Christopher and Shah, Viral and Edelman, Alan and Rackauckas, Christopher},
year = {2021},
journal = {arXiv:2010.04004}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary now? I think the implementation we have is not the one provided by this paper

Comment thread ext/RCODEReservoirExt.jl Outdated
Comment on lines 7 to 12
# `solve` and `remake` come from `SciMLBase`. The user picks the concrete
# solver type (e.g. `Tsit5()`) and loads its package separately
# (`OrdinaryDiffEqTsit5`, `OrdinaryDiffEq`, …); dispatch at solve time
# selects the right method via the type they passed in `res.args[1]`. We
# deliberately don't list a solver package as a weakdep trigger so users
# aren't forced to pull the full `OrdinaryDiffEq` meta-package in.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments not really useful here

Comment thread ext/RCODEReservoirExt.jl Outdated
Comment on lines +400 to +411
# ---------------------------------------------------------------------------
# ContinuousESN — convenience constructor
#
# Builds the 3-field `(reservoir, states_modifiers, readout)` model whose
# `reservoir` is a `ContinuousESNCell`. The cell carries Lukoševičius 2012
# §3.2.6 eq (5) as its `equations` field. Each of `collectstates`,
# `predict(rc, data, ...)`, and `predict(rc, steps, ...; initialdata)`
# dispatches on `rc.reservoir::ContinuousESNCell` below, building the
# `ODEProblem` lazily so the reservoir weights stay in `ps.reservoir` and
# can be re-initialised by `setup(rng, rc)` like every other ESN-family
# model.
# ---------------------------------------------------------------------------

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Comment thread src/layers/continuous_esn_cell.jl Outdated
Comment on lines +26 to +29
This type is normally constructed by the [`ContinuousESN`](@ref)
convenience constructor in the `RCODEReservoirExt` extension rather than
directly. Driving the inner constructor by hand requires placing every
field in the order above and pre-static-ing `use_bias`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too low level information for a docstring

- `reservoir_matrix :: (out_dims × out_dims)` — `W_r`
- `bias :: (out_dims,)` — present only if `use_bias = True()`
"""
@concrete struct ContinuousESNCell <: AbstractSciMLProblemReservoir

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need a constructor here like for the normal ESNCell

Comment thread src/models/continuous_esn.jl Outdated
kwargs...)

Continuous-time leaky-integrator Echo State Network ([Lukosevicius2012](@cite)
§3.2.6, eq 5). Thin convenience wrapper that composes:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§3.2.6, eq 5)

Comment thread src/models/continuous_esn.jl Outdated
Comment on lines +20 to +25
Structurally identical to [`ESN`](@ref) — three fields
`(reservoir, states_modifiers, readout)` — so it slots into the same
training and prediction pipeline. The reservoir field is the
`ContinuousESNCell` rather than a `StatefulLayer(ESNCell)`, which routes
`collectstates` / `predict` through the continuous-time
`RCODEReservoirExt` extension.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally identical to ESN, but the reservoir field is the
ContinuousESNCell rather than a StatefulLayer(ESNCell)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are too many implementation level details that the user may not be familiar with or not care about

Comment thread src/models/continuous_esn.jl Outdated
Comment on lines +40 to +43
No leaking-rate term `α` appears in the ODE. `α` emerges only when the
ODE is forward-Euler discretised with step `Δt = α`, recovering the
discrete leaky update of [`ESN`](@ref). To change the effective leak,
adjust `tspan` so the per-input-window width matches the desired `Δt`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

@MartinuzziFrancesco

Copy link
Copy Markdown
Collaborator

I think we are almost there! There are a few points to clarify:

  • the documentation, especially docstrings, has to be clear to human users. This means that details about the routing or the dispatch or how things work under the hood are not needed. When I use + I am not interested in how the dispatch was made but in what it does for my use case.
  • Similarly, too many details on sources can be a bit much. It suffices to say which paper the model comes from using the citation.
  • Comments are not needed, the code usually should be clear enough to roughly understand what is going on. I would remove most of the comments there are

Also, we do need a convenience constructor for the ContinuousESNCell.

Overall it's ok to use llms for coding, but make sure the code is up to standard. Forcing the SciML guide on a prompt usually helps in catching a lot of these issues, as well as having a local .md similar to this one

@Saswatsusmoy

Copy link
Copy Markdown
Contributor Author

I will have to look into this seriously, I was trying and experimenting documentations and code styling changes through autonomous agents (mostly claude & hermes) will have to fine tune the agents (will force the SciML guide on a project level). For now I will fallback to the prev workflow (without LLM agents) for the PR

…s, and cell constructor

Clean up implementation details from docstrings and comments across
the ContinuousESN model, cell, tutorial, and extension. Add a
Pair-signature convenience constructor for ContinuousESNCell matching
the ESNCell pattern. Move ContinuousESNCell to a new Continuous-time
Layers subsection in the API docs and drop the unused Anantharaman2021
citation.
@Saswatsusmoy

Copy link
Copy Markdown
Contributor Author

I kept _continuous_esn_rhs! unexported. It is documented and used as a default kwarg (equations = _continuous_esn_rhs!), so it shows up in signatures and @ref links, but the _ prefix is a Julia convention for internal and no other _-prefixed helper in the codebase is exported. Fine as is, or would you rather export it?

@MartinuzziFrancesco MartinuzziFrancesco marked this pull request as ready for review July 2, 2026 15:02
@Saswatsusmoy

Copy link
Copy Markdown
Contributor Author

Coming back to the warmup discussion. The cold vs warm autoregressive predict gap is still there, and the user currently has to manually collectstates after training and rebuild the reservoir with the terminal state as u0. In the side-by-side scripts the gap was NRMSE ~1.5 (cold) vs ~0.11 (warmed).

Do you want me to wire in automatic warmup as part of this PR, or keep it as-is and defer to a separate PR? Two lightweight approaches on the table:

  • Add a warmup_data kwarg to predict(rc, steps, ...) for the continuous paths, where a non-nothing value triggers collectstates internally and seeds u0 from the last column. No behaviour change for users who do not pass it.

  • Keep the current manual workflow as the documented path and just note the cold/warm distinction in the tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants