Skip to content

avoid parameterizing GeneratedFunctionWrapper on the RGFs#4552

Draft
KristofferC wants to merge 1 commit into
SciML:masterfrom
KristofferC:kc/nospecialize_RGFW
Draft

avoid parameterizing GeneratedFunctionWrapper on the RGFs#4552
KristofferC wants to merge 1 commit into
SciML:masterfrom
KristofferC:kc/nospecialize_RGFW

Conversation

@KristofferC
Copy link
Copy Markdown
Contributor

This causes a large number of functions that take this as an argument to specialize on different input models. This has no performance benefit in practice, so we can just avoid doing that. The result of this is also directly converted via u0_eltype (and they are only called in the initialization problem and remake AFAIU) so there is no loss of inference precision.

For numbers, I was running with something that looked like this (basically doing the same thing with two identical models that are separately instantiated:

using Multibody, ModelingToolkit, OrdinaryDiffEqTsit5
using ModelingToolkit: t_nounits as t, D_nounits as D
using OrdinaryDiffEqTsit5
using OrdinaryDiffEqNonlinearSolve
@time "create model 1    " @named model = ...
@time "create multibody 1" ssys = multibody(model)
@time "create odeprob 1  " prob = ODEProblem(ssys, [
                   ...
               ], (0.0, 2.0));

using InteractiveUtils

@time "create model 2    " @named model = ...
@time "create multibody 2" ssys = multibody(model)
# @trace_compile begin
@time "create odeprob 2  " prob = ODEProblem(ssys, [
                   ...
               ], (0.0, 2.0));
# end

and saw that create odeprob 2 had a large compilation time even though it should already be compiled. Printing out the results with @trace_compile I could see many functions being specialized on the GFW:

  ┌──────────────────────────────┬───────┬───────────┬────────────┐  
  │           Category           │ Count │ Time (ms) │ % of total │
  ├──────────────────────────────┼───────┼───────────┼────────────┤  
  │ get_A_b_from_LinearFunction  │    34 │      1577 │      56.5% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ drop_expr                    │   165 │       479 │      17.2% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ kwcall (keyword wrappers)    │    75 │       420 │      15.1% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ SymbolicLinearInterface ctor │    34 │        73 │       2.6% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ LinearFunction ctor          │    34 │        69 │       2.5% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ MappingRF (state_values)     │    33 │        44 │       1.6% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ Everything else              │    13 │       127 │       4.5% │
  ├──────────────────────────────┼───────┼───────────┼────────────┤
  │ Total                        │   394 │      2790 │       100% │
  └──────────────────────────────┴───────┴───────────┴────────────┘

With this PR, the numbers change as follows (what is most interesting is looking at the compile time for odeprob 2 (the last one):


# Before
create model 1    : 21.411221 seconds (130.82 M allocations: 6.344 GiB, 5.36% gc time, 99.12% compilation time: 3% of which was recompilation)
create multibody 1: 16.331450 seconds (81.40 M allocations: 4.189 GiB, 6.49% gc time, 87.35% compilation time: 11% of which was recompilation)
create odeprob 1  : 20.478440 seconds (122.05 M allocations: 7.064 GiB, 6.32% gc time, 82.90% compilation time: 26% of which was recompilation)
create model 2    : 0.186748 seconds (2.25 M allocations: 84.899 MiB, 20.02% gc time)
create multibody 2: 1.798874 seconds (14.62 M allocations: 920.115 MiB, 21.64% gc time, 0.01% compilation time: 6% of which was recompilation)
create odeprob 2  : 6.550366 seconds (45.56 M allocations: 3.286 GiB, 10.30% gc time, 48.29% compilation time)

# After
create model 1    : 20.308913 seconds (130.82 M allocations: 6.344 GiB, 5.52% gc time, 99.13% compilation time: 3% of which was recompilation)
create multibody 1: 15.464750 seconds (81.21 M allocations: 4.180 GiB, 5.48% gc time, 88.89% compilation time: 11% of which was recompilation)
create odeprob 1  : 16.793652 seconds (110.76 M allocations: 6.519 GiB, 8.53% gc time, 79.62% compilation time: 33% of which was recompilation)
create model 2    : 0.181082 seconds (2.23 M allocations: 84.676 MiB, 19.98% gc time)
create multibody 2: 1.477791 seconds (14.64 M allocations: 919.984 MiB, 5.43% gc time)
create odeprob 2  : 3.115815 seconds (35.94 M allocations: 2.824 GiB, 19.19% gc time)

This causes a large number of functions that take this as an argument to specialize on different input models. This has no performance benefit in practice so just avoid doing that
@AayushSabharwal
Copy link
Copy Markdown
Member

(Almost) every generated function is put in GFW, including the ODE RHS. If we unconditionally stop specializing here, then FullSpecialize does nothing, and I think this is a slightly worse version of the function wrapping that OrdinaryDiffEq automatically does in solve? Could this be solved by @nospecializeing the appropriate functions?

@KristofferC
Copy link
Copy Markdown
Contributor Author

KristofferC commented May 22, 2026

Maybe, will try. I have to reevalute my numbers on top of #4521.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants