diff --git a/_quarto.yml b/_quarto.yml index 9056a93ea..cad276703 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -88,6 +88,7 @@ website: - usage/sampler-visualisation/index.qmd - usage/dynamichmc/index.qmd - usage/varnamedtuple/index.qmd + - usage/vectorisation/index.qmd - usage/external-samplers/index.qmd - usage/troubleshooting/index.qmd @@ -222,6 +223,7 @@ usage-threadsafe-evaluation: usage/threadsafe-evaluation usage-tracking-extra-quantities: usage/tracking-extra-quantities usage-troubleshooting: usage/troubleshooting usage-varnamedtuple: usage/varnamedtuple +usage-vectorisation: usage/vectorisation contributing-start: contributing/start-contributing contributing-documentation: contributing/documentation diff --git a/usage/sampling-options/index.qmd b/usage/sampling-options/index.qmd index fcc13a206..aa23c0ad9 100644 --- a/usage/sampling-options/index.qmd +++ b/usage/sampling-options/index.qmd @@ -119,8 +119,10 @@ There are several options; all the `InitFrom...` structs are re-exported by Turi - `InitFromPrior()`: generate initial parameters by sampling from the prior - `InitFromUniform(lower, upper)`: generate initial parameters by sampling uniformly from the given bounds in linked space -- `InitFromParams(namedtuple_or_dict)`: use the provided initial parameters, supplied either as a `NamedTuple` or a `Dict{<:VarName}` -- `InitFromParams(mode_estimate)`: use the parameters in the optimisation result obtained via `maximum_a_posteriori` or `maximum_likelihood` +- `InitFromParams(vnt)`: use the provided initial parameters, supplied as a [`VarNamedTuple`]({{< meta usage-varnamedtuple >}}). `NamedTuple` and `Dict{VarName}` input is also supported. + +There are also a number of other overloads for `InitFromParams`, such as `InitFromParams(mode_estimate)`, which lets you use the parameters in the optimisation result obtained via `maximum_a_posteriori` or `maximum_likelihood`. +Please see the docstrings for `InitFromParams` for more details (use `?InitFromParams` in the REPL). If `initial_params` is unspecified, each sampler will use its own default initialisation strategy: for most samplers this is `InitFromPrior` but for Hamiltonian samplers it is `InitFromUniform(-2, 2)` (which mimics the behaviour of Stan). @@ -269,7 +271,6 @@ This catches a number of potential errors in a model, such as having repeated va If you wish to disable this you can pass `check_model=false` to `sample()`. - ## Callbacks The `callback` keyword argument can be used to specify a function that is called at the end of each sampler iteration. @@ -279,6 +280,29 @@ If you are performing multi-chain sampling, `kwargs` will additionally contain ` The [TuringCallbacks.jl package](https://github.com/TuringLang/TuringCallbacks.jl) contains a `TensorBoardCallback`, which can be used to obtain live progress visualisations using [TensorBoard](https://www.tensorflow.org/tensorboard). +## Fixed transforms + +Many of Turing's inference algorithms require the use of transformed variables to ensure that they are unconstrained. + +By default, these transforms are determined dynamically each time the model is run. +The reason for this is because the transforms depend on the support of the distributions used, and this can in turn depend on the values of the parameters. +For example, `y` here has a transform that depends on `x`: + +```{julia} +@model function demo_transform() + x ~ Exponential() + y ~ Uniform(0, x) +end +``` + +However, if you are certain that your model's variables always have a fixed support, you can pass the `fix_transforms=true` keyword argument to `sample()`. +This precomputes the transforms once at the start of sampling and reuses them for the rest of the sampling process. + +Note however that while this sounds attractive from a performance perspective, it does not always lead to any noticeable difference! +For many distributions, the computation of the transform is often extremely fast and is comparable to looking up the stored transform. + +Please see [the DynamicPPL documentation on fixed transforms](https://turinglang.org/DynamicPPL.jl/stable/fixed_transforms) for more information. + ## Automatic differentiation Finally, please note that for samplers which use automatic differentiation (e.g., HMC and NUTS), the AD type should be specified in the sampler constructor itself, rather than as a keyword argument to `sample()`. diff --git a/usage/vectorisation/index.qmd b/usage/vectorisation/index.qmd new file mode 100755 index 000000000..3ef2a7649 --- /dev/null +++ b/usage/vectorisation/index.qmd @@ -0,0 +1,269 @@ +--- +title: Vectorisation of Turing models +engine: julia +--- + +```{julia} +#| echo: false +#| output: false +using Pkg; +Pkg.instantiate(); +``` + +The 'native' data structure for a Turing model is [`VarNamedTuple`]({{< meta usage-varnamedtuple >}}), which is a mapping from `VarName`s to values. +For example, `rand(model)` returns a `VarNamedTuple`: + +```{julia} +using Turing +Turing.setprogress!(false) + +@model function demo() + x ~ Normal(0, 1) + y ~ Normal(x, 1) +end +model = demo() + +vnt = rand(model) +``` + +and you can likewise use a `VarNamedTuple` to specify initial parameters to Turing's inference algorithms: + +```{julia} +sample(model, MH(), 100; initial_params=InitFromParams(vnt)); +``` + +Now, `VarNamedTuple` is a _rich_ data structure which faithfully retains names and types according to the structure of the model, which is why it is our default data structure. + +However, many packages in the Julia ecosystem work with array inputs and outputs. +Indeed this is even the case for many of Turing's dependencies, such as AdvancedHMC, Optimization, and AdvancedVI: in these cases, Turing is responsible for performing the 'translation' between the `VarNamedTuple` and the vectorised format that these packages require. + +This page describes how to convert a Turing model into a vectorised format such that it can be passed to these packages, and how to convert the outputs back into a `VarNamedTuple` format for analysis. +This is useful if you need to carry out more advanced analysis with a Turing model with packages that Turing does not have wrappers for. + +## Introduction + +Before we demonstrate any code, we will first explain conceptually what we need to actually accomplish this vectorisation. +Given a VarNamedTuple of parameter values, we need two pieces of information to flatten it into a vector: + +1. How to convert each individual value into a vector; and + +2. How to concatenate those vectors together into a single vector. + +We accomplish this by specifying a *transform function* for each parameter, as well as a *range* for each parameter. +For example, conceptually, the `VarNamedTuple` above with `x` and `y` can be flattened with a representation that looks something like this. + +(Please note that you don't have to define any of this yourself! +Turing has built-in functionality for this, which we'll cover in a moment. +This is just to illustrate the principle behind it.) + +```{julia} +struct TransformPlusRange{F} + transform::F + range::UnitRange{Int} +end + +tovec(f) = [f] +info = @vnt begin + x := TransformPlusRange(tovec, 1:1) + y := TransformPlusRange(tovec, 2:2) +end +``` + +(For more information on the `@vnt` macro, please see [the VarNamedTuple page]({{< meta usage-varnamedtuple >}}).) + +To flatten the parameters, we can then do the following: + +```{julia} +# Begin by allocating a vector of the appropriate length. +params = Vector{Float64}(undef, 2) + +# Iterate through the original `VarNamedTuple`. For each key: +for (vn, value) in pairs(vnt) + vn_info = info[vn] + # Use the transform to vectorise the value + vectorised_value = vn_info.transform(value) + # Store it in the appropriate range of the allocated vector + params[vn_info.range] = vectorised_value +end + +params +``` + +## `LogDensityFunction` + +Turing abstracts all of this away for you by providing a struct, `LogDensityFunction`, which wraps a model plus all of the necessary information above for vectorisation. + +:::{.callout-note} +## More details + +In this page we will only cover a fairly high-level overview of the interface that `LogDensityFunction` exposes. +In particular, we assume that you simply want to construct one of them, pass it to some package, and retrieve the results back in a `VarNamedTuple` format. + +There are many more details of working with `LogDensityFunction` that can be useful if you are, for example, developing your own inference algorithms to work with Turing. We refer the interested reader to [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/ldf/overview/) for more information on this. +::: + +To construct a `LogDensityFunction`, you can simply pass a model: + +```{julia} +@model f() = x ~ Dirichlet(ones(3)) +model = f() + +ldf = LogDensityFunction(model); +``` + +With this single-argument constructor, the vectorised representation of `x` will simply be itself (since `Dirichlet` is a multivariate distribution). +Now, `x` also has the property that its elements must be non-negative and sum to 1. +We can see this in action if we draw a randomised set of vectorised parameters: + +```{julia} +rand(ldf) +``` + +By default, `LogDensityFunction` does not apply any extra transformations to the parameters, which is why the output above is still constrained to the simplex. +However, such constraints can cause problems for some algorithms which expect to work in unconstrained space. +To address this, you can instantiate a `LogDensityFunction` as follows: + +```{julia} +using DynamicPPL + +ldf_linked = LogDensityFunction(model, getlogjoint_internal, LinkAll()); +``` + +Here: + +- `LinkAll()` specifies that we want all parameters in the model to be transformed to unconstrained vectors. +- `getlogjoint_internal` says that the log-density we are interested in is the log-joint density, _with_ any Jacobian adjustments for the transformations used above. + +When we sample from this `LogDensityFunction`, we can see that the output is now unconstrained. +In fact, the output is now a vector of length 2: since `x` must sum to 1, the third element of `x` is not needed since it can be recalculated from the first two elements. + +```{julia} +rand(ldf_linked) +``` + +## The LogDensityProblems.jl interface + +`LogDensityFunction` is not just a way to perform parameter value conversion: it also allows you to compute the log probability density associated with these parameters. + +In particular, the `LogDensityFunction` struct is designed to obey [the LogDensityProblems.jl interface](https://www.tamaspapp.eu/LogDensityProblems.jl/stable/). +In particular, you can use the following functions: + +```{julia} +using LogDensityProblems + +LogDensityProblems.capabilities(ldf_linked) +``` + +```{julia} +N = LogDensityProblems.dimension(ldf_linked) +``` + +```{julia} +LogDensityProblems.logdensity(ldf_linked, randn(N)) +``` + +## Automatic differentiation + +In its current form, the `LogDensityFunction` does not natively know how to compute gradients of the parameters. +To enable integration with automatic differentiation (AD) packages, you can specify the `adtype` keyword argument: + +```{julia} +ldf_ad = LogDensityFunction(model, getlogjoint_internal, LinkAll(); adtype=AutoForwardDiff()); +``` + +:::{.callout-note} +## Gradient preparation + +When constructing a `LogDensityFunction` with an `adtype`, gradient preparation (using DifferentiationInterface.jl) will be performed automatically. +This means that subsequent gradient calculations will be much faster, but also means that depending on the backend used, the construction of the `LogDensityFunction` may take a long time. +::: + +This allows you to compute not only `logdensity`, but also `logdensity_and_gradient`: + +```{julia} +# We now have first-order AD capabilities +LogDensityProblems.capabilities(ldf_ad) +``` + +```{julia} +LogDensityProblems.logdensity_and_gradient(ldf_ad, randn(N)) +``` + +If you need more information on how to choose an AD backend, please see the [automatic differentiation page]({{< meta usage-automatic-differentiation >}}) for more information. + +## Changing the log-density function + +Most of the time, `getlogjoint_internal` is the appropriate kind of log-density function to use: it represents the log-joint density in the transformed space ('internal' is a slightly poor name, but it has stuck), and is what inference algorithms tend to expect. + +However, there are also other options: for example `LogDensityFunction(model, getloglikelihood, ...)` will give you a log-density function that only computes the likelihood of the data given the parameters. + +Please consult [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#DynamicPPL.LogDensityFunction) for more information on this. + +## Using custom transforms + +When instantiated, `LogDensityFunction` will automatically create its own mapping of `VarName`s to ranges and transforms (much like in the example above). +If you are happy with the default transforms, then you would never need to worry about this. +However, advanced users may wish to specify their own transforms. + +In order to do this, you have to use a different constructor for `LogDensityFunction`, namely + +```julia +LogDensityFunction(model, logdensity_function, ranges_and_transforms, sample_vec; adtype) +``` + +Please see the DynamicPPL API documentation for more information on these arguments. + +## Conversions between `VarNamedTuple` and vectors + +Finally, we will cover the functions that allow you to convert between `VarNamedTuple` and vector formats. + +### From `VarNamedTuple` to vector + +We already saw that we can draw random samples from a `LogDensityFunction` in vector format: + +```{julia} +rand(ldf_ad) +``` + +By default, this draws from the prior of the model, using `InitFromPrior` (see [Sampling options]({{< meta usage-sampling-options >}}#specifying-initial-parameters) for more information on this). +If you want vectorised samples for a _specific_ set of parameters, you can provide an initialisation strategy that uses those parameters: + +```{julia} +vnt = rand(model) +``` + +```{julia} +rand(ldf_ad, InitFromParams(vnt)) +``` + +(Note that this does technically mean that the output is no longer truly 'random'.) + +### From vector to `VarNamedTuple` + +Now suppose that you have run an inference algorithm using a `LogDensityFunction` and it has given you back a set of 'best' parameters (as a vector, of course): + +```{julia} +best_params = randn(LogDensityProblems.dimension(ldf_ad)) +``` + +To convert this back into a `VarNamedTuple`, you have to reevaluate the model. +The easiest way to do so is to use `DynamicPPL.ParamsWithStats`: + +```{julia} +pws = ParamsWithStats(best_params, ldf_ad) +``` + +This also helpfully computes the log-probabilities associated with this set of parameters (if you do not need this, you can disable it with `ParamsWithStats(...; include_log_probs=false)`). +You can then access `pws.params` to get the `VarNamedTuple` of parameters. +For example, to retrieve the original value of `x`: + +```{julia} +pws.params[@varname(x)] +``` + +### More fine-grained control + +These wrapper functions are designed for maximum ease of use. +However, as mentioned above, sometimes you may want more fine-grained control. +To do so, you should use the DynamicPPL API and in particular the `DynamicPPL.init!!` function. +This is covered in much more detail in [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/ldf/overview/)!