Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ website:
- usage/sampler-visualisation/index.qmd
- usage/dynamichmc/index.qmd
- usage/varnamedtuple/index.qmd
- usage/vectorisation/index.qmd
- usage/external-samplers/index.qmd
- usage/troubleshooting/index.qmd

Expand Down Expand Up @@ -222,6 +223,7 @@ usage-threadsafe-evaluation: usage/threadsafe-evaluation
usage-tracking-extra-quantities: usage/tracking-extra-quantities
usage-troubleshooting: usage/troubleshooting
usage-varnamedtuple: usage/varnamedtuple
usage-vectorisation: usage/vectorisation

contributing-start: contributing/start-contributing
contributing-documentation: contributing/documentation
Expand Down
30 changes: 27 additions & 3 deletions usage/sampling-options/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,10 @@ There are several options; all the `InitFrom...` structs are re-exported by Turi

- `InitFromPrior()`: generate initial parameters by sampling from the prior
- `InitFromUniform(lower, upper)`: generate initial parameters by sampling uniformly from the given bounds in linked space
- `InitFromParams(namedtuple_or_dict)`: use the provided initial parameters, supplied either as a `NamedTuple` or a `Dict{<:VarName}`
- `InitFromParams(mode_estimate)`: use the parameters in the optimisation result obtained via `maximum_a_posteriori` or `maximum_likelihood`
- `InitFromParams(vnt)`: use the provided initial parameters, supplied as a [`VarNamedTuple`]({{< meta usage-varnamedtuple >}}). `NamedTuple` and `Dict{VarName}` input is also supported.

There are also a number of other overloads for `InitFromParams`, such as `InitFromParams(mode_estimate)`, which lets you use the parameters in the optimisation result obtained via `maximum_a_posteriori` or `maximum_likelihood`.
Please see the docstrings for `InitFromParams` for more details (use `?InitFromParams` in the REPL).

If `initial_params` is unspecified, each sampler will use its own default initialisation strategy: for most samplers this is `InitFromPrior` but for Hamiltonian samplers it is `InitFromUniform(-2, 2)` (which mimics the behaviour of Stan).

Expand Down Expand Up @@ -269,7 +271,6 @@ This catches a number of potential errors in a model, such as having repeated va

If you wish to disable this you can pass `check_model=false` to `sample()`.


## Callbacks

The `callback` keyword argument can be used to specify a function that is called at the end of each sampler iteration.
Expand All @@ -279,6 +280,29 @@ If you are performing multi-chain sampling, `kwargs` will additionally contain `

The [TuringCallbacks.jl package](https://github.com/TuringLang/TuringCallbacks.jl) contains a `TensorBoardCallback`, which can be used to obtain live progress visualisations using [TensorBoard](https://www.tensorflow.org/tensorboard).

## Fixed transforms

Many of Turing's inference algorithms require the use of transformed variables to ensure that they are unconstrained.

By default, these transforms are determined dynamically each time the model is run.
The reason for this is because the transforms depend on the support of the distributions used, and this can in turn depend on the values of the parameters.
For example, `y` here has a transform that depends on `x`:

```{julia}
@model function demo_transform()
x ~ Exponential()
y ~ Uniform(0, x)
end
```

However, if you are certain that your model's variables always have a fixed support, you can pass the `fix_transforms=true` keyword argument to `sample()`.
This precomputes the transforms once at the start of sampling and reuses them for the rest of the sampling process.

Note however that while this sounds attractive from a performance perspective, it does not always lead to any noticeable difference!
For many distributions, the computation of the transform is often extremely fast and is comparable to looking up the stored transform.

Please see [the DynamicPPL documentation on fixed transforms](https://turinglang.org/DynamicPPL.jl/stable/fixed_transforms) for more information.

## Automatic differentiation

Finally, please note that for samplers which use automatic differentiation (e.g., HMC and NUTS), the AD type should be specified in the sampler constructor itself, rather than as a keyword argument to `sample()`.
Expand Down
269 changes: 269 additions & 0 deletions usage/vectorisation/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
---
title: Vectorisation of Turing models
engine: julia
---

```{julia}
#| echo: false
#| output: false
using Pkg;
Pkg.instantiate();
```

The 'native' data structure for a Turing model is [`VarNamedTuple`]({{< meta usage-varnamedtuple >}}), which is a mapping from `VarName`s to values.
For example, `rand(model)` returns a `VarNamedTuple`:

```{julia}
using Turing
Turing.setprogress!(false)

@model function demo()
x ~ Normal(0, 1)
y ~ Normal(x, 1)
end
model = demo()

vnt = rand(model)
```

and you can likewise use a `VarNamedTuple` to specify initial parameters to Turing's inference algorithms:

```{julia}
sample(model, MH(), 100; initial_params=InitFromParams(vnt));
```

Now, `VarNamedTuple` is a _rich_ data structure which faithfully retains names and types according to the structure of the model, which is why it is our default data structure.

However, many packages in the Julia ecosystem work with array inputs and outputs.
Indeed this is even the case for many of Turing's dependencies, such as AdvancedHMC, Optimization, and AdvancedVI: in these cases, Turing is responsible for performing the 'translation' between the `VarNamedTuple` and the vectorised format that these packages require.

This page describes how to convert a Turing model into a vectorised format such that it can be passed to these packages, and how to convert the outputs back into a `VarNamedTuple` format for analysis.
This is useful if you need to carry out more advanced analysis with a Turing model with packages that Turing does not have wrappers for.

## Introduction

Before we demonstrate any code, we will first explain conceptually what we need to actually accomplish this vectorisation.
Given a VarNamedTuple of parameter values, we need two pieces of information to flatten it into a vector:

1. How to convert each individual value into a vector; and

2. How to concatenate those vectors together into a single vector.

We accomplish this by specifying a *transform function* for each parameter, as well as a *range* for each parameter.
For example, conceptually, the `VarNamedTuple` above with `x` and `y` can be flattened with a representation that looks something like this.

(Please note that you don't have to define any of this yourself!
Turing has built-in functionality for this, which we'll cover in a moment.
This is just to illustrate the principle behind it.)

```{julia}
struct TransformPlusRange{F}
transform::F
range::UnitRange{Int}
end

tovec(f) = [f]
info = @vnt begin
x := TransformPlusRange(tovec, 1:1)
y := TransformPlusRange(tovec, 2:2)
end
```

(For more information on the `@vnt` macro, please see [the VarNamedTuple page]({{< meta usage-varnamedtuple >}}).)

To flatten the parameters, we can then do the following:

```{julia}
# Begin by allocating a vector of the appropriate length.
params = Vector{Float64}(undef, 2)

# Iterate through the original `VarNamedTuple`. For each key:
for (vn, value) in pairs(vnt)
vn_info = info[vn]
# Use the transform to vectorise the value
vectorised_value = vn_info.transform(value)
# Store it in the appropriate range of the allocated vector
params[vn_info.range] = vectorised_value
end

params
```

## `LogDensityFunction`

Turing abstracts all of this away for you by providing a struct, `LogDensityFunction`, which wraps a model plus all of the necessary information above for vectorisation.

:::{.callout-note}
## More details

In this page we will only cover a fairly high-level overview of the interface that `LogDensityFunction` exposes.
In particular, we assume that you simply want to construct one of them, pass it to some package, and retrieve the results back in a `VarNamedTuple` format.

There are many more details of working with `LogDensityFunction` that can be useful if you are, for example, developing your own inference algorithms to work with Turing. We refer the interested reader to [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/ldf/overview/) for more information on this.
:::

To construct a `LogDensityFunction`, you can simply pass a model:

```{julia}
@model f() = x ~ Dirichlet(ones(3))
model = f()

ldf = LogDensityFunction(model);
```

With this single-argument constructor, the vectorised representation of `x` will simply be itself (since `Dirichlet` is a multivariate distribution).
Now, `x` also has the property that its elements must be non-negative and sum to 1.
We can see this in action if we draw a randomised set of vectorised parameters:

```{julia}
rand(ldf)
```

By default, `LogDensityFunction` does not apply any extra transformations to the parameters, which is why the output above is still constrained to the simplex.
However, such constraints can cause problems for some algorithms which expect to work in unconstrained space.
To address this, you can instantiate a `LogDensityFunction` as follows:

```{julia}
using DynamicPPL

ldf_linked = LogDensityFunction(model, getlogjoint_internal, LinkAll());
```

Here:

- `LinkAll()` specifies that we want all parameters in the model to be transformed to unconstrained vectors.
- `getlogjoint_internal` says that the log-density we are interested in is the log-joint density, _with_ any Jacobian adjustments for the transformations used above.

When we sample from this `LogDensityFunction`, we can see that the output is now unconstrained.
In fact, the output is now a vector of length 2: since `x` must sum to 1, the third element of `x` is not needed since it can be recalculated from the first two elements.

```{julia}
rand(ldf_linked)
```

## The LogDensityProblems.jl interface

`LogDensityFunction` is not just a way to perform parameter value conversion: it also allows you to compute the log probability density associated with these parameters.

In particular, the `LogDensityFunction` struct is designed to obey [the LogDensityProblems.jl interface](https://www.tamaspapp.eu/LogDensityProblems.jl/stable/).
In particular, you can use the following functions:

```{julia}
using LogDensityProblems

LogDensityProblems.capabilities(ldf_linked)
```

```{julia}
N = LogDensityProblems.dimension(ldf_linked)
```

```{julia}
LogDensityProblems.logdensity(ldf_linked, randn(N))
```

## Automatic differentiation

In its current form, the `LogDensityFunction` does not natively know how to compute gradients of the parameters.
To enable integration with automatic differentiation (AD) packages, you can specify the `adtype` keyword argument:

```{julia}
ldf_ad = LogDensityFunction(model, getlogjoint_internal, LinkAll(); adtype=AutoForwardDiff());
```

:::{.callout-note}
## Gradient preparation

When constructing a `LogDensityFunction` with an `adtype`, gradient preparation (using DifferentiationInterface.jl) will be performed automatically.
This means that subsequent gradient calculations will be much faster, but also means that depending on the backend used, the construction of the `LogDensityFunction` may take a long time.
:::

This allows you to compute not only `logdensity`, but also `logdensity_and_gradient`:

```{julia}
# We now have first-order AD capabilities
LogDensityProblems.capabilities(ldf_ad)
```

```{julia}
LogDensityProblems.logdensity_and_gradient(ldf_ad, randn(N))
```

If you need more information on how to choose an AD backend, please see the [automatic differentiation page]({{< meta usage-automatic-differentiation >}}) for more information.

## Changing the log-density function

Most of the time, `getlogjoint_internal` is the appropriate kind of log-density function to use: it represents the log-joint density in the transformed space ('internal' is a slightly poor name, but it has stuck), and is what inference algorithms tend to expect.

However, there are also other options: for example `LogDensityFunction(model, getloglikelihood, ...)` will give you a log-density function that only computes the likelihood of the data given the parameters.

Please consult [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#DynamicPPL.LogDensityFunction) for more information on this.

## Using custom transforms

When instantiated, `LogDensityFunction` will automatically create its own mapping of `VarName`s to ranges and transforms (much like in the example above).
If you are happy with the default transforms, then you would never need to worry about this.
However, advanced users may wish to specify their own transforms.

In order to do this, you have to use a different constructor for `LogDensityFunction`, namely

```julia
LogDensityFunction(model, logdensity_function, ranges_and_transforms, sample_vec; adtype)
```

Please see the DynamicPPL API documentation for more information on these arguments.

## Conversions between `VarNamedTuple` and vectors

Finally, we will cover the functions that allow you to convert between `VarNamedTuple` and vector formats.

### From `VarNamedTuple` to vector

We already saw that we can draw random samples from a `LogDensityFunction` in vector format:

```{julia}
rand(ldf_ad)
```

By default, this draws from the prior of the model, using `InitFromPrior` (see [Sampling options]({{< meta usage-sampling-options >}}#specifying-initial-parameters) for more information on this).
If you want vectorised samples for a _specific_ set of parameters, you can provide an initialisation strategy that uses those parameters:

```{julia}
vnt = rand(model)
```

```{julia}
rand(ldf_ad, InitFromParams(vnt))
```

(Note that this does technically mean that the output is no longer truly 'random'.)

### From vector to `VarNamedTuple`

Now suppose that you have run an inference algorithm using a `LogDensityFunction` and it has given you back a set of 'best' parameters (as a vector, of course):

```{julia}
best_params = randn(LogDensityProblems.dimension(ldf_ad))
```

To convert this back into a `VarNamedTuple`, you have to reevaluate the model.
The easiest way to do so is to use `DynamicPPL.ParamsWithStats`:

```{julia}
pws = ParamsWithStats(best_params, ldf_ad)
```

This also helpfully computes the log-probabilities associated with this set of parameters (if you do not need this, you can disable it with `ParamsWithStats(...; include_log_probs=false)`).
You can then access `pws.params` to get the `VarNamedTuple` of parameters.
For example, to retrieve the original value of `x`:

```{julia}
pws.params[@varname(x)]
```

### More fine-grained control

These wrapper functions are designed for maximum ease of use.
However, as mentioned above, sometimes you may want more fine-grained control.
To do so, you should use the DynamicPPL API and in particular the `DynamicPPL.init!!` function.
This is covered in much more detail in [the DynamicPPL documentation](https://turinglang.org/DynamicPPL.jl/stable/ldf/overview/)!
Loading