`AbstractVarInfo` interface

There are many aspects that make `VarInfo` a very complex data structure:

1. `VarInfo` really consists of multiple mappings: `Metadata` maps variables to their values,
   their distributions, their range in the underlying vector, and additional metadata such as flags,
   GIDs, and orders.  Additionally, there are scalars `logp` and `num_produce` associated with the
   whole thing.
2. These mappings are not normal dictionaries.  First, the order of variables needs to be preserved
   (would an ordered dictionary be enough for that?).  Second, keys are themselves hierarchical,
   since they contain index values, and we need to be able to handle "sub-keys": get out, say,
   `x[1:3]` from the entry `x`.
3. The mapping between keys and values gets complicated even more, since some of the values need to
   be transformed between the distribution supports and Euclidean space.  This is currently
   implemented by special logic in some getters/setters, which automatically call `link`/`invlink`.

Correct me if I'm missing something.

Now, it seems to me that a sensible separation of concerns would involve a separate interface of the
dictionary part, taking care of the mappings and the "sub-key" handling.  Then the actual `VarInfo`
can just wrap this special dictionary (or several thereof), and add the additional metadata and
special logic to it.  Then different implementation of the dictionary essentially replace the
different kinds of `Metadata` that separate `UntypedVarInfo` from `TypedVarInfo`.

I haven’t really worked with traits yet, but they might also have some advantages.  For example, the
recently added `ThreadSafeVarInfo` is really only adding a thin layer to add an orthogonal concept
to the other variants.


## <strike>Dictionary</strike> Trie Interface

Should describe an ordered mapping from `VarName` to whatever: actual values (`<:Real` or
`<:AbstractArray{<:Real}`?), distributions, flags, etc. – but only one such mapping at a time.

- Pure functions: 
  - `iterate`, yielding pairs of `VarName` and the stored value
  - `IteratorEltype == HasEltype()`, `IteratorSize = HasLength()`
  - `keys`, `values`, `pairs`, `length` consistent with `iterate`
  - `eltype`, `keytype`, `valuetype`
  - `get`, `getindex`, `haskey` for indexing by `VarName`
  - `merge` to join two VarInfos
- Mutating functions:
  - `push!`, `merge!` to add and join elements
  - `setindex!`
  - `empty!`, `delete!`
- Conversions:
  - To array, for vectorization/linearization of values.  Could be a method of `Array`, or `copyto!`.
  - To (nested?) named tuple
- Trie-specific functions:
  - Search for matching keys
  - Extract subkeys

David has argued that linearized storage destroys shape information.  I think that is does not need
to: one could have one big array, with reshaped views into it – scalars would then just be
zero-dimensional views (akin to `Ref`s).  On the other hand, a tree-like implementation could have some other advantages, both with respect to ease of implementation and behaviour.

Subsumption, the issue of matching "sub-keys", can behave in different ways depending on the value
type.  For sampled values, using sub-keys should always work: when a value of a vector-of-vectors
`x` is stored, then `x[1][1]` must be a valid key.  However, this does not need to be the case for
distributions: when `S` comes from a matrix valued distribution, then `S[1,1]` is not necessarily
available.

Some pain points from the current implementation: vectorized access, shape preservation, association
of names and values.

Related: Gen's [`ChoiceMap`](https://www.gen.dev/dev/ref/choice_maps/)


## `VarInfo` proper

- `getlogp`, `setlogp!`, `resetlogp!`, and `acclogp!` for handling log probabilities
- `gen_num_produce`, `set_num_produce`, `increment_num_produce`, `reset_num_produce`; although these
  are specific to particle samplers, and would ideally be handled by them (maybe with a custom
  context in Turing?).  A similar argument holds for the order information and certain flags.
- Linking: Since I haven't yet written any HMC samplers or the like, I unfortunately don't know how
  feasible/practical it is to make most of this more explicit.  Mohamed, in a [recent
  PR](https://github.com/TuringLang/DynamicPPL.jl/pull/188/files), has introduced methods for the
  getters/setters that take the distribution as additional argument to figure out the right thing:
  
  ```julia
  vi[vn, dist]
  ```

  Since that indexing in Julia supports keyword arguments, even somthing like the following would be possible:

  ```julia
  vi[vn, link=dist]
  ```
  
  The other getters and setters could then use the same kind of syntax.
  
  Then, there would also have to be (I guess?) additional methods for the conversion methods, to
  perform bijections for the array of vectorized values. 
  
  Also, `link!`/`invlink!` for the complete mapping – is that necessary?
- Probably some resemblance of many of the old getters and setters: `getidx`, `getdist`, `getrange`,
  `getall`, `getval`, `getvals`, `getinds`, `syms`, `getgid`, flags.  Hopefully some of them will
  become redundant.
- There should be one well-defined way to build up a VarInfo object.  Perhaps an empty one + muliple
  calls to `push!`.  What about `push!!`, with the possibility of immutable implementations?
  Alternatively: based on `merge`ing?  Maybe `merge` can be general enough to cover both
  construction of all complex VarInfos, and producing `MixedVarInfo`s when required.
  
  
## `VarName`

This is working pretty well in its current form, but the logic for subsumption could be defined a
bit more explicitely.  Some helper methods are certainly needed in order to properly support
subsumption in the dictionary interface, and would become useful elsewhere.

In the long term: what about more complex things, like `getproperty`-based variables?


## Generated quantities

Different PPLs handle these in different ways – one alternative is to include them in the trace
(e.g., Soss.jl now uses a nested tuple).  If this is considered useful, it needs to included in the
interface.


## Related work/discussions

- https://github.com/TuringLang/DynamicPPL.jl/issues/68
- Mohamed’s refactoring: https://github.com/TuringLang/DynamicPPL.jl/pull/188/files
- Collection of gripes and anti-gripes: https://github.com/TuringLang/DynamicPPL.jl/issues/119
- Splitting up code, refactoring: https://github.com/TuringLang/DynamicPPL.jl/issues/118
- Non-real values: https://github.com/TuringLang/DynamicPPL.jl/issues/107
- Some older gripes: https://github.com/TuringLang/DynamicPPL.jl/issues/5
- Older interface proposal: https://github.com/TuringLang/DynamicPPL.jl/issues/7
- One more interface discussion: https://github.com/TuringLang/DynamicPPL.jl/issues/16
- Generated quantities: https://github.com/TuringLang/DynamicPPL.jl/issues/94

CC @yebai, @cpfiffer, @devmotion, @mohamed82008 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AbstractVarInfo` interface #1

Dictionary Trie Interface

`VarInfo` proper

`VarName`

Generated quantities

Related work/discussions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AbstractVarInfo interface #1

Description

Dictionary Trie Interface

VarInfo proper

VarName

Generated quantities

Related work/discussions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`AbstractVarInfo` interface #1

`VarInfo` proper

`VarName`