Skip to content

2nd Order Methods #19

@max-vassili3v

Description

@max-vassili3v

Suppose we have a function $f: \mathbb{R^n} \to \mathbb{R}$. Using dual numbers we can input a dual vector $x + J\epsilon$ and get out a dual number. This induces a function $g: \mathbb{R}^n \to \mathbb{R}^n$ defined as $g(x) \to \text{Dual parts of }f(x + J\epsilon)$. If we now input $y + I\epsilon'$ into $g$, where $\epsilon_i\epsilon'_j \neq 0$ for all $1 \leq i, j \leq n$, the $\epsilon_i\epsilon'_j$ terms give us the Hessian coefficients.

In application, if we allow definitions DualVector(::DualVector, ::AbstractMatrix) and Dual(::Dual, ::DualVector), this should have the same effect, and would open up discussion or examples concerning 2nd order methods in DualArrays.jl.

Example:

  • f(x) = x[1] * x[2]
  • g(x) = f(DualVector(x, I(length(x)))).partials
  • d = DualVector([1, 2], I(2))
  • In order to evaluate g(d), we start with a DualVector(d, I(2))
  • We then pass through f giving us Dual(d[1], [1, 0]) * Dual(d[2], [0, 1])
  • The product rule makes this evaluate to Dual(d[1] * d[2], [d[2], d[1]]) (Note: we can overload operations such that [d[2], d[1]] = d[2] * [1,0] + d[1] * [0,1] returns a DualVector. In general, for the latter argument to be a DualVector instead of a Vector{Dual} operations between Dual and AbstractVector also need to be overloaded)
  • The jacobian of [d[2], d[1]] is actually a matrix of nested dual parts, so this is our Hessian. We verify from the definition of d that this is the correct Hessian, [0 1;1 0].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions