Issue
Currently the API for predictions can be done by predict
|
function predict( |
|
emulator::Emulator{FT}, |
|
new_inputs::AM; |
|
transform_to_real = false, |
|
mlt_kwargs..., |
|
) where {FT <: AbstractFloat, AM <: AbstractMatrix} |
with transform_to_real=false by default, this is neatly how the MCMC can obtain both the posterior mean $$G(u)$$, and the estimated observational noise $$\Gamma(u)$$, for use in the MCMC objective (here a negative log likelihood):
$$
L(u, y) = (y-G(u))^\top \Gamma(u)^{-1} (y-G(u)) + \log\det(\Gamma(u))
$$
Thus y_pred, y_cov = predict(emulator, new_inputs) is not returning the predicted (GP mean and covariance $$C(u)$$) pair, but in fact the (GP mean, and the predicted observational covariance $$\Gamma(u)$$) pair.
These are typically related in the encoded space as $$\Gamma(u) = C(u) + I$$. But often users also call with transform_to_real=true to get a prediction, and here this harder to disaggregate the additional effect of our (linear encoder E), so $$C(u)$$ is only obtainable as y_cov - E*E' calling our data encoder.
This i think would result in users interpreting the obs covariances as the posterior covariances, and therefore will often see something overly broad in their interpretation.
Possible Solution - change in what predict returns.
I think perhaps we should default to predict always giving the the prediction covariance, with the observational noise variant now being called only internally. (used by MCMC).
Current fix until issue is resolved
If you have called the following
y_pred, y_cov = predict(emulator, new_inputs, transform_to_real=true,...)
then to get the "right" cov. This is exact if the output decoder is
y_cov_new = [similar(yc) for yc in y_cov] # create a new container
encoder_schedule = get_encoder_schedule(emulator)
for (yc_new, yc) in zip(y_cov_new, y_cov)
yc_new .= Matrix(decode_with_schedule(
encoder_schedule,
Matrix(encode_with_schedule(encoder_schedule, yc, "out")) - I,
"out"),
)
end
Issue
Currently the API for predictions can be done by
predictCalibrateEmulateSample.jl/src/Emulator.jl
Lines 237 to 242 in c011a05
with$$G(u)$$ , and the estimated observational noise $$\Gamma(u)$$ , for use in the MCMC objective (here a negative log likelihood):
transform_to_real=falseby default, this is neatly how the MCMC can obtain both the posterior meanThus$$C(u)$$ ) pair, but in fact the (GP mean, and the predicted observational covariance $$\Gamma(u)$$ ) pair.$$\Gamma(u) = C(u) + I$$ . But often users also call with $$C(u)$$ is only obtainable as
y_pred, y_cov = predict(emulator, new_inputs)is not returning the predicted (GP mean and covarianceThese are typically related in the encoded space as
transform_to_real=trueto get a prediction, and here this harder to disaggregate the additional effect of our (linear encoderE), soy_cov - E*E'calling our data encoder.This i think would result in users interpreting the obs covariances as the posterior covariances, and therefore will often see something overly broad in their interpretation.
Possible Solution - change in what
predictreturns.I think perhaps we should default to
predictalways giving the the prediction covariance, with the observational noise variant now being called only internally. (used by MCMC).Current fix until issue is resolved
If you have called the following
then to get the "right" cov. This is exact if the output decoder is