Conversation
- Details are in the README - inversion of Manning field using different field representations:
- Constant
- Region-based
- Independent Points Scheme
- Free specification at all points in the domain (Hessian regularised)
- Also fix hanging in parallel when using weighting of stations in inversion examples
- Update examples accordingly
- The cost function scaling didn't make sense to be passed here and I think was possibly applied 2x so took it out here
- self.control_coeff_list does not get updated from initial values and is just for exporting purposes
- using weighting by station is essentially using NMSE instead of MSE, therefore you shouldn't weight elevation or velocity differently by default - warn if the user does - NMSE includes noise so you might have a smooth observed elevation signal and noisy velocity, so we do want to allow the user to weight it if they want to
- J_elev is ~ 1e-6 * J_vel, would need a better example for both to play a role
- should really add functionality to monitor more than just the observation parameter during optimisation
|
The only issue with the current code is that for real-world applications, we might want to make some datum corrections (as per this publication (Section 3.2) and in this code) which means we need to assess the cost function at the end of the forward run rather than at every timestep - I've got a branch for this as it's an easy fix but I would do this as a follow-up PR. |
stephankramer
left a comment
There was a problem hiding this comment.
Just some comments/queries. Happy for this to go in as is as well
| (f"[{fd.COMM_WORLD.rank}] ERROR: Check for NaNs. Found non-finite variances of " | ||
| f"observation data: {var.dat.data[:]}") | ||
| self.sta_manager.station_weight_0d.interpolate(1 / var) | ||
| # in parallel some mesh partitions will have no stations so we need a conditional to avoid div by 0 |
There was a problem hiding this comment.
Why would that be a problem? Presumably local partitions with no stations, this expression isn't evaluated at all? The var is a variance per station, on a VOM, I think is that right? So in what location/node would we have var==0 where that conditional would apply?
| J = som.eval_cost_function(t) # individual manager will assert initialized | ||
| components[name] = float(J) | ||
| J_total += J | ||
| return J_total, components |
There was a problem hiding this comment.
Why is this returning a components dictionary? The reason I ask is that the only place it seems to expect a second output from eval_cost_function() is in a place where it explicitly checks for isinstance(sta_manager, MultiStationObservationManager) to deal with that - but then it actually throws away the extra components output.
Generally speaking having (many) of these
if isinstance(obj, ClassVersionA):
# call obj.method is some way
else:
# call obj.method in slightly different way because the API is different
are a bit of a code smell. If you want to treat both StationObservationManager and MultiStationObservationManager as the same kind of thing in part of your code, it's better to think of a well defined common API, maybe using an abstract StationObservationManagerBase that both inherit from. It's fine to then extend beyond that common API, e.g. you could have a different named evaluation method, that does return a component dictionary - but the methods in the common API should have the same (required) input arguments and the same outputs.
Currently, the inversion tools are set up to deal with either a scalar or vector quantity (typically elevation or velocity) but we might want to optimise for both simultaneously.
For real cases this is a useful way of preventing control values heading to the extremes where one type of data is sparse. For example, near coastlines we might have some ADCP data, but in the open sea we're more likely to have elevation gauges. I have found that friction values can become exceedingly large in deeper areas which is reflected in gauge harmonic amplitudes.
TODO: