You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even with multiprocessing working, there remains a critical operation to safely separate segment and ingestion loop working in parallel with each other - catching up the new estimator to the current frame (or the frames that accrued between starting NMF slicing and updating assets)
potential points of race condition / frame_idx mismatch include:
NMF: uses residual. since we're using a single deepcopy for the entire thing, it shouldn't cause too much issue?
Catalog: uses footprints and traces. traces especially gets live updated, potentially flushing out the frame_idx that the new estimators from NMF are from. this means the quality test (trace correlation) needs to be handled (a) in a thread-safe manner and (b) possibly using zarr.
Update: uses all assets. new_footprints and overlaps just get appended, so no biggie. new_traces will have ??? values where the new frames have been accrued in ingestion loop without the new estimators. same with sufficient_statistics. some of these values cannot be nan since it gets used to predict next frame_idx values. residuals should get an update here, where clearing new estimator areas to zero occurs - however, this is risky, since we may now be removing overlapping estimators from residuals. maybe i should track overlapping frame_idx and only clear those? buffer gets only used here with frame_summary. i'll just make it an attr for FrameSummary.
Scenario (turn this into a sequence diagram)
sequenceDiagram
participant I as Ingest
participant Asset@{ "type" : "queue" }
participant S as Segment
I <<->> +Asset: Frame 0-99
Asset ->> S: NMF (residual)
I <<->> Asset: Frame 100-149
Asset ->> S: Catalog (footprints, traces)
I <<->> Asset: Frame 150-200
Asset <<->> S: Update (all)
Loading
Residual needs to take async into account when updating (the “new” cells may not be from the immediate previous epoch anymore)
Residual needs to be cleared every time while segment is run - otherwise we run the risk of duplicate detection
Stage 1: we have gathered 100 frames
Stage 2:
segment: detect 3
ingest: get 101-150th frames
Stage 3:
segment: catalog needs fp and tr
now we gotta grab 0-100th frames for traces that were flushed to zarr (for comparison with new)
ingest: get 151-200th frames
Stage 4:
segment: update fp, tr, stats, overlaps
fp can be updated fine
tr has a bunch of nan’s. might as well start as nans
probably ok since by the time an overlapping cell is detected, the new traces would have accrued the same number of epochs
the latest value however has to be a real number since the trace_ingestion uses it as a starting value
0?
cc: just start with zeros…? if we don’t wanna deal with zarr-loading older traces
cy: buffer needs to be not updated the whole segment loop otherwise frames 0-100 are not in buffer anymore!
or it keeps accruing and gets cleared after each detect. maybe it should be a class attribute then.
ingest: locked out
Alternative
... or something else we could do - keep the ingestion loop accruing frame_update results within thread, and only update assets with segment. (or opposite, copy the entire asset into segment loop)
this circumvents possibly having to load zarr in segment in case segment takes a long time and frames get flushed (in catalog and update)
“catching up” the new estimators to the latest frame still is an issue
Description
Even with multiprocessing working, there remains a critical operation to safely separate segment and ingestion loop working in parallel with each other - catching up the new estimator to the current frame (or the frames that accrued between starting NMF slicing and updating assets)
potential points of race condition / frame_idx mismatch include:
residual. since we're using a single deepcopy for the entire thing, it shouldn't cause too much issue?footprintsandtraces.tracesespecially gets live updated, potentially flushing out the frame_idx that the new estimators from NMF are from. this means the quality test (trace correlation) needs to be handled (a) in a thread-safe manner and (b) possibly using zarr.new_footprintsandoverlapsjust get appended, so no biggie.new_traceswill have ??? values where the new frames have been accrued iningestion loopwithout the new estimators. same withsufficient_statistics. some of these values cannot benansince it gets used to predict next frame_idx values.residualsshould get an update here, where clearing new estimator areas to zero occurs - however, this is risky, since we may now be removing overlapping estimators from residuals. maybe i should track overlapping frame_idx and only clear those?buffergets only used here withframe_summary. i'll just make it an attr forFrameSummary.Scenario (turn this into a sequence diagram)
sequenceDiagram participant I as Ingest participant Asset@{ "type" : "queue" } participant S as Segment I <<->> +Asset: Frame 0-99 Asset ->> S: NMF (residual) I <<->> Asset: Frame 100-149 Asset ->> S: Catalog (footprints, traces) I <<->> Asset: Frame 150-200 Asset <<->> S: Update (all)segmentis run - otherwise we run the risk of duplicate detectiontracesthat were flushed to zarr (for comparison withnew)trace_ingestionuses it as a starting valueAlternative
... or something else we could do - keep the ingestion loop accruing
frame_updateresults within thread, and only update assets with segment. (or opposite, copy the entire asset into segment loop)