Skip to content

Commit b83228e

Browse files
committed
Initial version of gaus_hit_finder example
1 parent 1bc83d5 commit b83228e

30 files changed

Lines changed: 6133 additions & 0 deletions

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ find_package(TBB REQUIRED)
88
find_package(spdlog REQUIRED)
99
find_package(phlex REQUIRED)
1010

11+
add_subdirectory(gaus_hit_finder)
12+
1113
# Proxies for framework-agnostic experiment libraries
1214
# N.B. The specified library type should be SHARED.
1315
add_library(my_add SHARED my_add.cpp)

gaus_hit_finder/CMakeLists.txt

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Proxies for framework-agnostic experiment libraries
2+
# N.B. The specified library type should be SHARED.
3+
add_library(find_hits_with_gaussians SHARED
4+
find_hits_with_gaussians.cpp
5+
copied_from_larsoft_minor_edits/Hit.cxx
6+
copied_from_larsoft_minor_edits/geo_types.cxx
7+
copied_from_larsoft_minor_edits/CandHitStandard.cxx
8+
copied_from_larsoft_minor_edits/PeakFitterMrqdt.cxx
9+
copied_from_larsoft_minor_edits/HitFilterAlg.cxx
10+
copied_from_larsoft_minor_edits/MarqFitAlg.cxx
11+
copied_from_larsoft_minor_edits/Wire.cxx
12+
wire_serialization.cpp
13+
print_hits.cpp
14+
)
15+
target_link_libraries(find_hits_with_gaussians PRIVATE TBB::tbb fmt::fmt)
16+
17+
# Register providers for data products
18+
add_library(wires_source MODULE wires_source.cpp)
19+
target_link_libraries(wires_source PRIVATE phlex::module find_hits_with_gaussians)
20+
21+
# Register experiment libraries with Phlex
22+
add_library(find_hits_with_gaussians_hof MODULE register_find_hits_with_gaussians.cpp)
23+
target_link_libraries(find_hits_with_gaussians_hof PRIVATE phlex::module find_hits_with_gaussians)

gaus_hit_finder/README.md

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Purpose of this directory
2+
3+
This directory holds an example showing how to convert a
4+
module that works with the `art` framework into an
5+
algorithm that works with the `phlex` framework. This
6+
particular example converts the module defined in
7+
the file GausHitFinder_module.cc from the LArSoft larreco
8+
repository.
9+
10+
This is only an example and a prototype to be used in
11+
testing. It is not intended to be the version of this
12+
module used by DUNE or any other experiment
13+
for production or physics. That belongs in some other
14+
repository.
15+
16+
This example is based on version v0.1.0 of `phlex`. It will need
17+
some minor modifications to work with the newest version of `phlex`.
18+
The migration to phlex started with version of GausHitFinder_module.cc
19+
in v10_05_00 of larreco which was the version in use with DUNE software
20+
at the time the migration work was started.
21+
22+
# Work in Progress
23+
24+
This is work in progress. It is not final. It is intended
25+
that this example will change as we learn more about using
26+
`phlex` and as we receive feedback from DUNE. Currently, this
27+
is more like a rough first draft of our first attempts
28+
to use `phlex`. Please don't expect stability. Many features
29+
of `phlex` have not been implemented and the design of `phlex`
30+
itself is changing. Those changes will also cause this example
31+
to change.
32+
33+
As we study this example, it will guide how phlex is used.
34+
As we run tests, it may point to problems in phlex and this
35+
may guide which areas developers should focus on to fix
36+
issues and remove performance bottlenecks.
37+
38+
# Files in this directory
39+
40+
## Interesting files that are the heart of the example
41+
42+
1. find_hits_with_gaussians.hpp
43+
2. find_hits_with_gaussians.cpp
44+
3. register_find_hits_with_gaussians.cpp
45+
4. test_find_hits_with_gaussians.jsonnet
46+
47+
## These files exist only for test purposes
48+
49+
There is one function that will print out reconstructed hits.
50+
This printout can be used to compare results between an
51+
`art` process and a `phlex` process running GausHitFinder.
52+
There are functions that can be used to persistently store
53+
the input from an `art` process (a vector of `Wire`) so
54+
it can be used in the `phlex` process. And there is a `phlex` provider
55+
to read this input. This persistence mechanism is necessary
56+
because there is not an alternative yet. These files are all
57+
temporary and not part of the example.
58+
59+
1. print_hits.hpp
60+
2. print_hits.cpp
61+
3. wire_serialization.hpp
62+
4. wire_serialization.cpp
63+
5. wires_source.cpp
64+
65+
## Files copied from LArSoft
66+
67+
The infrastructure does not yet exist to link the
68+
code here with LArSoft code or include headers.
69+
Necessary files are copied in with minimal changes.
70+
Some changes are necessary because the files are in
71+
a different location and some remove unwanted dependences,
72+
but the modifications are small. When it is possible
73+
to depend on LArSoft code in some other repository,
74+
these files should be deleted to remove duplication.
75+
76+
These files are in the subdirectory:
77+
78+
```phlex-examples/gaus_hit_finder/copied_from_larsoft_minor_edits```
79+
80+
# Where more work is needed
81+
82+
At the moment, this is implemented as a single `phlex`
83+
transform. Does it make sense to split it with unfolds
84+
and folds? Does it make sense to have multiple transforms?
85+
We don't know the answers to these questions yet.
86+
Getting the single transform to work is a good first step.
87+
88+
The next thing we plan to work on is creating a second version
89+
of the migrated module. We will try replacing the
90+
outer `parallel_for` with an `unfold`, `transform`, and `fold`
91+
sequence of algorithms. After that we will create other versions
92+
pushing the division into separate algorithms to lower levels.
93+
Then we plan to run tests and compare the different versions.
94+
95+
`Phlex` will not support the `Tools` feature that existed in
96+
`art`. One possibility is that algorithm nodes scheduled
97+
by `tbb::flow_graph` offer sufficient configurability that we
98+
don't need a separate plugin system like `Tools`.
99+
Another possibility is that we eventually implement a
100+
plugin system for `phlex`. For now, I replaced the two `Tool`
101+
types used in `GausHitFinder` with direct instantiations of one
102+
of the plugin types. I instantiated a vector of objects of type
103+
`CandHitStandard` and one object of type `PeakFitterMrqdt`. They
104+
needed some modification but are as close as reasonably possible
105+
to the original versions.
106+
107+
I do not know yet how to deal with the fact that `GausHitFinder`
108+
can be configured to produce 1 or 2 output data products.
109+
For now, the example can only produce one `std::vector<recob::Hit>`.
110+
It's configurable whether to filter hits or not, but it cannot
111+
produce both outputs at the same time.
112+
113+
For now, this ignores the fact that `GausHitFinder` also can
114+
produce `art::Assns<recob::Wire, recob::Hit>` and
115+
`art::Assns<raw::RawDigit, recob::Hit>`. In the `art` version,
116+
it is configurable whether to produce those or not.
117+
We are ignoring this because `phlex` does not yet support `art::Assns`
118+
or whatever will be used to replace that type. For now, the
119+
example can only produce one `std::vector<recob::Hit>`.
120+
121+
Here is a list of the files that were used as a basis
122+
for this example, but significantly modified from the
123+
LArSoft version:
124+
125+
```
126+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/GausHitFinder_module.cc
127+
```
128+
129+
Some files were copied with minimal modifications. The "tool"
130+
classes were split into .h and .cxx files and are no longer implemented
131+
as `art Tools`, but otherwise the changes are minimal.
132+
Here is a list of the files that were copied from LArSoft
133+
with minimal modifications:
134+
135+
```
136+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFinderTools/CandHitStandard_tool.cc
137+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFinderTools/PeakFitterMrqdt_tool.cc
138+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFilterAlg.h
139+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFilterAlg.cxx
140+
https://github.com/LArSoft/lardataobj/blob/develop/lardataobj/RecoBase/Wire.h
141+
https://github.com/LArSoft/lardataobj/blob/develop/lardataobj/RecoBase/Wire.cxx
142+
https://github.com/LArSoft/lardataobj/blob/develop/lardataobj/RecoBase/Hit.h
143+
https://github.com/LArSoft/lardataobj/blob/develop/lardataobj/RecoBase/Hit.cxx
144+
https://github.com/LArSoft/larcoreobj/blob/develop/larcoreobj/SimpleTypesAndConstants/geo_types.h
145+
https://github.com/LArSoft/larcoreobj/blob/develop/larcoreobj/SimpleTypesAndConstants/geo_types.cxx
146+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFinderTools/ICandidateHitFinder.h
147+
https://github.com/LArSoft/larcoreobj/blob/develop/larcoreobj/SimpleTypesAndConstants/RawTypes.h
148+
https://github.com/LArSoft/lardataobj/blob/develop/lardataobj/Utilities/sparse_vector.h
149+
https://github.com/LArSoft/larvecutils/blob/develop/larvecutils/MarqFitAlg/MarqFitAlg.h
150+
https://github.com/LArSoft/larvecutils/blob/develop/larvecutils/MarqFitAlg/MarqFitAlg.cxx
151+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFinderTools/ICandidateHitFinder.h
152+
https://github.com/LArSoft/larreco/blob/develop/larreco/HitFinder/HitFinderTools/IPeakFitter.h
153+
```
154+
155+
For now, the part of `GausHitFinder` that fills 2 histograms is removed.
156+
Why? Because `phlex` does not yet have a histogramming facility like `TFileService`
157+
yet. Also those histograms were booked in `beginJob`. That entire method
158+
was removed. Booking the histograms was the only thing that was done in `beginJob`.
159+
I'm not sure how to support `beginJob` activity in `phlex` either.
160+
161+
Dropped `fEventCount`. It was not used. Seems like not a good thing
162+
to keep track of in a `phlex` algorithm anyway. Reconstruction algorithms
163+
should not depend on event count. Also removed the related count argument
164+
from the findHitCandidates method of ICandidateHitFinder and its implementations.
165+
It was not used in CandHitStandard anyway.
166+
167+
MessageLogger calls are commented out for now but not deleted. Eventually
168+
we will want to replace those with calls to whatever logging system `phlex`
169+
eventually uses or delete those lines entirely.
170+
171+
For now, calls to the Geometry service are commented out but not deleted.
172+
The current plan is to replace those with a `phlex` provider that provides the geometry
173+
data as data products and these providers will be scheduled by the flow graph.
174+
In the main algorithm the geometry is used to get the plane number for each
175+
wire. For now, I just hard coded the plane number to 0. The signal type in
176+
the `recob::Hit` objects is also from the geometry and for now always set
177+
to 0. Similarly, the `WireID` in the `recob::Hit` objects is from the geometry
178+
and for now set to a default value. `FillOutHitParameterVector` depends
179+
on the geometry also, but it only checks the size of a configuration
180+
vector against the number of planes. If this is skipped but the
181+
configuration vector is sized to match the number of planes, then things
182+
should still work ok. For now, I just skip FillOutHitParameterVector.
183+
184+
I copied in the content of the `TMath::Gaus` function from ROOT to
185+
avoid depending on ROOT in `phlex` code for now (slightly edited).
186+
Are we allowed to depend on ROOT in `phlex` algorithm code? That could
187+
easily be restored if the build system allows it.
188+
189+
# Testing
190+
191+
This was tested by running `GausHitFinder` with the `art` framework and running the modified `GausHitFinder` with the `phlex` framework. The two processes used identical input (the `std::vector<Wire>`). The output `std::vector<recob::Hit>` objects were compared using a text file printed during each process. The output was identical except for the following:
192+
193+
Two output data members of `Hit` were ignored because they depend on the `Geometry` and that is not implemented yet for `phlex`. In the `phlex` version, these data members are filled with default values.
194+
```
195+
geo::SigType_t fSignalType; ///< signal type for the plane of the hit
196+
geo::WireID fWireID; ///< WireID for the hit (Cryostat, TPC, Plane, Wire)
197+
```
198+
199+
The `phlex` process was run with multithreading and that causes the order of `Hit` objects to vary from one execution to the next and also the order of events to vary. In the comparison the `Hit` objects were sorted and we had to be careful to compare matching events.
200+
201+
Contact the `phlex` group if you are interested in details related
202+
to running `art` side of this test or want the input files (or
203+
to regenerate them).
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
#include <algorithm>
2+
#include <iterator>
3+
4+
#include "CandHitStandard.h"
5+
6+
namespace examples {
7+
8+
CandHitStandard::CandHitStandard(const CandHitStandardCfg& cfg)
9+
: fRoiThreshold(cfg.fRoiThreshold) {}
10+
11+
void CandHitStandard::findHitCandidates(
12+
const recob::Wire::RegionsOfInterest_t::datarange_t& dataRange,
13+
const size_t roiStartTick,
14+
const size_t channel,
15+
HitCandidateVec& hitCandidateVec) const
16+
{
17+
// Recover the actual waveform
18+
const Waveform& waveform = dataRange.data();
19+
20+
// Use the recursive version to find the candidate hits
21+
findHitCandidates(waveform.begin(), waveform.end(), roiStartTick, hitCandidateVec);
22+
}
23+
24+
void CandHitStandard::findHitCandidates(std::vector<float>::const_iterator startItr,
25+
std::vector<float>::const_iterator stopItr,
26+
const size_t roiStartTick,
27+
HitCandidateVec& hitCandidateVec) const
28+
{
29+
// Need a minimum number of ticks to do any work here
30+
if (std::distance(startItr, stopItr) > 4) {
31+
// Find the highest peak in the range given
32+
auto maxItr = std::max_element(startItr, stopItr);
33+
34+
float maxValue = *maxItr;
35+
int maxTime = std::distance(startItr, maxItr);
36+
37+
if (maxValue > fRoiThreshold) {
38+
// backwards to find first bin for this candidate hit
39+
auto firstItr = std::distance(startItr, maxItr) > 2 ? maxItr - 1 : startItr;
40+
41+
while (firstItr != startItr) {
42+
// Check for pathology where waveform goes too negative
43+
if (*firstItr < -fRoiThreshold) break;
44+
45+
// Check both sides of firstItr and look for min/inflection point
46+
if (*firstItr < *(firstItr + 1) && *firstItr <= *(firstItr - 1)) break;
47+
48+
firstItr--;
49+
}
50+
51+
int firstTime = std::distance(startItr, firstItr);
52+
53+
// Recursive call to find all candidate hits earlier than this peak
54+
findHitCandidates(startItr, firstItr + 1, roiStartTick, hitCandidateVec);
55+
56+
// forwards to find last bin for this candidate hit
57+
auto lastItr = std::distance(maxItr, stopItr) > 2 ? maxItr + 1 : stopItr - 1;
58+
59+
while (lastItr != stopItr - 1) {
60+
// Check for pathology where waveform goes too negative
61+
if (*lastItr < -fRoiThreshold) break;
62+
63+
// Check both sides of firstItr and look for min/inflection point
64+
if (*lastItr <= *(lastItr + 1) && *lastItr < *(lastItr - 1)) break;
65+
66+
lastItr++;
67+
}
68+
69+
int lastTime = std::distance(startItr, lastItr);
70+
71+
// Now save this candidate's start and max time info
72+
HitCandidate hitCandidate;
73+
hitCandidate.startTick = roiStartTick + firstTime;
74+
hitCandidate.stopTick = roiStartTick + lastTime;
75+
hitCandidate.maxTick = roiStartTick + firstTime;
76+
hitCandidate.minTick = roiStartTick + lastTime;
77+
hitCandidate.maxDerivative = *(startItr + firstTime);
78+
hitCandidate.minDerivative = *(startItr + lastTime);
79+
hitCandidate.hitCenter = roiStartTick + maxTime;
80+
hitCandidate.hitSigma = std::max(2., float(lastTime - firstTime) / 6.);
81+
hitCandidate.hitHeight = maxValue;
82+
83+
hitCandidateVec.push_back(hitCandidate);
84+
85+
// Recursive call to find all candidate hits later than this peak
86+
findHitCandidates(lastItr + 1,
87+
stopItr,
88+
roiStartTick + std::distance(startItr, lastItr + 1),
89+
hitCandidateVec);
90+
}
91+
}
92+
}
93+
94+
void CandHitStandard::MergeHitCandidates(const recob::Wire::RegionsOfInterest_t::datarange_t&,
95+
const HitCandidateVec& hitCandidateVec,
96+
MergeHitCandidateVec& mergedHitsVec) const
97+
{
98+
// If no hits then nothing to do here
99+
if (hitCandidateVec.empty()) return;
100+
101+
// The idea is to group hits that "touch" so they can be part of common fit, those that
102+
// don't "touch" are fit independently. So here we build the output vector to achieve that
103+
HitCandidateVec groupedHitVec;
104+
int lastTick = hitCandidateVec.front().stopTick;
105+
106+
// Step through the input hit candidates and group them by proximity
107+
for (const auto& hitCandidate : hitCandidateVec) {
108+
// Check condition that we have a new grouping
109+
if (int(hitCandidate.startTick) - lastTick > 1) {
110+
mergedHitsVec.emplace_back(groupedHitVec);
111+
112+
groupedHitVec.clear();
113+
}
114+
115+
// Add the current hit to the current group
116+
groupedHitVec.emplace_back(hitCandidate);
117+
118+
lastTick = hitCandidate.stopTick;
119+
}
120+
121+
// Check end condition
122+
if (!groupedHitVec.empty()) mergedHitsVec.emplace_back(groupedHitVec);
123+
}
124+
}

0 commit comments

Comments
 (0)