I apologise for the National Enquirer-style highlight... the truth is that I am not sure that the move constructor copies, and I am sure this is no more than a sizeable contribution to ICARUS production choking.
In investigating the large memory usage of the ICARUS production reconstruction job, I have found a curious memory spike, that Valgrind tracks down to art::Event::put() for an association object which is large (~2 million pairs).
Digging into the business, I ended with a debugging trace as follows:
#0 std::vector<std::pair<art::Ptr<recob::Hit>, art::Ptr<recob::SpacePoint> >, std::allocator<std::pair<art::Ptr<recob::Hit>, art::Ptr<recob::SpacePoint> > > >::vector (__x=..., this=<optimized out>)
at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v12_1_0/Linux64bit+3.10-2.17/include/c++/12.1.0/bits/stl_vector.h:598
#1 art::Assns<recob::Hit, recob::SpacePoint, void>::Assns (this=0x1ed437e0)
at /cvmfs/larsoft.opensciencegrid.org/products/canvas/v3_16_04/include/canvas/Persistency/Common/Assns.h:124
#2 art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >::Wrapper (ptr=..., this=0x1ed437d0)
at /cvmfs/larsoft.opensciencegrid.org/products/canvas/v3_16_04/include/canvas/Persistency/Common/Wrapper.h:162
#3 std::make_unique<art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >, std::unique_ptr<art::Assns<recob::Hit, recob::SpacePoint, void>, std::default_delete<art::Assns<recob::Hit, recob::SpacePoint, void> > > > ()
at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v12_1_0/Linux64bit+3.10-2.17/include/c++/12.1.0/bits/unique_ptr.h:1065
#4 art::ProductInserter::put<art::Assns<recob::Hit, recob::SpacePoint, void> > (this=this@entry=0x7ffffffd80b0, edp=..., instance=..., rs=...)
at /cvmfs/larsoft.opensciencegrid.org/products/art/v3_14_04/include/art/Framework/Principal/ProductInserter.h:126
#5 0x00007ffee7c7a379 in art::ProductInserter::put<art::Assns<recob::Hit, recob::SpacePoint, void> > (instance=..., edp=..., this=0x7ffffffd80b0)
at /cvmfs/larsoft.opensciencegrid.org/products/art/v3_14_04/include/art/Framework/Principal/ProductInserter.h:104
#6 art::Event::put<art::Assns<recob::Hit, recob::SpacePoint, void> > (instance=..., edp=..., this=0x7ffffffd8030)
at /cvmfs/larsoft.opensciencegrid.org/products/art/v3_14_04/include/art/Framework/Principal/Event.h:80
#7 lar_cluster3d::Cluster3DICARUS::ArtOutputHandler::outputObjects (spacePointsOnly=<optimized out>, this=0x7ffffffd7cc0)
at /scratch/workspace/icaruscode-release-build/BUILDTYPE/prof/label1/swarm/label2/buildservice104/temp/srcs/icaruscode/icaruscode/TPC/Tracking/cluster3D/Cluster3DICARUS_module.cc:246
#8 lar_cluster3d::Cluster3DICARUS::produce (this=0xa8a6440, evt=...)
at /scratch/workspace/icaruscode-release-build/BUILDTYPE/prof/label1/swarm/label2/buildservice104/temp/srcs/icaruscode/icaruscode/TPC/Tracking/cluster3D/Cluster3DICARUS_module.cc:678
`nuff said.
GDB also tells me that the __x in the vector constructor argument at the top of the stack is a constant reference, i.e. we are in vector's copy constructor. art::Wrapper did the due diligence in passing a r-value reference to art::Assns constructor, which is compiler-generated (so we don't really see it in the source code). The only open question is that the signature in the backtrace is the default constructor of Assns, which then tries to copy an array (the aforementioned __x) that I have no idea where it comes from and that might as well be empty (prof builds are no fun). Still, Valgrind associates this stack with a large allocation. In addition, the data in the association has different address before and after art::Event::put(), which is not the case for a companion std::vector data product (the address of the vector object changes, but the one of its elements does not).
Many moving parts here (for example, Assns moving back and forth between transient and persistent representation may also cause change of address and memory spikes... it's just not apparent that this is happening here).
To reproduce
With an input file in dCache (which could disappear for what I know), and setup icaruscode v10_06_00_01p05 -q e26:prof from a SBN-aware machine.
lar -c stage1_run2_icarus.fcl -s /pnfs/sbn/data/sbn_fd/poms_production/data/Reproc_Run2_SBN/reconstructed/icaruscode_v10_06_00_01p05/bnbmajority/stage0_daqPMT_drop/66/12/compressed_data_dl8_fstrmBNBMAJORITY_run9330_20_20221226T215948_20250706T065314-stage0_daqPMTdrop-42df9c2d-b9bc-40d9-83f8-a6ca3f756a36.root --nskip 45 -n 1
The above was the stack trace setting a breakpoint on: art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >::Wrapper (unfolds into 10 breakpoints).
I apologise for the National Enquirer-style highlight... the truth is that I am not sure that the move constructor copies, and I am sure this is no more than a sizeable contribution to ICARUS production choking.
In investigating the large memory usage of the ICARUS production reconstruction job, I have found a curious memory spike, that Valgrind tracks down to
art::Event::put()for an association object which is large (~2 million pairs).Digging into the business, I ended with a debugging trace as follows:
`nuff said.
GDB also tells me that the
__xin thevectorconstructor argument at the top of the stack is a constant reference, i.e. we are invector's copy constructor.art::Wrapperdid the due diligence in passing a r-value reference toart::Assnsconstructor, which is compiler-generated (so we don't really see it in the source code). The only open question is that the signature in the backtrace is the default constructor ofAssns, which then tries to copy an array (the aforementioned__x) that I have no idea where it comes from and that might as well be empty (profbuilds are no fun). Still, Valgrind associates this stack with a large allocation. In addition, the data in the association has different address before and afterart::Event::put(), which is not the case for a companionstd::vectordata product (the address of the vector object changes, but the one of its elements does not).Many moving parts here (for example,
Assnsmoving back and forth between transient and persistent representation may also cause change of address and memory spikes... it's just not apparent that this is happening here).To reproduce
With an input file in dCache (which could disappear for what I know), and
setup icaruscode v10_06_00_01p05 -q e26:proffrom a SBN-aware machine.The above was the stack trace setting a breakpoint on:
art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >::Wrapper(unfolds into 10 breakpoints).