Skip to content

art::Assns move constructor copies (and chokes ICARUS production) #152

@PetrilloAtWork

Description

@PetrilloAtWork

I apologise for the National Enquirer-style highlight... the truth is that I am not sure that the move constructor copies, and I am sure this is no more than a sizeable contribution to ICARUS production choking.

In investigating the large memory usage of the ICARUS production reconstruction job, I have found a curious memory spike, that Valgrind tracks down to art::Event::put() for an association object which is large (~2 million pairs).
Digging into the business, I ended with a debugging trace as follows:

#0  std::vector<std::pair<art::Ptr<recob::Hit>, art::Ptr<recob::SpacePoint> >, std::allocator<std::pair<art::Ptr<recob::Hit>, art::Ptr<recob::SpacePoint> > > >::vector (__x=..., this=<optimized out>)
    at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v12_1_0/Linux64bit+3.10-2.17/include/c++/12.1.0/bits/stl_vector.h:598
#1  art::Assns<recob::Hit, recob::SpacePoint, void>::Assns (this=0x1ed437e0)
    at /cvmfs/larsoft.opensciencegrid.org/products/canvas/v3_16_04/include/canvas/Persistency/Common/Assns.h:124
#2  art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >::Wrapper (ptr=..., this=0x1ed437d0)
    at /cvmfs/larsoft.opensciencegrid.org/products/canvas/v3_16_04/include/canvas/Persistency/Common/Wrapper.h:162
#3  std::make_unique<art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >, std::unique_ptr<art::Assns<recob::Hit, recob::SpacePoint, void>, std::default_delete<art::Assns<recob::Hit, recob::SpacePoint, void> > > > ()
    at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v12_1_0/Linux64bit+3.10-2.17/include/c++/12.1.0/bits/unique_ptr.h:1065
#4  art::ProductInserter::put<art::Assns<recob::Hit, recob::SpacePoint, void> > (this=this@entry=0x7ffffffd80b0, edp=..., instance=..., rs=...)
    at /cvmfs/larsoft.opensciencegrid.org/products/art/v3_14_04/include/art/Framework/Principal/ProductInserter.h:126
#5  0x00007ffee7c7a379 in art::ProductInserter::put<art::Assns<recob::Hit, recob::SpacePoint, void> > (instance=..., edp=..., this=0x7ffffffd80b0)
    at /cvmfs/larsoft.opensciencegrid.org/products/art/v3_14_04/include/art/Framework/Principal/ProductInserter.h:104
#6  art::Event::put<art::Assns<recob::Hit, recob::SpacePoint, void> > (instance=..., edp=..., this=0x7ffffffd8030)
    at /cvmfs/larsoft.opensciencegrid.org/products/art/v3_14_04/include/art/Framework/Principal/Event.h:80
#7  lar_cluster3d::Cluster3DICARUS::ArtOutputHandler::outputObjects (spacePointsOnly=<optimized out>, this=0x7ffffffd7cc0)
    at /scratch/workspace/icaruscode-release-build/BUILDTYPE/prof/label1/swarm/label2/buildservice104/temp/srcs/icaruscode/icaruscode/TPC/Tracking/cluster3D/Cluster3DICARUS_module.cc:246
#8  lar_cluster3d::Cluster3DICARUS::produce (this=0xa8a6440, evt=...)
    at /scratch/workspace/icaruscode-release-build/BUILDTYPE/prof/label1/swarm/label2/buildservice104/temp/srcs/icaruscode/icaruscode/TPC/Tracking/cluster3D/Cluster3DICARUS_module.cc:678

`nuff said.
GDB also tells me that the __x in the vector constructor argument at the top of the stack is a constant reference, i.e. we are in vector's copy constructor. art::Wrapper did the due diligence in passing a r-value reference to art::Assns constructor, which is compiler-generated (so we don't really see it in the source code). The only open question is that the signature in the backtrace is the default constructor of Assns, which then tries to copy an array (the aforementioned __x) that I have no idea where it comes from and that might as well be empty (prof builds are no fun). Still, Valgrind associates this stack with a large allocation. In addition, the data in the association has different address before and after art::Event::put(), which is not the case for a companion std::vector data product (the address of the vector object changes, but the one of its elements does not).
Many moving parts here (for example, Assns moving back and forth between transient and persistent representation may also cause change of address and memory spikes... it's just not apparent that this is happening here).

To reproduce

With an input file in dCache (which could disappear for what I know), and setup icaruscode v10_06_00_01p05 -q e26:prof from a SBN-aware machine.

lar -c stage1_run2_icarus.fcl -s /pnfs/sbn/data/sbn_fd/poms_production/data/Reproc_Run2_SBN/reconstructed/icaruscode_v10_06_00_01p05/bnbmajority/stage0_daqPMT_drop/66/12/compressed_data_dl8_fstrmBNBMAJORITY_run9330_20_20221226T215948_20250706T065314-stage0_daqPMTdrop-42df9c2d-b9bc-40d9-83f8-a6ca3f756a36.root --nskip 45 -n 1

The above was the stack trace setting a breakpoint on: art::Wrapper<art::Assns<recob::Hit, recob::SpacePoint, void> >::Wrapper (unfolds into 10 breakpoints).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions