Skip to content

Enhanced support for memory-tracking raft resources#3004

Open
achirkin wants to merge 9 commits into
rapidsai:mainfrom
achirkin:enh-memory-resources
Open

Enhanced support for memory-tracking raft resources#3004
achirkin wants to merge 9 commits into
rapidsai:mainfrom
achirkin:enh-memory-resources

Conversation

@achirkin
Copy link
Copy Markdown
Contributor

@achirkin achirkin commented Apr 24, 2026

  1. Change the host memory resource to have the same owning semantics as the device memory resources as of Migrate RMM usage to CCCL MR design #2996
  2. Add a workaround to statistics_adaptor.hpp to compile via nvcc (see code comment)
  3. Add memory_stats_resources. In constrast to memory_tracking_resources this doesn't stream the memory usage online, but only reports the overall usage metrics.

breaking change: moved raft/util/memory_tracking_resources.hpp -> raft/core/memory_tracking_resources.hpp

@achirkin achirkin self-assigned this Apr 24, 2026
@achirkin achirkin requested review from a team as code owners April 24, 2026 14:09
@achirkin achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 24, 2026
@achirkin achirkin moved this to In Progress in Unstructured Data Processing Apr 24, 2026
@achirkin achirkin changed the title Enhanced support for memory tracking raft resources Enhanced support for memory-tracking raft resources Apr 24, 2026
Comment on lines +75 to +79
// NVCC injects __host__ __device__ on std::shared_ptr special members,
// which makes the *implicit* or *defaulted* special members __host__
// __device__ too. That conflicts with Upstream types whose special
// members are __host__ only (e.g. rmm::device_async_resource_ref).
// User-defined bodies (not = default) force plain __host__ execution space.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Are there changes you would suggest for RMM or CCCL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think, CCCL/rrm are fine, because the cuda::mr::shared_resource defines the copy/move constructors explicitly.

Comment thread cpp/include/raft/core/memory_stats_resources.hpp
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the PR! Overall looks good, I have only one question regarding Thrust policy handling.

}

// --- Device (global) ---
// Invalidate the cached thrust policy (the resource_ref it captured
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to invalidate this again during destruction of the memory_stats_resources?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unlike the device memory resource, the thrust policy is a local resource object, so it dies out with its owning raft::resources handle.

Comment thread cpp/include/raft/core/memory_stats_resources.hpp
@achirkin achirkin changed the base branch from main to release/26.06 May 15, 2026 14:33
@achirkin achirkin requested review from a team as code owners May 15, 2026 14:33
@achirkin achirkin requested a review from msarahan May 15, 2026 14:33
@achirkin achirkin force-pushed the enh-memory-resources branch from aa09273 to 053b0c5 Compare May 15, 2026 14:44
@achirkin achirkin removed request for a team and msarahan May 15, 2026 15:06
@achirkin achirkin moved this from In Progress to Done in Unstructured Data Processing May 15, 2026
@achirkin achirkin moved this from Done to In Progress in Unstructured Data Processing May 15, 2026
@achirkin achirkin requested a review from cjnolet May 19, 2026 10:47
@achirkin achirkin added breaking Breaking change and removed non-breaking Non-breaking change labels May 19, 2026
@achirkin achirkin changed the base branch from release/26.06 to main May 21, 2026 04:25
@achirkin
Copy link
Copy Markdown
Contributor Author

Moving this to 26.08 as 26.06 deadline is missed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change improvement Improvement / enhancement to an existing function

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants