Skip to content

Conversation

@Binyang2014
Copy link
Contributor

@Binyang2014 Binyang2014 commented Nov 19, 2025

Reorganize current native algorithm implementation and DSL algorithm implementation.
Provide unified API for DSL algo and native algo and provide interface to tune the algo
Provide interface for pytorch integration with native API and DSL

Comment on lines 135 to 145
std::vector<RegisteredMemory> registeredMemories;
std::vector<MemoryChannel> memoryChannels;
std::vector<SwitchChannel> switchChannels;
std::vector<PortChannel> portChannels;
std::vector<std::shared_ptr<NvlsConnection>> nvlsConnections;
std::shared_ptr<DeviceHandle<MemoryChannel>> memoryChannelDeviceHandles;
std::shared_ptr<DeviceHandle<SwitchChannel>> switchChannelDeviceHandles;
std::shared_ptr<DeviceHandle<PortChannel>> portChannelDeviceHandles;
std::vector<std::shared_ptr<MemoryDevice2DeviceSemaphore>> memorySemaphores;
std::vector<std::shared_ptr<Host2DeviceSemaphore>> hostSemaphores;
std::unordered_map<std::string, std::shared_ptr<void>> extras;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they need to be public? Note this is a user-facing header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users need this to implement customized algorithm, for example: https://github.com/microsoft/mscclpp/blob/105239fc6c8c246e77a6f54504cb2836a67ca234/examples/customized-collective-algorithm/customized_allgather.cu

Also, this structure is not stable, will change to use BaseChannel instead Channel which memory attached in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to the internal header folder

Copy link
Contributor

@chhwang chhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting all my minor comments below.

examples/torch-integration/customized_allgather.cu: "MIT license" -> "MIT License"
python/csrc/ext/algorithm_collection_builder.cpp: rename the file into python/csrc/ext/algorithm_collection_builder_py.cpp
python/mscclpp/_core/__init__.py: "MIT license" -> "MIT License"
python/mscclpp/_core/buffer.py: "MIT license" -> "MIT License"
include/mscclpp/ext/collectives/algorithm_collection_builder.hpp: missing license
src/core/include/logger.hpp: let case LogSubsys::COUNT: case fallback to default (return "UNKNOWN"), since it's only for counting the number of LogSubsys types and using this directly for logging is not a valid behavior. Let's put a regarding comment here since copilot keeps suggesting this.
src/core/algorithm.cc: "MIT license" -> "MIT License"
src/core/CMakeLists.txt: "MIT license" -> "MIT License"
src/core/gpu_utils.cc: fix the comment "!defined(HIP_PLATFORM_AMD)" -> "!defined(MSCCLPP_USE_ROCM)"
src/ext/collectives/allgather/allgather_fullmesh2.cu: rename the file into src/ext/collectives/allgather/allgather_fullmesh_2.cu (would be better if we give a proper name rather than "_2")
src/ext/collectives/allreduce/allreduce_nvls_with_copy2.cu: rename the file into src/ext/collectives/allreduce/allreduce_nvls_with_copy_2.cu (would be better if we give a proper name rather than "_2")
src/ext/collectives/include/allgather/allgather_fullmesh.hpp: missing #ifdef guard in this header file - use #ifdef MSCCLPP_EXT_<FILE_NAME>_HPP
src/ext/collectives/include/allgather/allgather_fullmesh2.hpp: rename the file into src/ext/collectives/include/allgather/allgather_fullmesh_2.hpp (would be better if we give a proper name rather than "_2"). Also, missing #ifdef guard in this header file - use #ifdef MSCCLPP_EXT_<FILE_NAME>_HPP
src/ext/collectives/include/allreduce/allreduce_allpair_packet.hpp: "MIT license" -> "MIT License"
src/ext/collectives/include/allreduce/allreduce_nvls_packet.hpp: "MIT license" -> "MIT License"
src/ext/collectives/include/allreduce/allreduce_nvls_with_copy2.hpp: rename the file into src/ext/collectives/include/allreduce/allreduce_nvls_with_copy_2.hpp (would be better if we give a proper name rather than "_2")
src/ext/collectives/include/collective_utils.hpp: follow #ifdef MSCCLPP_EXT_<FILE_NAME>_HPP format for the header guard
src/ext/collectives/algorithm_collection_builder.cc: missing license
src/ext/collectives/CMakeLists.txt: "MIT license" -> "MIT License"
src/ext/nccl/CMakeLists.txt: "MIT license" -> "MIT License"
src/ext/CMakeLists.txt: missing license
include/mscclpp/algorithm.hpp: enum value names of CommResult (e.g., commSuccess) is inconsistent with other enums. But it would be too long if we use CAPITAL_CASE style, so I'd recommend CamelCase style, including in CommResult and in other enums in this header.

Overall: Review the documentations under docs/ to see if all the texts are still aligned with the changed paths. Especially, we need to update the binary paths (such as mp_unit_tests build path) mentioned in documentations.

@chhwang
Copy link
Contributor

chhwang commented Jan 15, 2026

Putting all my minor comments below.

examples/torch-integration/customized_allgather.cu: "MIT license" -> "MIT License" python/csrc/ext/algorithm_collection_builder.cpp: rename the file into python/csrc/ext/algorithm_collection_builder_py.cpp python/mscclpp/_core/__init__.py: "MIT license" -> "MIT License" python/mscclpp/_core/buffer.py: "MIT license" -> "MIT License" include/mscclpp/ext/collectives/algorithm_collection_builder.hpp: missing license src/core/include/logger.hpp: let case LogSubsys::COUNT: case fallback to default (return "UNKNOWN"), since it's only for counting the number of LogSubsys types and using this directly for logging is not a valid behavior. Let's put a regarding comment here since copilot keeps suggesting this. src/core/algorithm.cc: "MIT license" -> "MIT License" src/core/CMakeLists.txt: "MIT license" -> "MIT License" src/core/gpu_utils.cc: fix the comment "!defined(HIP_PLATFORM_AMD)" -> "!defined(MSCCLPP_USE_ROCM)" src/ext/collectives/allgather/allgather_fullmesh2.cu: rename the file into src/ext/collectives/allgather/allgather_fullmesh_2.cu (would be better if we give a proper name rather than "_2") src/ext/collectives/allreduce/allreduce_nvls_with_copy2.cu: rename the file into src/ext/collectives/allreduce/allreduce_nvls_with_copy_2.cu (would be better if we give a proper name rather than "_2") src/ext/collectives/include/allgather/allgather_fullmesh.hpp: missing #ifdef guard in this header file - use #ifdef MSCCLPP_EXT_<FILE_NAME>_HPP src/ext/collectives/include/allgather/allgather_fullmesh2.hpp: rename the file into src/ext/collectives/include/allgather/allgather_fullmesh_2.hpp (would be better if we give a proper name rather than "_2"). Also, missing #ifdef guard in this header file - use #ifdef MSCCLPP_EXT_<FILE_NAME>_HPP src/ext/collectives/include/allreduce/allreduce_allpair_packet.hpp: "MIT license" -> "MIT License" src/ext/collectives/include/allreduce/allreduce_nvls_packet.hpp: "MIT license" -> "MIT License" src/ext/collectives/include/allreduce/allreduce_nvls_with_copy2.hpp: rename the file into src/ext/collectives/include/allreduce/allreduce_nvls_with_copy_2.hpp (would be better if we give a proper name rather than "_2") src/ext/collectives/include/collective_utils.hpp: follow #ifdef MSCCLPP_EXT_<FILE_NAME>_HPP format for the header guard src/ext/collectives/algorithm_collection_builder.cc: missing license src/ext/collectives/CMakeLists.txt: "MIT license" -> "MIT License" src/ext/nccl/CMakeLists.txt: "MIT license" -> "MIT License" src/ext/CMakeLists.txt: missing license include/mscclpp/algorithm.hpp: enum value names of CommResult (e.g., commSuccess) is inconsistent with other enums. But it would be too long if we use CAPITAL_CASE style, so I'd recommend CamelCase style, including in CommResult and in other enums in this header.

Overall: Review the documentations under docs/ to see if all the texts are still aligned with the changed paths. Especially, we need to update the binary paths (such as mp_unit_tests build path) mentioned in documentations.

@copilot work on this

Copy link
Contributor

Copilot AI commented Jan 15, 2026

@chhwang I've opened a new pull request, #725, to work on those changes. Once the pull request is ready, I'll request review from you.

@chhwang
Copy link
Contributor

chhwang commented Jan 15, 2026

/azp run mscclpp-ut

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI and others added 2 commits January 15, 2026 09:46
…d style consistency (#725)

- [x] Fix license text: "MIT license" → "MIT License" in multiple files
- [x] Rename files with "_2" suffix and update references
- [x] Add missing license headers
- [x] Fix header guards to follow MSCCLPP_EXT_<FILE_NAME>_HPP_ format
- [x] Fix enum naming consistency
  - [x] CommResult enum to CamelCase
- [x] CollectiveBufferMode enum to CamelCase (Any, InPlace, OutOfPlace)
  - [x] AlgorithmType enum to CamelCase (Native, DSL)
- [x] Fix comment in src/core/gpu_utils.cc
- [x] Fix LogSubsys::COUNT case in src/core/include/logger.hpp
  - [x] Add explanatory comment
  - [x] Add [[fallthrough]] attribute
- [x] Apply clang-format
- [x] Remove _codeql_detected_source_root file and add to .gitignore
- [x] Update documentation paths
  - [x] Fix NCCL library paths: build/apps/nccl/ → build/lib/
  - [x] Fix test binary paths: ./test/ → ./bin/

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
@Binyang2014
Copy link
Contributor Author

/azp run mscclpp-ut

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run mscclpp-ut

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run mscclpp-ut

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run mscclpp-ut

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants