Skip to content

[Feature] Migrate AICPU launch to new rtsLaunchCpuKernel interface (BUILD_WITH_NEW_CANN) #356

@hw-native-sys-bot

Description

@hw-native-sys-bot

Summary

Migrate the AICPU kernel launch path from the legacy rtAicpuKernelLaunchExWithArgs API to the new rtsLaunchCpuKernel / rtsBinaryLoadFromFile / rtsFuncGetByName interface available in newer CANN versions. This is gated behind BUILD_WITH_NEW_CANN in the pypto codebase and should be adopted in simpler for forward compatibility.

Motivation / Use Case

Current state (simpler):

Both a2a3 and a5 platform backends use the legacy launch path in device_runner.cpp:

// src/a2a3/platform/onboard/host/device_runner.cpp:607
rtAicpuKernelLaunchExWithArgs(
    rtKernelType_t::KERNEL_TYPE_AICPU_KFC, "AST_DYN_AICPU",
    aicpu_num, &rt_args, nullptr, stream, 0);

This requires manually constructing rtAicpuArgsEx_t with kernelNameAddrOffset / soNameAddrOffset, embedding kernel and SO names as fixed-size char arrays in a struct.

New interface (pypto, under BUILD_WITH_NEW_CANN):

pypto has migrated to a cleaner approach using LoadAicpuOp:

  1. Load: rtsBinaryLoadFromFile(jsonPath, &optionCfg, &binHandle) — load AICPU op info from a JSON descriptor
  2. Resolve: rtsFuncGetByName(binHandle, opName, &funcHandle) — get function handle by name
  3. Launch: rtsLaunchCpuKernel(funcHandle, blockDim, stream, &launchCfg, &argInfo) — launch with typed args

Benefits:

  • Cleaner API: No manual offset calculations (kernelNameAddrOffset, soNameAddrOffset), no embedded char arrays
  • Forward compatibility: The legacy rtAicpuKernelLaunchExWithArgs may be deprecated in future CANN versions
  • Consistency: Aligns simpler's host launch path with pypto's approach
  • Custom op support: The new interface supports both built-in ops (LaunchBuiltInOp) and custom ops (LaunchCustomOp) through a unified LoadAicpuOp class

Proposed API / Behavior

Add BUILD_WITH_NEW_CANN compile flag support and a LoadAicpuOp-style abstraction:

// New path (when BUILD_WITH_NEW_CANN is defined):
// 1. Generate op info JSON at init time
// 2. Load binary handle: rtsBinaryLoadFromFile(...)
// 3. Resolve function handles: rtsFuncGetByName(...)
// 4. Launch: rtsLaunchCpuKernel(funcHandle, blockDim, stream, &cfg, &args)

// Legacy path (fallback):
// Existing rtAicpuKernelLaunchExWithArgs code unchanged

Scope:

  • src/a2a3/platform/onboard/host/device_runner.cpp — AICPU launch in launch_aicpu_kernel()
  • src/a5/platform/onboard/host/device_runner.cpp — same pattern
  • New header: rts/rts_kernel.h dependency (from CANN toolkit)
  • New include dependency gated behind #ifdef BUILD_WITH_NEW_CANN

Reference implementation: pypto/framework/src/machine/runtime/load_aicpu_op.cpp and load_aicpu_op.h

Alternatives Considered

  • Keep legacy API only: Works for now, but risks breakage if CANN deprecates the old interface
  • Conditional compilation (recommended): Use #ifdef BUILD_WITH_NEW_CANN to support both old and new paths, matching pypto's approach. This allows gradual migration without breaking existing builds

Additional Context

  • pypto reference: framework/src/machine/runtime/load_aicpu_op.{h,cpp} and device_runner.cpp
  • The new API requires rts/rts_kernel.h header from the CANN toolkit
  • AICore launch (rtKernelLaunchWithHandleV2) is unaffected — only AICPU launch changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions