[CK_BUILDER] Improve forward declarations for reflections

## Background

Currently, we forward declare CK device operation templates in CK-Builder's reflection code:

https://github.com/ROCm/composable_kernel/blob/9b168082b7aa19bcf50fd9991baf10a0c77d105b/experimental/builder/include/ck_tile/builder/reflect/instance_traits_device_grouped_conv_bwd_weight_xdl_cshuffle.hpp#L13-L57

This is mainly required to break a circular dependency in reflection. The architecture of that is as follows:
1. `MyDeviceOp` implements `GetInstanceString()`. This is typically defined directly in the class definition (no forward declaration).
2. `GetInstanceString()` calls `instance_string<MyDeviceOp>()`
3. `instance_string<MyDeviceOp>()` calls `InstanceTraits<MyDeviceOp>::instance_string()`
4. `InstanceTraits` has a specialization for `MyDeviceOp` which implements `instance_string()`

So order for `GetInstanceString()` to work properly, `InstanceTraits` must already be defined. And for `InstanceTraits` to be defined, the device op needs to be defined. In order to do _that_, we are currently using aforementioned forward declaration.

## The Problem

There are multiple problems with this forward declaration: First, its annoying to maintain both definitions of the device op, as these can get very long (see above example). Second, the actual device op has a bunch of defaulted template parameters, and these cannot be _also defined in the forward definition_: C++ only allows you to default each template parameter only once (though that can be either in a forward declaration or the actual definition, but if its the latter, then code that only sees the former would obviously not know about those defaulted parameters). But if they are not forward defined, you cannot use the `InstanceTraits` reflection _without explicitly specifying those defaults_. This caused a breakage on the `develop` branch recently: #3648 introduced a new parameter, and a dependent test was not updated (Its unclear to me how CI for #3648 passed in the first place), see #3677.

## Proposed Fix

In my opinion we should try to get rid of the forward declaration in the `instance_traits` files entirely. I think the best option is to use C++'s lazy template evaluation by calling into an as-of-yet undefined function static member function of `InstanceTraits<MyDeviceOp>` in `GetInstanceString()`, and then specializing `InstanceTraits` only _after that_. The caveat here is that both the device op itself as well as the instance traits specialization must be in scope, otherwise there would be an undefined function error. In practise, we can solve that either by placing the instance traits directly into the file that defines `MyDeviceOp`, or possibly by using a `.inc` file to keep the concerns separated. That would look something like this: 

```c++
// instance_traits.hpp
template <typename T>
struct Traits;
```

```c++
// include/ck/tensor_operation/gpu/device.impl/device_my_op.hpp

#ifdef CK_EXPERIMENTAL_BUILDER
#include "experimental/builder/include/ck_tile/builder/reflect/instance_traits.hpp"
#endif

template <int X>
struct MyDeviceOp {
#ifdef CK_EXPERIMENTAL_BUILDER
    std::string instance_string() const {
        return Traits<MyDeviceOp>::instance_string();
    }
#endif
};

#ifdef CK_EXPERIMENTAL_BUILDER
#include "experimental/builder/include/ck_tile/builder/reflect/reflect_device_my_op.inc"
#endif
```

```c++
// experimental/builder/include/ck_tile/builder/reflect/reflect_device_my_op.inc

template <int X>
struct Traits<MyDeviceOp<X>> {
    static std::string instance_string() {
        return "test " + std::to_string(X);
    }
};
```

```c++
#include <ck/tensor_operation/gpu/device.impl/device_my_op.hpp>

// user code
std::string test() {
    MyDeviceOp<10> op;
    return op.instance_string();
}
```

The main downside is that this does not allow reflecting without including the possibly giant header that defines `MyDeviceOp`, but I think that this is rare in practise and so is an acceptable loss.

#### Alternative

If the above is not acceptable, then the alternative is to split out the forward declaration of `MyDeviceOp` into a separate file, where the defaults are defined, and include this both from the relevant `instance_traits` and the `include/ck/tensor_operation/gpu/device.impl/device_my_op.hpp` header. In the latter, all defaults template arguments would be removed. This still requires a forward declaration so my preference is the previous solution.

	template <ck::index_t NDimSpatial,
	typename InLayout,
	typename WeiLayout,
	typename OutLayout,
	typename InDataType,
	typename WeiDataType,
	typename OutDataType,
	typename AccDataType,
	typename InElementwiseOperation,
	typename WeiElementwiseOperation,
	typename OutElementwiseOperation,
	ck::tensor_operation::device::ConvolutionBackwardWeightSpecialization
	ConvBackwardWeightSpecialization,
	ck::index_t BlockSize,
	ck::index_t MPerBlock,
	ck::index_t NPerBlock,
	ck::index_t K0PerBlock,
	ck::index_t K1,
	ck::index_t MPerXDL,
	ck::index_t NPerXDL,
	ck::index_t MXdlPerWave,
	ck::index_t NXdlPerWave,
	typename ABlockTransferThreadClusterLengths_K0_M_K1,
	typename ABlockTransferThreadClusterArrangeOrder,
	typename ABlockTransferSrcAccessOrder,
	ck::index_t ABlockTransferSrcVectorDim,
	ck::index_t ABlockTransferSrcScalarPerVector,
	ck::index_t ABlockTransferDstScalarPerVector_K1,
	bool ABlockLdsAddExtraM,
	typename BBlockTransferThreadClusterLengths_K0_N_K1,
	typename BBlockTransferThreadClusterArrangeOrder,
	typename BBlockTransferSrcAccessOrder,
	ck::index_t BBlockTransferSrcVectorDim,
	ck::index_t BBlockTransferSrcScalarPerVector,
	ck::index_t BBlockTransferDstScalarPerVector_K1,
	bool BBlockLdsAddExtraN,
	ck::index_t CShuffleMXdlPerWavePerShuffle,
	ck::index_t CShuffleNXdlPerWavePerShuffle,
	typename CBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock,
	ck::index_t CBlockTransferScalarPerVector_NWaveNPerXdl,
	typename ComputeTypeA,
	typename ComputeTypeB,
	ck::index_t MaxTransposeTransferSrcScalarPerVector,
	ck::index_t MaxTransposeTransferDstScalarPerVector>
	struct DeviceGroupedConvBwdWeight_Xdl_CShuffle;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK_BUILDER] Improve forward declarations for reflections #3682

Background

The Problem

Proposed Fix

Alternative

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CK_BUILDER] Improve forward declarations for reflections #3682

Description

Background

The Problem

Proposed Fix

Alternative

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions