-
Notifications
You must be signed in to change notification settings - Fork 270
Description
Background
Currently, we forward declare CK device operation templates in CK-Builder's reflection code:
Lines 13 to 57 in 9b16808
| template <ck::index_t NDimSpatial, | |
| typename InLayout, | |
| typename WeiLayout, | |
| typename OutLayout, | |
| typename InDataType, | |
| typename WeiDataType, | |
| typename OutDataType, | |
| typename AccDataType, | |
| typename InElementwiseOperation, | |
| typename WeiElementwiseOperation, | |
| typename OutElementwiseOperation, | |
| ck::tensor_operation::device::ConvolutionBackwardWeightSpecialization | |
| ConvBackwardWeightSpecialization, | |
| ck::index_t BlockSize, | |
| ck::index_t MPerBlock, | |
| ck::index_t NPerBlock, | |
| ck::index_t K0PerBlock, | |
| ck::index_t K1, | |
| ck::index_t MPerXDL, | |
| ck::index_t NPerXDL, | |
| ck::index_t MXdlPerWave, | |
| ck::index_t NXdlPerWave, | |
| typename ABlockTransferThreadClusterLengths_K0_M_K1, | |
| typename ABlockTransferThreadClusterArrangeOrder, | |
| typename ABlockTransferSrcAccessOrder, | |
| ck::index_t ABlockTransferSrcVectorDim, | |
| ck::index_t ABlockTransferSrcScalarPerVector, | |
| ck::index_t ABlockTransferDstScalarPerVector_K1, | |
| bool ABlockLdsAddExtraM, | |
| typename BBlockTransferThreadClusterLengths_K0_N_K1, | |
| typename BBlockTransferThreadClusterArrangeOrder, | |
| typename BBlockTransferSrcAccessOrder, | |
| ck::index_t BBlockTransferSrcVectorDim, | |
| ck::index_t BBlockTransferSrcScalarPerVector, | |
| ck::index_t BBlockTransferDstScalarPerVector_K1, | |
| bool BBlockLdsAddExtraN, | |
| ck::index_t CShuffleMXdlPerWavePerShuffle, | |
| ck::index_t CShuffleNXdlPerWavePerShuffle, | |
| typename CBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock, | |
| ck::index_t CBlockTransferScalarPerVector_NWaveNPerXdl, | |
| typename ComputeTypeA, | |
| typename ComputeTypeB, | |
| ck::index_t MaxTransposeTransferSrcScalarPerVector, | |
| ck::index_t MaxTransposeTransferDstScalarPerVector> | |
| struct DeviceGroupedConvBwdWeight_Xdl_CShuffle; |
This is mainly required to break a circular dependency in reflection. The architecture of that is as follows:
MyDeviceOpimplementsGetInstanceString(). This is typically defined directly in the class definition (no forward declaration).GetInstanceString()callsinstance_string<MyDeviceOp>()instance_string<MyDeviceOp>()callsInstanceTraits<MyDeviceOp>::instance_string()InstanceTraitshas a specialization forMyDeviceOpwhich implementsinstance_string()
So order for GetInstanceString() to work properly, InstanceTraits must already be defined. And for InstanceTraits to be defined, the device op needs to be defined. In order to do that, we are currently using aforementioned forward declaration.
The Problem
There are multiple problems with this forward declaration: First, its annoying to maintain both definitions of the device op, as these can get very long (see above example). Second, the actual device op has a bunch of defaulted template parameters, and these cannot be also defined in the forward definition: C++ only allows you to default each template parameter only once (though that can be either in a forward declaration or the actual definition, but if its the latter, then code that only sees the former would obviously not know about those defaulted parameters). But if they are not forward defined, you cannot use the InstanceTraits reflection without explicitly specifying those defaults. This caused a breakage on the develop branch recently: #3648 introduced a new parameter, and a dependent test was not updated (Its unclear to me how CI for #3648 passed in the first place), see #3677.
Proposed Fix
In my opinion we should try to get rid of the forward declaration in the instance_traits files entirely. I think the best option is to use C++'s lazy template evaluation by calling into an as-of-yet undefined function static member function of InstanceTraits<MyDeviceOp> in GetInstanceString(), and then specializing InstanceTraits only after that. The caveat here is that both the device op itself as well as the instance traits specialization must be in scope, otherwise there would be an undefined function error. In practise, we can solve that either by placing the instance traits directly into the file that defines MyDeviceOp, or possibly by using a .inc file to keep the concerns separated. That would look something like this:
// instance_traits.hpp
template <typename T>
struct Traits;// include/ck/tensor_operation/gpu/device.impl/device_my_op.hpp
#ifdef CK_EXPERIMENTAL_BUILDER
#include "experimental/builder/include/ck_tile/builder/reflect/instance_traits.hpp"
#endif
template <int X>
struct MyDeviceOp {
#ifdef CK_EXPERIMENTAL_BUILDER
std::string instance_string() const {
return Traits<MyDeviceOp>::instance_string();
}
#endif
};
#ifdef CK_EXPERIMENTAL_BUILDER
#include "experimental/builder/include/ck_tile/builder/reflect/reflect_device_my_op.inc"
#endif// experimental/builder/include/ck_tile/builder/reflect/reflect_device_my_op.inc
template <int X>
struct Traits<MyDeviceOp<X>> {
static std::string instance_string() {
return "test " + std::to_string(X);
}
};#include <ck/tensor_operation/gpu/device.impl/device_my_op.hpp>
// user code
std::string test() {
MyDeviceOp<10> op;
return op.instance_string();
}The main downside is that this does not allow reflecting without including the possibly giant header that defines MyDeviceOp, but I think that this is rare in practise and so is an acceptable loss.
Alternative
If the above is not acceptable, then the alternative is to split out the forward declaration of MyDeviceOp into a separate file, where the defaults are defined, and include this both from the relevant instance_traits and the include/ck/tensor_operation/gpu/device.impl/device_my_op.hpp header. In the latter, all defaults template arguments would be removed. This still requires a forward declaration so my preference is the previous solution.