Skip to content

Commit 05c15bf

Browse files
authored
dropout:ompsimd+cuda的实现 (#44)
* dropout:ompsimd+cuda的实现 * dropout:ompsimd+cuda的实现
1 parent bea5115 commit 05c15bf

File tree

14 files changed

+440
-138
lines changed

14 files changed

+440
-138
lines changed

CODE_OF_CONDUCT.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# DeepX 行为准则
2+
3+
## 我们的承诺
4+
5+
作为贡献者和维护者,我们承诺为每个人提供一个开放和欢迎的环境。
6+
7+
## 我们的标准
8+
9+
有助于创造积极环境的行为包括但不限于:
10+
11+
- 使用友好和包容的语言
12+
- 尊重不同的观点和经验
13+
- 耐心地接受建设性的批评
14+
- 关注对社区最有利的事情
15+
- 友善对待其他社区成员
16+
17+
不可接受的行为包括但不限于:
18+
19+
- 使用性化的语言或图像以及不受欢迎的性关注或挑逗
20+
- 捣乱/煽动/侮辱性/贬损的评论,人身攻击或政治攻击
21+
- 公开或私下的骚扰
22+
- 未经明确许可,发布他人的私人信息,如物理或电子地址
23+
- 其他可以合理地被认为不符合专业行为的行为
24+
25+
## 我们的责任
26+
27+
项目维护者有责任澄清可接受行为的标准,并应对任何不可接受的行为采取适当和公平的纠正措施。
28+
29+
项目维护者有权利和责任删除、编辑或拒绝与本行为准则不符的评论、提交、代码、wiki编辑、问题和其他贡献,并可暂时或永久禁止任何他们认为不适合、威胁、冒犯或有害的贡献者。
30+
31+
## 适用范围
32+
33+
当个人代表项目或其社区时,本行为准则适用于项目空间和公共空间。
34+
35+
## 执行
36+
37+
如有滥用、骚扰或其他不可接受的行为,请通过以下方式联系项目团队。所有投诉都将被审查和调查,并将导致认为必要和适当的回应。
38+
39+
## 联系信息
40+
41+
请通过 [您的联系信息] 联系我们。
42+
43+
## 归属
44+
45+
本行为准则改编自[贡献者公约](https://www.contributor-covenant.org),版本1.4。

CONTRIBUTING.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# deepx 贡献指南
2+
3+
deepx框架的发展,主要包括五大类方向
4+
5+
+ front: 新增模型、module、python类函数等
6+
+ 中间层:包括计算图优化器,插件系统(自动KVcache系统),自动分布式化,栈tensor自动释放,自动Inplace化等操作
7+
+ 新增或修改excuter
8+
+ 增加或修改算子,进一步可以分为leaftensorfunc(不可分割的基础算子),fusedtensorfunc(融合算子)
9+
+ 文档丰富:
10+
+ 运维自动化方向
11+
12+
大家可以选择一个方向
13+
14+
## 步骤
15+
16+
第一次提交
17+
1. Fork本仓库(github.com/array2d/deepx)的main分支,到你的github/yourname/deepx
18+
2. 本地clone github/yourname/deepx
19+
3. 提交并推送您的更改到你的github:`git commit -m 'Add some feature'`
20+
4. 创建一个Pull Request。
21+
22+
第N次提交
23+
24+
1. 保障你的本地和github/yourname/deepx中均已提pull request并得到merge
25+
2. 在github/yourname/deepx中sync fork【危险操作,会删除你新增的代码】,拉取(github.com/array2d/deepx) main分支的最新代码
26+
3. 本地clone github/yourname/deepx
27+
4. 提交并推送您的更改到你的github:`git commit -m 'Add some feature'`
28+
5. 创建一个Pull Request。

README.md

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -66,29 +66,6 @@ DeepX可以集成现有的张量计算框架作为执行器,充分利用现有
6666

6767
这种架构使得DeepX可以整合各类先进的计算框架作为执行引擎,同时提供统一的分布式调度和执行能力,为用户提供更灵活的选择和更高的性能。
6868

69-
## 二.贡献指南
70-
71-
也可以参考官方文档的指南
72-
73-
https://deepx.array2d.com
74-
75-
欢迎通过以下方式参与项目共建:
76-
77-
1. **代码贡献**
78-
- 提交PR前请先创建Issue说明修改内容
79-
- front项目当前以py为核心
80-
- excuter:目前规划开发的3类执行器,参考这里如何给excuter添加一个新算子[excuter](doc/excuter/excuter.md)
81-
- cpu:
82-
- cuda:
83-
- jax:
84-
85-
2. **文档改进**
86-
- 提交文档更新到`doc/`目录
87-
88-
3. **问题反馈**
89-
- 当前处于高速迭代中,可通过issue反馈问题
90-
91-
9269

9370
### 官方文档
9471

doc/excuter/op-mem-cuda/list.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848

4949
| Operation | Author | Math Formula | IR Instruction |
5050
|-----------|--------|--------------|----------------|
51+
| dropout | miaobyte | T1.dropout(p,seed)->T3 | dropout(tensor<any> A, var<float32> p, var<int32> seed)->(tensor<any> C) |
5152
| switch | miaobyte | C=switch(tensors,cases) | switch(listtensor<any> tensors, tensor<int8> cases)->(tensor<any> result) |
5253
| greaterscalar | miaobyte | mask=compare(T1, scalar) | greaterscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
5354
| equalscalar | miaobyte | mask=compare(T1, scalar) | equalscalar(tensor<any> A, var<any> scalar, var<float64> epsilon)->(tensor<bool> mask) |

doc/excuter/op-mem-ompsimd/list.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949

5050
| Operation | Author | Math Formula | IR Instruction |
5151
|-----------|--------|--------------|----------------|
52+
| dropout | miaobyte | A.dropout(p,seed)->C | dropout(tensor<any> A, var<float32> p, var<int32> seed)->(tensor<any> C) |
5253
| switch | miaobyte | C=switch([tensors],case) | switch(listtensor<any> tensors, tensor<int8> cases)->(tensor<any> C) |
5354
| greaterscalar | miaobyte | mask=greater(T1,scalar) | greaterscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
5455
| equalscalar | miaobyte | mask=equal(T1,scalar) | equalscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |

excuter/cpp-common/src/deepx/tensorfunc/elementwise.hpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,19 @@ namespace deepx::tensorfunc
412412
invertDispatcher<Author, T>::invert(input, output);
413413
}
414414

415+
//dropout(A,p)=>C
416+
template <typename Author, typename T>
417+
struct dropoutDispatcher
418+
{
419+
static void dropout(const Tensor<T> &input, const float p,const unsigned int seed, Tensor<T> &output) = delete;
420+
};
421+
422+
template <typename Author, typename T>
423+
void dropout(const Tensor<T> &input, const float p,const unsigned int seed, Tensor<T> &output)
424+
{
425+
dropoutDispatcher<Author, T>::dropout(input, p, seed, output);
426+
}
427+
415428
} // namespace deepx::tensorfunc
416429

417430
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_HPP

excuter/op-mem-cuda/src/client/tfs.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -462,6 +462,17 @@ namespace deepx::tf
462462
{
463463
Param("result", DataCategory::Tensor, Precision::Any),
464464
})));
465+
// dropout
466+
tffactory.add_tf(std::make_shared<Dropout<miaobyte>>(vector<Param>(
467+
{
468+
Param("A", DataCategory::Tensor, Precision::Any),
469+
Param("p", DataCategory::Var, Precision::Float32),
470+
Param("seed", DataCategory::Var, Precision::Int32),
471+
}),
472+
vector<Param>(
473+
{
474+
Param("C", DataCategory::Tensor, Precision::Any),
475+
})));
465476
}
466477
// matmul
467478
void register_matmul(TfFactory &tffactory)

excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cu

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33

44
#include <cuda_bf16.h>
55
#include <cuda_fp16.h>
6+
#include <curand_kernel.h>
7+
8+
69
#include "deepx/tensorfunc/cuda.hpp"
710
#include "deepx/tensorfunc/authors.hpp"
811
#include "deepx/tensorfunc/cuda_math.cuh"
@@ -404,6 +407,48 @@ namespace deepx::tensorfunc
404407
template void launch_invert<int16_t>(const int16_t *a, int16_t *c, const int size);
405408
template void launch_invert<int8_t>(const int8_t *a, int8_t *c, const int size);
406409

410+
//dropout
411+
template <typename T>
412+
__global__ void dropout_kernel(const T *A, const float p,const unsigned int seed, T *C, const int size)
413+
{
414+
int stride = blockDim.x * gridDim.x;
415+
curandState state;
416+
curand_init(seed, threadIdx.x, 0, &state); // 仅初始化一次
417+
418+
for (int idx = blockIdx.x * blockDim.x + threadIdx.x; idx < size; idx += stride)
419+
{
420+
float rand = curand_uniform(&state);
421+
if (rand < p)
422+
{
423+
C[idx] = 0;
424+
}
425+
else
426+
{
427+
C[idx] = A[idx];
428+
}
429+
}
430+
}
431+
432+
template <typename T>
433+
void launch_dropout(const T *a, const float p,const unsigned int seed, T *c, const int size)
434+
{
435+
auto [numBlocks, blockSize] = BestDims(size);
436+
dropout_kernel<<<numBlocks, blockSize>>>(a, p, seed, c, size);
437+
cudaError_t err = cudaGetLastError();
438+
if (err != cudaSuccess)
439+
{
440+
throw std::runtime_error("Failed to launch dropout kernel: " +
441+
std::string(cudaGetErrorString(err)));
442+
}
443+
}
444+
template void launch_dropout<double>(const double *a, const float p,const unsigned int seed, double *c, const int size);
445+
template void launch_dropout<float>(const float *a, const float p,const unsigned int seed, float *c, const int size);
446+
template void launch_dropout<half>(const half *a, const float p,const unsigned int seed, half *c, const int size);
447+
template void launch_dropout<nv_bfloat16>(const nv_bfloat16 *a, const float p,const unsigned int seed, nv_bfloat16 *c, const int size);
448+
template void launch_dropout<int64_t>(const int64_t *a, const float p,const unsigned int seed, int64_t *c, const int size);
449+
template void launch_dropout<int32_t>(const int32_t *a, const float p,const unsigned int seed, int32_t *c, const int size);
450+
template void launch_dropout<int16_t>(const int16_t *a, const float p,const unsigned int seed, int16_t *c, const int size);
451+
template void launch_dropout<int8_t>(const int8_t *a, const float p,const unsigned int seed, int8_t *c, const int size);
407452
}
408453

409454
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAOBYTE_BASIC_CU

excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cuh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,12 @@ namespace deepx::tensorfunc
8989
template <typename T>
9090
void launch_invert(const T* a, T* c,const int size);
9191

92+
//dropout
93+
template <typename T>
94+
__global__ void dropout_kernel(const T* A, const float p,const unsigned int seed, T* C,const int size);
95+
96+
template <typename T>
97+
void launch_dropout(const T* a, const float p,const unsigned int seed, T* c,const int size);
9298
}
9399

94100
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAO_BYTE_BASIC_CUH

excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.hpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,15 @@ namespace deepx::tensorfunc
140140
launch_invert( A.data, C.data, A.shape.size);
141141
}
142142
};
143+
144+
template <typename T>
145+
struct dropoutDispatcher<miaobyte, T>
146+
{
147+
static void dropout(const Tensor<T> &A, const float p,const unsigned int seed, Tensor<T> &C)
148+
{
149+
launch_dropout(A.data, p, seed, C.data, A.shape.size);
150+
}
151+
};
143152
}
144153

145154
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAO_BYTE_BASIC_HPP

0 commit comments

Comments
 (0)