dropout:ompsimd+cuda的实现 (#44)

miaobyte · web-flow · commit 05c15bf22000 · 2025-04-28T21:38:50.000+08:00
* dropout:ompsimd+cuda的实现

* dropout:ompsimd+cuda的实现
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,45 @@
+# DeepX 行为准则
+
+## 我们的承诺
+
+作为贡献者和维护者，我们承诺为每个人提供一个开放和欢迎的环境。
+
+## 我们的标准
+
+有助于创造积极环境的行为包括但不限于：
+
+- 使用友好和包容的语言
+- 尊重不同的观点和经验
+- 耐心地接受建设性的批评
+- 关注对社区最有利的事情
+- 友善对待其他社区成员
+
+不可接受的行为包括但不限于：
+
+- 使用性化的语言或图像以及不受欢迎的性关注或挑逗
+- 捣乱/煽动/侮辱性/贬损的评论，人身攻击或政治攻击
+- 公开或私下的骚扰
+- 未经明确许可，发布他人的私人信息，如物理或电子地址
+- 其他可以合理地被认为不符合专业行为的行为
+
+## 我们的责任
+
+项目维护者有责任澄清可接受行为的标准，并应对任何不可接受的行为采取适当和公平的纠正措施。
+
+项目维护者有权利和责任删除、编辑或拒绝与本行为准则不符的评论、提交、代码、wiki编辑、问题和其他贡献，并可暂时或永久禁止任何他们认为不适合、威胁、冒犯或有害的贡献者。
+
+## 适用范围
+
+当个人代表项目或其社区时，本行为准则适用于项目空间和公共空间。
+
+## 执行
+
+如有滥用、骚扰或其他不可接受的行为，请通过以下方式联系项目团队。所有投诉都将被审查和调查，并将导致认为必要和适当的回应。
+
+## 联系信息
+
+请通过 [您的联系信息] 联系我们。
+
+## 归属
+
+本行为准则改编自[贡献者公约](https://www.contributor-covenant.org)，版本1.4。
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,28 @@
+# deepx 贡献指南
+
+deepx框架的发展，主要包括五大类方向
+
++ front: 新增模型、module、python类函数等
++ 中间层：包括计算图优化器，插件系统(自动KVcache系统)，自动分布式化，栈tensor自动释放，自动Inplace化等操作
++ 新增或修改excuter
++ 增加或修改算子，进一步可以分为leaftensorfunc(不可分割的基础算子)，fusedtensorfunc（融合算子）
++ 文档丰富：
++ 运维自动化方向
+
+大家可以选择一个方向
+
+## 步骤
+
+第一次提交
+  1. Fork本仓库（github.com/array2d/deepx）的main分支，到你的github/yourname/deepx
+  2. 本地clone github/yourname/deepx
+  3. 提交并推送您的更改到你的github：`git commit -m 'Add some feature'`
+  4. 创建一个Pull Request。
+
+第N次提交 
+
+  1. 保障你的本地和github/yourname/deepx中均已提pull request并得到merge
+  2. 在github/yourname/deepx中sync fork【危险操作，会删除你新增的代码】，拉取（github.com/array2d/deepx） main分支的最新代码
+  3. 本地clone github/yourname/deepx
+  4. 提交并推送您的更改到你的github：`git commit -m 'Add some feature'`
+  5. 创建一个Pull Request。
diff --git a/README.md b/README.md
@@ -66,29 +66,6 @@ DeepX可以集成现有的张量计算框架作为执行器，充分利用现有
 
 这种架构使得DeepX可以整合各类先进的计算框架作为执行引擎，同时提供统一的分布式调度和执行能力，为用户提供更灵活的选择和更高的性能。
 
-## 二.贡献指南
-
-也可以参考官方文档的指南
-
-https://deepx.array2d.com 
-
-欢迎通过以下方式参与项目共建：
-
-1. **代码贡献**
-   - 提交PR前请先创建Issue说明修改内容
-   - front项目当前以py为核心
-   - excuter:目前规划开发的3类执行器,参考这里如何给excuter添加一个新算子[excuter](doc/excuter/excuter.md)
-      - cpu:
-      - cuda:
-      - jax:
- 
-2. **文档改进**
-   - 提交文档更新到`doc/`目录
-
-3. **问题反馈**
-   - 当前处于高速迭代中，可通过issue反馈问题
- 
-
  
  ### 官方文档
  
diff --git a/doc/excuter/op-mem-cuda/list.md b/doc/excuter/op-mem-cuda/list.md
@@ -48,6 +48,7 @@
 
 | Operation | Author |  Math Formula | IR Instruction |
 |-----------|--------|--------------|----------------|
+| dropout | miaobyte | T1.dropout(p,seed)->T3 | dropout(tensor<any> A, var<float32> p, var<int32> seed)->(tensor<any> C) |
 | switch | miaobyte | C=switch(tensors,cases) | switch(listtensor<any> tensors, tensor<int8> cases)->(tensor<any> result) |
 | greaterscalar | miaobyte | mask=compare(T1, scalar) | greaterscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
 | equalscalar | miaobyte | mask=compare(T1, scalar) | equalscalar(tensor<any> A, var<any> scalar, var<float64> epsilon)->(tensor<bool> mask) |
diff --git a/doc/excuter/op-mem-ompsimd/list.md b/doc/excuter/op-mem-ompsimd/list.md
@@ -49,6 +49,7 @@
 
 | Operation | Author |  Math Formula | IR Instruction |
 |-----------|--------|--------------|----------------|
+| dropout | miaobyte | A.dropout(p,seed)->C | dropout(tensor<any> A, var<float32> p, var<int32> seed)->(tensor<any> C) |
 | switch | miaobyte | C=switch([tensors],case) | switch(listtensor<any> tensors, tensor<int8> cases)->(tensor<any> C) |
 | greaterscalar | miaobyte | mask=greater(T1,scalar) | greaterscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
 | equalscalar | miaobyte | mask=equal(T1,scalar) | equalscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
diff --git a/excuter/cpp-common/src/deepx/tensorfunc/elementwise.hpp b/excuter/cpp-common/src/deepx/tensorfunc/elementwise.hpp
@@ -412,6 +412,19 @@ namespace deepx::tensorfunc
         invertDispatcher<Author, T>::invert(input, output);
     }
 
+    //dropout(A,p)=>C
+    template <typename Author, typename T>
+    struct dropoutDispatcher
+    {
+        static void dropout(const Tensor<T> &input, const float p,const unsigned int seed, Tensor<T> &output) = delete;
+    };
+
+    template <typename Author, typename T>
+    void dropout(const Tensor<T> &input, const float p,const unsigned int seed, Tensor<T> &output)
+    {
+        dropoutDispatcher<Author, T>::dropout(input, p, seed, output);
+    }
+    
 } // namespace deepx::tensorfunc
 
 #endif // DEEPX_TENSORFUNC_ELEMENTWISE_HPP
diff --git a/excuter/op-mem-cuda/src/client/tfs.cpp b/excuter/op-mem-cuda/src/client/tfs.cpp
@@ -462,6 +462,17 @@ namespace deepx::tf
                                                                 {
                                                                     Param("result", DataCategory::Tensor, Precision::Any),
                                                                 })));
+        // dropout
+        tffactory.add_tf(std::make_shared<Dropout<miaobyte>>(vector<Param>(
+                                                                 {
+                                                                     Param("A", DataCategory::Tensor, Precision::Any),
+                                                                     Param("p", DataCategory::Var, Precision::Float32),
+                                                                     Param("seed", DataCategory::Var, Precision::Int32),
+                                                                 }),
+                                                                 vector<Param>(
+                                                                     {
+                                                                         Param("C", DataCategory::Tensor, Precision::Any),
+                                                                     })));
     }
     // matmul
     void register_matmul(TfFactory &tffactory)
diff --git a/excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cu b/excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cu
@@ -3,6 +3,9 @@
 
 #include <cuda_bf16.h>
 #include <cuda_fp16.h>
+#include <curand_kernel.h>
+
+
 #include "deepx/tensorfunc/cuda.hpp"
 #include "deepx/tensorfunc/authors.hpp"
 #include "deepx/tensorfunc/cuda_math.cuh"
@@ -404,6 +407,48 @@ namespace deepx::tensorfunc
     template void launch_invert<int16_t>(const int16_t *a, int16_t *c, const int size);
     template void launch_invert<int8_t>(const int8_t *a, int8_t *c, const int size);
 
+    //dropout
+    template <typename T>
+    __global__ void dropout_kernel(const T *A, const float p,const unsigned int seed, T *C, const int size)
+    {
+        int stride = blockDim.x * gridDim.x;
+        curandState state;
+        curand_init(seed, threadIdx.x, 0, &state); // 仅初始化一次
+
+        for (int idx = blockIdx.x * blockDim.x + threadIdx.x; idx < size; idx += stride)
+        {
+            float rand = curand_uniform(&state);
+            if (rand < p)
+            {
+                C[idx] = 0;
+            }
+            else
+            {
+                C[idx] = A[idx];
+            }
+        }
+    }
+
+    template <typename T>
+    void launch_dropout(const T *a, const float p,const unsigned int seed, T *c, const int size)
+    {
+        auto [numBlocks, blockSize] = BestDims(size);
+        dropout_kernel<<<numBlocks, blockSize>>>(a, p, seed, c, size);
+        cudaError_t err = cudaGetLastError();
+        if (err != cudaSuccess)
+        {
+            throw std::runtime_error("Failed to launch dropout kernel: " +
+                                     std::string(cudaGetErrorString(err)));
+        }
+    }
+    template void launch_dropout<double>(const double *a, const float p,const unsigned int seed, double *c, const int size);
+    template void launch_dropout<float>(const float *a, const float p,const unsigned int seed, float *c, const int size);
+    template void launch_dropout<half>(const half *a, const float p,const unsigned int seed, half *c, const int size);
+    template void launch_dropout<nv_bfloat16>(const nv_bfloat16 *a, const float p,const unsigned int seed, nv_bfloat16 *c, const int size);
+    template void launch_dropout<int64_t>(const int64_t *a, const float p,const unsigned int seed, int64_t *c, const int size);
+    template void launch_dropout<int32_t>(const int32_t *a, const float p,const unsigned int seed, int32_t *c, const int size);
+    template void launch_dropout<int16_t>(const int16_t *a, const float p,const unsigned int seed, int16_t *c, const int size);
+    template void launch_dropout<int8_t>(const int8_t *a, const float p,const unsigned int seed, int8_t *c, const int size);
 }
 
 #endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAOBYTE_BASIC_CU
diff --git a/excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cuh b/excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cuh
@@ -89,6 +89,12 @@ namespace deepx::tensorfunc
     template <typename T>
     void launch_invert(const T* a, T* c,const int size);
 
+    //dropout
+    template <typename T>
+    __global__ void dropout_kernel(const T* A, const float p,const unsigned int seed, T* C,const int size);
+
+    template <typename T>
+    void launch_dropout(const T* a, const float p,const unsigned int seed, T* c,const int size);
 }
 
 #endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAO_BYTE_BASIC_CUH
diff --git a/excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.hpp b/excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.hpp
@@ -140,6 +140,15 @@ namespace deepx::tensorfunc
             launch_invert( A.data, C.data, A.shape.size);
         }
     };
+
+    template <typename T>
+    struct dropoutDispatcher<miaobyte, T>
+    {
+        static void dropout(const Tensor<T> &A, const float p,const unsigned int seed, Tensor<T> &C)
+        {
+            launch_dropout(A.data, p, seed, C.data, A.shape.size);
+        }           
+    };
 }
 
 #endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAO_BYTE_BASIC_HPP
diff --git a/excuter/op-mem-cuda/src/deepx/tf/elementwise_basic.hpp b/excuter/op-mem-cuda/src/deepx/tf/elementwise_basic.hpp
@@ -1033,6 +1033,74 @@ namespace deepx::tf
             return 0;
         }
     };
+
+    // dropout
+    template <typename Author>
+    class Dropout : public TF
+    {
+    public:
+        Dropout(const vector<Param> &args, const vector<Param> &returns)
+        {
+            this->name = "dropout";
+            this->metadata.author = Author::name();
+            this->tftype = "elementwise";
+            this->args = args;
+            this->returns = returns;
+        }
+        string math_formula() const override
+        {
+            return "T1.dropout(p,seed)->T3";
+        }
+        shared_ptr<TF> clone() const override
+        {
+            return make_shared<Dropout<Author>>(*this);
+        }
+        int run(shared_ptr<MemBase> mem, string &error) override
+        {
+            if (!checktensors({this->args[0].textvalue, this->returns[0].textvalue}, mem, error))
+            {
+                return 1;
+            }
+            Precision a_type = mem->gettensor(this->args[0].textvalue).get()->shape.dtype;
+            Precision c_type = mem->gettensor(this->returns[0].textvalue).get()->shape.dtype;
+            if (a_type != c_type)
+            {
+                error = "Type mismatch: " + precision_str(a_type) + " != " + precision_str(c_type);
+                return 1;
+            }
+            switch (a_type)
+            {
+            case Precision::Float64:
+                tensorfunc::dropout<Author>(*mem->gettensor<double>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<double>(this->returns[0].textvalue));
+                break;
+            case Precision::Float32:
+                tensorfunc::dropout<Author>(*mem->gettensor<float>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<float>(this->returns[0].textvalue));
+                break;
+            case Precision::Float16:
+                tensorfunc::dropout<Author>(*mem->gettensor<half>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<half>(this->returns[0].textvalue));
+                break;
+            case Precision::BFloat16:
+                tensorfunc::dropout<Author>(*mem->gettensor<nv_bfloat16>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<nv_bfloat16>(this->returns[0].textvalue));
+                break;  
+            case Precision::Int64:
+                tensorfunc::dropout<Author>(*mem->gettensor<int64_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int64_t>(this->returns[0].textvalue));
+                break;
+            case Precision::Int32:
+                tensorfunc::dropout<Author>(*mem->gettensor<int32_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int32_t>(this->returns[0].textvalue));
+                break;
+            case Precision::Int16:
+                tensorfunc::dropout<Author>(*mem->gettensor<int16_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int16_t>(this->returns[0].textvalue));
+                break;
+            case Precision::Int8:
+                tensorfunc::dropout<Author>(*mem->gettensor<int8_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int8_t>(this->returns[0].textvalue));
+                break;
+            default:
+                error = "Unsupported dtype: " + precision_str(a_type);
+                return 1;
+            }
+            return 0;
+        }
+    };
 };
 
 #endif // DEEPX_TF_ELEMENTWISE_BASIC_HPP
diff --git a/excuter/op-mem-ompsimd/src/client/tfs.cpp b/excuter/op-mem-ompsimd/src/client/tfs.cpp
@@ -465,6 +465,17 @@ namespace deepx::tf
                                                                 {
                                                                     Param("C", DataCategory::Tensor, Precision::Any),
                                                                 })));
+        // dropout author=miaobyte
+        tffactory.add_tf(std::make_shared<Dropout<miaobyte>>(vector<Param>(
+                                                                 {
+                                                                     Param("A", DataCategory::Tensor, Precision::Any),
+                                                                     Param("p", DataCategory::Var, Precision::Float32),
+                                                                     Param("seed", DataCategory::Var, Precision::Int32),
+                                                                 }),
+                                                                 vector<Param>(
+                                                                     {
+                                                                         Param("C", DataCategory::Tensor, Precision::Any),
+                                                                     }) ));
     }
     // matmul
     void register_matmul(TfFactory &tffactory)
diff --git a/excuter/op-mem-ompsimd/src/deepx/tensorfunc/elementwise_miaobyte.hpp b/excuter/op-mem-ompsimd/src/deepx/tensorfunc/elementwise_miaobyte.hpp
diff --git a/excuter/op-mem-ompsimd/src/deepx/tf/elementwise.hpp b/excuter/op-mem-ompsimd/src/deepx/tf/elementwise.hpp