deepxir: (#112)

miaobyte · peng.li24 · web-flow · commit 6bdfc47f565d · 2026-01-30T17:29:18.000+08:00
* deepxir:设计完善

* deepxir:设计完善

* deepxir:接近第一版方案定型

---------

Co-authored-by: peng.li24 &lt;peng.li24@nio.com&gt;
diff --git a/.github/workflows/executor-deepxcore.yml b/.github/workflows/executor-deepxcore.yml
@@ -1,4 +1,4 @@
-name: Excuter/cppcommon Build
+name: executor/deepxcore Build
 on:
   push:
     paths:
diff --git a/.github/workflows/executor-heapmemcuda.yml b/.github/workflows/executor-heapmemcuda.yml
@@ -1,4 +1,4 @@
-name: op/cuda-linux Build
+name: executor/heapmem-cuda Build
 on:
   push:
     paths:
@@ -61,8 +61,8 @@ jobs:
             cp -r include/* /usr/local/include/ && \
             cd /workspace && \
             
-            # 构建 common 库
-            cd executor/cpp-common && \
+            # 构建 deepxcore 库
+            cd executor/deepxcore && \
             mkdir -p build && cd build && \
             cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -GNinja .. && \
             ninja && \
diff --git a/.github/workflows/executor-op-cuda-linux.yml b/.github/workflows/executor-op-cuda-linux.yml
@@ -1,4 +1,4 @@
-name: Excuter/cuda-linux Build
+name: executor/op-cuda-linux Build
 on:
   push:
     paths:
diff --git a/.github/workflows/executor-op-ompsimd-linux.yml b/.github/workflows/executor-op-ompsimd-linux.yml
@@ -1,4 +1,4 @@
-name: Excuter/ompsimd-linux Build
+name: executor/op-mem-ompsimd-linux Build
 on:
   push:
     paths:
diff --git a/docs/deepxIR/deepxir.md b/docs/deepxIR/deepxir.md
@@ -0,0 +1,136 @@
+# DeepX IR（deepxir）规范
+
+## 1. 类型系统
+
+### 基础数据类型
+```
+type f16, f32, f64, bf16, bf8    // 浮点类型
+type i8, i16, i32, i64, u8       // 整数类型
+type bool                       // 布尔类型
+```
+
+### 动态长度类型
+```
+list<type>   // list 可以和以上基础类型组合
+```
+
+### 类型约束
+```
+f32|f64   // 支持两种/多种 类型之一
+```
+
+### Tensor 类型模板
+```
+type tensor<shape, elem_type>
+```
+- shape 格式：dim1xdim2x...xdimN，或使用 `?` 表示动态维度。 最后一个x后的是精度。 
+- 示例：`tensor<10x20xf32>`, `tensor<?x?xi32>`
+
+tensor 也可以没有 shape 和 dtype 的约束，例如：
+```
+deepxir addscalar(A:tensor, b:i8|i16|i32|i64) -> (c:tensor) { ... }
+```
+表示任意 shape、任意 dtype 的 tensor 都可作为参数。
+
+### 动态维度变量
+- `?` 任意数字  
+- `?1` 动态维度变量 1  
+- `?2` 动态维度变量 2（用于表示同名变量处维度需一致）  
+- 示例：`tensor<?1x?2xf32>`
+
+## 2. IR 定义格式
+
+语法示例：
+```
+deepxir ir_name(ro_p1:type1, ro_param2:type2, ...) -> (w_p1:type3, w_p2:type4, ...)
+{
+    // 函数体：IR 操作序列
+    operation_name(ro_p1, ro_p2) -> w_p1
+    operation_name(ro_p2, ro_p2) -> w_p2
+}
+```
+- `deepxir` 为关键字，也可使用 `function`、`func` 等。  
+- 参数遵循“左读右写”规则（无返回值；通过写入参数实现输出）。  
+- 参数类型支持：`tensor`、`list<tensor>`、基础类型，以及基础类型的 list。
+
+## 3. 设计思考
+DeepX IR 采用简洁的文本格式表示张量类型约束、运算定义与运算体，便于阅读与解析。
+deepx不是ssa，调用时，依然遵循左读右写的参数列表原则，右写的参数列表支持多个。
+
+## 4. 具体示例
+
+### 示例 1：融合 Linear + 归一化
+```
+deepxir fused_linear_norm(
+    A: tensor<?1x?2xf32>,
+    W: tensor<?2x?3xf32>,
+    b: tensor<?3xf32>,
+    axis: i32,
+    keepdims: bool
+) -> (out: tensor<?1x?3xf32>) {
+    newtensor(?1x?3, f32)->(mm)
+    matmul(A, W)-> (mm)
+    newtensor(?1x?3, f32)-> bias
+    add(mm, b)-> bias
+    deltensor(mm)-> mm
+    newtensor(?1, f32)-> mean
+    sum(bias, axis, keepdims)-> mean
+    newtensor(?1x?3, f32)-> centered
+    sub(bias, mean)-> centered
+    deltensor(bias)-> bias
+    deltensor(mean)-> mean
+    newtensor(?1x?3, f32)-> sq
+    mul(centered, centered)-> sq
+    deltensor(centered)-> centered
+    newtensor(?1, f32)-> var
+    sum(sq, axis, keepdims)-> var
+    deltensor(sq)-> sq
+    constant(1e-5)-> eps
+    newtensor(?1, f32)-> var_eps
+    add(var, eps)-> var_eps
+    deltensor(var)-> var
+    deltensor(eps)-> eps
+    newtensor(?1, f32)-> std
+    sqrt(var_eps)-> std
+    deltensor(var_eps)-> var_eps
+    div(std, std)-> std
+    deltensor(std)-> std
+    div(centered, std)-> out
+}
+```
+
+下面给出一个完整的 `deepxir` 调用示例：在一个 IR 中先构造输入张量和辅助参数，然后调用 `fused_linear_norm`，输出 `out`。
+
+```
+deepxir example_use_fused_linear_norm() -> (out: tensor<2x3xf32>) {
+    newtensor([2,4], f32)-> A
+    newtensor([4,3], f32)-> W
+    newtensor([3], f32)-> b
+    fused_linear_norm(A, W, b, 1, false) -> out
+}
+```
+
+该示例展示了如何在 IR 中构造必要的张量/参数并调用 `fused_linear_norm`，其中 `out` 的类型为 `tensor<2x3xf32>`，与 `W` 的列数和 `A` 的行数对应。
+
+### 示例 2：融合 Attention score + Softmax
+```
+deepxir fused_attention_scores(
+    Q: tensor<?x?xf32>,
+    K: tensor<?x?xf32>,
+    axis: list<i32>,
+    keepdims: bool,
+    shape_scores: list<i32>,
+    shape_sum: list<i32>
+) -> (out: tensor<?x?xf32>) {
+    newtensor(shape_scores, f32)-> scores_tmp
+    matmul(Q, K)-> scores_tmp
+    newtensor(shape_scores, f32)-> exp_tmp
+    exp(scores_tmp)-> exp_tmp
+    deltensor(scores_tmp)-> scores_tmp
+    newtensor(shape_sum, f32)-> sum_tmp
+    sum(exp_tmp, axis, keepdims)-> sum_tmp
+    div(exp_tmp, sum_tmp)-> out
+    deltensor(exp_tmp)-> exp_tmp
+    deltensor(sum_tmp)-> sum_tmp
+}
+```
diff --git a/docs/deepxIR/ir.md b/docs/deepxIR/ir.md
diff --git a/executor/deepxcore/README.md b/executor/deepxcore/README.md
@@ -0,0 +1,72 @@
+# deepxcore
+
+deepxcore 是 deepx 执行器层与统一存算面协议共享的 C++ 核心基础库。
+
+它的目标是提供稳定、跨执行器可复用的数据模型与协议对象，避免把 CUDA/Metal/CPU 等具体实现细节渗透到上层与其他组件，从而保证进程间与代码组件的隔离。
+
+## 定位
+- 面向：执行器进程（heapmem-*、op-*）、统一存算面 SDK、调度/编译侧的 C++ 组件
+- 提供：dtype/shape/tensor 等基础数据结构、协议对象的结构化表达、配置与序列化基础设施
+- 不提供：具体硬件算子实现、显存/IPC 生命周期实现、调度编译逻辑
+
+## 职责
+
+### 1) 基础数据模型
+- `DType`：数据类型描述与大小/对齐等基础能力
+- `Shape`：维度/元素数量/bytes 计算、shape 合法性检查
+- `Tensor`：Tensor 元信息与句柄表达（不绑定具体设备实现）
+
+这些类型应作为所有执行器的共同语言，保证跨组件传递时语义一致。
+
+### 2) 统一存算面协议对象
+用于在统一寻址空间（如 Redis KV）与执行器之间传递的数据结构，例如：
+- tensor 元信息记录（name/key、dtype、shape、device、bytes、ctime 等）
+- 生命周期指令（create/get/delete 等）
+
+deepxcore 只负责“结构化表达与编解码”，不负责“真正分配/回收/IPC 映射”。
+
+### 3) 序列化/反序列化与配置
+- 将协议对象、元信息在 JSON/YAML/二进制之间进行编解码
+- 读取执行器/客户端的配置（例如地址、设备策略、协议版本等）
+
+目标是让其他组件不要各自实现一套解析与校验逻辑。
+
+### 4) 通用基础设施
+- 轻量的错误与返回值表达（Status/Result）
+- 字符串、文件系统等工具的薄封装
+
+要求保持依赖尽量少、接口稳定、与具体硬件/运行时解耦。
+
+## 非职责（边界）
+
+### 不做硬件绑定
+- 不直接依赖 CUDA/Metal/ROCm/NCCL 等
+- 不实现任何具体算子 kernel
+
+这些应由 `op-cuda`、`op-ompsimd`、`op-mem-mps` 等执行器承担。
+
+### 不做堆 tensor 生命周期与 IPC
+- 不管理持久堆 tensor 的分配/回收
+- 不负责 CUDA IPC handle 的创建/打开/关闭
+
+这些应由 `heapmem-cuda` 这类“统一寻址空间的 tensor 具体实现”承担。
+
+### 不做编译与调度
+- 不负责 deepxIR 的编译替换、fusion、分布式调度
+
+这些属于中端编译器与调度器。
+
+## 与其他组件的关系
+
+- heapmem-*：owner 侧负责堆 tensor 生命周期与跨进程共享；deepxcore 提供 dtype/shape/协议对象
+- op-*：算子执行器负责栈 tensor（中间变量）与 kernel；deepxcore 提供基础数据模型与统一的元信息表达
+- 前端/SDK：通过统一协议把计算图与 tensor 元信息写入统一寻址空间；deepxcore 是 C++ 侧共用的协议层
+
+## 目录
+- `src/`：核心库实现
+- `test/`：单元测试
+
+## 构建
+本库通过 CMake 构建，并作为其他执行器目标的依赖被链接。
+
+在上层执行器中使用时，通常只需要链接 `deepxcore` 目标，并包含对应头文件。

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-name: Excuter/cppcommon Build`
	`1`	`+name: executor/deepxcore Build`
`2`	`2`	`on:`
`3`	`3`	`push:`
`4`	`4`	`paths:`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-name: Excuter/cuda-linux Build`
	`1`	`+name: executor/op-cuda-linux Build`
`2`	`2`	`on:`
`3`	`3`	`push:`
`4`	`4`	`paths:`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-name: Excuter/ompsimd-linux Build`
	`1`	`+name: executor/op-mem-ompsimd-linux Build`
`2`	`2`	`on:`
`3`	`3`	`push:`
`4`	`4`	`paths:`