Skip to content

Commit 283e857

Browse files
committed
rft:设计远程tensorfunc的调用协议
1 parent f815237 commit 283e857

20 files changed

Lines changed: 920 additions & 626 deletions

File tree

doc/deepxIR/1_mem.tx

Whitespace-only changes.

doc/deepxIR/func.md

Lines changed: 0 additions & 9 deletions
This file was deleted.

doc/deepxIR/ir.md

Lines changed: 115 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,115 @@
1-
# deepx IR设计说明
2-
3-
+ 1.除了newtensor以外,其他IR均不创建新的张量,而是引用已有的张量
4-
+ 2.IR的输入输出均使用张量名,而不是张量指针(后期IR可能支持直接用值)
5-
+ 3.命名,t开头为张量名,a开头为参数名,v开头为vector名
6-
+ 4.backward需要指令触发,箭头方向为<-,同时需要指定所有grad张量名
7-
+ 5.这份IR均为基本IR,也就是最基础的IR。如relu这类组合IR(可以用max_scalar实现的),则不会出现在这里
8-
9-
## IR列表
10-
11-
## 单向IR(不支持backward)
12-
13-
| IR | 说明 | 例子 |例子作用|
14-
| --- | --- | --- | --- |
15-
| argset | 设置参数 | argset@int32 1->a1 |设置a1为int32类型,值为1|
16-
| argset | 设置参数 | argset@int32 1->a1 |设置a2为int32类型,值为2|
17-
| argset | 设置vector参数 | argset@int32 1 2 3->vec1 |设置vec1为int32类型,值为1 2 3|
18-
| argset | 设置vector参数 | argset@int32 0 1 2->vec2 |设置vec2为int32类型,值为0 1 2|
19-
| argdel | 删除参数 | argdel a |删除a参数|
20-
| newtensor | 创建张量 | newtensor@int32 vec1->t1 |创建一个int32类型的张量t1,并从vec1中复制数据|
21-
| deltensor | 删除张量 | deltensor t1 |删除t1张量|
22-
| constant | tensor初始化-填充固定值 | constant@int32 a1->t1 |给t1填充固定值,值引用a1|
23-
| arange | tensor初始化-生成序列 | arange@int32 a1 a2>t1 |给t1生成序列,从a1开始,步长为a2|
24-
| uniform | tensor初始化-均匀分布 | uniform@int32 a1 a2>t1 |给t1生成均匀分布,low为a1,high为a2|
25-
26-
## 双向IR(支持backward)
27-
| IR | 说明 | 例子 |例子作用|
28-
| --- | --- | --- | --- |
29-
| add | 矩阵加法 | add@float32 t1 t2->t3 |t3=t1+t2|
30-
| add_scalar | 矩阵加法 | add_scalar@float32 t1 a1->t3 |t3=t1+a1,a1为常数|
31-
| sub | 矩阵减法 | sub@float32 t1 t2->t3 |t3=t1-t2|
32-
| mul | 矩阵乘法 | mul@float32 t1 t2->t3 |t3=t1*t2|
33-
| mul_scalar | 矩阵乘法 | mul_scalar@float32 t1 a1->t3 |t3=t1*a1,a1为常数|
34-
| div | 除法 | div@float32 t1 t2->t3 |t3=t1/t2|
35-
| div_scalar | 除法 | div_scalar@float32 t1 a1->t3 |t3=t1/a1,a1为常数|
36-
| mod (还没实现)| 取模 | mod@float32 t1 t2->t3 |t3=t1%t2|
37-
| mod_scalar (还没实现) | 取模 | mod_scalar@float32 t1 a1->t3 |t3=t1%a1,a1为常数|
38-
| exp | 指数 | exp@float32 t1->t3 |t3=exp(t1)|
39-
| sqrt | 平方根 | sqrt@float32 t1->t3 |t3=sqrt(t1)|
40-
| log | 对数 | log@float32 t1->t3 |t3=log(t1)|
41-
| sum | 规约计算-按dims求和 | sum@float32 t1 vec2->t3 |t3=sum(t1,dims=vec2),按vec2的维度求和|
42-
| max | 规约计算-按dims求最大值 | max@float32 t1 t2->t3 |t3=max(t1,t2) |
43-
| max_scalar | 规约计算-按dims求最大值 | max_scalar@float32 t1 a1->t3 |t3=max(t1,a1),a1为常数|
44-
| min | 规约计算-按dims求最小值 | min@float32 t1 t2->t3 |t3=min(t1,t2) |
45-
| min_scalar | 规约计算-按dims求最小值 | min_scalar@float32 t1 a1->t3 |t3=min(t1,a1),a1为常数|
46-
47-
backward时,改变箭头方向为<-
48-
49-
| IR | 说明 | 例子 |例子作用|
50-
| --- | --- | --- | --- |
51-
| add | 矩阵加法 | add@float32 t1(t1_grad) t2(t2_grad)<-t3(t3_grad) |t3=t1+t2,t3_grad=t1_grad+t2_grad|
52-
| add_scalar | 矩阵加法 | add_scalar@float32 t1(t1_grad) a1<-t3(t3_grad) |t3=t1+a1,t3_grad=t1_grad|
53-
| sub | 矩阵减法 | sub@float32 t1(t1_grad) t2(t2_grad)<-t3(t3_grad) |t3=t1-t2,t3_grad=t1_grad-t2_grad|
54-
| mul | 矩阵乘法 | mul@float32 t1(t1_grad) t2(t2_grad)<-t3(t3_grad) |t3=t1*t2,t3_grad=t1_grad*t2+t1*t2_grad|
55-
| mul_scalar | 矩阵乘法 | mul_scalar@float32 t1(t1_grad) a1<-t3(t3_grad) |t3=t1*a1,t3_grad=t1_grad*a1|
56-
| div | 除法 | div@float32 t1(t1_grad) t2(t2_grad)<-t3(t3_grad) |t3=t1/t2,t3_grad=t1_grad/t2-t1*t2_grad/t2^2|
57-
| div_scalar | 除法 | div_scalar@float32 t1(t1_grad) a1<-t3(t3_grad) |t3=t1/a1,t3_grad=t1_grad/a1|
58-
| mod (还没实现)| 取模 | mod@float32 t1(t1_grad) t2(t2_grad)<-t3(t3_grad) |t3=t1%t2,t3_grad=t1_grad%t2|
59-
| mod_scalar (还没实现) | 取模 | mod_scalar@float32 t1(t1_grad) a1<-t3(t3_grad) |t3=t1%a1,t3_grad=t1_grad%a1,a1为常数|
60-
| exp | 指数 | exp@float32 t1(t1_grad)<-t3(t3_grad) |t3=exp(t1),t3_grad=t1_grad*exp(t1)|
61-
| sqrt | 平方根 | sqrt@float32 t1(t1_grad)<-t3(t3_grad) |t3=sqrt(t1),t3_grad=t1_grad/(2*sqrt(t1))|
62-
| log | 对数 | log@float32 t1(t1_grad)<-t3(t3_grad) |t3=log(t1),t3_grad=t1_grad/t1 |
63-
| sum | 规约计算-按dims求和 | sum@float32 t1 vec2<-t3 |t3=sum(t1,dims=vec2),按vec2的维度求和|
64-
| max | 规约计算-按dims求最大值 | max@float32 t1 t2<-t3 |t3=max(t1,t2) |
65-
| max_scalar | 规约计算-按dims求最大值 | max_scalar@float32 t1 a1<-t3 |t3=max(t1,a1),a1为常数|
66-
| min | 规约计算-按dims求最小值 | min@float32 t1 t2<-t3 |t3=min(t1,t2) |
67-
| min_scalar | 规约计算-按dims求最小值 | min_scalar@float32 t1 a1<-t3 |t3=min(t1,a1),a1为常数|
1+
# DeepX IR (Intermediate Representation) 格式规范
2+
3+
DeepX IR 采用简洁的文本格式来表示张量运算。主要分为函数定义(funcdef)和函数调用(funccall)两种模式。
4+
5+
## 基本语法规则
6+
7+
1. 使用 `->` 分隔输入参数和返回值
8+
2. 参数之间使用逗号(,)分隔
9+
3. 向量类型的值使用空格分隔元素
10+
4. 参数和返回值可选择性地用括号()包裹
11+
5. 可在指令后添加元数据,使用 `//` 分隔
12+
13+
## 函数调用(funccall)模式
14+
15+
函数调用模式用于实际执行操作,语法更简洁。
16+
17+
示例:
18+
matmul A,B -> C
19+
sum(A,[1 2 3]) -> B
20+
newtensor 3 4 5 -> T1
21+
22+
## 函数定义(funcdef)
23+
24+
函数定义由excuter层负责注册实现,用于声明操作的参数和返回值类型。excuter通过注册funcdef来声明其支持的tensorfunc。
25+
26+
因此需要设置参数、返回值的详细类型约束
27+
28+
语法示例:
29+
```
30+
matmul(Tensor<float32|float64> A, Tensor<float32|float64> B) -> Tensor<float32|float64> C
31+
sum(Tensor<any> A, vector<int32> dim) -> Tensor<any> B
32+
newtensor(vector<int32> shape) -> Tensor<float32> T1
33+
```
34+
35+
## 元数据格式
36+
37+
可在指令后添加元数据信息:
38+
39+
```
40+
matmul(A,B)->C //id=1 created_at=123456789 sent_at=123456790
41+
```
42+
43+
支持的元数据字段:
44+
- id: 操作ID
45+
- author: 作者,部分tensorfunc的实现,如matmul,会有多实现,需要指定作者以根据环境指定最优实现
46+
- created_at: 创建时间戳
47+
- sent_at: 发送时间戳
48+
49+
## 类型系统
50+
51+
对于tensorfunc的类型系统,我们只关心与tensor相关的类型系统
52+
53+
参考 excuter/common/src/deepx/dtype.hpp
54+
55+
```
56+
{
57+
类型:
58+
var
59+
vector
60+
tensor
61+
listtensor
62+
精度:
63+
float64
64+
float32
65+
float16
66+
bfloat16
67+
fp8
68+
fp4
69+
int64
70+
int32
71+
int16
72+
int8
73+
int4
74+
string//可以用来引用其他var或tensor的name
75+
}
76+
```
77+
多精度支持可以用|分隔,如float32|float64
78+
79+
80+
## funcdef
81+
82+
excuter 负责定义其支持的tensorfunc
83+
84+
1. 矩阵乘法:
85+
```
86+
# funcdef
87+
matmul(Tensor<float32|float64> A, Tensor<float32|float64> B) -> Tensor<float32|float64> C
88+
89+
# funccall
90+
matmul A,B -> C
91+
// rtf(remote tensor func)解析器会自动解析参数和返回值的列表
92+
// excuter会从mem获取A,B,C这3个tensor,并执行matmul操作
93+
```
94+
95+
2. 张量求和:
96+
```
97+
# funcdef
98+
sum(Tensor<any> input, vector<int32> dims,var<bool> keepdim) -> Tensor<any> output
99+
100+
# funccall
101+
sum(T1,[0 1],true) -> T2
102+
// rtf(remote tensor func)解析器会自动解析参数和返回值的列表
103+
// 其中[0 1]会被解析为vector<int32>,便于excuter执行时使用
104+
// true会被解析为var<bool> keepdim,便于excuter执行时使用
105+
// excuter会从mem获取T1,T2这2个tensor,并执行sum操作
106+
```
107+
108+
3. 创建新张量:
109+
```
110+
# funcdef
111+
newtensor(vector<int32> shape) -> Tensor<float32> output
112+
113+
# funccall
114+
newtensor 3 4 5 -> T1
115+
```

doc/excuter/op-mem-ompsimd/list.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,8 @@
44

55
| Operation | Author | Func Def | Math Formula | IR Instruction |
66
|-----------|--------|------------|--------------|----------------|
7-
| argset | none | (arg)->(double) | shape = [3 4 5] | argset(arg )->(double d1) |
8-
| argset | none | (arg)->(float) | shape = [3 4 5] | argset(arg )->(float f1) |
9-
| argset | none | (args)->(int32) | shape = [3 4 5] | argset(args )->(int32 shape) |
10-
| newtensor | none | (shape)->(double) | T1 = zeros(shape) | newtensor(shape )->(double tensor) |
11-
| newtensor | none | (shape)->(float) | T1 = zeros(shape) | newtensor(shape )->(float tensor) |
12-
| newtensor | none | (shape)->(int64) | T1 = zeros(shape) | newtensor(shape )->(int64 tensor) |
13-
| newtensor | none | (shape)->(int32) | T1 = zeros(shape) | newtensor(shape )->(int32 tensor) |
14-
| newtensor | none | (shape)->(int16) | T1 = zeros(shape) | newtensor(shape )->(int16 tensor) |
15-
| newtensor | none | (shape)->(int8) | T1 = zeros(shape) | newtensor(shape )->(int8 tensor) |
7+
| concat | none | (unknown<any>, var<int32>)->(tensor<any>) | Tresult = concat([T1, T2...], axis=3) | concat(unknown<any> tensors, var<int32> axis)->(tensor<any> Tresult) |
8+
| newtensor | none | (var<unknown>)->(tensor<any>) | T1 = zeros(shape) | newtensor(var<unknown> shape)->(tensor<any> tensor1) |
9+
| newtensor | none | (vector<int32>)->(tensor<any>) | T1 = zeros(shape) | newtensor(vector<int32> shape)->(tensor<any> tensor1) |
10+
| vecset | none | (vector<any>)->() | shape = [3 4 5] | vecset(vector<any> shape)->() |
11+
| argset | none | (var<any>)->() | var argname = argvalue | argset(var<any> argname)->() |

0 commit comments

Comments
 (0)