【Hackathon 10th Spring No.2】Add schema parser and related functionality for Torch compatibility#77938
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive schema parser functionality to the TORCH_LIBRARY registration mechanism, enabling PyTorch-compatible operator definitions with full type information, default values, keyword arguments, and alias annotations. This is part of the Hackathon 10th Spring No.2 task and significantly enhances the compatibility layer between PaddlePaddle and PyTorch.
Changes:
- Implemented schema parsing infrastructure including FunctionSchemaParser and SchemaTypeParser to parse PyTorch-style operator schemas
- Extended FunctionArgs to support named/keyword arguments and automatic argument normalization based on parsed schemas
- Added comprehensive type system (FunctionSchema, Argument, AliasInfo, Type hierarchy) to represent parsed schema information
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/cpp/compat/torch_library_test.cc | Added 400+ lines of comprehensive tests for schema parsing, including default values, keyword arguments, optional types, tuples, and variadic functions |
| test/cpp/compat/schema_parser_type_test.cc | New test file (206 lines) specifically for testing schema type parser edge cases and torchcodec-like schemas |
| test/cpp/compat/CMakeLists.txt | Moved torch_library_test from GPU-only (nv_test) to general (cc_test) and added schema_parser_type_test |
| paddle/phi/api/include/compat/torch/library.h | Added schema binding to CppFunction, keyword argument support in FunctionArgs, and normalize_args_by_schema logic |
| paddle/phi/api/include/compat/torch/library.cpp | Updated OperatorRegistry to parse and bind schemas to implementations |
| paddle/phi/api/include/compat/torch/csrc/jit/function_schema_parser.h/cpp | New 574-line schema parser implementation handling full PyTorch schema grammar |
| paddle/phi/api/include/compat/torch/csrc/jit/schema_type_parser.h/cpp | New 240-line type parser for parsing schema types including tuples, optionals, and alias annotations |
| paddle/phi/api/include/compat/torch/csrc/jit/schema_parser_defs.h | Constants and macros for schema parsing (type names, keywords, characters) |
| paddle/phi/api/include/compat/ATen/core/function_schema.h/cpp | New FunctionSchema and Argument classes to represent parsed schemas |
| paddle/phi/api/include/compat/ATen/core/jit_type_base.h | Base Type class hierarchy with SingletonOrSharedTypePtr for type system |
| paddle/phi/api/include/compat/ATen/core/jit_type.h | Concrete type implementations (TensorType, IntType, FloatType, etc.) and schema-specific types |
| paddle/phi/api/include/compat/ATen/core/alias_info.h | AliasInfo class for representing aliasing metadata in schemas |
| paddle/phi/api/include/compat/ATen/core/type_ptr.h | SingletonTypePtr wrapper for singleton type instances |
| paddle/phi/api/include/compat/ATen/core/functional.h | Utility functions (fmap, filter) for working with collections |
| paddle/phi/api/include/compat/CMakeLists.txt | Updated build to include new schema parser source files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| const size_t idx = it->second; | ||
| if (assigned[idx]) { | ||
| throw std::runtime_error("Argument `" + name + "` is already provided"); |
There was a problem hiding this comment.
The error message "Argument name is already provided" could be more helpful by specifying whether the argument was provided both positionally and by keyword. Consider changing to: "Argument " + name + " provided both positionally and as keyword argument"
| throw std::runtime_error("Argument `" + name + "` is already provided"); | |
| throw std::runtime_error("Argument `" + name + | |
| "` provided both positionally and as keyword argument"); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::move(*alias_info)) | ||
| : nullptr), | ||
| kwarg_only_(kwarg_only) { | ||
| // this is an softly-enforced invariant for out arguments. |
There was a problem hiding this comment.
The comment contains a grammatical error. It should be "This is a softly-enforced invariant" instead of "this is an softly-enforced invariant". The article "an" should be "a" before "softly".
| // this is an softly-enforced invariant for out arguments. | |
| // This is a softly-enforced invariant for out arguments. |
|
@gouzil 测试哈,这个 PR 当下 🐁 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #77938 +/- ##
==========================================
Coverage ? 97.53%
==========================================
Files ? 11
Lines ? 1015
Branches ? 0
==========================================
Hits ? 990
Misses ? 25
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. |
There was a problem hiding this comment.
参考 torch 的代码记得添加声明
// #The file has been adapted from pytorch project
// #Licensed under BSD-style license -
// https://github.com/pytorch/pytorch/blob/main/LICENSE
参考 ivalue.h
There was a problem hiding this comment.
参考 #78590 改一下声明
另外,paddlecodec 那边目前卡在哪里了?
我按 #78590 的样式对着看了一下,这个 PR 当前至少还有这 3 个本次新增文件仍然是旧的声明写法:
paddle/phi/api/include/compat/ATen/core/alias_info.hpaddle/phi/api/include/compat/ATen/core/function_schema.hpaddle/phi/api/include/compat/ATen/core/function_schema.cpp
它们现在还是:
// #The file has been adapted from pytorch project// #Licensed under BSD-style license -
也就是还没按 #78590 里那种去掉多余 #、规范 BSD-style 行空格的写法收齐。repo 里其他 compat 老文件也还有不少同类存量,但如果只看这个 PR 自己新加的内容,我会先把这 3 处一起改掉。
paddlecodec 那边目前卡点还是 PFCCLab/paddlecodec#2:
- 现在那版改动主要是 link-time 的
libpaddle路径/库名处理 - 还没有覆盖我之前卡住它的 macOS runtime
@rpath/libav*.dylib查找问题 - 也没有补 macOS 上最基本的
import torchcodec级别验证 - 另外 @SigureMo 之前提过“目标是减少 diff”,所以那边现在既有 runtime 搜索路径问题,也有 diff 范围需要继续收敛的问题
There was a problem hiding this comment.
There was a problem hiding this comment.
不需要全部恢复,这不是这个 PR 需要考虑的问题,只需要考虑这个 PR 需要考虑的部分即可
这个 PR 合入之后 PFCCLab/paddlecodec#3 就可以合入的话,那我觉得这个 PR 目前已经达到合入标准了
PFCCLab/paddlecodec#3 依赖 #78521 吗?
There was a problem hiding this comment.
PFCCLab/paddlecodec#3 依赖 #78521 吗?
我回查了最新 head 5cad6631bbbd7ef6c9a65c74fd94b79fcbd8b79a,这次新 push 本身只是把 develop merge 进来;从 compare 看,这次带进来的上游提交里已经包含:
3bb286ab/#78521:[common] Add support for std::optional in error messagese27e6c2e/#78590:[Cpp API Compatibility] Normalize adapted-from-PyTorch declarations
所以如果 PFCCLab/paddlecodec#3 之前卡在 #78521,按这个 PR 当前分支基线看,这部分依赖现在已经被吸收进来了,不再是额外阻塞项。
顺带我也回查了当前状态:
alias_info.h/function_schema.h/function_schema.cpp这 3 个本 PR 新增核心文件的声明样式已经跟到#78590;Check approval现在仍然失败,但日志已经是正常分行输出,当前失败原因是缺审批(cc_test/ Chinese /PADDLE_API),不是之前那个换行显示问题;- 其他 CI 目前还在继续跑。
There was a problem hiding this comment.
| return std::make_tuple(get<Types>(I)...); | ||
| } | ||
| std::vector<torch::IValue> args_; | ||
| std::unordered_map<std::string, torch::IValue> named_args_; |
There was a problem hiding this comment.
named_args_ 如果是 unordered_map,那是否支持在 named_args 部分通过位置传参?(torch 是否支持这种行为?我们是否支持这种行为?)
There was a problem hiding this comment.
这里其实是参考 https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/function.h 中的 Stack 和 Kwargs,但 torch 在这里会把 Kwargs 全部合入到 Stack。
现在的实现是支持混合的,比如:schema:f(int a, int b) 调用:f(1, b=2)
There was a problem hiding this comment.
args_ 与 named_args_.values() 是什么关系?包含关系还是没有交集?
There was a problem hiding this comment.
在这个组合下,int a 是存储在 args_ 的,int b 是存储在 named_args_ 的。如果出现交集会报错 Duplicate keyword argument 或者 Argument xxxx is already provided
There was a problem hiding this comment.
那我知道了,这里 FunctionArgs 表征的是传参方式,而非函数 schema 及其运行时 binding
| @@ -0,0 +1,790 @@ | |||
| // Copyright (c) 2026 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
@SigureMo 这个文件都是从 torch copy 过来有关 schema 的测试。暂时不支持的都有做简单的说明
ShigureNyako
left a comment
There was a problem hiding this comment.
整体看,这个 PR 在 schema parser / type parser / kwargs 归一化这条主线上已经比较完整,测试投入也不少,作为“最小可用的 Torch schema 兼容实现”质量是不错的。
这次 review 我主要关注了两点:
- 实例方法的 C++ 调用路径里,kwargs / named args 现在还没有和 Python 路径保持一致,后续如果类方法也开始依赖 kw-only 参数,可能会出现行为分叉。
TORCH_LIBRARY_IMPL中def(schema)与def(schema, fn)当前表现不一致,建议确认是否真的希望把后者当成受支持语义。
另外一个建议是:当前默认值解析和 type 支持范围看起来更接近“受控子集”,例如 identifier 形式默认值对真实 Torch schema 的覆盖面还偏窄。即使这是本 PR 的预期取舍,也建议在 PR 描述或代码注释里把当前支持边界写得更明确一些,方便后续使用方判断兼容范围。
| {"example_library_with_mdef_cases::overload.name", | ||
| {torch::IValue(40), torch::IValue(2)}, | ||
| 42}, | ||
| // def(schema, fn) inside IMPL block still registers through template |
There was a problem hiding this comment.
这组测试现在把 TORCH_LIBRARY_IMPL 里 def(schema, fn) 仍然注册成功当成了预期行为;但同一个 PR 里 def(schema) 在 IMPL block 中又是显式 no-op。这样同名 API 在 IMPL block 中语义不一致,而且也和 PyTorch 里“IMPL 只做实现注册”的直觉不太一致。建议确认一下这里是不是有意保留这种行为;如果目标是更贴近 PyTorch,模板版 def 也最好加上同样的 guard,而不是把这条误用路径固化成正向测试。
There was a problem hiding this comment.
这里我补一个更明确的语义对比和修复方向。
当前同一个 IMPL block 里,两个入口的行为其实不一致:
TORCH_LIBRARY_IMPL(ns, CPU, m) {
m.def("foo(int x) -> int"); // 当前是 no-op
m.def("bar(int x) -> int", &bar_fn); // 当前却会注册成功
}从调用路径看,大概是:
// 非模板 def(schema)
Library::def(const std::string& schema) {
if (kind_ == IMPL) return *this;
register_schema(...);
}
// 模板 def(schema, fn)
Library::def(const std::string& name_or_schema, Func&& f) {
if (name_or_schema contains "(") {
register_schema(...); // 这里没有 IMPL guard
}
register_implementation(...);
}所以现在不是“IMPL block 不支持 def”,而是“只对一个重载不支持,另一个重载仍然放行”。这会让使用方很难判断 TORCH_LIBRARY_IMPL 的边界,也把一条本来像误用的路径变成了测试锁定的行为。
如果目标是更贴近 PyTorch / 更少歧义,我会倾向于让模板版 def(schema, fn) 也和 def(schema) 保持同一策略(拒绝或忽略其 schema 注册语义),例如至少先加同样的 kind_ == IMPL guard,再决定是直接 no-op 还是显式报错。对应测试也建议改成验证“不允许/不生效”,而不是把当前分叉行为固化下来。
| @@ -474,13 +539,96 @@ class CppFunction { | |||
| if (!func_) { | |||
There was a problem hiding this comment.
这里把 schema-based kwargs 归一化集中放在 CppFunction::call_with_args() 是合理的;但我顺着调用链看了一下,ClassRegistry::call_method_with_args(qualified_name, method_name, instance, args) 那个重载目前只把 instance 和 positional args 拼进 full_args,没有把 args.named_args() 一起转发。这样实例方法如果后续也依赖 kw-only / keyword args,会和 Python 路径(torch_compat.h 里会保留 kwargs)出现行为不一致。建议这里一起补一个“实例方法 + kwargs/kwonly”的 C++ 单测,避免 schema 归一化只在 operator/static method 路径上生效。
There was a problem hiding this comment.
补一个更具体的调用流示例,说明我担心的分叉点:
当前比较像这样:
// Python 路径:kwargs 会被保留
converted = convert_args_kwargs_to_function_args(args, kwargs);
function_args = [instance] + converted.positional + converted.named_args;
ClassRegistry::call_method_with_args(qualified_name, method_name, function_args);但 C++ 这个重载现在是:
FunctionResult call_method_with_args(..., const IValue& instance, const FunctionArgs& args) {
FunctionArgs full_args;
full_args.add_arg(instance);
for (size_t i = 0; i < args.size(); ++i) {
full_args.add_arg(args.get_value(i));
}
return call_method_with_args(qualified_name, method_name, full_args);
}也就是只转发了 positional,没有转发 args.named_args()。
如果后面出现这种 schema / 调用:
// schema: method(Tensor self, *, int? idx=None, str mode="nearest") -> ()
FunctionArgs a;
a.add_arg(instance);
a.add_arg(torch::arg("idx") = 1);
a.add_arg(torch::arg("mode") = "nearest");预期是 idx/mode 能继续进入 CppFunction::call_with_args() 做 schema 归一化;但当前这条 C++ 路径里它们会在拼 full_args 时丢失。
修复方向我理解可以很直接,形态大概像:
FunctionArgs full_args;
full_args.add_arg(instance);
for (size_t i = 0; i < args.size(); ++i) {
full_args.add_arg(args.get_value(i));
}
for (const auto& [name, value] : args.named_args()) {
torch::arg keyword(name);
keyword = value;
full_args.add_arg(std::move(keyword));
}
return call_method_with_args(qualified_name, method_name, full_args);建议再补一个覆盖“实例方法 + kwonly kwargs”的 C++ 单测,这样就能把 Python/C++ 两条路径的行为约束成一致。
…ma_parser_torch_compatibility
SigureMo
left a comment
There was a problem hiding this comment.
有能让 paddlecodec 减少 diff 的 PR 么?
ShigureNyako
left a comment
There was a problem hiding this comment.
整体方向我认可:schema parser / type parser / kwargs 归一化这条主线已经比较完整,测试量也明显补上来了;不过我这里还有两点需要先修掉,当前我先给 REQUEST_CHANGES。
paddle/phi/api/include/compat/ATen/core/alias_info.h里的std::hash<c10::AliasInfo>现在没有把hash_combine的返回值回写给hash,导致beforeSets/afterSets/containedTypes的贡献全部丢失,hash 实际上只区分isWrite()。另外这里还额外引入了一个全局hash_combine符号;上游 PyTorch 用的是c10::hash_combine。paddle/phi/api/include/compat/torch/library.h里def(..., fn)在TORCH_LIBRARY_IMPLblock 中仍然选择“静默忽略”。这虽然比之前的不一致行为更统一,但语义上依然和 PyTorch 不一致:上游在 IMPL block 调def()会直接报错,而不是 no-op。当前实现会把注册错误吞掉,调试体验和兼容语义都偏掉了。
这两处我都留了行内评论。修完后我再看一次。
| // Define an operator implementation | ||
| template <typename Func> | ||
| Library& def(const std::string& name_or_schema, Func&& f) & { | ||
| if (kind_ == IMPL) { |
There was a problem hiding this comment.
如果目标是贴近 PyTorch,这里不应该 silently no-op。上游 Library::_def() 在 IMPL block 会直接 TORCH_CHECK 报错,因为静默忽略会把真实的注册错误吞掉。当前实现会把误用悄悄吃掉,调用方只会得到“没有注册”的后果;这和 PyTorch 的调试体验/语义都不一致。建议改成抛错,并把下面的 TestMDefInImplBlockIsNoop 一并改成断言失败。
| after_set_hash_seed = after_set_hash_seed ^ symbol_hash; | ||
| } | ||
|
|
||
| hash_combine(hash, before_set_hash_seed); |
There was a problem hiding this comment.
这里的 hash_combine 返回值没有回写到 hash,所以 beforeSets / afterSets / containedTypes 的贡献都被丢掉了;当前 hash 实际上只区分 isWrite()。上游 PyTorch 这里是 hash = c10::hash_combine(hash, ...)。另外这个 helper 目前还落在全局 namespace,会额外泄漏一个未限定的 hash_combine 符号。建议直接引入 c10/util/hash.h 并按上游写法回写。
There was a problem hiding this comment.
确实忘记回写了,之前用的 Boost 是带回写的。已修复,并补充相关单测
There was a problem hiding this comment.
这里的
hash_combine返回值没有回写到hash,所以beforeSets/afterSets/containedTypes的贡献都被丢掉了;当前 hash 实际上只区分isWrite()。上游 PyTorch 这里是hash = c10::hash_combine(hash, ...)。另外这个 helper 目前还落在全局 namespace,会额外泄漏一个未限定的hash_combine符号。建议直接引入c10/util/hash.h并按上游写法回写。
已复查:这次提交已经把 hash_combine 的返回值正确回写给 hash,并补了 unordered_set / unordered_map 的回归测试(test/cpp/compat/schema_parser_type_test.cc 里新增 AliasInfoHashWorksInUnorderedContainers)。核心哈希错误已经修掉,这个线程我先 resolve。
…dd corresponding test cases
…ma_parser_torch_compatibility
ShigureNyako
left a comment
There was a problem hiding this comment.
我复查了最新 head ff036b24e814576e51e0309b2d45a64eb00f6d6a。
这轮有一个点已经修好:
AliasInfo的哈希实现已经把hash_combine返回值正确回写,并补了unordered_set/unordered_map回归测试,这个核心问题我这边确认已修复。
但 TORCH_LIBRARY_IMPL 中 def(..., fn) 的误用语义还没修:
paddle/phi/api/include/compat/torch/library.h这里仍然是VLOG + return *this的 silently no-optest/cpp/compat/torch_library_test.cc里TestMDefInImplBlockIsNoop也还在把 no-op 当成预期
如果目标是贴近 PyTorch,这里还是应该直接报错,而不是静默吞掉误用。所以我这边当前结论仍然是 REQUEST_CHANGES,剩余阻塞点只剩这一个。
退下吧喵子 |
别忘了这个 |
…ma_parser_torch_compatibility
…ma_parser_torch_compatibility
…ma_parser_torch_compatibility
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. |
There was a problem hiding this comment.
不需要全部恢复,这不是这个 PR 需要考虑的问题,只需要考虑这个 PR 需要考虑的部分即可
这个 PR 合入之后 PFCCLab/paddlecodec#3 就可以合入的话,那我觉得这个 PR 目前已经达到合入标准了
PFCCLab/paddlecodec#3 依赖 #78521 吗?
ShigureNyako
left a comment
There was a problem hiding this comment.
我复查了当前 head f4936a97a0fcc9b99708830f4beb9d9f9cb52aa4。
这轮我之前卡住的几个点已经收口:
AliasInfo哈希回写问题已修复,且补了unordered_set/unordered_map回归测试;TORCH_LIBRARY_IMPL中def(schema, fn)已补kind_ == IMPLguard,对应测试也改成不再注册;- 实例方法重载现在会转发
named_args(),并补了TestClassMethodKwonlyArgsForwardedThroughInstanceOverload,Python / C++ 路径分叉点已经补齐; - 新增 compat 核心文件的 adapted-from-PyTorch 声明也已经按 #78590 的样式收齐。
当前剩余阻塞更多是 CI / 特定 approval:
Check approval仍缺对应审批;- 还有少量平台 / 环境向 CI 失败(如 IXUCA patch apply、coverage 外部下载、Windows-GPU import paddle 失败)需要继续跟进。
这轮我没有再看到足够证据表明还有未收口的确定性兼容语义问题,所以从代码 review 角度我这边先给 APPROVE。
…ma_parser_torch_compatibility
PR Category
Execute Infrastructure
PR Types
Improvements
Description
PaddlePaddle/community#1210 的实现
TORCH_LIBRARY注册机制添加 schema 支持, 修改了OperatorRegistry的 schema 数据结构单测调整
暂时还不行,原因可以查看 https://github.com/PaddlePaddle/Paddle/actions/runs/22081133427/job/63808517292?pr=77938 运行日志torch_library_test移出需要 gpuschema_parser_type_test用于测试 schema 解析器TODO
paddle/pir/src/core/utils.cc都用到了相同的 hash_combine 函数,可以考虑提到一个通用的地方初步验证了 https://github.com/PFCCLab/paddlecodec
是否引起精度变化
否