-
Notifications
You must be signed in to change notification settings - Fork 324
【Hackathon 10th Spring No.13】GDI-NN模型复现 RFC #1254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
megemini
wants to merge
6
commits into
PaddlePaddle:master
Choose a base branch
from
megemini:gdinn
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,291
−0
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
520aac0
【Hackathon 9th No.109】基于 Setuptools 80+ 版本自定义算子机制适配设计文档
megemini b6e4f2b
Merge branch 'master' of https://github.com/PaddlePaddle/community
megemini 2c91906
docs: 添加 GDI-NN 设计文档
megemini 07b7acf
add: gdinn test
megemini a139648
refactor: gdinn
megemini 2365b1b
update: gdinn test
megemini File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| # GDI-NN 设计文档 | ||
|
|
||
| | API名称 | 新增API名称 | | ||
| | ------------ | ----------------- | | ||
| | 提交作者 | 柳顺(megemini) | | ||
| | 提交时间 | 2026-03-22 | | ||
| | 版本号 | V1.0.0 | | ||
| | 依赖飞桨版本 | develop | | ||
| | 文件名 | 20260322_gdinn.md | | ||
|
|
||
| # 一、概述 | ||
|
|
||
| ## 1、相关背景 | ||
|
|
||
| GDI-NN (Gibbs-Duhem Informed Neural Network) 是一种基于物理约束的图神经网络模型,用于预测二元混合物中组分的活度系数(activity coefficient),该模型由Rittig等人提出,通过将Gibbs-Duhem方程作为正则化项引入损失函数,确保模型预测满足热力学一致性。 | ||
|
|
||
| ## 2、功能目标 | ||
|
|
||
| 本项目旨在将 GDI-NN 模型集成至 PaddleMaterials 框架,提供模型训练、评估和推理的完整 pipeline,实现与原始 PyTorch 版本的精度对齐。 | ||
|
|
||
| ## 3、意义 | ||
|
|
||
| 填补飞桨生态在热力学一致性分子性质预测领域的空白 | ||
|
|
||
| # 二、飞桨现状 | ||
|
|
||
| PaddleMaterials 目前不支持 GDI-NN 模型。 | ||
|
|
||
| # 三、业内方案调研 | ||
|
|
||
| [GDI-NN](https://git.rwth-aachen.de/avt-svt/public/GDI-NN) 此 REPO 提供了 PyTorch 实现,包括模型: | ||
|
|
||
| - model/model_GNN.py,实现了 solvgnn_binary, solvgnn_xMLP_binary,gegnn_binary 等模型 | ||
| - model/model_MCM.py,实现了 MCM_multiMLP 模型 | ||
|
|
||
| # 四、对比分析 | ||
|
|
||
| **PyTorch原版**: | ||
|
|
||
| 基于 PyTorch 的实现,使用 DGL 进行图操作。 | ||
|
|
||
| **飞桨实现**: | ||
|
|
||
| 基于 PaddlePaddle 的实现,使用 PGL 进行图操作。 | ||
|
|
||
| # 五、设计思路与实现方案 | ||
|
|
||
| ```shell | ||
| ppmat/ | ||
| ├── datasets/ | ||
| │ ├── __init__.py | ||
| │ ├── binary_activity_dataset.py # 二元活度系数数据集处理类 | ||
| │ └── collate_fn.py # 数据集处理函数 | ||
| ├── losses/ | ||
| │ ├── __init__.py | ||
| │ └── gibbs_duhem_loss.py # Gibbs-Duhem 一致性损失函数 | ||
| ├── models/ | ||
| │ ├─── gdinn/ | ||
| │ │ ├── utils/ | ||
| │ │ │ ├── atom_feat_encoding.py # 特征编码 | ||
| │ │ │ ├── graph_utils.py # 图操作工具 | ||
| │ │ │ ├── layers.py # 图卷积层 | ||
| │ │ │ ├── molecular_graph.py # 分子图构建工具 | ||
| │ │ ├── __init__.py | ||
| │ │ ├── gnn.py # SolvGNN 等模型,对应 GDI-NN 的 model_GNN.py | ||
| │ │ └── mcm.py # MCM_multiMLP 模型,对应 GDI-NN 的 model_MCM.py | ||
| │ └─── __init__.py | ||
| ├── property_prediction/ | ||
| │ └── configs/ | ||
| │ └── gdinn/ | ||
| │ └── solvgnn_binary_gamma.yaml # 模型训练配置 | ||
| └── .pre-commit-config.yaml # 修改了 line 的限制为 120,默认的 88 对于 docstring 中的公式等限制太大 | ||
| ``` | ||
|
|
||
| 其中: | ||
|
|
||
| - `datasets/binary_activity_dataset.py` 与 `datasets/collate_fn.py` | ||
|
|
||
| GDI-NN 中的数据集处理类为 `util/generate_dataset_for_training.py`,这里主要参考了这个文件进行适配。 | ||
|
|
||
| GDI-NN 中还有一个数据集处理类为 `util/generate_dataset.py`,但是,整个 repo 中都没有使用!因此,这里只适配 `generate_dataset_for_training.py` 。 | ||
|
|
||
| 由于 PaddleMaterials 处理数据的方式的不同,将 `empty_solvsys` 的生成放到了 `collate_fn.py` 中,根据每一个 batch 进行处理。 | ||
|
|
||
| - `losses/gibbs_duhem_loss.py` | ||
|
|
||
| GDI-NN 中将 loss 放在了 `train.py` 中,本项目将其单独抽取出来,放到 `losses/gibbs_duhem_loss.py` 作为类使用。 | ||
|
|
||
| 在模型 `gnn.py` 和 `mcm.py` 中,也都使用了 `losses/gibbs_duhem_loss.py` 的 `GibbsDuhemLoss` 类。 | ||
|
|
||
| - `models/gdinn/utils/atom_feat_encoding.py` | ||
|
|
||
| 参考 GDI-NN 的 `util/atom_feat_encoding.py` 进行了部分的适配。原项目这个文件中有很多类和方法,此次项目迁移只保留了部分实际用的方法。 | ||
|
|
||
| - `models/gdinn/utils/molecular_graph.py` | ||
|
|
||
| 参考 GDI-NN 的 `util/molecular_graph.py` 进行了部分的适配。使用 PGL 代替 DGL 。 | ||
|
|
||
| - `models/gdinn/utils/layers.py` 与 `models/gdinn/utils/graph_utils.py` | ||
|
|
||
| `models/gdinn/utils/layers.py` 中抽取并迁移了部分的 layer,`models/gdinn/utils/graph_utils.py` 中抽取了部分公用方法。 | ||
|
|
||
| 这里单独说明一下 `NNConv` 这个类,在 `paddle_geometric` 中有这么一个类 (`jointContribution/mattergen/paddle_geometric/nn/conv/nn_conv.py`),但是,经过测试发现,`paddle_geometric` 这个包目前似乎还有一些问题,比如,在我本地的 paddle 版本为 `3.1.0` 的环境中,通过源码安装 `paddle_geometric` 后无法运行,经过一些修改与问题定位后,怀疑是 paddle 版本的兼容问题导致的。因此,这里选择单独创建无特殊依赖的 `NNConv` 类,而不是直接使用 `paddle_geometric` 中的 `NNConv` 类。 | ||
|
|
||
| - `models/gdinn/gnn.py` 和 `models/gdinn/mcm.py` | ||
|
|
||
| 参考 GDI-NN 的 `model/model_GNN.py` 和 `model/model_MCM.py` 进行了适配。但是,并没有将 `model/model_GNN.py` 中的所有模型都做迁移,如: | ||
|
|
||
| - solvgnn_onexMLP_binary | ||
| - solvgnn_onexMLP_share1layer_binary | ||
| - solvgnn_onexMLP_share2layer_binary | ||
|
|
||
| 这几个模型仅在 `model/model_GNN.py` 中出现,`train.py` 中并没有使用。`train.py` 中仅提供了: | ||
|
|
||
| - SolvGNN | ||
| - SolvGNNxMLP | ||
| - GEGNN | ||
| - MCM_multiMLP | ||
|
|
||
| 的支持,本项目也仅迁移了这几个模型。 | ||
|
|
||
| - `property_prediction/configs/gdinn/solvgnn_binary_gamma.yaml` | ||
|
|
||
| 参考 GDI-NN 的 `train.py` 的使用进行了编写。 | ||
|
|
||
| # 六、测试和验收的考量 | ||
|
|
||
| - **单元测试**:本地对模型、类、方法做必要的单元测试 | ||
| - **精度对齐**:与PyTorch原版在相同数据集上进行精度对齐 | ||
| - **集成测试**:验证与PaddleMaterials训练pipeline的兼容性 | ||
|
|
||
| **验收标准**: | ||
|
|
||
| - 模型训练精度与PyTorch原版精度对齐 | ||
| - 模型推理精度与PyTorch原版精度对齐 | ||
| - 模型训练pipeline与PaddleMaterials训练pipeline兼容 | ||
|
|
||
| # 七、可行性分析和排期规划 | ||
|
|
||
| ## 可行性分析 | ||
|
|
||
| - **技术可行性**: | ||
|
|
||
| 使用 PaddlePaddle 代替 PyTorch 进行开发,并使用 PGL 代替 DGL 进行图操作,经验证可行。 | ||
|
|
||
| - **数据可行性**: | ||
|
|
||
| 使用 GDI-NN 中提供的数据 `data/output_binary_with_inf_all.csv` `data/solvent_list.csv` 进行测试,精度与 PyTorch 原版精度对齐。 | ||
|
|
||
| ## 排期规划 | ||
|
|
||
| - 代码迁移,1 周 | ||
| - 集成测试,1 周 | ||
| - 精度测试,1 周 | ||
| - 其他事项,1 周 | ||
|
|
||
| # 八、影响面 | ||
|
|
||
| 本项目将 GDI-NN 集成到 PaddleMaterials 中,利用 PGL 代替 DGL ,未引入其他依赖。 | ||
|
|
||
| # 附件及参考资料 | ||
|
|
||
| - https://github.com/PaddlePaddle/community/blob/master/hackathon/hackathon_10th/%E3%80%90Hackathon_10th%E3%80%91%E5%BC%80%E6%BA%90%E8%B4%A1%E7%8C%AE%E4%B8%AA%E4%BA%BA%E6%8C%91%E6%88%98%E8%B5%9B%E6%98%A5%E8%8A%82%E7%89%B9%E5%88%AB%E5%AD%A3%E2%80%94%E4%BB%BB%E5%8A%A1%E5%90%88%E9%9B%86.md#no6---no19-paddlemateirals%E6%A8%A1%E5%9E%8B%E5%A4%8D%E7%8E%B0 | ||
| - https://github.com/PaddlePaddle/PaddleMaterials/issues/194 | ||
| - https://git.rwth-aachen.de/avt-svt/public/GDI-NN | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
辛苦确认是否可行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddlePaddle/PaddleMaterials#252 已经提交 PR 了 ~
我新加了两个测试脚本在 RFC 目录中:
我在本地验证是可以的,测试步骤:
test_gdinn目录test_gdinn/dataset目录中test_gdinn中python test_gdinn/test_alignment.py; python test_gdinn/quick_test.py进行测试以下是测试结果:
请帮忙看一下,感谢!:)