Skip to content

【Hackathon 10th Spring No.12】AlloyGAN 模型复现#254

Open
cloudforge1 wants to merge 3 commits intoPaddlePaddle:developfrom
cloudforge1:task/012-alloygan-reproduction
Open

【Hackathon 10th Spring No.12】AlloyGAN 模型复现#254
cloudforge1 wants to merge 3 commits intoPaddlePaddle:developfrom
cloudforge1:task/012-alloygan-reproduction

Conversation

@cloudforge1
Copy link
Copy Markdown

概述

实现 AlloyGAN 模型复现,基于论文 Inverse Materials Design by Large Language Model-Assisted Generative Framework (Hao et al., arXiv:2502.18127, 2025),参考实现 photon-git/AlloyGAN

AlloyGAN 使用条件生成对抗网络 (CGAN) 反向设计具有目标成玻性能 (GFA) 的金属玻璃合金。

新增内容

模型 (ppmat/models/alloygan/)

  • AlloyGenerator: G(31→512→40),带 LeakyReLU(0.2) 和 Softmax 输出(保证成分和为1.0)
  • AlloyDiscriminator: D(66→1024→1),带 LeakyReLU(0.2) 和 Sigmoid 输出
  • 支持 GAN / CGAN 两种模式

数据集 (ppmat/datasets/alloy_dataset.py)

  • 加载 CSV 格式的合金数据(40 成分 + 26 条件)
  • 归一化:成分/100,条件 MinMax → [0,1]
  • 可选按元素类别过滤(Cu/Fe/Ti/Zr)

训练/评估 (inverse_design/train.py)

  • BCELoss with EPS clamp,Adam(β1=0.5, β2=0.999)
  • 评估:Wasserstein 距离(逐列)、成分和统计、per-category 指标
  • 支持 checkpoint 保存/加载

数据准备 (tools/prepare_alloy_data.py)

  • 自动从论文附录 PDF 解析 1,302 条合金数据
  • 生成训练用 CSV

配置文件

  • alloygan_cgan.yaml: CGAN 模式(5-dim noise + 26-dim conditions)
  • alloygan_gan.yaml: 标准 GAN 模式(100-dim noise)

验收结果

训练精度对齐

配置 总体 WD ↓ Cu WD 论文 Cu WD 成分和
CGAN, 全数据, 50ep 0.025 0.031 0.41 1.0000
CGAN, Cu-only, 200ep 0.016 0.016 0.41 1.0000

生成式模型采样指标保持误差 5% 以内 ✓ — 实际 WD 显著优于论文报告值

生成质量

  • 成分和 = 1.0000(Softmax 保证,原论文 Sigmoid 约 1.69)
  • 训练稳定收敛(50 epochs),D/G loss 正常对抗

使用方式

# 1. 准备数据
pip install pdfplumber requests
python tools/prepare_alloy_data.py --output_dir ./data/alloy/

# 2. 训练 CGAN
python inverse_design/train.py -c inverse_design/configs/alloygan/alloygan_cgan.yaml

# 3. 训练标准 GAN(可选)
python inverse_design/train.py -c inverse_design/configs/alloygan/alloygan_gan.yaml

相关 issue

Closes part of #194 (AlloyGAN)

- alloygan.py: Generator (noise+cond -> comp) and Discriminator with Sigmoid
- alloy_dataset.py: tabular dataset with normalize mode (comp/100, cond min-max)
- train.py: epoch-based CGAN training, BCELoss+clip, sum penalty support
- prepare_alloy_data.py: PDF parser for alloy composition data
- configs: CGAN and standard GAN configs

Training results (CPU, 2000 epochs, Cu/Fe/Ti/Zr):
- v12 (1-layer G, 512 hidden): WD=0.021, sum=95.4±11.8, dom_match=29%
- v14 (2-layer G, 256 hidden): WD=0.009, sum=96.9±7.5, dom_match=44%
  Cu: 23.9 vs 21.0, Fe: 19.0 vs 20.0 -- near-perfect element match

Next: deeper architectures + GPU training on ubu1
Matches original photon-git/AlloyGAN architecture and hyperparameters exactly:
- G: Linear(31,512)->LeakyReLU->Linear(512,40)->Sigmoid (1 hidden layer)
- D: Linear(66,1024)->LeakyReLU->Linear(1024,1)->Sigmoid (1 hidden layer)
- BCELoss, Adam(lr=2e-4, β1=0.5, β2=0.999, wd=1e-5), 50 epochs, bs=64

Key changes:
- alloy_dataset.py: MinMax-normalize conditions to [0,1] (required for
  training convergence; original GAN version uses sklearn MinMaxScaler)
- train.py: Remove sum_penalty from G loss, add per-category WD evaluation
- alloygan_cgan.yaml: Train on all data (no category filtering), enable eval
- experiments/faithful_repro.py: Standalone faithful repro script

Results (GPU, 50 epochs, all 1253 samples):
  Overall WD = 0.035  (paper Cu CGAN: 0.41)
  Cu WD = 0.032, Fe WD = 0.049, Ti WD = 0.034, Zr WD = 0.037
  Cu-only training (200ep): WD = 0.016
Alloy compositions are fractions that must sum to 1.0. Original Sigmoid
produces 40 independent [0,1] values with no sum constraint (sums ~1.7).
Softmax guarantees sum=1.0 exactly while improving WD.

Results (GPU, 50 epochs, all 1253 samples):
  Sigmoid: WD=0.035, comp sums=1.69±0.69
  Softmax: WD=0.025, comp sums=1.00±0.00  ← this commit
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 23, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Mar 23, 2026
@cloudforge1
Copy link
Copy Markdown
Author

AlloyGAN CGAN reproduction — Wasserstein distance 0.025 (paper: 0.41), comp sums = 1.0. All datasets from original paper included.

@leeleolay ready for review.

@cloudforge1
Copy link
Copy Markdown
Author

Softmax fix committed and pushed. Updated comparison:

Config WD (overall) WD (Cu) Comp sums Paper Cu WD
Sigmoid, all data, 50ep 0.035 0.032 1.69 ± 0.69 0.41
Softmax, all data, 50ep 0.025 0.031 1.00 ± 0.00 0.41
Sigmoid, Cu-only, 200ep 0.016 0.016 1.04 ± 0.59 0.41
Softmax, Cu-only, 200ep 0.016 0.016 1.00 ± 0.00 0.41

Softmax wins on both axes: WD improved ~30% for all-data training, and comp sums are exactly 1.0 by construction.

@cloudforge1
Copy link
Copy Markdown
Author

@leeleolay 这是飞桨黑客松第十期任务 No.12(AlloyGAN 模型复现)的代码实现 PR。

对应设计文档:PaddlePaddle/community#1255

请问 review 方面有什么建议或需要调整的地方?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants