Deep Learning Accelerated Multi-Criteria Screening of Chromium-Based A2+B2 Superalloys Across 11 Elements

This repository contains datasets and scripts to reproduce selected datasets and results of the article 'Accelerated discovery of Cr-based A2+B2 superalloys across 11 elements with a deep-learning CALPHAD surrogate'.

Prerequisites

A requirements.txt file is included for installing the required Python packages (excluding tc-python, which must be installed after Thermo-Calc).

System Requirements

Python 3.9.17
Thermo-Calc version 2025b

Python Packages

Install the required packages using:

pip install -r requirements.txt

Thermo-Calc Setup

Install Thermo-Calc (tested with version 2025b).
Install tc-python (tc-python==2025.2.30) using the Python wheel provided with Thermo-Calc 2025b.

Note: As Thermo-Calc licenses do not permit the open release of large calculated datasets, only a 100-composition 'dummy' version of the full set of feasible compositions (roughly 15,000 compositions) is provided. Therefore, to reproduce some of the figures in the manuscript, a license to the commercial TCHEA7 database is needed.

Repository Structure

data_generation/: Scripts for generating and processing CALPHAD datasets
surrogate_modeling/: Code for training surrogate models
screening/: Screening pipeline for candidate compositions
analyze_feasible_candidates/: Analysis and visualization of screening results
analyze_experimental_compositions/: Analysis of experimental data
helper_functions/: Utility functions for calculations (yield strength, VEC, misfit volumes)
shared/: Shared utilities and common functions

Workflow

Follow these steps to reproduce the datasets and results:

1. Generate Composition Pool for Generating Training Data

Run the data_generation/training_pool_generator.py to create composition pool compositions_b.h5.

2. Generate Dataset B

Run the data_generation/calphad_data_generator.py to reproduce Dataset B dataset_b.h5 using compositions_b.h5 as input.

Note: Due to CALPHAD convergence issues we iteratively checked which compositions were missing from the dataset, and repeated calculations for these compositions until dataset_b.h5 included 99.98% (3,570,288) of the CALPHAD data points.

3. Postprocess and Split Dataset B

Run the data_generation/postprocess_and_split_dataset.py to process and split dataset_b.h5 into development and test sets. This creates dataset_b_dev.h5 and dataset_b_test.h5.

4. Train Surrogate Model

Run surrogate_modeling/main.py to create the surrogate model with optimized hyperparameters. Move the generated model/ folder to the screening/ folder.

5. Generate Composition Pool

Create the composition candidate pool for screening with data_generation/screening_pool_generator.py. Move the generated composition_pool_screening.h5 to the screening/ folder.

6. Perform Screening

Execute the screening step with screening/main.py to identify feasible candidates. This produces feasible_candidates.xlsx.

7. Analyze Results

-analyze_feasible_candidates/: analyze the feasible compositions (alloy families, VEC-strength scatter plots). -analyze_experimental_compositions: contains the scripts to visualize DFT-derived properties (strength, D-parameters).

Usage Tips

Ensure Thermo-Calc is properly configured before running data generation scripts.
When working with CALPHAD data, be aware of potential convergence issues and missing data points.
Check file paths and dependencies when moving generated files between folders

License

This project is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later). See the LICENSE file for details.

Citation

TBA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Accelerated Multi-Criteria Screening of Chromium-Based A2+B2 Superalloys Across 11 Elements

Prerequisites

System Requirements

Python Packages

Thermo-Calc Setup

Repository Structure

Workflow

1. Generate Composition Pool for Generating Training Data

2. Generate Dataset B

3. Postprocess and Split Dataset B

4. Train Surrogate Model

5. Generate Composition Pool

6. Perform Screening

7. Analyze Results

Usage Tips

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
analyze_experimental_compositions		analyze_experimental_compositions
analyze_feasible_candidates		analyze_feasible_candidates
data_generation		data_generation
helper_functions		helper_functions
screening		screening
shared		shared
surrogate_modeling		surrogate_modeling
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Accelerated Multi-Criteria Screening of Chromium-Based A2+B2 Superalloys Across 11 Elements

Prerequisites

System Requirements

Python Packages

Thermo-Calc Setup

Repository Structure

Workflow

1. Generate Composition Pool for Generating Training Data

2. Generate Dataset B

3. Postprocess and Split Dataset B

4. Train Surrogate Model

5. Generate Composition Pool

6. Perform Screening

7. Analyze Results

Usage Tips

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages