Machine learning guided synthesis for organic flow batteries and the processes. This repository contains the code used for the experiment conducted at PNNL and UW for synthesizing organic redox batteries.
All code is run on Python version 3.9.1
The following figure is a diagram that shows how to files are organized

The Bayesian Optimization experiment follows the following workflow. Acess the files in order:
Experiment Round 1 > Experiment Round 2 > Experiment Round 3
- Experiment Round Redo is a redo of selected samples.
To see the final results, access Experimental Results.
Open Model A Results Notebook
Plots for Model A can be found under Experiment_Results/ModelA_results.ipynb
The uncertainty plots for Model A were constructed in the same file. (SI)
Open Model B Results Notebook
Plots for Model B can be found under Experiment_Results/ModelB_results.ipynb
The uncertainty plots for Model B were constructed in the same file. (SI)
Open Model C Results Notebook
Plots for Model C can be found under Experiment_Results/ModelC_results.ipynb
The uncertainty plots for Model C were constructed in the same file. (SI)
Open Summary Results Notebook
Model evaluations and comparison plots can be found in Experiment_Results/All_results.ipynb
In total, four rounds of data collection were conducted and are named: Round1, Round2, Round3, and Round_Redo.
- Round 1: Contains data generated from Latin Hypercube sampling and their respective HPLC samples.
- Round 2: Contains the code to extract yield from HPLC and the Bayesian Optimization in round 2 for Model A, Model B, and Model C, as well as their repective HPLC samples.
- Round 3: Contains the code to extract yield from HPLC and the Bayesian Optimization in round 3 for Model A, Model B, and Model C, as well as their repective HPLC samples.
- Round Redo: Contains selected samples that were resynthesized due to inconsistencies.
The product yield is extracted form the HPLC data using the python package hplc-py
The peaks are first fit to the Chromatogrpam
chrom = Chromatogram(data_df, cols={'time':f'wave{i}', 'signal':f'intensity{i}'})
chrom.correct_baseline()
peak_list.append(chrom.fit_peaks(prominence=0.01))
The fit produces a dataframe consisting of the peak locations and the area under the cuve at that peak. We locate the peaks of interest and comput the yield.
For our experiment we have three repetes per conditions. The HPLC of two selected conditons are shown in the figure below. The yield mean and variance are computed after extracting each peak individually.

Pool-Based:
To compare our model results and test their reproducibility, we conducted pool based analysis.
Aquisition Function Selection: Contains the analysis for the selection of the aqusiiton functions.
pool_aqu_comparison.ipynbis the analysis for first 15 samples collected from Latin Hypercube Sampling to determine which aquisition to choose. See filepool_aqu_comparison_fulldata.ipynbis the analysis that shows the best aquisition function for the redox flow battery data set. See file
Pool-based Comparisons: Contains the analysis of the pool based comparisons.
surrogate_comparison.ipynbContains the comparison of different surrogate models. See fileBOvsBBOvsRandom.ipynbContains the comparison of Bayesian optimizaiton with differnet batch sizes. See file
Scalibility testing: Contains the analysis for applying our models to a different and larger data set.

