The Carbon Cost of AI for Science

A benchmarking framework that jointly evaluates predictive accuracy and carbon footprint of generative AI models across six scientific discovery tasks.

Key Finding: Simpler, specialized models frequently match or approach state-of-the-art accuracy while consuming 10–100× less compute.

CO₂ Reference Points

Category	Activity	CO₂ Emission
Everyday	Smartphone charge (iPhone 16 Pro Max)	~9.7 g CO₂ eq/charge
	Driving a car (EU average)	~170 g CO₂ eq/km
	Commercial Aviation (Boeing 737)	~15.8 kg CO₂ eq/km
LLM inference	Text generation (Claude 3.7 Sonnet)	~2.12 g CO₂ eq/call
	Image generation (Stable Diffusion)	~1.38 g CO₂ eq/image
Chemical simulation	Classical MD (force field)	10 g CO₂ eq/1M steps
	Ab initio MD (PBE, 50 atoms)	140.96 kg CO₂ eq/1M steps
Chemical synthesis	Organic synthesis (Letermovir)	369 kg CO₂ eq/kg
	Material synthesis (UiO-66-NH₂)	43 kg CO₂ eq/kg
	Battery synthesis (vanadium flow battery)	37 kg CO₂ eq/MWh

Results

All tasks benchmarked on the same hardware with full carbon tracking.

Hardware: NVIDIA RTX 5000 Ada Generation (32 GB) · Intel Xeon Platinum 8558 (192 cores) · 503 GB RAM

Column definitions:

CO₂/exp — total CO₂ for the full benchmark run (actual experiment)
CO₂/job — normalized per a fixed workload (see per-task note)
Time/exp — total wall-clock time for the full benchmark run

1. Retrosynthesis

Dataset: USPTO-50K · N = 5,007 reactions · Metric: Top-50 exact match · CO₂/job: per 500 reactions

Year	Venue	Model	Architecture	Params	Top-10	Top-50	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2017	Chem. Eur. J.	neuralsym	MLP	32.5M	72.8%	74.8%	35.0	3.50	1,282	128
2021	JCIM	MEGAN	GNN	9.8M	87.0%	90.1%	51.7	5.15	2,951	295
2021	JACS Au	LocalRetro	GNN	8.6M	91.5%	97.3%	62.1	6.20	2,313	231
2022	Chem. Sci.	RSMILES	LM	44.6M	89.6%	93.0%	1,083	108.25	44,142	4,401
2022	ML:ST	Chemformer	LM	44.7M	62.8%	64.0%	2,570	256.65	85,055	8,482
2024	COLM	LlaSMol	LLM	~7.2B	5.0%	5.0%	1,385	138.35	39,119	3,905
2024	ICLR	RetroBridge	Flow Matching	4.6M	44.9%	52.8%	4,040	403.45	157,820	15,740
2025	Nat. Commun.	RSGPT	LLM	~1.6B	96.6%	97.8%	2,512	250.80	79,090	7,887

2. Forward Reaction Prediction

Dataset: USPTO-MIT · N = 40,029 reactions · Metric: Top-3 exact match · CO₂/job: per 500 reactions

Year	Venue	Model	Architecture	Params	Top-1	Top-3	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2017	Chem. Eur. J.	neuralsym	MLP	98.1M	49.5%	50.6%	43.9	0.55	2,732	34
2019	ACS Cent. Sci.	MolecularTransformer	LM	11.7M	86.8%	91.7%	360.0	4.50	12,317	154
2021	JCIM	MEGAN	GNN	9.9M	80.1%	86.4%	85.3	1.07	6,657	83
2021	JCIM	Graph2SMILES	LM	18M	88.5%	89.9%	287.8	3.60	7,940	99
2022	Nat. Mach. Intell.	LocalTransform	GNN	9.1M	90.4%	94.1%	141.4	1.77	8,799	110
2022	ML:ST	Chemformer	LM	44.7M	89.0%	89.8%	580.0	7.25	45,288	566
2022	Chem. Sci.	RSMILES	LM	44.6M	89.4%	94.7%	614.7	7.68	46,209	578
2024	COLM	LlaSMol	LLM	~7.2B	3.8%	5.9%	1,413.8	17.67	104,960	1,312

3. Molecule Generation

Dataset: ChEMBL 28 · N = 10,000 molecules · Metric: VUN% · CO₂/job: per 10K molecules (= full exp)

Year	Venue	Model	Architecture	Params	VUN (%)	VUNS (%)	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2017	J. Cheminf.	REINVENT	LM	4.2M	87.90	80.88	0.18	0.18	14	14
2018	ICML	JT-VAE	VAE	5.3M	91.39	89.41	10.58	10.58	662	662
2020	ICML	HierVAE	VAE	8.0M	92.10	88.89	11.97	11.97	757	757
2021	J. Chem. Inf. Model.	MolGPT	LM	9.5M	77.15	76.65	1.07	1.07	37	37
2023	ICML	DiGress	Diffusion	16.2M	82.45	81.18	175.35	175.35	5,201	5,201
2024	J. Cheminf.	REINVENT4	LM	5.8M	94.16	85.44	0.07	0.07	8	8
2025	ICML	DeFoG	Flow Matching	16.3M	82.27	81.73	355.24	355.24	9,874	9,874
2026	Nat. Comput. Sci.	SmileyLlama	LLM	8.0B	94.30	85.16	21.79	21.79	638	638

4. Material Generation

Dataset: MP-20 · N = 1,000 structures · Metric: mSUN% · CO₂/job: per 1K structures (= full exp)

Year	Venue	Model	Architecture	Params	mSUN (%)	SUN (%)	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2022	ICLR	CDVAE	Diffusion	4.9M	22.6	3.2	270.4	270.40	25,764	25,764
2023	NeurIPS	DiffCSP	Diffusion	12.4M	29.0	4.3	12.7	12.60	381	381
2024	Nat. Commun.	CrystaLLM	LM	25.9M	16.4	3.5	19.3	19.20	942	942
2024	ICML	FlowMM	Flow Matching	28.3M	23.9	4.3	12.8	12.80	547	547
2025	arXiv	ChargeDIFF	Diffusion	59.5M	33.5	4.4	133.5	133.50	2,994	2,994
2025	Nature	MatterGen	Diffusion	44.6M	33.4	5.2	248.1	248.10	8,079	8,079
2025	ICML	ADiT	Diffusion	231.9M	29.6	5.5	112.5	112.50	10,512	10,512
2025	ICML	CrystalFlow	Flow Matching	20.9M	21.7	3.0	1.5	1.50	43	43

5. Structure Optimization

System: WBM · N = 100 structures · Metric: CPS · CO₂/job: per 1,000 structures

Year	Venue	Model	Architecture	Params	CPS	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2023	Nat. Mach. Intell.	CHGNet	GNN	413K	0.343	1.50	15.0	88	884
2023	arXiv	MACE	GNN	4.69M	0.637	3.57	35.7	208	2,083
2024	J. Chem. Theory Comput.	SevenNet	GNN	1.17M	0.714	2.87	28.7	160	1,600
2024	arXiv	ORB	GNN	25.2M	0.470	0.58	5.8	37	374
2025	arXiv	NequIP	GNN	9.6M	0.733	1.07	10.7	52	519
2025	arXiv	DPA3	GNN	4.81M	0.718	6.71	67.1	380	3,798
2025	arXiv	Nequix	GNN	708K	0.729	2.21	22.1	126	1,258
2025	arXiv	eSEN	GNN	30.1M	0.797	13.86	138.6	763	7,629

6. MD Simulation

System: LGPS · N = 75K steps × 3 seeds · Metric: MSD score · CO₂/job: per 1M steps

Year	Venue	Model	Architecture	Params	MSD	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2023	Nat. Mach. Intell.	CHGNet	GNN	413K	0.047	9.47	379	602	8,033
2023	arXiv	MACE	GNN	4.69M	0.095	23.29	932	1,068	14,241
2024	J. Chem. Theory Comput.	SevenNet	GNN	1.17M	0.531	16.21	648	790	10,529
2024	arXiv	ORB	GNN	25.2M	0.385	3.87	155	210	2,795
2025	arXiv	NequIP	GNN	9.6M	0.361	11.34	454	316	4,219
2025	arXiv	DPA3	GNN	4.81M	0.508	38.45	1,538	2,087	27,829
2025	arXiv	Nequix	GNN	708K	0.203	17.13	685	736	9,809
2025	arXiv	eSEN	GNN	30.1M	0.720	87.14	3,486	2,780	37,071

7. Protein Folding

System: CASP15/CASP16 unique <1000-residue monomers · N = 45 targets · Metric: GDT-TS (%) · CO₂/job: per 45 targets

Year	Venue	Model	Architecture	Params	GDT-TS (%)	lDDT-Cα	CO₂/exp (g)	CO₂/job (g)	Time/exp (s)	Time/job (s)
2021	Nature	AF2	Evoformer + MSA	93.2M	59.15	0.868	46.73	2,103.0	1,729.6	77,832
2022	Nat. Methods	ColabFold	Evoformer + MMseqs2	93.2M	60.96	0.876	11.60	522.2	669.5	30,126
2022	bioRxiv	OmegaFold	PLM + Geoformer	795M	47.18	0.770	4.00	180.1	123.0	5,535
2023	Science	ESMFold	ESM-2 LM + folding	693M	52.36	0.811	5.27	237.0	363.2	16,345
2024	bioRxiv	Chai-1	Diffusion (AF3-like)	316M	48.79	0.798	19.83	892.4	1,416.4	63,738
2024	Nat. Methods	OpenFold	Evoformer + MSA	93.2M	60.84	0.875	10.61	477.6	596.8	26,854
2025	bioRxiv	Boltz-2	Diffusion (AF3-like)	521M	51.82	0.765	17.70	796.5	1,180.8	53,137
2025	bioRxiv	Protenix	Diffusion (AF3-like)	368M	57.50	0.871	9.83	442.3	612.3	27,555

GDT-TS (%) is the primary ranking metric; lDDT-Cα is the secondary metric. Per-exp values are per single target; per-job values are totals over all 45 targets. Carbon uses CodeCarbon offline world-average accounting (475 gCO₂/kWh) on 3 × NVIDIA RTX A5000 (24 GB). Boltz-2's training cutoff (2023-06-01) postdates the 33 CASP15 targets — interpret with a potential leakage caveat.

Key Insights

Simpler models dominate the efficiency frontier: In every task, at least one GNN or small LM achieves near-SOTA accuracy at 10–100× lower CO₂ than the best-performing model
Architecture drives cost more than parameter count: Diffusion models cost 10–100× more per sample than LM or GNN models due to iterative sampling, regardless of size
LLMs underperform on narrow scientific tasks: LlaSMol (7B) scores 3.8% top-1 on Forward prediction vs. 89.4% for RSMILES (45M) — at 2× the carbon cost
MLIP tasks are carbon-intensive by nature: A single 75K-step MD run costs 4–87 g CO₂; relaxing 100 WBM structures costs 0.6–14 g — both orders of magnitude more than per-molecule chemistry tasks
50–75% of models per task are Pareto-dominated: A cheaper and more accurate alternative always exists — the extra carbon was wasted

Contributing

See CONTRIBUTING.md for the full guide on adding new models and tasks.

Quick start:

git clone https://github.com/shuan4638/Carbon4Science.git
cd Carbon4Science
git checkout main && git pull
git checkout -b <your-name>/<task>-<model>
cp -r Example/ <YourTask>/   # copy the template
claude                        # Claude Code guides you through the rest

Your PR to main only needs: results/<Task>/<model>_<N>.json + a new row in this README.

Citation

@article{carbon2026,
  title={The Carbon Cost of AI for Science},
  author={...},
  journal={...},
  year={2026}
}

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.claude/skills		.claude/skills
analysis		analysis
branch-example		branch-example
docs		docs
results		results
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Carbon Cost of AI for Science

Contents

CO₂ Reference Points

Results

1. Retrosynthesis

2. Forward Reaction Prediction

3. Molecule Generation

4. Material Generation

5. Structure Optimization

6. MD Simulation

7. Protein Folding

Key Insights

Contributing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Carbon Cost of AI for Science

Contents

CO₂ Reference Points

Results

1. Retrosynthesis

2. Forward Reaction Prediction

3. Molecule Generation

4. Material Generation

5. Structure Optimization

6. MD Simulation

7. Protein Folding

Key Insights

Contributing

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages