This repository contains the official implementation of zERExtractor, an automated and extensible platform for multimodal extraction of enzyme-catalyzed reaction data from scientific literature.
The system integrates tables, molecular diagrams, enzyme sequences, and experimental conditions into structured, machine-readable datasets for downstream AI-driven modeling.
- β Unified framework combining deep learning and large language models
- β Supports tables, figures, and text extraction
- β Benchmarked on 1,000+ annotated tables and 5,000 biological fields
- β Achieves 89.9% accuracy on table recognition and 98%+ accuracy on molecular recognition
|
You can explore zERExtractor directly through our online platform:
π zERExtractor Platform based on zCloud platform by Shanghai Zelixir Biotech Co Ltd
π οΈ The source code will be released upon the acceptance and publication of our paper.
- π Preprint on arXiv
- π» Project Website
- π¦ Dataset & Results on GitHub Releases
RyanοΌCASοΌ π§ ryan5zh5@gmail.com π§ contact@zelixir.com
If you find this work useful, please cite:
@article{zhou2025zerextractor,
title={zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature},
author={Zhou, Rui and Ma, Haohui and Xin, Tianle and Zou, Lixin and Hu, Qiuyue and Cheng, Hongxi and Lin, Mingzhi and Guo, Jingjing and Wang, Sheng and Zhang, Guoqing and others},
journal={arXiv preprint arXiv:2508.09995},
year={2025}
}

