Skip to content

Latest commit

Β 

History

History
72 lines (53 loc) Β· 2.63 KB

File metadata and controls

72 lines (53 loc) Β· 2.63 KB

zERExtractor: Automated Multimodal Extraction of Enzyme-Catalyzed Reaction Data

zERExtractor Overview

πŸ“Œ Introduction

This repository contains the official implementation of zERExtractor, an automated and extensible platform for multimodal extraction of enzyme-catalyzed reaction data from scientific literature.
The system integrates tables, molecular diagrams, enzyme sequences, and experimental conditions into structured, machine-readable datasets for downstream AI-driven modeling.

πŸš€ Features

  • βœ… Unified framework combining deep learning and large language models
  • βœ… Supports tables, figures, and text extraction
  • βœ… Benchmarked on 1,000+ annotated tables and 5,000 biological fields
  • βœ… Achieves 89.9% accuracy on table recognition and 98%+ accuracy on molecular recognition

πŸ“Š Results

Method Acc(%) Gain
TableMaster 77.90* -
LGPMA 65.74* -
SLANet 86.0 -
Ours 89.9 3.9%

zERExtractor Overview

2025-09-05 21 25 18

⚑ Quick Start

You can explore zERExtractor directly through our online platform:
πŸ”— zERExtractor Platform based on zCloud platform by Shanghai Zelixir Biotech Co Ltd

πŸ› οΈ The source code will be released upon the acceptance and publication of our paper.

🌐 Links

πŸ“¬ Contact

Ryan(CASοΌ‰ πŸ“§ ryan5zh5@gmail.com πŸ“§ contact@zelixir.com

πŸ“– Citation

If you find this work useful, please cite:

@article{zhou2025zerextractor,
  title={zERExtractor: An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature},
  author={Zhou, Rui and Ma, Haohui and Xin, Tianle and Zou, Lixin and Hu, Qiuyue and Cheng, Hongxi and Lin, Mingzhi and Guo, Jingjing and Wang, Sheng and Zhang, Guoqing and others},
  journal={arXiv preprint arXiv:2508.09995},
  year={2025}
}