BatteryChemistryClassification is an AI-powered project that classifies battery research papers into different chemistries using Large Language Models (LLMs). By automating this process, researchers can efficiently extract valuable insights from the vast amount of literature in battery technology.
- Introduction
- Features
- Models Used
- Installation
- Usage
- Dataset
- Results
- Future Work
- Contributing
- License
- Acknowledgments
BatteryChemistryClassification employs state-of-the-art transformer models to categorize research papers based on battery chemistries. It enhances literature analysis by providing an automated and scalable approach to text classification.
- LLM-based Text Classification: Categorizes battery research papers into different chemistries.
- Pretrained Transformer Models: Fine-tuned BERT, DeBERTa, RoBERTa, LongFormer models.
- Scalable & Customizable: Can be further fine-tuned for related downstream tasks.
- Automated Literature Processing: Assists in quickly identifying relevant studies.
The following transformer models have been trained and evaluated for text classification:
- BERT (Bidirectional Encoder Representations from Transformers)
- DeBERTa (Decoding-enhanced BERT with Disentangled Attention)
- RoBERTa (Robustly Optimized BERT Pretraining Approach)
- LongFormer (Efficient Transformer for Long Documents)
- Create a virtual environment (optional but recommended):
python3 -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
- Clone the repository:
git clone https://github.com/vigsam-coder/BatteryChemistryClassification.git cd BatteryChemistryClassification pip install .
- Download and scrap scientific data
python BatteryChemistryClassification/battery_data.py Dataset/battery.csv Dataset/df_processed.csv --debug
- Change the model configurations in
BatteryChemistryClassification/config/config.yaml
- Train the LLM model
python BatteryChemistryClassification/training.py