Skip to content

Blue-radish/SimpleNER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleNER / EvaCun2025

本仓库为论文 “Simple Named Entity Recognition (NER) System with RoBERTa for Ancient Chinese” 的代码实现(EvaCun2025)。
The code implementation for the article "Simple Named Entity Recognition (NER) System with RoBERTa for Ancient Chinese".
论文链接:https://aclanthology.org/2025.alp-1.27.pdf


📌 项目简介

SimpleNER 面向古代汉语文本(如《史记》《二十四史》《中医药典籍》)的命名实体识别任务,采用 GujiRoBERTa_jian_fan + LSTM + CRF 架构,并在训练策略上采用“前期冻结预训练参数 + 后期全局微调”以缓解小样本过拟合问题。代码与实验结果用于复现论文中的 EvaHan/EvaCun2025 评测。


📁 仓库结构

SimpleNER
│
│  README.md
│  requirements.txt
│
├─data
│
├─figure
│      Poster.png
│
├─model
│      README.md
│
├─notebook
│      data.ipynb
│      EvaNer.ipynb
│      EvaNer_crf.ipynb
│      EvaNer_crf_attention.ipynb
│      EvaNer_crf_lstm.ipynb
│      predicted.ipynb
│
└─src
        EvaNer.py

🚀 快速开始

依赖安装

git clone https://github.com/Blue-radish/SimpleNER.git
cd SimpleNER
pip install -r requirements.txt

建议使用 Python 3.10 与 GPU 环境(CUDA)以加速训练。

数据准备

在 notebook/data.ipynb 中运行数据抽取与格式转换单元,生成训练/验证/测试所需的数据。

训练模型

  • train.ipynb (multi-modal, can handle multiple images)
  • train_all.ipynb (multi-modal, if there are multiple images, only one will be used)
  • train_text.ipynb (uni-modal, using only text information)

Poster (EvaCun2025)

Poster.

引用

如果在研究中使用本代码或数据,请引用我们的论文:

@inproceedings{zhang-etal-2025-simple,
    title = "Simple Named Entity Recognition ({NER}) System with {R}o{BERT}a for {A}ncient {C}hinese",
    author = "Zhang, Yunmeng  and
      Liu, Meiling  and
      Tang, Hanqi  and
      Lu, Shige  and
      Xue, Lang",
    editor = "Anderson, Adam  and
      Gordin, Shai  and
      Li, Bin  and
      Liu, Yudong  and
      Passarotti, Marco C.  and
      Sprugnoli, Rachele",
    booktitle = "Proceedings of the Second Workshop on Ancient Language Processing",
    month = may,
    year = "2025",
    address = "The Albuquerque Convention Center, Laguna",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.alp-1.27/",
    doi = "10.18653/v1/2025.alp-1.27",
    pages = "206--212",
    ISBN = "979-8-89176-235-0",
    abstract = "Named Entity Recognition (NER) is a fun-damental task in Natural Language Process-ing (NLP), particularly in the analysis of Chi-nese historical texts. In this work, we pro-pose an innovative NER model based on Gu-jiRoBERTa, incorporating Conditional Ran-dom Fields (CRF) and Long Short Term Mem-ory Network(LSTM) to enhance sequence la-beling performance. Our model is evaluated on three datasets from the EvaHan2025 competi-tion, demonstrating superior performance over the baseline model, SikuRoBERTa-BiLSTM-CRF. The proposed approach effectively cap-tures contextual dependencies and improves entity boundary recognition. Experimental re-sults show that our method achieves consistent improvements across almost all evaluation met-rics, highlighting its robustness and effective-ness in handling ancient Chinese texts."
}

About

The code implementation for the article "Simple Named Entity Recognition (NER) System with RoBERTa for Ancient Chinese".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors