A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
-
Updated
Jan 15, 2025 - Python
A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
Cross-type Biomedical Named Entity Recognition with Deep Multi-task Learning (Bioinformatics'19)
Bioformer: an efficient BERT model for biomedical text mining
[EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".
This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have been introduced in the paper "On The Effectiveness of Compact Biomedical Transformers"
Systematic evaluation of hallucination risks in Large Language Models (GPT-4, Claude 3, Gemini Pro) for clinical proteomics and mass spectrometry interpretation. Production-ready detection framework with comprehensive benchmarks.
Graph-based RAG system for biomedical nutrigenetic knowledge discovery. Enables natural language queries on gene-nutrient interactions, supports personalized nutrition counseling, and runs 100% locally with Ollama LLMs and SBERT embeddings.
BERT-for-BioNLP-OST2019-AGAC-Task2
RAG pipeline for medical question-answering. Fuses lexical and dense retrieval (MedCPT, Contriever, Specter + FAISS) with OpenAI, Gemini, and HuggingFace LLMs. Supports iterative multi-round reasoning, strict typing, structured observability, and a clean layered architecture
AGAC-BioNL-OST2009-Task1 BERT+CRF
Implements relation extraction for biomedical texts using Hard Negative Mining to improve accuracy in identifying complex entity relationships. Includes code for data processing, training, and evaluation with BioC-format datasets.
MedQA-NLI is a comprehensive medical reasoning dataset comprising 42,889 instances designed for training and evaluating models on natural language inference (NLI) tasks in biomedical domains.
BioGemma — Google Gemma 3 1B fine-tuned on medical/biomedical corpus for clinical NLP tasks
Cancer-Alterome is a comprehensive and curated dataset that focuses on the investigation of regulatory events caused by gene alteration in the context of cancer.
SOEA-Plus (PDEMC): 3-task biomedical metacognition benchmark evaluating LLM metacognitive control across 2 frontier models on 300 real PubMed examples. Reveals the Control Collapse Gap
Core LLM for M.A.R.S. (Model Assisted Review System). Utilizes fine-tuned Llama 3.2 3B to automate biomedical SLR screening with 92.2% accuracy.
Clinical trial document intelligence pipelines using medallion architecture. Classification (87 categories) + NER (8 entity types) on Databricks.
Add a description, image, and links to the biomedical-nlp topic page so that developers can more easily learn about it.
To associate your repository with the biomedical-nlp topic, visit your repo's landing page and select "manage topics."