Skip to content

mbparacha88/Bang_Mohammad_w266_Project_Code_Repo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DATASCI 266: Natural Language Processing Final Paper

Welcome to DATASCI w266 Project Repository for John Bang and Mohammad Paracha!

In this paper, we classify medical exam questions from the MedMCQA dataset, which contains over 193,000 expert-authored questions from AIIMS and NEET PG exams across 21 medical sub- jects. We evaluate five models: Naive Bayes, logistic regression, BERT, BioBERT, and Gem- ini 1.5 Flash. Unlike previous work that focuses on answer prediction or open-domain medical reasoning, our study reframes the problem as subject-level classification—an underexplored yet critical task for downstream applications. Our results show that BioBERT achieves high precision scores, highlighting the effectiveness of domain-specific pretraining for understand- ing medical language.

Screenshot 2025-08-03 at 7 38 06 PM

About

Code Repository for w266 Project for John Bang and Mohammad Paracha

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors