MoE vs. Dense Transformer Models: A Comprehensive Benchmark Study

Project Overview

This repository contains the code, data, and analysis for our comparative benchmark study evaluating Mixture-of-Experts (MoE) architectures against traditional Dense Transformer models. The study provides a systematic evaluation of model capabilities across factual knowledge, reasoning, truthfulness, and bias dimensions.

🚀 Key Highlights

Comprehensive evaluation of 6 state-of-the-art LLMs across 4 diverse datasets
Rigorous comparison between sparse (MoE) and dense architectural paradigms
70+ hours of compute on A100 GPUs
Novel insights into architectural trade-offs for real-world deployment

📊 Models Evaluated

Model	Type	Parameters	Active Parameters
Mixtral 8x7B 4-bit Quantized	MoE	46.7B	12.9B
RWKV-4-Raven-14B	Recurrent	14B	14B
DeepSeek v2 Base 7B	MoE	15.7B	2.4B
LLaMa 2 13B	Dense	13B	13B
Gemma 7B	Dense	7B	7B
Phi 3 mini 4k instruct	Dense	3.8B	3.8B

📈 Datasets Used

Counterfact Dataset (200 samples)
- Tests factual recall and knowledge integrity
Bias Benchmark for Question Answering
- Age (200 Samples)
- Disability (200 Samples)
- Race (200 Samples)
- Gender (200 Samples)
TruthfulQA Dataset (200 Samples)
- Evaluates tendency to generate misinformation
BigBench Logical Reasoning - 5 Object Logical Deduction (200 Samples)
- Tests pure reasoning capacity independent of knowledge

🔍 Key Findings

Mixtral 8x7B outperformed all other models across all four datasets, demonstrating the effectiveness of the MoE architecture
DeepSeek performed worst on bias metrics across all four demographic categories
LLaMA 2 13B showed concerning weaknesses in the TruthfulQA benchmark
RWKV-4-Raven model struggled most with factual recall and logical reasoning
None of the models (except partially Mixtral) achieved above 60% accuracy on bias benchmarks

💡 Why This Matters

This benchmark provides critical insights into how architectural choices impact model performance across different dimensions. As LLMs become increasingly deployed in production environments, understanding these trade-offs becomes essential for responsible AI development.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Mixtral8x7B_4bit_Quantized		Mixtral8x7B_4bit_Quantized
RWKV-4_Raven14B		RWKV-4_Raven14B
Final_Report.pdf		Final_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoE vs. Dense Transformer Models: A Comprehensive Benchmark Study

Project Overview

🚀 Key Highlights

📊 Models Evaluated

📈 Datasets Used

🔍 Key Findings

💡 Why This Matters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MoE vs. Dense Transformer Models: A Comprehensive Benchmark Study

Project Overview

🚀 Key Highlights

📊 Models Evaluated

📈 Datasets Used

🔍 Key Findings

💡 Why This Matters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages