Skip to content

Sama-ndari/llm-semantic-drift-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‰ LLM Semantic Drift Analysis ("The Telephone Game")

Python Ollama OpenAI

A scientific experiment to measure how meaning degrades ("drifts") when information is passed sequentially through a chain of different Large Language Models.

πŸ§ͺ The Experiment

Just like the children's game of "Telephone," this project feeds the output of one AI model (e.g., GPT-4o) as the only input for the next model (e.g., Claude 3). We measure the degradation of the core concept across 6 steps, including local models via Ollama.

The Chain: GPT-4o β†’ Claude 3.5 Sonnet β†’ Gemini 1.5 β†’ DeepSeek β†’ Mixtral β†’ Llama 3 (Local)

πŸ“Š Key Features

  • πŸ”„ Universal Wrapper: A single Python function to handle API calls for OpenAI, Anthropic, Google, and Ollama.
  • πŸ›‘οΈ Strict System Prompts: Ensures models act as "repeaters" rather than conversational assistants.
  • πŸ“ Hybrid Evaluation:
    • Quantitative: Cosine similarity scoring using SentenceTransformers embeddings.
    • Qualitative: GPT-4o acts as a "Judge" to score Concept Mutation and Hallucination.
  • πŸ“ˆ Visualization: Matplotlib charts correlating embedding distance with idea survival.

πŸš€ Quick Start

  1. Clone the repo
git clone https://github.com/Sama-ndari/llm-semantic-drift-analysis.git
  1. Install dependencies
pip install -r requirements.txt
  1. Setup Environment Create a .env file with your keys:
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...
GROQ_API_KEY=...
  1. Run the Notebook Launch telephone_game.ipynb and run all cells.

πŸ“ˆ Sample Results

"Significant semantic drift was observed at Step 4 (DeepSeek), where the specific academic context was replaced by generalized advice."

πŸ› οΈ Tech Stack

  • Languages: Python
  • Models: GPT-4o, Claude 3.5, Gemini 1.5 Pro, Deepseek, Groq, Llama 3:8b
  • Libraries: openai, anthropic, sentence-transformers, scikit-learn, pandas, matplotlib

Created by Sama-ndari


About

Quantifying information degradation in multi-agent AI systems. An experiment passing prompts sequentially through GPT-4o, Claude 3.5, Gemini, Deepseek, Groq and Llama 3 (Ollama) to measure and visualize semantic drift

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors