Skip to content

mitadake/tokengen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TokenGen: Visualize Token Prediction Evolution & Attention Dynamics

TokenGen is an interactive visualization tool for exploring how transformer-based language models predict tokens layer-by-layer. It provides insights into the evolution of token probabilities and attention head dynamics across transformer blocks. Built with Streamlit and Plotly. Website: TokenGen. Video:

Video Thumbnail

Features

  • Token Probability Timeline: Track how token predictions evolve through each transformer layer.
  • Attention Heatmaps: Visualize aggregated attention patterns across layers and tokens.
  • Head Clustering: Discover patterns in attention heads using UMAP dimensionality reduction and K-means clustering.
  • Model Comparison: Compare predictions and attention patterns between two models.
  • Contrastive Analysis: Analyze model preferences between two tokens across layers.

Installation

  1. Clone repository:
    git clone https://github.com/mitadake/tokengen.git
    cd tokengen
  2. Install dependencies:
    pip install -r requirements.txt
    

Usage

  1. Launch Streamlit app:
    streamlit run token_prob_timeline.py
  2. In the browser:
    • Select a model (or compare two)
    • Enter your text prompt
    • Adjust visualization parameters
    • Explore different tabs and visualizations

Key Visualizations

  1. Token Prediction:
    Predicted token
    Predicts the token based on the model chosen.

  2. Probability Timeline:
    Probability Timeline
    Shows how different token probabilities change through successive transformer layers.

  3. Attention Heatmap:
    Attention Heatmap
    Displays layer-wise attention patterns aggregated across all attention heads.

  4. Head Clustering:
    Head Clustering
    Groups of similar attention heads using K-Means to reveal functional patterns.

Supported Models

  • GPT-2 (base, medium)
  • DistilGPT-2
  • OPT-1.3b

Note: Larger models require more memory and GPU resources. But a similar visualization can be done for them.

Example Analysis

Try the default prompt: "The world is full of amazing"

  1. Observe probability shifts:
  • See how "things" overtakes "people" in later layers for GPT-2 medium model.
  • Notice how grammatical tokens remain strong throughout.
  1. Analyze attention patterns:
  • See how early layers focus on determiners ("The").
  • Notice later layers attending to descriptive words ("amazing").
  1. Compare models:
  • Try GPT-2 vs. OPT-1.3b.
  • Observe different attention allocation strategies.

Contrastive Mode

Compare how two tokens fare across layers:

  1. Enable "Contrastive Explanation Mode".
  2. Enter tokens (e.g., "people" vs "things").
  3. See which layers prefer each token.

Contrastive Analysis

Notes

  • First run will download selected models.
  • Loading the model may take time.
  • Clear the cache after analysis of the two/ three models.
  • Work to be done: optimize the inference time and model loading using open-source tools like unsloth and onnx.
  • Future work: Visualization support for custom models uploaded or from Hugging Face by the user.

Contributing

Contributions welcome! Please open an issue first to discuss proposed changes.

License

MIT License

Acknowledgements

GitHub View Source Open in Streamlit

About

A small visualization tool for token probability timeline and change in attention across transformer block .

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages