Skip to content

Mark-Kitur/Swahili_English-Translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Swahili–English Neural Machine Translation Model

Transformer Architecture (Attention Is All You Need) This repository contains an end-to-end Swahili–English Neural Machine Translation (NMT) system implemented using the Transformer architecture introduced in the landmark paper Attention Is All You Need. The project includes dataset preprocessing, custom tokenizer creation, model definition, training pipeline, and inference utilities. The goal of this project is to build a fully functional sequence-to-sequence translation model without relying on external pretrained weights, while demonstrating a clean and reproducible implementation of the Transformer architecture.

1. Project Overview

The Transformer architecture eliminates recurrence and convolution by relying entirely on multi-head self-attention, enabling efficient parallelism and improved long-range sequence modeling. This project applies that architecture to translate Swahili sentences into English using a dataset collected from open parallel corpora. Key objectives of the project include: Build a custom tokenizer for both languages. Implement the original Transformer components from scratch. Train an encoder–decoder model following the “Attention Is All You Need” specification. Evaluate translation quality using BLEU scores. Provide an inference script for real-time translation.

3. Tokenizer Construction

A key objective of this project was to build the tokenizer manually rather than relying on prebuilt libraries. Tokenizer Design Text normalization Lowercasing Removing non-language symbols Basic punctuation handling Subword vocabulary construction Built using Byte Pair Encoding (BPE) Separate vocabularies for Swahili and English Special tokens included: , , , Vocabulary size Configurable; default is typically 8k–16k tokens per language. Encoding and decoding utilities Convert text to token IDs Convert token IDs back to text Handle unknown and padding tokens

4. Model Architecture The model strictly follows the Attention Is All You Need architecture: Encoder Token embedding + positional encoding N identical layers Multi-Head Self-Attention Positionwise feed-forward network Layer normalization and residual connections Decoder Masked multi-head self-attention Encoder–decoder cross-attention Feed-forward network Layer normalization and residual connections

5. Training Pipeline

Training was performed using PyTorch, with full teacher forcing and label smoothing. Training Steps Dataset preparation Tokenize Swahili and English sentences Pad sequences to uniform length Create DataLoader with batching and masking Loss function Cross-entropy with label smoothing Padding tokens excluded from loss Optimizer Adam optimizer with the Transformer learning rate schedule Warm-up steps implemented per original paper Checkpointing Saves model state, optimizer state, and tokenizers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors