Omar Samir OmarSamirz

Hi there, I'm Omar 👋

I'm a Computer Science graduate with a strong passion for machine learning, multimodal AI, and autonomous systems. I enjoy transforming complex research ideas into real-world systems and building intelligent models that reason, perceive, and act.

Throughout my academic journey, I worked on multiple end-to-end AI projects, including Vision-Language-Action models for autonomous driving, OCR systems, and speech synthesis models. I’m driven by research, engineering, and pushing AI from theory to production.

🚀 Featured Projects

🚗 DriveFusion – Vision-Language-Action Model for Autonomous Driving (Graduation Project)

A research-driven Vision-Language-Action (VLA) framework designed to enable multimodal large language models to perceive driving scenes, reason about them, and generate vehicle control commands.

Designed a unified architecture combining vision, language reasoning, and ego-vehicle state
Fine-tuned Qwen2.5-VL for autonomous driving tasks
Generated structured outputs: trajectory, speed, and steering predictions
Enabled explainable driving decisions using language grounding
Trained and evaluated on DriveGPT4, LingoQA, DriveLM datasets

Tech: PyTorch, Qwen2.5-VL, Multimodal LLMs, Autonomous Driving, Computer Vision

📄 Fine-Tuning an Arabic OCR Model using Tesseract 5 (Research Paper)

A published research project focused on improving Arabic OCR accuracy through large-scale synthetic data generation and model fine-tuning.

Built a dataset of ~1 million Arabic sentences with and without tashkeel
Generated synthetic OCR images using TRDG with multiple Arabic fonts
Fine-tuned Tesseract 5 Arabic models (ara.traineddata)
Evaluated performance using CER and WER benchmarks
Achieved significant WER improvements on clean datasets

📎 Paper: https://ieeexplore.ieee.org/document/10928060

Tech: Tesseract OCR, Arabic NLP, Dataset Engineering, OCR Evaluation

🔊 EGTTS v0.1 – Text-to-Speech Model

A beta Text-to-Speech model built on the XTTS v2 architecture.

Focused on voice quality and performance
Published and documented the model for public use

Tech: Speech Processing, Deep Learning, Model Training

🧩 IFTG – Image From Text Generator (Open Source)

A Python package for generating OCR training datasets from text.

Supports multilingual text rendering with custom fonts
Includes noise augmentation for OCR robustness
Published on PyPI with full documentation (MkDocs)

Tech: Python Packaging, Data Augmentation, OCR Pipelines

📚 Research & Publications

Fine-Tuning an Arabic OCR Model using Tesseract 5
IEEE Xplore
https://ieeexplore.ieee.org/document/10928060

💻 Skills

🧑‍💻 Programming

🤖 Machine Learning & AI

👁️ Vision, Language & Multimodal AI

Computer Vision & Image Processing
OCR Systems & Document AI
Vision-Language Models (VLMs, MLLMs)
Vision-Language-Action (VLA) Models
Multimodal Dataset Curation
Model Fine-Tuning & Evaluation

📊 Data Visualization

🛠️ Tools & Engineering

PyTorch
LLaMAFactory
MkDocs
Git & GitHub
Linux

🔍 Interests

Autonomous Driving • Multimodal AI • Large Language Models • Applied Machine Learning • AI Research

📫 Contact

GitHub: https://github.com/OmarSamirz
LinkedIn: https://www.linkedin.com/in/omar-samir-8415b2285/
Email: omarsamir1300@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly