I'm a Computer Science graduate with a strong passion for machine learning, multimodal AI, and autonomous systems. I enjoy transforming complex research ideas into real-world systems and building intelligent models that reason, perceive, and act.
Throughout my academic journey, I worked on multiple end-to-end AI projects, including Vision-Language-Action models for autonomous driving, OCR systems, and speech synthesis models. Iβm driven by research, engineering, and pushing AI from theory to production.
A research-driven Vision-Language-Action (VLA) framework designed to enable multimodal large language models to perceive driving scenes, reason about them, and generate vehicle control commands.
- Designed a unified architecture combining vision, language reasoning, and ego-vehicle state
- Fine-tuned Qwen2.5-VL for autonomous driving tasks
- Generated structured outputs: trajectory, speed, and steering predictions
- Enabled explainable driving decisions using language grounding
- Trained and evaluated on DriveGPT4, LingoQA, DriveLM datasets
Tech: PyTorch, Qwen2.5-VL, Multimodal LLMs, Autonomous Driving, Computer Vision
A published research project focused on improving Arabic OCR accuracy through large-scale synthetic data generation and model fine-tuning.
- Built a dataset of ~1 million Arabic sentences with and without tashkeel
- Generated synthetic OCR images using TRDG with multiple Arabic fonts
- Fine-tuned Tesseract 5 Arabic models (ara.traineddata)
- Evaluated performance using CER and WER benchmarks
- Achieved significant WER improvements on clean datasets
π Paper: https://ieeexplore.ieee.org/document/10928060
Tech: Tesseract OCR, Arabic NLP, Dataset Engineering, OCR Evaluation
A beta Text-to-Speech model built on the XTTS v2 architecture.
- Focused on voice quality and performance
- Published and documented the model for public use
Tech: Speech Processing, Deep Learning, Model Training
A Python package for generating OCR training datasets from text.
- Supports multilingual text rendering with custom fonts
- Includes noise augmentation for OCR robustness
- Published on PyPI with full documentation (MkDocs)
Tech: Python Packaging, Data Augmentation, OCR Pipelines
- Fine-Tuning an Arabic OCR Model using Tesseract 5
IEEE Xplore
https://ieeexplore.ieee.org/document/10928060
- Computer Vision & Image Processing
- OCR Systems & Document AI
- Vision-Language Models (VLMs, MLLMs)
- Vision-Language-Action (VLA) Models
- Multimodal Dataset Curation
- Model Fine-Tuning & Evaluation
- PyTorch
- LLaMAFactory
- MkDocs
- Git & GitHub
- Linux
Autonomous Driving β’ Multimodal AI β’ Large Language Models β’ Applied Machine Learning β’ AI Research
- GitHub: https://github.com/OmarSamirz
- LinkedIn: https://www.linkedin.com/in/omar-samir-8415b2285/
- Email: omarsamir1300@gmail.com

