Skip to content
View OmarSamirz's full-sized avatar

Highlights

  • Pro

Organizations

@DriveFusion

Block or report OmarSamirz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
OmarSamirz/README.md

Hi there, I'm Omar πŸ‘‹

I'm a Computer Science graduate with a strong passion for machine learning, multimodal AI, and autonomous systems. I enjoy transforming complex research ideas into real-world systems and building intelligent models that reason, perceive, and act.

Throughout my academic journey, I worked on multiple end-to-end AI projects, including Vision-Language-Action models for autonomous driving, OCR systems, and speech synthesis models. I’m driven by research, engineering, and pushing AI from theory to production.


πŸš€ Featured Projects

πŸš— DriveFusion – Vision-Language-Action Model for Autonomous Driving (Graduation Project)

A research-driven Vision-Language-Action (VLA) framework designed to enable multimodal large language models to perceive driving scenes, reason about them, and generate vehicle control commands.

  • Designed a unified architecture combining vision, language reasoning, and ego-vehicle state
  • Fine-tuned Qwen2.5-VL for autonomous driving tasks
  • Generated structured outputs: trajectory, speed, and steering predictions
  • Enabled explainable driving decisions using language grounding
  • Trained and evaluated on DriveGPT4, LingoQA, DriveLM datasets

Tech: PyTorch, Qwen2.5-VL, Multimodal LLMs, Autonomous Driving, Computer Vision


πŸ“„ Fine-Tuning an Arabic OCR Model using Tesseract 5 (Research Paper)

A published research project focused on improving Arabic OCR accuracy through large-scale synthetic data generation and model fine-tuning.

  • Built a dataset of ~1 million Arabic sentences with and without tashkeel
  • Generated synthetic OCR images using TRDG with multiple Arabic fonts
  • Fine-tuned Tesseract 5 Arabic models (ara.traineddata)
  • Evaluated performance using CER and WER benchmarks
  • Achieved significant WER improvements on clean datasets

πŸ“Ž Paper: https://ieeexplore.ieee.org/document/10928060

Tech: Tesseract OCR, Arabic NLP, Dataset Engineering, OCR Evaluation


πŸ”Š EGTTS v0.1 – Text-to-Speech Model

A beta Text-to-Speech model built on the XTTS v2 architecture.

  • Focused on voice quality and performance
  • Published and documented the model for public use

Tech: Speech Processing, Deep Learning, Model Training


🧩 IFTG – Image From Text Generator (Open Source)

A Python package for generating OCR training datasets from text.

  • Supports multilingual text rendering with custom fonts
  • Includes noise augmentation for OCR robustness
  • Published on PyPI with full documentation (MkDocs)

Tech: Python Packaging, Data Augmentation, OCR Pipelines


πŸ“š Research & Publications


πŸ’» Skills

πŸ§‘β€πŸ’» Programming

Python C C++ SQL


πŸ€– Machine Learning & AI

PyTorch scikit-learn NumPy Pandas SciPy


πŸ‘οΈ Vision, Language & Multimodal AI

  • Computer Vision & Image Processing
  • OCR Systems & Document AI
  • Vision-Language Models (VLMs, MLLMs)
  • Vision-Language-Action (VLA) Models
  • Multimodal Dataset Curation
  • Model Fine-Tuning & Evaluation

πŸ“Š Data Visualization

Matplotlib Plotly


πŸ› οΈ Tools & Engineering

  • PyTorch
  • LLaMAFactory
  • MkDocs
  • Git & GitHub
  • Linux

πŸ” Interests

Autonomous Driving β€’ Multimodal AI β€’ Large Language Models β€’ Applied Machine Learning β€’ AI Research


πŸ“« Contact

Pinned Loading

  1. DriveFusion/drivefusion DriveFusion/drivefusion Public

    Python

  2. joejoe03/Egyptian-Text-To-Speech joejoe03/Egyptian-Text-To-Speech Public

    Jupyter Notebook 37 8

  3. ImageFromTextGenerator ImageFromTextGenerator Public

    IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and l…

    Python 20 1

  4. DriveFusion/drivefusion-train DriveFusion/drivefusion-train Public

    Python

  5. Fine-Tuning-an-Arabic-OCR-Model-using-Tesseract-5.0 Fine-Tuning-an-Arabic-OCR-Model-using-Tesseract-5.0 Public

    This research aims to fine-tune an Arabic OCR model using Tesseract 5.0, enhancing text recognition accuracy through extensive data collection, preprocessing, and image generation. By leveraging ad…

    Jupyter Notebook 6 1

  6. NextBI NextBI Public

    NextBI is an AI assistant for business intelligence that lets users ask natural-language questions about enterprise data and instantly get answers and visualizations. It connects to Teradata Vantag…

    Jupyter Notebook 2