Mykola Melnyk mykolamelnykml

Greetings! 👋

My name is Mykola Melnyk, and I'm an ML expert with two decades of experience in the software development. I specialize in transforming complex business ideas into scalable, secure, and efficient AI-driven products. I have expert knowledge in various areas, enabling me to deliver cutting-edge, top-tier AI solutions that drive business growth and improve efficiency.

Key Areas of My Specialization:

📄 Natural Language Processing (NLP), Computer Vision (CV), and Optical Character Recognition (OCR): 5+ years of experience in document processing, understanding, and anonymization. Led the development of Spark OCR (Visual NLP) using technologies such as Python/Scala, PySpark, PyTorch, LLMs, LLama 3, Mini Gemini, LangChain, and Hugging Face Transformers.

⚡ Big Data Processing with Apache Spark: 7+ years of experience designing and optimizing large-scale data pipelines for high-performance processing. In-depth knowledge of Spark internals, Spark Structured Streaming, and creator/contributor to the open-source spark-pdf datasource project written in Scala, enhancing Spark’s capabilities.

🔒 Data De-identification & Anonymization: Expert in anonymizing sensitive data from text, images, PDFs, and DICOM files. I ensure privacy, security, and compliance with GDPR and HIPAA standards using NLP, OCR, and computer vision to remove or mask personal information, safeguarding data confidentiality.

🧬 Healthcare, Pharma, MedTech, BioTech Expertise: Over 5 years of experience in the healthcare and life sciences sectors, with a strong understanding of formats like DICOM, and expertise in delivering solutions specifically tailored to meet the unique needs of these industries.

TOP 5 Reasons to Work With Me

✅ End-to-End Expertise

✅ Complex Problem-Solving Ability

✅ Timely Delivery

✅ Transparent Communication

✅ Scalable Solutions

Professional Skills

🛠️ Programming Languages: Python, Scala

📊 Data Science & Machine Learning: NLP, Computer Vision, Large Language Models (LLMs), Optical Character Recognition (OCR), Model Productionalization, Deep Learning (PyTorch, TensorFlow, Hugging Face Transformers, ONNX, Pandas, CLIP)

💡 LLMs and Related Tools: OpenAI GPT, Gemini, Llama 3, FLUX, Together.ai, Ollama, Hugging Face, Langchain, LlamaIndex, LangServe, LangGraph, QLORA, Streamlit, Gradio

⚡ Big Data & Distributed Systems: Big Data Processing, ETL, Stream Processing, Real-Time Aggregation, Apache Spark (PySpark, Spark ML, Spark Structured Streaming), Kinesis, Kafka, Databricks

🚀 Cloud Computing & Infrastructure: Amazon Web Services (AWS), Distributed Systems, CI/CD Pipelines, Docker, Jenkins, Graphite, Grafana, Elasticsearch, Kibana

⚙️ Databases: PostgreSQL, MongoDB, Redis, DynamoDB

💼 CRMs: Hubspot, ZohoCRM

Availability

Committed to long-term collaborations. Available full-time for your next project.

My Projects

Spark PDF DataSource

Source Code: https://github.com/StabRise/spark-pdf

Home page: https://stabrise.com/spark-pdf/

Quick Start Jupyter Notebook: PdfDataSource.ipynb

The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.

Key features:

Read PDF documents to the Spark DataFrame
Support read PDF files lazy per page
Support big files, up to 10k pages
Support scanned PDF files (call OCR)
No need to install Tesseract OCR, it's included in the package

ScaleDP

Source Code: https://github.com/StabRise/scaledp

Home page: https://stabrise.com/scaledp/

Quick Start Jupyter Notebook: https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb

ScaleDP is an Open-Source Library for processing documents using Apache Spark.

Key features:

Load PDF documents/Images
Extract text from PDF documents/Images
Extract images from PDF documents
OCR Images/PDF documents
Run NER on text extracted from PDF documents/Images
Visualize NER results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mykola Melnyk mykolamelnykml

Achievements

Achievements

Block or report mykolamelnykml

Key Areas of My Specialization:

TOP 5 Reasons to Work With Me

Professional Skills

Availability

My Projects

Spark PDF DataSource

Key features:

ScaleDP

Key features:

Github

Pinned Loading

Uh oh!