Build software better, together

yigitkonur / api-llm-ocr

PDF to markdown using vision LLMs — tables, layouts, and structure preserved

python ocr text-extraction table-extraction fastapi document-ai pdf-to-markdown vision-llm

Updated Feb 21, 2026
Python

ahnafnafee / local-llm-pdf-ocr

Convert scanned PDFs into searchable text locally using Vision LLMs (olmOCR). 100% private, offline, and free. Features a modern Web UI & CLI.

python ocr web-ui document-processing fastapi privacy-focused searchable-pdf no-api-key pdf-ocr local-llm offline-ai surya-ocr olmocr vision-llm

Updated Dec 23, 2025
Python

aidalinfo / extract-kit

Star

Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured, validated data using TypeScript, Zod, and AI providers like Scaleway and Ollama.

pdf document-processing ai-sdk pdf-extraction vision-llm

Updated Sep 14, 2025
TypeScript

AIPythoner / pymidscene

Star

PyMidscene - Midscene.js 的 Python SDK 实现 | AI 驱动的自然语言 UI 自动化，告别选择器，用中文描述即可操作。与官方缓存格式完全兼容。

python automation ai natural-language ui-testing browser-automation rpa playwright vision-llm midscene

Updated Feb 9, 2026
Python

mazsola2k / ai-video-editor

Star

AI Video Editor Pipeline with Vision LLM Models

python workflow optimized video-editor integrated youtube-upload davinci-resolve llm llamacpp llama-cpp end-to-end-workflow vision-llm ai-video-editor llm-vision llm-video-editor

Updated Feb 15, 2026
Python

vdamov / D2R-AI-Item-Tracker

Star

AI-powered OCR for Diablo II: Resurrected - batch-extract item tooltips from screenshots using Vision LLMs (OpenAI, Groq, OpenRouter, LM Studio/Ollama). No Tesseract or EasyOCR needed.

Updated Sep 3, 2025
Python

barni007-pro / ollama_desktop_client

Star

A feature-rich desktop GUI for Ollama with Vision, RAG, and JSON support.

desktop-app python gui vision thinking code-execution rag ai-tools llm net8 local-ai ollama vision-llm

Updated Feb 21, 2026
Visual Basic .NET

A Python‑based incident detection engine that analyzes video feeds for motion, detects objects, and uses large language models (LLMs) to generate semantic descriptions of incidents. Designed for extensibility with custom detectors and processors.

computer-vision yolo llm vision-llm public-service-ai

Updated Feb 8, 2026
TypeScript

ceodaniyal / free-llm-image-to-text

Star

Free OCR powered by LLMs using OpenRouter — extract text from images with no API costs. Works with image URLs and Base64 inputs using free vision-capable models.

python ocr computer-vision image-processing text-extraction image-to-text api-integration llm free-ai openrouter ai-ocr free-ocr vision-llm

Updated Dec 11, 2025
Python

NS027 / medical_chatbot_project_genAI

Star

Multimodal AI-powered medical assistant with LLMs, speech, and image understanding.

chatbot llama whisper peft multimodal huggingface healthcare-ai generative-ai qwen vision-llm

Updated Apr 18, 2025
Jupyter Notebook

ceodaniyal / local-llm-ocr-ollama

Star

Free, offline OCR using local LLMs with Ollama. Convert images to text with vision-enabled models running entirely on your machine — no cloud, no API costs, full privacy.

python ocr computer-vision image-processing text-extraction image-to-text llm local-llm ollama ai-ocr offline-ocr free-ocr llm-ocr vision-llm

Updated Dec 11, 2025
Python

code-vygr / local-llm-ocr-ollama

Star

🖼️ Extract text from images locally using Ollama's LLMs—100% free, offline, and private. No API keys or cloud costs necessary.

python computer-vision offline image-processing embeddings plants openai ocr-recognition document-processing multimodal ml-engineering ai-engineering llm generative-ai local-llm ollama mistral-7b ai-ocr vision-llm

Updated Feb 25, 2026
Python

AbdallahIbrahim27 / Bank-Statements-Analyzer

Star

AI-powered tool that extracts structured data from bank statement images using LLaMA Vision and displays it in clean JSON and table formats. Built with Streamlit and pandas for fast, accurate financial document parsing.

python ocr image-processing pandas streamlit generative-ai ollama llama3 vision-llm

Updated Oct 27, 2025
Python

MayankSahu297 / marksheet-information-extraction-api

Star

A FastAPI-based backend service that extracts structured information from academic marksheets (images or PDFs) using OCR and an LLM, and returns a normalized JSON response with confidence scores.

docker ocr computer-vision json-api reactjs image-processing document-processing ai-api fastapi huggingface pdf-processing easyocr document-ai llm multimodal-ai google-gemini-api vision-llm marksheet-extraction

Updated Jan 24, 2026
Python

Sean95164 / discord-ai-comic-explainer

Star

🤖 A Discord bot that scrapes daily tech comics (XKCD, MonkeyUser, Turnoff.us) and uses Vision LLMs (Llama-4 via Groq) to explain the jokes.

python docker discord-bot web-scraping xkcd groq-api tech-humor vision-llm

Updated Dec 14, 2025
Python

usteinfo / pymidscene

Star

🤖 Automate UI interactions with ease using the PyMidscene Python SDK, leveraging Midscene.js for AI-driven, natural language commands.

python automation ai natural-language ui-testing browser-automation rpa playwright vision-llm midscene

Updated Feb 25, 2026
Python

annibale-x / Easymage

Star

Multi-engine image generation filter for Open WebUI. Features automated prompt enhancement, multi-language support, and real-time Vision QC scoring. Supports A1111, ComfyUI, and OpenAI backends with integrated performance telemetry.

python prompt-engineering stable-diffusion comfyui image-generation-ai open-webui vision-llm open-webui-functions

Updated Feb 15, 2026
Python

juanso123 / local-llm-pdf-ocr

Star

python nlp docker-compose web-ui embeddings feature-extraction privacy-focused searchable-pdf geometric-transformations no-api-key llm ollama llimage olmocr vision-llm

Updated Feb 25, 2026
Python

vishvaRam / Fine-Tuning-Qwen2.5-Vision

Star

This repository focuses on customizing the Qwen2.5-Vision model for specific tasks. It provides step-by-step guidance, scripts, and best practices for fine-tuning the model on custom datasets. Ideal for developers and researchers, it ensures optimal performance and accuracy tailored to unique use cases.

transformer fine-tuning vision-transformer llm vision-language-model llm-training qwen qwen2-5 vision-llm

Updated Apr 22, 2025
Jupyter Notebook

weaveweave / PP-Receipt

Star

Automated data extraction from PDF receipts to Excel using Vision LLM (tested with Qwen3-VL and olmOCR 2).

python macos automation ocr computer-vision local-first vision-llm receipt-extraction

Updated Feb 10, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-llm

Here are 22 public repositories matching this topic...

yigitkonur / api-llm-ocr

ahnafnafee / local-llm-pdf-ocr

aidalinfo / extract-kit

AIPythoner / pymidscene

mazsola2k / ai-video-editor

vdamov / D2R-AI-Item-Tracker

barni007-pro / ollama_desktop_client

10mudassir007 / Sentinel-AI

ceodaniyal / free-llm-image-to-text

NS027 / medical_chatbot_project_genAI

ceodaniyal / local-llm-ocr-ollama

code-vygr / local-llm-ocr-ollama

AbdallahIbrahim27 / Bank-Statements-Analyzer

MayankSahu297 / marksheet-information-extraction-api

Sean95164 / discord-ai-comic-explainer

usteinfo / pymidscene

annibale-x / Easymage

juanso123 / local-llm-pdf-ocr

vishvaRam / Fine-Tuning-Qwen2.5-Vision

weaveweave / PP-Receipt

Improve this page

Add this topic to your repo