qwen-vl

Here are 31 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Jan 11, 2026
Markdown

1038lab / ComfyUI-QwenVL

Sponsor

Star

ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and video analysis.

comfyui customnodes qwen-vl qwen3-vl

Updated Feb 10, 2026
Python

zli12321 / Vision-Language-Models-Overview

Star

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

Updated Apr 29, 2026
HTML

zjysteven / lmms-finetune

Star

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning multimodal vision-language foundation-models instruction-tuning large-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next

Updated Feb 28, 2026
Python

aiptimizer / TurboOCR

Star

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

ocr grpc nvidia text-recognition text-detection inference-server fp16 tensorrt rag fastapi pdf-extraction paddleocr easyocr document-ai document-parsing qwen-vl gpu-ocr

Updated May 1, 2026
C++

zli12321 / Vision-SR1

Star

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

reinforcement-learning self-improvement self-rewarding vision-language-models qwen-vl grpo self-evolving-ai visual-perception-reward

Updated Mar 14, 2026
Python

reidbarber / webmarker

Star

Mark web pages for use with vision-language models

som prompt gemini operator cua claude playwright prompt-engineering llms vision-language-model gpt4v qwen-vl gpt4o set-of-mark computer-use computer-using-agent

Updated Mar 8, 2026
TypeScript

dolphin-creator / VideoContext-Engine

Star

Local Video RAG Engine. A FastAPI microservice for video understanding: Scene Detection + Whisper ASR + Qwen3-VL. Optimized for Apple Silicon (MLX) & Windows/Linux (Llama.cpp).

python microservice whisper mlx video-analysis rag fastapi apple-silicon llama-cpp local-ai qwen-vl local-ai-agents

Updated Dec 4, 2025
Python

Codeeaner / Computer-Use-Agent

Star

An AI Agent that is able to control your screen to complste any task

agent ai desktop agents cua ai-agents autogen ai-tools llm qwen-vl computer-use browser-use computer-use-agent qwen3 browser-use-agent desktop-au visual-language-mo computer-aut agent-com

Updated Oct 23, 2025
Jupyter Notebook

janelu9 / EasyLLM

Star

Running Large Language Model easily.

llama fine-tuning megatron npu pretrain deepspeed rlhf vllm qwen deepseek qwen-vl

Updated Feb 12, 2026
Python

luxus180 / LLaVA-OneVision-1.5

Star

🛠️ Build and train multimodal models easily with LLaVA-OneVision 1.5, an open framework designed for seamless integration of vision and language tasks.

finetuning multimodal vision-language foundation-models llm instruction-tuning mllm vision-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next qwen3

Updated May 3, 2026
Python

autodistill / autodistill-qwen-vl

Star

Qwen-VL base model for use with Autodistill.

zero-shot-object-detection autodistill qwen-vl

Updated Feb 8, 2024
Python

100noob / Qwen-Grasper

Star

A robotic sequential grasping system integrating YOLO detection and Qwen-VLM fine-tuning, enabling a full loop from manual teaching to LLM-based logical manipulation.

numpy yolo object-detection opencv-python vlm opencv2 robotic-arm ncnn-model ultralytics raspberry-pi-5 qwen-vl unsloth yolo11

Updated Apr 27, 2026
Python

gokul6350 / GNX-CLI

Star

🤖 The Next-Gen AI Agent. Unlike normal agents, it goes beyond text and can control your Desktop & Android.

android cli machine-learning automation ai computer-vision adb desktop-automation pyautogui ai-agent llm vision-language-model qwen-vl computer-use

Updated Feb 15, 2026
Python

mangobanaani / movie2story

Star

creates text from video and audio using Qwen-VL and Whisper

python machine-learning qwen-vl

Updated Jan 24, 2026
Jupyter Notebook

telota / imagines-nummorum-vlm-data-extraction

Star

A computer vision system for automated analysis of index cards from a collection of coin forgeries using Qwen2.5-VL vision-language model. Developed for the imagines nummorum project.

transformers information-extraction vlm qwen-vl

Updated Aug 6, 2025
Python

anubisshah / sgun-qwen3.5-comfyui

Star

Enable local integration of Qwen3.5 models with ComfyUI for text generation and multimodal visual tasks, featuring automatic model management and precision control.

Updated May 3, 2026
Python

3182417680 / EYUAI

Star

EYUAI

ocr image-editing image-recognition text-to-image ai-art flux-model ai-tools ai-image-generator qwen-vl seadream

Updated Apr 4, 2026

liewcc / ComfyUI-Qwen-Canvas

Star

A specialized ComfyUI toolkit for Qwen Image Edit workflows. It provides official training resolution calibration, real-time UI aspect ratio feedback, and intelligent image scaling (Crop/Pad/Stretch) to ensure optimal inference quality for Qwen-series image editing and generation.

machine-learning computer-vision aspect-ratio image-editing custom-nodes comfyui qwen-vl qwen2-5 latent-generator

Updated Jan 28, 2026
Python

anto18671 / image-to-dense-caption

Sponsor

Star

Generate vivid, human-like captions for portrait images using the Qwen2.5-VL-7B model. Outputs dense descriptions covering emotion, posture, clothing, and environment.

transformers image-captioning captioning-images vision-language vision-language-model qwen-vl

Updated Jul 17, 2025
Python

Improve this page

Add a description, image, and links to the qwen-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen-vl

Here are 31 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

1038lab / ComfyUI-QwenVL

zli12321 / Vision-Language-Models-Overview

zjysteven / lmms-finetune

aiptimizer / TurboOCR

zli12321 / Vision-SR1

reidbarber / webmarker

dolphin-creator / VideoContext-Engine

Codeeaner / Computer-Use-Agent

janelu9 / EasyLLM

luxus180 / LLaVA-OneVision-1.5

autodistill / autodistill-qwen-vl

100noob / Qwen-Grasper

gokul6350 / GNX-CLI

mangobanaani / movie2story

telota / imagines-nummorum-vlm-data-extraction

anubisshah / sgun-qwen3.5-comfyui

3182417680 / EYUAI

liewcc / ComfyUI-Qwen-Canvas

anto18671 / image-to-dense-caption

Improve this page

Add this topic to your repo