Skip to content

Latest commit

 

History

History
25 lines (19 loc) · 1.56 KB

File metadata and controls

25 lines (19 loc) · 1.56 KB

Large Vision Models Inferences

[Made Public]

Multimodal AI Inference

Overview

This repository is dedicated to running inference tasks using various large vision models (LVMs) over secure SSH connections. It serves as a growing collection of scripts that implement and manage inference for cutting-edge multimodal AI models, focusing on both vision and language tasks.

Key Features

  • Qwen Inference: Leverages Qwen, a robust language model, to process multimodal input through qwen_inference.py.
  • Llava Next Integration: Adds advanced visual understanding with Llava Next, utilizing llava_next_inference.py.
  • Continuous Expansion: As more large vision models are explored and integrated, the repository will expand with additional inference files.
  • SSH-based Inference: All inference processes are conducted remotely over SSH, providing scalable and secure access to compute resources.

Files (Growing Collection)

  • qwen_inference.py: Script for running inference tasks using the Qwen model.
  • llava_next_inference.py: Inference script for Llava Next, aimed at advanced visual understanding.

Use Case

This repository is designed for AI researchers and developers working on large vision models. It facilitates the remote deployment and inference of state-of-the-art vision and multimodal models, with secure SSH-based access.

Future Work

  • Integration of additional large vision models for comprehensive multimodal tasks.
  • Support for larger datasets and batch processing capabilities.
  • Performance benchmarking and optimization for inference tasks.