Skip to content

Add multimodal embeddings support#351

Open
virgildotcodes wants to merge 8 commits into
laravel:0.xfrom
virgildotcodes:gemini-multimodal-embeddings
Open

Add multimodal embeddings support#351
virgildotcodes wants to merge 8 commits into
laravel:0.xfrom
virgildotcodes:gemini-multimodal-embeddings

Conversation

@virgildotcodes
Copy link
Copy Markdown

@virgildotcodes virgildotcodes commented Apr 5, 2026

Closes #308

Summary

Laravel AI currently only supports text inputs for embeddings, but provider APIs can accept richer media inputs. This PR adds multimodal embeddings input support and handles the provider-specific request shapes needed to send those inputs correctly.

This PR adds multimodal embeddings input support for embeddings, including:

  • images
  • audio
  • documents
  • video

It also validates unsupported provider / model combinations early, preserves the original media source when mapping embeddings inputs, and avoids fetching remote media when generating embeddings cache keys.

Examples

Gemini multimodal embeddings

use Laravel\Ai\Embeddings;
use Laravel\Ai\Files\Image;

$response = Embeddings::for([
    Image::fromPath('/path/to/image.png'),
])->generate(provider: 'gemini', model: 'gemini-embedding-2');

Voyage AI image embeddings

use Laravel\Ai\Embeddings;
use Laravel\Ai\Files\Image;

$response = Embeddings::for([
    Image::fromUrl('https://example.com/image.png'),
])->dimensions(1024)->generate(
    provider: 'voyageai',
    model: 'voyage-multimodal-3',
);

Voyage AI video embeddings

use Laravel\Ai\Embeddings;
use Laravel\Ai\Files\Video;

$response = Embeddings::for([
    Video::fromPath('/path/to/video.mp4'),
])->dimensions(1024)->generate(
    provider: 'voyageai',
    model: 'voyage-multimodal-3.5',
);

Changes

  • widen embeddings inputs to accept text, images, audio, documents, and video
  • add explicit validation for unsupported provider / input combinations
  • route Gemini Embedding 2 and media embeddings requests through embedContent
  • resolve Gemini provider-backed files to file URIs before sending embedding requests
  • preserve remote, local, stored, and base64 media sources when converting embeddings inputs
  • avoid remote fetches when building cache keys for remote embeddings inputs

Notes

  • Gemini multimodal embeddings require gemini-embedding-2 or gemini-embedding-2-preview
  • Voyage AI supports text and image embeddings with voyage-multimodal-3 or voyage-multimodal-3.5
  • Voyage AI video embeddings require voyage-multimodal-3.5
  • Voyage AI multimodal requests must use either URL media or base64 media exclusively

@pushpak1300 pushpak1300 marked this pull request as draft April 24, 2026 17:08
@pushpak1300
Copy link
Copy Markdown
Member

can you fix merge conflicts ?

…ddings

# Conflicts:
#	composer.json
#	src/Files/Concerns/HasRemoteContent.php
#	src/Gateway/Prism/PrismGateway.php
#	tests/Feature/EmbeddingsFakeTest.php
#	tests/Feature/EmbeddingsIntegrationTest.php
@virgildotcodes
Copy link
Copy Markdown
Author

@pushpak1300 Done, and made some additional changes to support video on Voyage which is a new capability, and gemini-embedding-2 (release version) among other things.

@virgildotcodes virgildotcodes marked this pull request as ready for review April 28, 2026 11:13
@pushpak1300
Copy link
Copy Markdown
Member

Thanks @virgildotcodes. I'll take a look.

pushpak1300 and others added 2 commits May 4, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support multimodal embeddings (like gemini-embeddings-002)

2 participants