InfiniteTalk on Modal

This project deploys the MeiGen-AI/InfiniteTalk model on Modal to provide a high-performance, scalable API for generating talking head videos from an image and audio files.

The deployment is optimized for efficient inference, leveraging:

L40s GPUs (can be easily switched to other Modal GPU types)
FusioniX LoRA optimization
flash-attention
teacache

Note that this does not fully implement the multi-speaking capabilities since that requires downloading separate model weights. We focus on single person for now. Open an issue if you would like multi-person support.

Prerequisites

Clone this Repository:

git clone https://github.com/Square-Zero-Labs/modal-infinitetalk
cd modal-infinitetalk

Create a Modal Account: Sign up for a free account at modal.com.
Install Modal Client: Install the Modal client library and set up your authentication token.
```
pip install modal
modal setup
```

Deployment

The application consists of a persistent web endpoint for production use and a local CLI for testing. It uses a Volume to cache the large model files, ensuring they are only downloaded once. A second Volume is used to efficiently handle the video outputs.

To deploy the web endpoint, run the following command from your terminal:

pip install pydantic
modal deploy app.py

Modal will build a custom container image, download the model weights into a persistent Volume, and deploy the application. After a successful deployment, it will provide a public URL for your API endpoint.

The initial deployment will take several minutes as it needs to download the large model files. Subsequent deployments will be much faster as the models are cached in the volume.

The weights are under the volume name infinitetalk-models. Output videos are saved under the volume name infinitetalk-outputs.

Usage

1. Local Testing CLI

For development and testing, you can use the built-in command-line interface to run inference on local files or URLs.

modal run app.py --image-path "https://example.com/portrait.jpg" --audio1-path "https://example.com/audio.mp3" --prompt "A dog talking" --output-path outputs/my_video.mp4

2. Web API Endpoint

The deployed service can be called via a POST request with proxy authentication. The API accepts a JSON payload with the following fields:

image (string, required): A URL to the source image or video. The input can be image or video. If video the output video will contain some of the movement from the input video plus the new lip sync to the audio.
audio1 (string, required): A URL to the source audio file (MP3 or WAV).
prompt (string, optional): A text prompt.

The duration of the output video is automatically determined by the length of the input audio. The video frame count is calculated to match this combined duration while adhering to the model's 4n+1 frame constraint.

Authentication

The API requires proxy authentication tokens.

To create proxy auth tokens, go to your Modal workspace settings and generate a new token. Set the token ID and secret as environment variables:

export TOKEN_ID="your-token-id"
export TOKEN_SECRET="your-token-secret"

API Usage - Polling Pattern:

Following Modal's recommended polling pattern, we use two endpoints for long-running video generation:

Submit Job - Starts generation and returns call_id
Poll Results - Check status and download video when ready

Step 1: Submit Job

# Submit video generation job and capture call_id
CALL_ID=$(curl -s -X POST \
     -H "Content-Type: application/json" \
     -H "Modal-Key: $TOKEN_ID" \
     -H "Modal-Secret: $TOKEN_SECRET" \
     -d '{
           "image": "https://example.com/portrait.jpg",
           "audio1": "https://example.com/audio.mp3",
           "prompt": "A dog is talking"
         }' \
     "https://<username>--infinitetalk-api-model-submit.modal.run" | jq -r '.call_id')

echo "Job submitted with call_id: $CALL_ID"

Step 2: Poll for Results

# Poll for results - downloads video on success, shows status on failure
HTTP_STATUS=$(curl -w "%{http_code}" -s --head \
     -H "Modal-Key: $TOKEN_ID" \
     -H "Modal-Secret: $TOKEN_SECRET" \
     "https://<username>--infinitetalk-api-api-result-head.modal.run?call_id=$CALL_ID")
echo "HTTP $HTTP_STATUS"

202 Accepted - Shows processing status in terminal only
200 OK - Downloads video to outputs/generated_video.mp4

Step 3: Retrieve Finished Video

curl -X GET \
         -H "Modal-Key: $TOKEN_ID" \
         -H "Modal-Secret: $TOKEN_SECRET" \
         --output outputs/api-generated_video.mp4 \
         "https://<username>--infinitetalk-api-api-result.modal.run?call_id=$CALL_ID"
    echo "Video saved to outputs/api-generated_video.mp4"

Replace:

<username> with your actual Modal username
$TOKEN_ID and $TOKEN_SECRET with your proxy auth token credentials

The URL format is https://[username]--[app-name]-[class-name]-[method-name].modal.run where the class-name is model or api and the method-name are the methods defined in app.py.

Resources

Development Notes

Git Subtree Management

When originally added:

git subtree add --prefix infinitetalk https://github.com/MeiGen-AI/InfiniteTalk main --squash

If the original InfiniteTalk repository is updated and you want to incorporate those changes into this project, you can pull the updates using the following command:

git subtree pull --prefix infinitetalk https://github.com/MeiGen-AI/InfiniteTalk main --squash

Changes made to InfiniteTalk base repo

generate_infinitetalk.py (attention fix)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
infinitetalk		infinitetalk
outputs		outputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfiniteTalk on Modal

Prerequisites

Deployment

Usage

1. Local Testing CLI

2. Web API Endpoint

Authentication

Resources

Development Notes

Git Subtree Management

Changes made to InfiniteTalk base repo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InfiniteTalk on Modal

Prerequisites

Deployment

Usage

1. Local Testing CLI

2. Web API Endpoint

Authentication

Resources

Development Notes

Git Subtree Management

Changes made to InfiniteTalk base repo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages