This project deploys the MeiGen-AI/InfiniteTalk model on Modal to provide a high-performance, scalable API for generating talking head videos from an image and audio files.
The deployment is optimized for efficient inference, leveraging:
- L40s GPUs (can be easily switched to other Modal GPU types)
FusioniXLoRA optimizationflash-attentionteacache
Note that this does not fully implement the multi-speaking capabilities since that requires downloading separate model weights. We focus on single person for now. Open an issue if you would like multi-person support.
-
Clone this Repository:
git clone https://github.com/Square-Zero-Labs/modal-infinitetalk cd modal-infinitetalk -
Create a Modal Account: Sign up for a free account at modal.com.
-
Install Modal Client: Install the Modal client library and set up your authentication token.
pip install modal modal setup
The application consists of a persistent web endpoint for production use and a local CLI for testing. It uses a Volume to cache the large model files, ensuring they are only downloaded once. A second Volume is used to efficiently handle the video outputs.
To deploy the web endpoint, run the following command from your terminal:
pip install pydantic
modal deploy app.pyModal will build a custom container image, download the model weights into a persistent Volume, and deploy the application. After a successful deployment, it will provide a public URL for your API endpoint.
The initial deployment will take several minutes as it needs to download the large model files. Subsequent deployments will be much faster as the models are cached in the volume.
The weights are under the volume name infinitetalk-models.
Output videos are saved under the volume name infinitetalk-outputs.
For development and testing, you can use the built-in command-line interface to run inference on local files or URLs.
modal run app.py --image-path "https://example.com/portrait.jpg" --audio1-path "https://example.com/audio.mp3" --prompt "A dog talking" --output-path outputs/my_video.mp4The deployed service can be called via a POST request with proxy authentication. The API accepts a JSON payload with the following fields:
image(string, required): A URL to the source image or video. The input can be image or video. If video the output video will contain some of the movement from the input video plus the new lip sync to the audio.audio1(string, required): A URL to the source audio file (MP3 or WAV).prompt(string, optional): A text prompt.
The duration of the output video is automatically determined by the length of the input audio. The video frame count is calculated to match this combined duration while adhering to the model's 4n+1 frame constraint.
The API requires proxy authentication tokens.
To create proxy auth tokens, go to your Modal workspace settings and generate a new token. Set the token ID and secret as environment variables:
export TOKEN_ID="your-token-id"
export TOKEN_SECRET="your-token-secret"API Usage - Polling Pattern:
Following Modal's recommended polling pattern, we use two endpoints for long-running video generation:
- Submit Job - Starts generation and returns call_id
- Poll Results - Check status and download video when ready
Step 1: Submit Job
# Submit video generation job and capture call_id
CALL_ID=$(curl -s -X POST \
-H "Content-Type: application/json" \
-H "Modal-Key: $TOKEN_ID" \
-H "Modal-Secret: $TOKEN_SECRET" \
-d '{
"image": "https://example.com/portrait.jpg",
"audio1": "https://example.com/audio.mp3",
"prompt": "A dog is talking"
}' \
"https://<username>--infinitetalk-api-model-submit.modal.run" | jq -r '.call_id')
echo "Job submitted with call_id: $CALL_ID"Step 2: Poll for Results
# Poll for results - downloads video on success, shows status on failure
HTTP_STATUS=$(curl -w "%{http_code}" -s --head \
-H "Modal-Key: $TOKEN_ID" \
-H "Modal-Secret: $TOKEN_SECRET" \
"https://<username>--infinitetalk-api-api-result-head.modal.run?call_id=$CALL_ID")
echo "HTTP $HTTP_STATUS"202 Accepted- Shows processing status in terminal only200 OK- Downloads video tooutputs/generated_video.mp4
Step 3: Retrieve Finished Video
curl -X GET \
-H "Modal-Key: $TOKEN_ID" \
-H "Modal-Secret: $TOKEN_SECRET" \
--output outputs/api-generated_video.mp4 \
"https://<username>--infinitetalk-api-api-result.modal.run?call_id=$CALL_ID"
echo "Video saved to outputs/api-generated_video.mp4"Replace:
<username>with your actual Modal username$TOKEN_IDand$TOKEN_SECRETwith your proxy auth token credentials
The URL format is https://[username]--[app-name]-[class-name]-[method-name].modal.run where the class-name is model or api and the method-name are the methods defined in app.py.
When originally added:
git subtree add --prefix infinitetalk https://github.com/MeiGen-AI/InfiniteTalk main --squashIf the original InfiniteTalk repository is updated and you want to incorporate those changes into this project, you can pull the updates using the following command:
git subtree pull --prefix infinitetalk https://github.com/MeiGen-AI/InfiniteTalk main --squash- generate_infinitetalk.py (attention fix)