Skip to content

ACE-Step 1.5 API decode stall on RTX 3060 Laptop GPU 6GB during 10s text-to-music benchmark #415

@OGKMaster

Description

@OGKMaster

ACE-Step 1.5 API decode stall on RTX 3060 Laptop GPU 6GB during 10s text-to-music benchmark

Status: ready-to-paste upstream/config issue package
Prepared: 2026-05-25
Jazzy scope: evidence package only, no new benchmark or generation run

Summary

ACE-Step 1.5 local REST API repeatedly stalls during VAE/audio decode on an NVIDIA GeForce RTX 3060 Laptop GPU with 6144 MiB VRAM while running a minimal 10-second text-to-music benchmark with the main acestep-v15-turbo model.

Three RTX 3060 attempts failed at the same phase:

Attempt Jazzy job ID ACE task ID Runtime profile Result
1 1d02bd6af8774400bc29918000bb7783 5db18dc2-9d69-4d9a-b386-85ddce901f09 default safe runtime Decode stall at progress 0.8; no artifact.
2 a76ae914578245db9cd4f2f4760ddb13 1b527f7f-797f-4467-8260-2f54563666ee default safe runtime / low-VRAM settings in Jazzy Decode stall at progress 0.8; no artifact.
3 407a96fcddf8434fb05c3d1a970591f8 3160e7d6-b044-4a4b-826b-77b09d299094 LowMemoryTier3 runtime failed_decode_stall; no artifact.

Environment

  • OS: Windows.
  • GPU: NVIDIA GeForce RTX 3060 Laptop GPU.
  • GPU UUID: GPU-0a315a59-ccad-3b0e-c82a-912951db4d9b.
  • VRAM: 6144 MiB.
  • NVIDIA driver: 591.86.
  • CUDA runtime from nvidia-smi: 13.1.
  • Torch in ACE-Step venv: 2.7.1+cu128.
  • Torch CUDA runtime: 12.8.
  • Torch CUDA available: true.
  • Torch device count: 1.
  • Torch device 0: NVIDIA GeForce RTX 3060 Laptop GPU.
  • ACE-Step source path in project: tools/ACE-Step-1.5.
  • ACE-Step source remote/commit recorded by project docs: https://github.com/ACE-Step/ACE-Step-1.5.git, commit dce621408bee8c31b4fcf4811682eb9359e1bc94.
  • ACE-Step API service: local ACE-Step REST API, ACE-Step API version 1.0.
  • Model: acestep-v15-turbo.
  • LM inventory visible: acestep-5Hz-lm-1.7B, not initialized in Safe Mode.

Reproduction

Observed model inventory:

  • /v1/model_inventory sees acestep-v15-turbo as the default model.
  • /v1/model_inventory sees acestep-5Hz-lm-1.7B as the LM model.
  • /health reports models_initialized=false and llm_initialized=false before generation.

REST workflow used:

  1. POST /release_task
  2. Poll POST /query_result
  3. Download only if an output path is returned from GET /v1/audio

Payload summary:

{
  "prompt": "short jazzy lo-fi piano groove, warm rhodes, upright bass, soft drums",
  "lyrics": "",
  "task_type": "text2music",
  "audio_duration": 10,
  "batch_size": 1,
  "model": "acestep-v15-turbo",
  "audio_format": "wav",
  "seed": 424242,
  "use_random_seed": false,
  "thinking": false,
  "use_format": false,
  "vocal_language": "en",
  "inference_steps": 4
}

No XL model, LoRA, reference audio, cover, remix, repaint, batch generation, Demucs, Whisper, or additional model download was used.

Failure

All attempts reached ACE-Step audio decode and then stalled or timed out:

  • ACE status: 0.
  • Progress: 0.8.
  • Stage: Decoding audio....
  • Warning: [tiled_decode] Reduced overlap from 64 to 32 for chunk_size=128.
  • Output path: empty.
  • Output artifact: none.
  • /v1/audio was not useful because no output path was returned.
  • ffprobe/QC could not run because no audio artifact existed.

Attempt 3 result details:

  • Jazzy classification: failed_decode_stall.
  • Error category: ace_step_decode_stall_timeout.
  • Runtime: 317.737 seconds.
  • Stall window: 302.536 seconds against a 300 second stall timeout.
  • VRAM before: 720 MiB used / 5275 MiB free.
  • Peak monitored VRAM: 5496 MiB used / 499 MiB free.
  • VRAM after cleanup: 720 MiB used / 5275 MiB free.

Runtime Profiles Tested

Default safe runtime

Used for attempts 1 and 2.

  • Safe startup before generation.
  • LM startup initialization disabled.
  • Jazzy Low VRAM Mode enabled.
  • Jazzy Offload Mode enabled.
  • No unsupported generation payload fields were sent.

LowMemoryTier3 runtime

Used for attempt 3 through the official Jazzy runtime profile.

Runtime settings:

ACESTEP_VAE_ON_CPU=1
ACESTEP_OFFLOAD_TO_CPU=true
ACESTEP_OFFLOAD_DIT_TO_CPU=true
ACESTEP_INIT_LLM=false
ACESTEP_VAE_DECODE_CHUNK_SIZE=auto/source-default

Quantization status:

  • unconfirmed in the inspected local API startup/init path.
  • Not claimed active.

Attempt 3 still failed at the same decode phase.

Source Findings

Local source inspection found:

  • Progress 0.8 and Decoding audio... are set immediately before VAE/audio decode.
  • The tiled warning comes from decode chunking when the effective overlap must be reduced for chunk_size=128.
  • use_tiled_decode appears parsed in the request path, but it was not effectively propagated in the inspected API generation path.
  • Decode chunk size is environment/runtime configuration through ACESTEP_VAE_DECODE_CHUNK_SIZE, not a REST payload field.
  • CPU VAE is controlled by ACESTEP_VAE_ON_CPU=1.
  • CPU/model offload is controlled by ACESTEP_OFFLOAD_TO_CPU=true.
  • DiT offload is controlled by ACESTEP_OFFLOAD_DIT_TO_CPU=true.
  • LM-off is controlled by ACESTEP_INIT_LLM=false, and the generation payload also used thinking=false.
  • Quantization appears available in the Gradio/handler path, but support through the local REST API server startup/init path was not confirmed.
  • No documented per-generation cancellation/timeout endpoint was found in local OpenAPI; cleanup required stopping/restarting the ACE-Step server through official local scripts.

Questions For ACE-Step Maintainers

  1. Is 6 GB CUDA VRAM expected to work for a 10-second text-to-music generation with acestep-v15-turbo through the API server?
  2. What startup command or environment configuration is recommended for the REST API server on 6 GB GPUs?
  3. Should CPU VAE plus CPU/DiT offload be enough for this 10-second workload?
  4. Is there a recommended value for ACESTEP_VAE_DECODE_CHUNK_SIZE on 6 GB GPUs?
  5. Is INT8 or another quantization mode supported through the REST API server path? If yes, how should it be enabled?
  6. Is there a recommended no-LM or LM-off mode for low VRAM API use beyond ACESTEP_INIT_LLM=false and thinking=false?
  7. Is the tiled decode warning expected under normal low-VRAM operation, or does it indicate a bug or unsupported decode configuration?
  8. Is there a known Windows or RTX 3060 Laptop GPU issue with VAE/audio decode?
  9. Is there a cancellation or timeout endpoint for stuck generation tasks?
  10. Are there API server flags that differ from the Gradio CLI flags for low-memory operation?

Attachments And Evidence List

Project-relative evidence paths:

  • Attempt 2 diagnosis folder: backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/
  • Attempt 3 diagnosis folder: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/
  • Attempt 3 compact result: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/benchmark_result_summary_compact.json
  • Attempt 3 ACE query result: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/ace_query_result_before_cleanup.json
  • Attempt 3 VRAM summary: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/vram_monitor_summary.json
  • Attempt 3 runtime registry snapshots: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/runtime_registry_before.json, runtime_registry_before_cleanup.json, runtime_registry_after_cleanup.json
  • Attempt 3 backend logs: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/backend_out_before_cleanup.log
  • Attempt 3 ACE logs: backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/acestep_server_before_cleanup.log
  • Attempt 3 nvidia-smi snapshots: nvidia_smi_before.csv, nvidia_smi_before_cleanup.csv, nvidia_smi_after_cleanup.csv
  • Attempt 2 result summary: backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/benchmark_result_summary.json
  • Attempt 2 ACE query result: backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/ace_query_result_before_restart.json
  • Attempt 2 VRAM summary: backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/vram_monitor_summary.json
  • Source investigation: ACE_STEP_LOW_MEMORY_SOURCE_INVESTIGATION.md
  • Decode diagnosis: ACE_STEP_DECODE_STALL_DIAGNOSIS.md
  • Benchmark review: ACE_STEP_BENCHMARK_REVIEW.md
  • API capability map: ACE_STEP_API_CAPABILITY_MAP.md

No secrets are included in this package.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions