ACE-Step 1.5 API decode stall on RTX 3060 Laptop GPU 6GB during 10s text-to-music benchmark
Status: ready-to-paste upstream/config issue package
Prepared: 2026-05-25
Jazzy scope: evidence package only, no new benchmark or generation run
Summary
ACE-Step 1.5 local REST API repeatedly stalls during VAE/audio decode on an NVIDIA GeForce RTX 3060 Laptop GPU with 6144 MiB VRAM while running a minimal 10-second text-to-music benchmark with the main acestep-v15-turbo model.
Three RTX 3060 attempts failed at the same phase:
| Attempt |
Jazzy job ID |
ACE task ID |
Runtime profile |
Result |
| 1 |
1d02bd6af8774400bc29918000bb7783 |
5db18dc2-9d69-4d9a-b386-85ddce901f09 |
default safe runtime |
Decode stall at progress 0.8; no artifact. |
| 2 |
a76ae914578245db9cd4f2f4760ddb13 |
1b527f7f-797f-4467-8260-2f54563666ee |
default safe runtime / low-VRAM settings in Jazzy |
Decode stall at progress 0.8; no artifact. |
| 3 |
407a96fcddf8434fb05c3d1a970591f8 |
3160e7d6-b044-4a4b-826b-77b09d299094 |
LowMemoryTier3 runtime |
failed_decode_stall; no artifact. |
Environment
- OS: Windows.
- GPU: NVIDIA GeForce RTX 3060 Laptop GPU.
- GPU UUID:
GPU-0a315a59-ccad-3b0e-c82a-912951db4d9b.
- VRAM: 6144 MiB.
- NVIDIA driver:
591.86.
- CUDA runtime from
nvidia-smi: 13.1.
- Torch in ACE-Step venv:
2.7.1+cu128.
- Torch CUDA runtime:
12.8.
- Torch CUDA available:
true.
- Torch device count:
1.
- Torch device 0: NVIDIA GeForce RTX 3060 Laptop GPU.
- ACE-Step source path in project:
tools/ACE-Step-1.5.
- ACE-Step source remote/commit recorded by project docs:
https://github.com/ACE-Step/ACE-Step-1.5.git, commit dce621408bee8c31b4fcf4811682eb9359e1bc94.
- ACE-Step API service: local ACE-Step REST API,
ACE-Step API version 1.0.
- Model:
acestep-v15-turbo.
- LM inventory visible:
acestep-5Hz-lm-1.7B, not initialized in Safe Mode.
Reproduction
Observed model inventory:
/v1/model_inventory sees acestep-v15-turbo as the default model.
/v1/model_inventory sees acestep-5Hz-lm-1.7B as the LM model.
/health reports models_initialized=false and llm_initialized=false before generation.
REST workflow used:
POST /release_task
- Poll
POST /query_result
- Download only if an output path is returned from
GET /v1/audio
Payload summary:
{
"prompt": "short jazzy lo-fi piano groove, warm rhodes, upright bass, soft drums",
"lyrics": "",
"task_type": "text2music",
"audio_duration": 10,
"batch_size": 1,
"model": "acestep-v15-turbo",
"audio_format": "wav",
"seed": 424242,
"use_random_seed": false,
"thinking": false,
"use_format": false,
"vocal_language": "en",
"inference_steps": 4
}
No XL model, LoRA, reference audio, cover, remix, repaint, batch generation, Demucs, Whisper, or additional model download was used.
Failure
All attempts reached ACE-Step audio decode and then stalled or timed out:
- ACE status:
0.
- Progress:
0.8.
- Stage:
Decoding audio....
- Warning:
[tiled_decode] Reduced overlap from 64 to 32 for chunk_size=128.
- Output path: empty.
- Output artifact: none.
/v1/audio was not useful because no output path was returned.
- ffprobe/QC could not run because no audio artifact existed.
Attempt 3 result details:
- Jazzy classification:
failed_decode_stall.
- Error category:
ace_step_decode_stall_timeout.
- Runtime:
317.737 seconds.
- Stall window:
302.536 seconds against a 300 second stall timeout.
- VRAM before:
720 MiB used / 5275 MiB free.
- Peak monitored VRAM:
5496 MiB used / 499 MiB free.
- VRAM after cleanup:
720 MiB used / 5275 MiB free.
Runtime Profiles Tested
Default safe runtime
Used for attempts 1 and 2.
- Safe startup before generation.
- LM startup initialization disabled.
- Jazzy Low VRAM Mode enabled.
- Jazzy Offload Mode enabled.
- No unsupported generation payload fields were sent.
LowMemoryTier3 runtime
Used for attempt 3 through the official Jazzy runtime profile.
Runtime settings:
ACESTEP_VAE_ON_CPU=1
ACESTEP_OFFLOAD_TO_CPU=true
ACESTEP_OFFLOAD_DIT_TO_CPU=true
ACESTEP_INIT_LLM=false
ACESTEP_VAE_DECODE_CHUNK_SIZE=auto/source-default
Quantization status:
unconfirmed in the inspected local API startup/init path.
- Not claimed active.
Attempt 3 still failed at the same decode phase.
Source Findings
Local source inspection found:
- Progress
0.8 and Decoding audio... are set immediately before VAE/audio decode.
- The tiled warning comes from decode chunking when the effective overlap must be reduced for
chunk_size=128.
use_tiled_decode appears parsed in the request path, but it was not effectively propagated in the inspected API generation path.
- Decode chunk size is environment/runtime configuration through
ACESTEP_VAE_DECODE_CHUNK_SIZE, not a REST payload field.
- CPU VAE is controlled by
ACESTEP_VAE_ON_CPU=1.
- CPU/model offload is controlled by
ACESTEP_OFFLOAD_TO_CPU=true.
- DiT offload is controlled by
ACESTEP_OFFLOAD_DIT_TO_CPU=true.
- LM-off is controlled by
ACESTEP_INIT_LLM=false, and the generation payload also used thinking=false.
- Quantization appears available in the Gradio/handler path, but support through the local REST API server startup/init path was not confirmed.
- No documented per-generation cancellation/timeout endpoint was found in local OpenAPI; cleanup required stopping/restarting the ACE-Step server through official local scripts.
Questions For ACE-Step Maintainers
- Is 6 GB CUDA VRAM expected to work for a 10-second text-to-music generation with
acestep-v15-turbo through the API server?
- What startup command or environment configuration is recommended for the REST API server on 6 GB GPUs?
- Should CPU VAE plus CPU/DiT offload be enough for this 10-second workload?
- Is there a recommended value for
ACESTEP_VAE_DECODE_CHUNK_SIZE on 6 GB GPUs?
- Is INT8 or another quantization mode supported through the REST API server path? If yes, how should it be enabled?
- Is there a recommended no-LM or LM-off mode for low VRAM API use beyond
ACESTEP_INIT_LLM=false and thinking=false?
- Is the tiled decode warning expected under normal low-VRAM operation, or does it indicate a bug or unsupported decode configuration?
- Is there a known Windows or RTX 3060 Laptop GPU issue with VAE/audio decode?
- Is there a cancellation or timeout endpoint for stuck generation tasks?
- Are there API server flags that differ from the Gradio CLI flags for low-memory operation?
Attachments And Evidence List
Project-relative evidence paths:
- Attempt 2 diagnosis folder:
backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/
- Attempt 3 diagnosis folder:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/
- Attempt 3 compact result:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/benchmark_result_summary_compact.json
- Attempt 3 ACE query result:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/ace_query_result_before_cleanup.json
- Attempt 3 VRAM summary:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/vram_monitor_summary.json
- Attempt 3 runtime registry snapshots:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/runtime_registry_before.json, runtime_registry_before_cleanup.json, runtime_registry_after_cleanup.json
- Attempt 3 backend logs:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/backend_out_before_cleanup.log
- Attempt 3 ACE logs:
backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/acestep_server_before_cleanup.log
- Attempt 3 nvidia-smi snapshots:
nvidia_smi_before.csv, nvidia_smi_before_cleanup.csv, nvidia_smi_after_cleanup.csv
- Attempt 2 result summary:
backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/benchmark_result_summary.json
- Attempt 2 ACE query result:
backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/ace_query_result_before_restart.json
- Attempt 2 VRAM summary:
backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/vram_monitor_summary.json
- Source investigation:
ACE_STEP_LOW_MEMORY_SOURCE_INVESTIGATION.md
- Decode diagnosis:
ACE_STEP_DECODE_STALL_DIAGNOSIS.md
- Benchmark review:
ACE_STEP_BENCHMARK_REVIEW.md
- API capability map:
ACE_STEP_API_CAPABILITY_MAP.md
No secrets are included in this package.
ACE-Step 1.5 API decode stall on RTX 3060 Laptop GPU 6GB during 10s text-to-music benchmark
Status: ready-to-paste upstream/config issue package
Prepared: 2026-05-25
Jazzy scope: evidence package only, no new benchmark or generation run
Summary
ACE-Step 1.5 local REST API repeatedly stalls during VAE/audio decode on an NVIDIA GeForce RTX 3060 Laptop GPU with 6144 MiB VRAM while running a minimal 10-second text-to-music benchmark with the main
acestep-v15-turbomodel.Three RTX 3060 attempts failed at the same phase:
1d02bd6af8774400bc29918000bb77835db18dc2-9d69-4d9a-b386-85ddce901f090.8; no artifact.a76ae914578245db9cd4f2f4760ddb131b527f7f-797f-4467-8260-2f54563666ee0.8; no artifact.407a96fcddf8434fb05c3d1a970591f83160e7d6-b044-4a4b-826b-77b09d299094failed_decode_stall; no artifact.Environment
GPU-0a315a59-ccad-3b0e-c82a-912951db4d9b.591.86.nvidia-smi:13.1.2.7.1+cu128.12.8.true.1.tools/ACE-Step-1.5.https://github.com/ACE-Step/ACE-Step-1.5.git, commitdce621408bee8c31b4fcf4811682eb9359e1bc94.ACE-Step APIversion1.0.acestep-v15-turbo.acestep-5Hz-lm-1.7B, not initialized in Safe Mode.Reproduction
Observed model inventory:
/v1/model_inventoryseesacestep-v15-turboas the default model./v1/model_inventoryseesacestep-5Hz-lm-1.7Bas the LM model./healthreportsmodels_initialized=falseandllm_initialized=falsebefore generation.REST workflow used:
POST /release_taskPOST /query_resultGET /v1/audioPayload summary:
{ "prompt": "short jazzy lo-fi piano groove, warm rhodes, upright bass, soft drums", "lyrics": "", "task_type": "text2music", "audio_duration": 10, "batch_size": 1, "model": "acestep-v15-turbo", "audio_format": "wav", "seed": 424242, "use_random_seed": false, "thinking": false, "use_format": false, "vocal_language": "en", "inference_steps": 4 }No XL model, LoRA, reference audio, cover, remix, repaint, batch generation, Demucs, Whisper, or additional model download was used.
Failure
All attempts reached ACE-Step audio decode and then stalled or timed out:
0.0.8.Decoding audio....[tiled_decode] Reduced overlap from 64 to 32 for chunk_size=128./v1/audiowas not useful because no output path was returned.Attempt 3 result details:
failed_decode_stall.ace_step_decode_stall_timeout.317.737seconds.302.536seconds against a300second stall timeout.720 MiB used / 5275 MiB free.5496 MiB used / 499 MiB free.720 MiB used / 5275 MiB free.Runtime Profiles Tested
Default safe runtime
Used for attempts 1 and 2.
LowMemoryTier3 runtime
Used for attempt 3 through the official Jazzy runtime profile.
Runtime settings:
Quantization status:
unconfirmedin the inspected local API startup/init path.Attempt 3 still failed at the same decode phase.
Source Findings
Local source inspection found:
0.8andDecoding audio...are set immediately before VAE/audio decode.chunk_size=128.use_tiled_decodeappears parsed in the request path, but it was not effectively propagated in the inspected API generation path.ACESTEP_VAE_DECODE_CHUNK_SIZE, not a REST payload field.ACESTEP_VAE_ON_CPU=1.ACESTEP_OFFLOAD_TO_CPU=true.ACESTEP_OFFLOAD_DIT_TO_CPU=true.ACESTEP_INIT_LLM=false, and the generation payload also usedthinking=false.Questions For ACE-Step Maintainers
acestep-v15-turbothrough the API server?ACESTEP_VAE_DECODE_CHUNK_SIZEon 6 GB GPUs?ACESTEP_INIT_LLM=falseandthinking=false?Attachments And Evidence List
Project-relative evidence paths:
backend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/backend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/benchmark_result_summary_compact.jsonbackend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/ace_query_result_before_cleanup.jsonbackend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/vram_monitor_summary.jsonbackend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/runtime_registry_before.json,runtime_registry_before_cleanup.json,runtime_registry_after_cleanup.jsonbackend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/backend_out_before_cleanup.logbackend/logs/diagnosis/tier3_rtx3060_lowmem_retry_20260525_061424/acestep_server_before_cleanup.lognvidia_smi_before.csv,nvidia_smi_before_cleanup.csv,nvidia_smi_after_cleanup.csvbackend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/benchmark_result_summary.jsonbackend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/ace_query_result_before_restart.jsonbackend/logs/diagnosis/tier3_rtx3060_retry_20260524_141345/vram_monitor_summary.jsonACE_STEP_LOW_MEMORY_SOURCE_INVESTIGATION.mdACE_STEP_DECODE_STALL_DIAGNOSIS.mdACE_STEP_BENCHMARK_REVIEW.mdACE_STEP_API_CAPABILITY_MAP.mdNo secrets are included in this package.