Hi, Thank you for your great work! I am very interested in this work!
From the inference/style_transfer.py, i need to construct the following input for "Zero-Shot Style Transfer" task.
- prompt audio path
- prompt ph, note, note_dur, note_type
- target ph, note, note_dur, note_type
And I think the “Speech-to-Singing Style Transfer” task only requires the following input, is my understanding correct? I think if the input is speech.WAV, there should be no elements such as ph and note? How should I modify the inference/style_transfer.py for “Speech-to-Singing Style Transfer” task?
- prompt audio path
- target ph, note, note_dur, note_type
"Zero-Shot Style Transfer" and “Speech-to-Singing Style Transfer” task in https://tcsinger.github.io/#parallel-style-control.

