Skip to content

Implementation of content embedding injection: Concatenation vs. Gated Addition #7

@South-Twilight

Description

@South-Twilight

Context:
I am reviewing the implementation of the forward pass in the Flow-based Custom Transformer for the Singing Voice Synthesis (SVS) task. While comparing the current codebase with the provided paper description, I noticed a potential discrepancy regarding how the content embedding $z_c$ is integrated into the model.

Discrepancy Details:

# Combine midi and phoneme embeddings
content = midi + ph
content = self.final_proj(content.transpose(1, 2)).transpose(1, 2)

# ... (x_combined is defined as concat of prompt and x)

# Current injection method: Gated Addition (Line 85-87)
gate = torch.sigmoid(self.gate_content(content))
x_combined += content * gate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions