CUDA doesn't actually work due to a bug in upstream PyPI package, resulting in…
AttributeError: 'FileLikeQueueWriter' object has no attribute 'tell'
[wav @ 000002052f916200] invalid start code [0][0][0][0] in RIFF header
[in#0 @ 000002052f915f80] Error opening input: Invalid data found when processing input
Error opening input file pipe:0.
Error opening input files: Invalid data found when processing input
The cache is created on self.in_proj.weight.device (likely CPU at init), but forward uses q.device (CUDA at runtime). The mask is created on q.device, but k and v come from complete_kv which uses the cache's device. A complete context mismatch.
pocket_tts/modules/transformer.py
def forward(self, query: torch.Tensor, model_state: dict | None):
state = self.check_model_state(model_state)
projected = self.in_proj(query)
# Reshape from (b, t, p*h*d) to (b, t, p, h, d) where p=3, h=num_heads
b, t, _ = projected.shape
d = self.embed_dim // self.num_heads
packed = projected.view(b, t, 3, self.num_heads, d)
q, k, v = torch.unbind(packed, dim=2)
q, k = self._apply_rope(q, k, state)
k, v = self._complete_kv(k, v, state)
+ k = k.to(q.device)
+ v = v.to(q.device)
This fixes the bug, allowing to switch to CUDA. But since you pull upstream dynamically as a module — you'd have to hack it in somehow. Have fun.
The upstream GitHub repo is ahead of the PyPI package — they refactored it and I'm not checking if it happens to break things even worse.
CUDA doesn't actually work due to a bug in upstream PyPI package, resulting in…
The cache is created on
self.in_proj.weight.device(likely CPU at init), butforwardusesq.device(CUDA at runtime). The mask is created onq.device, butkandvcome fromcomplete_kvwhich uses the cache's device. A complete context mismatch.pocket_tts/modules/transformer.pydef forward(self, query: torch.Tensor, model_state: dict | None): state = self.check_model_state(model_state) projected = self.in_proj(query) # Reshape from (b, t, p*h*d) to (b, t, p, h, d) where p=3, h=num_heads b, t, _ = projected.shape d = self.embed_dim // self.num_heads packed = projected.view(b, t, 3, self.num_heads, d) q, k, v = torch.unbind(packed, dim=2) q, k = self._apply_rope(q, k, state) k, v = self._complete_kv(k, v, state) + k = k.to(q.device) + v = v.to(q.device)This fixes the bug, allowing to switch to CUDA. But since you pull upstream dynamically as a module — you'd have to hack it in somehow. Have fun.
The upstream GitHub repo is ahead of the PyPI package — they refactored it and I'm not checking if it happens to break things even worse.