Abstract: Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results