Abstract: This paper introduces V2Coder, a non-autoregressive vocoder based on hierarchical variational autoencoders (VAEs). The hierarchical VAE with hierarchically extended prior and approximate ...
GLM-TTS is a high-quality text-to-speech (TTS) synthesis system based on large language models, supporting zero-shot voice cloning and streaming inference. This system adopts a two-stage architecture: ...
Abstract: The quality of raw audio waveform generated by a vocoder could affect various audio generative tasks. In recent years, the dominance of source-filter vocoders was greatly challenged by ...
Despite significant advances in neural vocoders using diffusion models and their variants, these methods, unfortunately, inherently suffer from a performance-inference dilemma, which stems from the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results