| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - pnnbao-ump/VieNeu-TTS-1000h |
| | - pnnbao-ump/VieNeuCodec-dataset |
| | - pnnbao-ump/VieNeu-TTS-140h |
| | language: |
| | - vi |
| | base_model: |
| | - neuphonic/neutts-air |
| | pipeline_tag: text-to-speech |
| | --- |
| | |
| | ## Overview |
| |
|
| | **VieNeu-TTS** is an advanced on-device Vietnamese Text-to-Speech (TTS) model with **instant voice cloning**. |
| |
|
| | Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements: |
| |
|
| | - **Enhanced pronunciation**: More accurate and stable Vietnamese pronunciation |
| | - **Code-switching support**: Seamless transitions between Vietnamese and English |
| | - **Better voice cloning**: Higher fidelity and speaker consistency |
| | - **Real-time synthesis**: 24 kHz waveform generation on CPU or GPU |
| |
|
| | VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline. |
| |
|
| | **Author:** Phạm Nguyễn Ngọc Bảo |
| |
|
| | ## Support This Project |
| |
|
| | Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development: |
| |
|
| | [](https://buymeacoffee.com/pnnbao) |
| |
|
| | Your support helps maintain and improve VieNeu-TTS! 🙏 |
| |
|
| | --- |
| |
|
| | ## Reference Voices |
| |
|
| | | File | Gender | Accent | Description | |
| | |-------------------------|--------|--------|--------------------| |
| | | Bình (nam miền Bắc) | Male | North | Male voice, North accent | |
| | | Tuyên (nam miền Bắc) | Male | North | Male voice, North accent | |
| | | Nguyên (nam miền Nam) | Male | South | Male voice, South accent | |
| | | Sơn (nam miền Nam) | Male | South | Male voice, South accent | |
| | | Vĩnh (nam miền Nam) | Male | South | Male voice, South accent | |
| | | Hương (nữ miền Bắc) | Female | North | Female voice, North accent | |
| | | Ly (nữ miền Bắc) | Female | North | Female voice, North accent | |
| | | Ngọc (nữ miền Bắc) | Female | North | Female voice, North accent | |
| | | Đoan (nữ miền Nam) | Female | South | Female voice, South accent | |
| | | Dung (nữ miền Nam) | Female | South | Female voice, South accent | |
| |
|
| | --- |
| |
|
| | ## Model Architecture |
| |
|
| | | Component | Description | |
| | |----------|-------------| |
| | | Backbone | Qwen 0.5B (chat-format LM) | |
| | | Codec | NeuCodec (supports ONNX + quantization) | |
| | | Output | 24 kHz waveform synthesis | |
| | | Context Window | 2048 tokens shared text + speech | |
| | | Watermark | Enabled | |
| | | Training Data | VieNeuCodec-dataset + Emilia dataset pretraining | |
| |
|
| | ## Features |
| |
|
| | - High-quality Vietnamese speech |
| | - Instant **voice cloning** (3–5 second reference audio) |
| | - Fully **offline** |
| | - Runs real-time or faster |
| | - Multi-voice reference support |
| | - Python API + CLI + Gradio |
| |
|
| | ## Troubleshooting |
| |
|
| | | Issue | Cause | Solution | |
| | |------|-------|----------| |
| | | Missing `libespeak` | System dependency | Install eSpeak NG | |
| | | GPU OOM | VRAM too small | Use CPU or quantized model | |
| | | Poor voice match | Bad reference sample | Try a clearer reference clip | |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{vieneutts2025, |
| | title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning}, |
| | author = {Pham Nguyen Ngoc Bao}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}} |
| | } |
| | ``` |
| |
|
| | Please also cite the base model: |
| |
|
| | ```bibtex |
| | @misc{neuttsair2025, |
| | title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning}, |
| | author = {Neuphonic}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}} |
| | } |
| | ``` |