Update README.md

3d3eac0 verified 2 months ago

3.75 kB

	---
	license: apache-2.0
	datasets:
	- pnnbao-ump/VieNeu-TTS-1000h
	- pnnbao-ump/VieNeuCodec-dataset
	- pnnbao-ump/VieNeu-TTS-140h
	language:
	- vi
	base_model:
	- neuphonic/neutts-air
	pipeline_tag: text-to-speech
	---

	## Overview

	VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

	Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:

	- Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
	- Code-switching support: Seamless transitions between Vietnamese and English
	- Better voice cloning: Higher fidelity and speaker consistency
	- Real-time synthesis: 24 kHz waveform generation on CPU or GPU

	VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.

	Author: Phạm Nguyễn Ngọc Bảo

	## Support This Project

	Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development:

	[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Support-orange?logo=buy-me-a-coffee)](https://buymeacoffee.com/pnnbao)

	Your support helps maintain and improve VieNeu-TTS! 🙏

	---

	## Reference Voices

	\| File \| Gender \| Accent \| Description \|
	\|-------------------------\|--------\|--------\|--------------------\|
	\| Bình (nam miền Bắc) \| Male \| North \| Male voice, North accent \|
	\| Tuyên (nam miền Bắc) \| Male \| North \| Male voice, North accent \|
	\| Nguyên (nam miền Nam) \| Male \| South \| Male voice, South accent \|
	\| Sơn (nam miền Nam) \| Male \| South \| Male voice, South accent \|
	\| Vĩnh (nam miền Nam) \| Male \| South \| Male voice, South accent \|
	\| Hương (nữ miền Bắc) \| Female \| North \| Female voice, North accent \|
	\| Ly (nữ miền Bắc) \| Female \| North \| Female voice, North accent \|
	\| Ngọc (nữ miền Bắc) \| Female \| North \| Female voice, North accent \|
	\| Đoan (nữ miền Nam) \| Female \| South \| Female voice, South accent \|
	\| Dung (nữ miền Nam) \| Female \| South \| Female voice, South accent \|

	---

	## Model Architecture

	\| Component \| Description \|
	\|----------\|-------------\|
	\| Backbone \| Qwen 0.5B (chat-format LM) \|
	\| Codec \| NeuCodec (supports ONNX + quantization) \|
	\| Output \| 24 kHz waveform synthesis \|
	\| Context Window \| 2048 tokens shared text + speech \|
	\| Watermark \| Enabled \|
	\| Training Data \| VieNeuCodec-dataset + Emilia dataset pretraining \|

	## Features

	- High-quality Vietnamese speech
	- Instant voice cloning (3–5 second reference audio)
	- Fully offline
	- Runs real-time or faster
	- Multi-voice reference support
	- Python API + CLI + Gradio

	## Troubleshooting

	\| Issue \| Cause \| Solution \|
	\|------\|-------\|----------\|
	\| Missing `libespeak` \| System dependency \| Install eSpeak NG \|
	\| GPU OOM \| VRAM too small \| Use CPU or quantized model \|
	\| Poor voice match \| Bad reference sample \| Try a clearer reference clip \|

	## License

	Apache 2.0

	## Citation

	```bibtex
	@misc{vieneutts2025,
	title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
	author = {Pham Nguyen Ngoc Bao},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
	}
	```

	Please also cite the base model:

	```bibtex
	@misc{neuttsair2025,
	title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
	author = {Neuphonic},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
	}
	```