π About The Project
This is a powerful, fully OFFLINE web-based application that combines five major AI technologies into one beautiful, easy-to-use interface. Whether you are a content creator, developer, or AI enthusiast, this tool helps you generate voiceovers, clone voices, transcribe videos, clean audio, and convert formatsβall without needing an internet connection.
β¨ Key Features
π£οΈ 1. Advanced Text-to-Speech (TTS) & Voice Cloning
- Multilingual Support: Generate speech in 24+ languages (English, Urdu, Hindi, Arabic, Chinese, French, Spanish, German, Japanese, Korean, and more).
- Voice Cloning: Clone any voice using a short reference audio file (
.wav/.mp3) with advanced AI models. - Long-Form Generation: Automatic text splitting and chunking for audiobooks and long content.
- Customization: Full control over Speed, Temperature, Pitch, and Exaggeration parameters.
- Background Mode: Optimized for mobile devicesβkeeps generating audio even when screen is off.
π€ 2. Audio/Video Transcriber (Whisper AI)
- Video to SRT: Convert video files directly to subtitles (
.srt) and plain text transcripts. - High Accuracy: Powered by OpenAI Whisper models (Tiny β Large-v3) for maximum precision.
- Translation: Automatically translate foreign audio into English subtitles.
- Hardware Support: Works with NVIDIA GPU (CUDA) for blazing speed or CPU for compatibility.
- Multiple Formats: Supports MP4, WebM, MKV, AVI, MOV, MP3, WAV, and more.
π 3. Voice-to-Voice Converter
- Timbre Transfer: Change the input voice to match a target speaker while keeping emotion and intonation intact.
- Unlimited Languages: Works with any language through advanced voice conversion.
- Real-time Logic: Optimized setup for AI voice changing workflows.
- Professional Quality: Studio-grade voice conversion results.
ποΈ 4. Audio Cleaner Pro (Offline)
- Noise Reduction: Remove background hiss, rumble, static, and unwanted sounds.
- Silence Removal: Automatically trim silent parts from recordings.
- Enhancement: Adjust pitch (Deep/Alien/Kid voices) and playback speed.
- Professional Results: Studio-quality audio cleaning without expensive software.
π οΈ 5. Audio Master Studio
- Video to Audio: Extract high-quality audio (MP3/WAV) from any video format.
- Format Converter: Convert between MP3, WAV, AAC, OGG, FLAC, and more.
- Audio Recorder: Built-in microphone recorder with live waveform visualization.
- Professional Tools: Complete audio editing suite for content creators.
π Supported Languages
π£οΈ Text-to-Speech & Voice Cloning (24+ Languages)
| π Language | Code | Status | π Language | Code | Status |
|---|---|---|---|---|---|
| Arabic | ar |
β | English | en |
β |
| Chinese | zh |
β | Finnish | fi |
β |
| Danish | da |
β | French | fr |
β |
| Dutch | nl |
β | German | de |
β |
| Greek | el |
β | Hebrew | he |
β |
| Hindi | hi |
β | Italian | it |
β |
| Japanese | ja |
β | Korean | ko |
β |
| Malay | ms |
β | Norwegian | no |
β |
| Polish | pl |
β | Portuguese | pt |
β |
| Russian | ru |
β | Spanish | es |
β |
| Swedish | sv |
β | Swahili | sw |
β |
| Turkish | tr |
β | Urdu | ur |
β |
π Transcription & Translation (100+ Languages)
Includes: English, Urdu, Hindi, Spanish, French, German, Japanese, Chinese, Russian, Arabic, Portuguese, Italian, Korean, Turkish, Polish, Dutch, Swedish, Indonesian, Filipino, Vietnamese, Thai, and 80+ more languages!
π₯οΈ System Requirements
β‘ Minimum Requirements
- OS: Windows 10/11 (64-bit), Linux, or macOS
- RAM: 4GB minimum (8GB recommended)
- Storage: 10GB free space for AI models
- Python: 3.10 or higher
- Internet: Only required for initial model downloads
π Recommended (For Best Performance)
- GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
- CPU: Multi-core processor (Intel i5/AMD Ryzen 5 or better)
- RAM: 16GB or more
- Storage: SSD with 20GB+ free space
- Display: Works on Desktop, Laptop, Tablet, and Mobile
π₯ Installation Guide
Prerequisites
Before you start, make sure you have these installed:
- Git: Download Here
- Python (3.10+): Download Here
- β οΈ Important: When installing Python, check the box "Add Python to PATH"
- FFmpeg: Download Here
- Required for audio/video processing
π Quick Installation (Windows)
Method 1: One-Click Installer (Recommended)
# Step 1: Clone the Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC
# Step 2: Run One-Click Installer
setup.bat
# Step 3: Launch the Application
"TTS - Video to SRT -VC (RUN).bat"
That's it! The server will start automatically! π
Method 2: Manual Installation (Advanced)
# Step 1: Clone Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC
# Step 2: Create Virtual Environment
python -m venv venv
# Step 3: Activate Virtual Environment
# For Windows:
venv\Scripts\activate
# For Linux/Mac:
source venv/bin/activate
# Step 4: Install Dependencies
pip install -r requirements.txt
# Step 5: Install PyTorch (GPU Version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Step 6: Run the Server
python server_vc.py
π§ Installation for Linux/Mac
# Step 1: Clone Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC
# Step 2: Create Virtual Environment
python3 -m venv venv
source venv/bin/activate
# Step 3: Install Dependencies
pip install -r requirements.txt
# Step 4: Install FFmpeg
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS (with Homebrew):
brew install ffmpeg
# Step 5: Run Server
python server_vc.py
π Usage
Step 1: Start the Server
Run the application using one of these methods:
# Method 1: Using Batch File (Windows)
"TTS - Video to SRT -VC (RUN).bat"
# Method 2: Using Python (All Platforms)
python server_vc.py
# Method 3: Custom Port
python server_vc.py --port 8004 --host 0.0.0.0
Step 2: Open Web Interface
Once the server starts, open your web browser and navigate to:
π http://127.0.0.1:8004 or http://localhost:8004
Step 3: Choose Your Module
Select from 5 powerful modules in the interface:
| Module | Description |
|---|---|
| π£οΈ TTS & Voice Cloning | Generate speech and clone voices |
| π Transcriber | Convert audio/video to SRT subtitles |
| π Voice Converter | Transform voice timbre |
| ποΈ Audio Cleaner | Remove noise and enhance audio |
| π οΈ Audio Master | Record, convert, and edit audio |
π API Documentation
1. TTS Generation
POST /tts
Content-Type: application/json
{
"text": "Your text here",
"voice_mode": "predefined",
"predefined_voice_id": "voice1",
"temperature": 0.8,
"speed_factor": 1.0,
"language": "en",
"output_format": "mp3"
}
2. Voice Cloning
POST /process_vc
Content-Type: multipart/form-data
source_audio: [file]
target_audio: [file]
device: "cuda"
cfg_rate: 0.5
sigma_min: 0.000001
3. Transcription
POST /transcribe
Content-Type: multipart/form-data
file: [audio/video file]
language: "en"
model: "small"
task: "transcribe"
π Project Structure
TTS-Video-to-SRT-VC/
βββ π easy-installation/ # Installation scripts
βββ π Models/ # AI models storage
β βββ π nzgnzg73/
β βββ π vc/ # Voice cloning models
βββ π reference_audio/ # Reference audio files
βββ π static/ # Static assets
βββ π ui/ # Web interface (TTS)
β βββ index.html # Main TTS interface
β βββ script.js # Frontend logic
β βββ styles.css # Modern styling
β βββ π vendor/ # Third-party libraries
βββ π ui_transcriber/ # Transcriber interface
βββ π ui_vc/ # Voice converter interface
βββ π Engine/ # Updated engine components
βββ π server.py # Main FastAPI server
βββ π engine.py # TTS engine core
βββ π server_vc.py # TTS-Video-to-SRT server
βββ βοΈ config.yaml # Configuration file
βββ π requirements.txt # Python dependencies
βββ β‘ setup.bat # Windows setup script
βββπ Nomi \ #Here are the details of PC and laptop
ββββ Nomi.py
ββββ NOMI RUN.bat β Run It
ββββ templates\
β βββ index.html
βββ π TTS Server (RUN).bat # If You Run This, Text To Voice Or Voice Clone Will Run.
βββ π TTS - Video to SRT -VC (RUN).bat # The future within it which we can take advantage of. ππΌ
TTS - Video to SRT -VC (RUN).bat In this file, I am giving you the details, inside this you will find Text to votes Or Audio Video to SRT, condition Voice to Voice all available.
βββ π README.md # This file
βοΈ Configuration & Tips
π± Mobile Usage
- This tool is fully optimized for mobile devices!
- Android Tip: Go to
Settings > Apps > Chrome > Batteryand set to Unrestricted to allow background audio generation when screen is off. - iOS Tip: Use Safari and enable "Request Desktop Website" for best experience.
π§ First Run Setup
- On first run, the app will download AI models (Whisper/Coqui/Voice Cloning models).
- This requires an internet connection and may take 5-15 minutes depending on your connection.
- Models are stored locally and only downloaded once.
π¨ Interface Features
- Zoom Lock: Use the π button in the bottom left to lock screen zoom for better app-like experience on mobile.
- Dark Mode: Modern dark theme for comfortable usage.
- Real-time Preview: See waveforms and progress in real-time.
- Background Mode: Continue working while audio generates.
π Performance Tips
- For faster processing, use GPU mode (NVIDIA CUDA).
- For CPU-only systems, use smaller models (Tiny/Small) for faster results.
- Close other heavy applications for better performance.
- Use SSD storage for faster model loading.
π― Screenshots & Demo
πΌοΈ Main Interface
ποΈ Audio Processing
π Transcription Module
π Acknowledgments
- π― Coqui TTS - Amazing TTS engine
- π€ OpenAI Whisper - Transcription capabilities
- π€ Hugging Face - Model hosting and AI tools
- β‘ FastAPI - Robust backend framework
- π¨ Tailwind CSS - Beautiful interface styling
- π§ PyTorch - Deep learning framework
- π FFmpeg - Audio/video processing
π Support & Contact
π€ Created by NZG73
| Platform | Link |
|---|---|
| πΊ YouTube | @NZG73 |
| π Website | nzg73.blogspot.com |
| π§ Email | [email protected] |
| π Report Bugs | GitHub Issues |
| β GitHub | Star this Repo |
β Star History
π License
This project is licensed under the MIT License - see the LICENSE file for details.
Β© 2024 NZG73. All Rights Reserved.