YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

📖 Explore the Docs »

🐛 Report Bug · ✨ Request Feature

🚀 About The Project

This is a powerful, fully OFFLINE web-based application that combines five major AI technologies into one beautiful, easy-to-use interface. Whether you are a content creator, developer, or AI enthusiast, this tool helps you generate voiceovers, clone voices, transcribe videos, clean audio, and convert formats—all without needing an internet connection.

✨ Key Features

🗣️ 1. Advanced Text-to-Speech (TTS) & Voice Cloning

Multilingual Support: Generate speech in 24+ languages (English, Urdu, Hindi, Arabic, Chinese, French, Spanish, German, Japanese, Korean, and more).
Voice Cloning: Clone any voice using a short reference audio file (.wav/.mp3) with advanced AI models.
Long-Form Generation: Automatic text splitting and chunking for audiobooks and long content.
Customization: Full control over Speed, Temperature, Pitch, and Exaggeration parameters.
Background Mode: Optimized for mobile devices—keeps generating audio even when screen is off.

🔤 2. Audio/Video Transcriber (Whisper AI)

Video to SRT: Convert video files directly to subtitles (.srt) and plain text transcripts.
High Accuracy: Powered by OpenAI Whisper models (Tiny → Large-v3) for maximum precision.
Translation: Automatically translate foreign audio into English subtitles.
Hardware Support: Works with NVIDIA GPU (CUDA) for blazing speed or CPU for compatibility.
Multiple Formats: Supports MP4, WebM, MKV, AVI, MOV, MP3, WAV, and more.

🔄 3. Voice-to-Voice Converter

Timbre Transfer: Change the input voice to match a target speaker while keeping emotion and intonation intact.
Unlimited Languages: Works with any language through advanced voice conversion.
Real-time Logic: Optimized setup for AI voice changing workflows.
Professional Quality: Studio-grade voice conversion results.

🎛️ 4. Audio Cleaner Pro (Offline)

Noise Reduction: Remove background hiss, rumble, static, and unwanted sounds.
Silence Removal: Automatically trim silent parts from recordings.
Enhancement: Adjust pitch (Deep/Alien/Kid voices) and playback speed.
Professional Results: Studio-quality audio cleaning without expensive software.

🛠️ 5. Audio Master Studio

Video to Audio: Extract high-quality audio (MP3/WAV) from any video format.
Format Converter: Convert between MP3, WAV, AAC, OGG, FLAC, and more.
Audio Recorder: Built-in microphone recorder with live waveform visualization.
Professional Tools: Complete audio editing suite for content creators.

🌐 Supported Languages

🗣️ Text-to-Speech & Voice Cloning (24+ Languages)

🌍 Language	Code	Status	🌍 Language	Code	Status
Arabic	`ar`	✅	English	`en`	✅
Chinese	`zh`	✅	Finnish	`fi`	✅
Danish	`da`	✅	French	`fr`	✅
Dutch	`nl`	✅	German	`de`	✅
Greek	`el`	✅	Hebrew	`he`	✅
Hindi	`hi`	✅	Italian	`it`	✅
Japanese	`ja`	✅	Korean	`ko`	✅
Malay	`ms`	✅	Norwegian	`no`	✅
Polish	`pl`	✅	Portuguese	`pt`	✅
Russian	`ru`	✅	Spanish	`es`	✅
Swedish	`sv`	✅	Swahili	`sw`	✅
Turkish	`tr`	✅	Urdu	`ur`	✅

📝 Transcription & Translation (100+ Languages)

Includes: English, Urdu, Hindi, Spanish, French, German, Japanese, Chinese, Russian, Arabic, Portuguese, Italian, Korean, Turkish, Polish, Dutch, Swedish, Indonesian, Filipino, Vietnamese, Thai, and 80+ more languages!

🖥️ System Requirements

⚡ Minimum Requirements

OS: Windows 10/11 (64-bit), Linux, or macOS
RAM: 4GB minimum (8GB recommended)
Storage: 10GB free space for AI models
Python: 3.10 or higher
Internet: Only required for initial model downloads

🚀 Recommended (For Best Performance)

GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
CPU: Multi-core processor (Intel i5/AMD Ryzen 5 or better)
RAM: 16GB or more
Storage: SSD with 20GB+ free space
Display: Works on Desktop, Laptop, Tablet, and Mobile

📥 Installation Guide

Prerequisites

Before you start, make sure you have these installed:

Git: Download Here
Python (3.10+): Download Here
- ⚠️ Important: When installing Python, check the box "Add Python to PATH"
FFmpeg: Download Here
- Required for audio/video processing

🚀 Quick Installation (Windows)

Method 1: One-Click Installer (Recommended)

# Step 1: Clone the Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC

# Step 2: Run One-Click Installer
setup.bat

# Step 3: Launch the Application
"TTS - Video to SRT -VC (RUN).bat"

That's it! The server will start automatically! 🎉

Method 2: Manual Installation (Advanced)

# Step 1: Clone Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC

# Step 2: Create Virtual Environment
python -m venv venv

# Step 3: Activate Virtual Environment
# For Windows:
venv\Scripts\activate
# For Linux/Mac:
source venv/bin/activate

# Step 4: Install Dependencies
pip install -r requirements.txt

# Step 5: Install PyTorch (GPU Version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Step 6: Run the Server
python server_vc.py

🐧 Installation for Linux/Mac

# Step 1: Clone Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC

# Step 2: Create Virtual Environment
python3 -m venv venv
source venv/bin/activate

# Step 3: Install Dependencies
pip install -r requirements.txt

# Step 4: Install FFmpeg
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS (with Homebrew):
brew install ffmpeg

# Step 5: Run Server
python server_vc.py

📖 Usage

Step 1: Start the Server

Run the application using one of these methods:

# Method 1: Using Batch File (Windows)
"TTS - Video to SRT -VC (RUN).bat"

# Method 2: Using Python (All Platforms)
python server_vc.py

# Method 3: Custom Port
python server_vc.py --port 8004 --host 0.0.0.0

Step 2: Open Web Interface

Once the server starts, open your web browser and navigate to:

🌐 http://127.0.0.1:8004 or http://localhost:8004

Step 3: Choose Your Module

Select from 5 powerful modules in the interface:

Module	Description
🗣️ TTS & Voice Cloning	Generate speech and clone voices
📝 Transcriber	Convert audio/video to SRT subtitles
🔄 Voice Converter	Transform voice timbre
🎛️ Audio Cleaner	Remove noise and enhance audio
🛠️ Audio Master	Record, convert, and edit audio

🌐 API Documentation

1. TTS Generation

POST /tts
Content-Type: application/json

{
  "text": "Your text here",
  "voice_mode": "predefined",
  "predefined_voice_id": "voice1",
  "temperature": 0.8,
  "speed_factor": 1.0,
  "language": "en",
  "output_format": "mp3"
}

2. Voice Cloning

POST /process_vc
Content-Type: multipart/form-data

source_audio: [file]
target_audio: [file]
device: "cuda"
cfg_rate: 0.5
sigma_min: 0.000001

3. Transcription

POST /transcribe
Content-Type: multipart/form-data

file: [audio/video file]
language: "en"
model: "small"
task: "transcribe"

📁 Project Structure

TTS-Video-to-SRT-VC/
├── 📁 easy-installation/         # Installation scripts
├── 📁 Models/                    # AI models storage
│   └── 📁 nzgnzg73/
│       └── 📁 vc/                # Voice cloning models
├── 📁 reference_audio/           # Reference audio files
├── 📁 static/                    # Static assets
├── 📁 ui/                        # Web interface (TTS)
│   ├── index.html               # Main TTS interface
│   ├── script.js                # Frontend logic
│   ├── styles.css               # Modern styling
│   └── 📁 vendor/               # Third-party libraries
├── 📁 ui_transcriber/            # Transcriber interface
├── 📁 ui_vc/                     # Voice converter interface
├── 📁 Engine/                    # Updated engine components
├── 🐍 server.py                  # Main FastAPI server
├── 🐍 engine.py                  # TTS engine core
├── 🐍 server_vc.py               # TTS-Video-to-SRT  server
├── ⚙️ config.yaml                # Configuration file
├── 📋 requirements.txt           # Python dependencies
├── ⚡ setup.bat                  # Windows setup script
├──📁 Nomi \                        #Here are the details of PC and laptop 
│└── Nomi.py              
│└── NOMI RUN.bat         ← Run It
│└── templates\
│    └── index.html          
├── 🚀 TTS Server (RUN).bat     # If You Run This, Text To Voice Or Voice Clone  Will Run.
├── 🚀 TTS - Video to SRT -VC (RUN).bat     # The future within it which we can take advantage of. 👇🏼
TTS - Video to SRT -VC (RUN).bat In this file, I am giving you the details, inside this you will find Text to votes Or Audio  Video to SRT, condition Voice to Voice  all available. 
└── 📖 README.md                  # This file

⚙️ Configuration & Tips

📱 Mobile Usage

This tool is fully optimized for mobile devices!
Android Tip: Go to Settings > Apps > Chrome > Battery and set to Unrestricted to allow background audio generation when screen is off.
iOS Tip: Use Safari and enable "Request Desktop Website" for best experience.

🔧 First Run Setup

On first run, the app will download AI models (Whisper/Coqui/Voice Cloning models).
This requires an internet connection and may take 5-15 minutes depending on your connection.
Models are stored locally and only downloaded once.

🎨 Interface Features

Zoom Lock: Use the 🔓 button in the bottom left to lock screen zoom for better app-like experience on mobile.
Dark Mode: Modern dark theme for comfortable usage.
Real-time Preview: See waveforms and progress in real-time.
Background Mode: Continue working while audio generates.