YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸš€ About The Project

About Animation

This is a powerful, fully OFFLINE web-based application that combines five major AI technologies into one beautiful, easy-to-use interface. Whether you are a content creator, developer, or AI enthusiast, this tool helps you generate voiceovers, clone voices, transcribe videos, clean audio, and convert formatsβ€”all without needing an internet connection.


✨ Key Features

Features

πŸ—£οΈ 1. Advanced Text-to-Speech (TTS) & Voice Cloning

TTS Features
  • Multilingual Support: Generate speech in 24+ languages (English, Urdu, Hindi, Arabic, Chinese, French, Spanish, German, Japanese, Korean, and more).
  • Voice Cloning: Clone any voice using a short reference audio file (.wav/.mp3) with advanced AI models.
  • Long-Form Generation: Automatic text splitting and chunking for audiobooks and long content.
  • Customization: Full control over Speed, Temperature, Pitch, and Exaggeration parameters.
  • Background Mode: Optimized for mobile devicesβ€”keeps generating audio even when screen is off.

πŸ”€ 2. Audio/Video Transcriber (Whisper AI)

Transcriber Features
  • Video to SRT: Convert video files directly to subtitles (.srt) and plain text transcripts.
  • High Accuracy: Powered by OpenAI Whisper models (Tiny β†’ Large-v3) for maximum precision.
  • Translation: Automatically translate foreign audio into English subtitles.
  • Hardware Support: Works with NVIDIA GPU (CUDA) for blazing speed or CPU for compatibility.
  • Multiple Formats: Supports MP4, WebM, MKV, AVI, MOV, MP3, WAV, and more.

πŸ”„ 3. Voice-to-Voice Converter

Voice Converter Features
  • Timbre Transfer: Change the input voice to match a target speaker while keeping emotion and intonation intact.
  • Unlimited Languages: Works with any language through advanced voice conversion.
  • Real-time Logic: Optimized setup for AI voice changing workflows.
  • Professional Quality: Studio-grade voice conversion results.

πŸŽ›οΈ 4. Audio Cleaner Pro (Offline)

Audio Cleaner Features
  • Noise Reduction: Remove background hiss, rumble, static, and unwanted sounds.
  • Silence Removal: Automatically trim silent parts from recordings.
  • Enhancement: Adjust pitch (Deep/Alien/Kid voices) and playback speed.
  • Professional Results: Studio-quality audio cleaning without expensive software.

πŸ› οΈ 5. Audio Master Studio

Audio Studio Features
  • Video to Audio: Extract high-quality audio (MP3/WAV) from any video format.
  • Format Converter: Convert between MP3, WAV, AAC, OGG, FLAC, and more.
  • Audio Recorder: Built-in microphone recorder with live waveform visualization.
  • Professional Tools: Complete audio editing suite for content creators.

🌐 Supported Languages

Languages Animation

πŸ—£οΈ Text-to-Speech & Voice Cloning (24+ Languages)

🌍 Language Code Status 🌍 Language Code Status
Arabic ar βœ… English en βœ…
Chinese zh βœ… Finnish fi βœ…
Danish da βœ… French fr βœ…
Dutch nl βœ… German de βœ…
Greek el βœ… Hebrew he βœ…
Hindi hi βœ… Italian it βœ…
Japanese ja βœ… Korean ko βœ…
Malay ms βœ… Norwegian no βœ…
Polish pl βœ… Portuguese pt βœ…
Russian ru βœ… Spanish es βœ…
Swedish sv βœ… Swahili sw βœ…
Turkish tr βœ… Urdu ur βœ…

πŸ“ Transcription & Translation (100+ Languages)

Transcription Languages

Includes: English, Urdu, Hindi, Spanish, French, German, Japanese, Chinese, Russian, Arabic, Portuguese, Italian, Korean, Turkish, Polish, Dutch, Swedish, Indonesian, Filipino, Vietnamese, Thai, and 80+ more languages!


πŸ–₯️ System Requirements

System Requirements

⚑ Minimum Requirements

  • OS: Windows 10/11 (64-bit), Linux, or macOS
  • RAM: 4GB minimum (8GB recommended)
  • Storage: 10GB free space for AI models
  • Python: 3.10 or higher
  • Internet: Only required for initial model downloads

πŸš€ Recommended (For Best Performance)

  • GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
  • CPU: Multi-core processor (Intel i5/AMD Ryzen 5 or better)
  • RAM: 16GB or more
  • Storage: SSD with 20GB+ free space
  • Display: Works on Desktop, Laptop, Tablet, and Mobile

πŸ“₯ Installation Guide

Installation Animation

Prerequisites

Before you start, make sure you have these installed:

  1. Git: Download Here
  2. Python (3.10+): Download Here
    • ⚠️ Important: When installing Python, check the box "Add Python to PATH"
  3. FFmpeg: Download Here
    • Required for audio/video processing

πŸš€ Quick Installation (Windows)

Method 1: One-Click Installer (Recommended)

Installation Steps
# Step 1: Clone the Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC

# Step 2: Run One-Click Installer
setup.bat

# Step 3: Launch the Application
"TTS - Video to SRT -VC (RUN).bat"

That's it! The server will start automatically! πŸŽ‰


Method 2: Manual Installation (Advanced)

# Step 1: Clone Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC

# Step 2: Create Virtual Environment
python -m venv venv

# Step 3: Activate Virtual Environment
# For Windows:
venv\Scripts\activate
# For Linux/Mac:
source venv/bin/activate

# Step 4: Install Dependencies
pip install -r requirements.txt

# Step 5: Install PyTorch (GPU Version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Step 6: Run the Server
python server_vc.py

🐧 Installation for Linux/Mac

# Step 1: Clone Repository
git clone https://github.com/nzgnzg73/TTS-Video-to-SRT-VC.git
cd TTS-Video-to-SRT-VC

# Step 2: Create Virtual Environment
python3 -m venv venv
source venv/bin/activate

# Step 3: Install Dependencies
pip install -r requirements.txt

# Step 4: Install FFmpeg
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS (with Homebrew):
brew install ffmpeg

# Step 5: Run Server
python server_vc.py

πŸ“– Usage

Usage Animation

Step 1: Start the Server

Run the application using one of these methods:

# Method 1: Using Batch File (Windows)
"TTS - Video to SRT -VC (RUN).bat"

# Method 2: Using Python (All Platforms)
python server_vc.py

# Method 3: Custom Port
python server_vc.py --port 8004 --host 0.0.0.0

Step 2: Open Web Interface

Once the server starts, open your web browser and navigate to:

🌐 http://127.0.0.1:8004 or http://localhost:8004

Step 3: Choose Your Module

Select from 5 powerful modules in the interface:

Module Description
πŸ—£οΈ TTS & Voice Cloning Generate speech and clone voices
πŸ“ Transcriber Convert audio/video to SRT subtitles
πŸ”„ Voice Converter Transform voice timbre
πŸŽ›οΈ Audio Cleaner Remove noise and enhance audio
πŸ› οΈ Audio Master Record, convert, and edit audio

🌐 API Documentation

API Animation

1. TTS Generation

POST /tts
Content-Type: application/json

{
  "text": "Your text here",
  "voice_mode": "predefined",
  "predefined_voice_id": "voice1",
  "temperature": 0.8,
  "speed_factor": 1.0,
  "language": "en",
  "output_format": "mp3"
}

2. Voice Cloning

POST /process_vc
Content-Type: multipart/form-data

source_audio: [file]
target_audio: [file]
device: "cuda"
cfg_rate: 0.5
sigma_min: 0.000001

3. Transcription

POST /transcribe
Content-Type: multipart/form-data

file: [audio/video file]
language: "en"
model: "small"
task: "transcribe"

πŸ“ Project Structure

TTS-Video-to-SRT-VC/
β”œβ”€β”€ πŸ“ easy-installation/         # Installation scripts
β”œβ”€β”€ πŸ“ Models/                    # AI models storage
β”‚   └── πŸ“ nzgnzg73/
β”‚       └── πŸ“ vc/                # Voice cloning models
β”œβ”€β”€ πŸ“ reference_audio/           # Reference audio files
β”œβ”€β”€ πŸ“ static/                    # Static assets
β”œβ”€β”€ πŸ“ ui/                        # Web interface (TTS)
β”‚   β”œβ”€β”€ index.html               # Main TTS interface
β”‚   β”œβ”€β”€ script.js                # Frontend logic
β”‚   β”œβ”€β”€ styles.css               # Modern styling
β”‚   └── πŸ“ vendor/               # Third-party libraries
β”œβ”€β”€ πŸ“ ui_transcriber/            # Transcriber interface
β”œβ”€β”€ πŸ“ ui_vc/                     # Voice converter interface
β”œβ”€β”€ πŸ“ Engine/                    # Updated engine components
β”œβ”€β”€ 🐍 server.py                  # Main FastAPI server
β”œβ”€β”€ 🐍 engine.py                  # TTS engine core
β”œβ”€β”€ 🐍 server_vc.py               # TTS-Video-to-SRT  server
β”œβ”€β”€ βš™οΈ config.yaml                # Configuration file
β”œβ”€β”€ πŸ“‹ requirements.txt           # Python dependencies
β”œβ”€β”€ ⚑ setup.bat                  # Windows setup script
β”œβ”€β”€πŸ“ Nomi \                        #Here are the details of PC and laptop 
│└── Nomi.py              
│└── NOMI RUN.bat         ← Run It
│└── templates\
β”‚    └── index.html          
β”œβ”€β”€ πŸš€ TTS Server (RUN).bat     # If You Run This, Text To Voice Or Voice Clone  Will Run.
β”œβ”€β”€ πŸš€ TTS - Video to SRT -VC (RUN).bat     # The future within it which we can take advantage of. πŸ‘‡πŸΌ
TTS - Video to SRT -VC (RUN).bat In this file, I am giving you the details, inside this you will find Text to votes Or Audio  Video to SRT, condition Voice to Voice  all available. 
└── πŸ“– README.md                  # This file

βš™οΈ Configuration & Tips

Configuration Animation

πŸ“± Mobile Usage

  • This tool is fully optimized for mobile devices!
  • Android Tip: Go to Settings > Apps > Chrome > Battery and set to Unrestricted to allow background audio generation when screen is off.
  • iOS Tip: Use Safari and enable "Request Desktop Website" for best experience.

πŸ”§ First Run Setup

  • On first run, the app will download AI models (Whisper/Coqui/Voice Cloning models).
  • This requires an internet connection and may take 5-15 minutes depending on your connection.
  • Models are stored locally and only downloaded once.

🎨 Interface Features

  • Zoom Lock: Use the πŸ”“ button in the bottom left to lock screen zoom for better app-like experience on mobile.
  • Dark Mode: Modern dark theme for comfortable usage.
  • Real-time Preview: See waveforms and progress in real-time.
  • Background Mode: Continue working while audio generates.

πŸš€ Performance Tips

  • For faster processing, use GPU mode (NVIDIA CUDA).
  • For CPU-only systems, use smaller models (Tiny/Small) for faster results.
  • Close other heavy applications for better performance.
  • Use SSD storage for faster model loading.

🎯 Screenshots & Demo

Screenshots Animation

πŸ–ΌοΈ Main Interface

Main Interface

πŸŽ›οΈ Audio Processing

Audio Processing

πŸ“ Transcription Module

Transcription

πŸ™ Acknowledgments

Acknowledgments Animation


πŸ“ž Support & Contact

Contact Animation

πŸ‘€ Created by NZG73

Platform Link
πŸ“Ί YouTube @NZG73
🌐 Website nzg73.blogspot.com
πŸ“§ Email [email protected]
πŸ› Report Bugs GitHub Issues
⭐ GitHub Star this Repo

⭐ Star History


πŸ“œ License

License Animation

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ’– Made with Love by NZG73

Closing Animation

🌟 If you like this project, please give it a Star! 🌟

GitHub stars GitHub forks GitHub watchers


Thank You Animation

Β© 2024 NZG73. All Rights Reserved.

⬆ Back to Top

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support