Spaces:

subhankarg
/

MagpieTTS_Internal_Demo

Runtime error

App Files Files Community

MagpieTTS_Internal_Demo / examples /asr /speech_pretraining /README.md

subhankarg

Upload folder using huggingface_hub

0558aa4 verified 12 days ago

preview code

raw

history blame contribute delete

2.05 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Speech Self-Supervised Learning

This directory contains example scripts to self-supervised speech models.

There are two main types of supported self-supervised learning methods:

Wav2vec-BERT: speech_pre_training.py
NEST: masked_token_pred_pretrain.py
- For downstream tasks that use NEST as multi-layer feature extractor, please refer to ./downstream/speech_classification_mfa_train.py
- For extracting multi-layer features from NEST, please refer to <NEMO ROOT>/scripts/ssl/extract_features.py
- For using NEST as weight initialization for downstream tasks, please refer to the usage of maybe_init_from_pretrained_checkpoint.

For their corresponding usage, please refer to the example yaml config:

Wav2vec-BERT: examples/asr/conf/ssl/fastconformer/fast-conformer.yaml
NEST: examples/asr/conf/ssl/nest/nest_fast-conformer.yaml

The dataset format follows that of ASR models, but no groundtruth transcriptions are needed. For example, the jsonl file specified in manifest_filepath should look like:

{"audio_filepath": "path/to/audio1.wav", "duration": 10.0, "text": ""}
{"audio_filepath": "path/to/audio2.wav", "duration": 5.0, "text": ""}

Please refer to the ASR dataset documentation for more details.

For most efficient data loading, please refer to