Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.2.0
Speech Self-Supervised Learning
This directory contains example scripts to self-supervised speech models.
There are two main types of supported self-supervised learning methods:
- Wav2vec-BERT:
speech_pre_training.py - NEST:
masked_token_pred_pretrain.py- For downstream tasks that use NEST as multi-layer feature extractor, please refer to
./downstream/speech_classification_mfa_train.py - For extracting multi-layer features from NEST, please refer to
<NEMO ROOT>/scripts/ssl/extract_features.py - For using NEST as weight initialization for downstream tasks, please refer to the usage of maybe_init_from_pretrained_checkpoint.
- For downstream tasks that use NEST as multi-layer feature extractor, please refer to
For their corresponding usage, please refer to the example yaml config:
- Wav2vec-BERT:
examples/asr/conf/ssl/fastconformer/fast-conformer.yaml - NEST:
examples/asr/conf/ssl/nest/nest_fast-conformer.yaml
The dataset format follows that of ASR models, but no groundtruth transcriptions are needed. For example, the jsonl file specified in manifest_filepath should look like:
{"audio_filepath": "path/to/audio1.wav", "duration": 10.0, "text": ""}
{"audio_filepath": "path/to/audio2.wav", "duration": 5.0, "text": ""}
Please refer to the ASR dataset documentation for more details.
For most efficient data loading, please refer to