subhankarg's picture
Upload folder using huggingface_hub
0558aa4 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Speech Self-Supervised Learning

This directory contains example scripts to self-supervised speech models.

There are two main types of supported self-supervised learning methods:

  • Wav2vec-BERT: speech_pre_training.py
  • NEST: masked_token_pred_pretrain.py
    • For downstream tasks that use NEST as multi-layer feature extractor, please refer to ./downstream/speech_classification_mfa_train.py
    • For extracting multi-layer features from NEST, please refer to <NEMO ROOT>/scripts/ssl/extract_features.py
    • For using NEST as weight initialization for downstream tasks, please refer to the usage of maybe_init_from_pretrained_checkpoint.

For their corresponding usage, please refer to the example yaml config:

  • Wav2vec-BERT: examples/asr/conf/ssl/fastconformer/fast-conformer.yaml
  • NEST: examples/asr/conf/ssl/nest/nest_fast-conformer.yaml

The dataset format follows that of ASR models, but no groundtruth transcriptions are needed. For example, the jsonl file specified in manifest_filepath should look like:

{"audio_filepath": "path/to/audio1.wav", "duration": 10.0, "text": ""}
{"audio_filepath": "path/to/audio2.wav", "duration": 5.0, "text": ""}

Please refer to the ASR dataset documentation for more details.

For most efficient data loading, please refer to