--- tags: - music-generation - bert - masked-language-modeling - remi - midi - symbolic-music - gigamidi library_name: transformers pipeline_tag: fill-mask license: mit datasets: - Metacreation/GigaMIDI --- # musicbert ## Model Description MusicBERT large is a 24-layer BERT-style masked language model trained on REMI+BPE symbolic music sequences extracted from the [GigaMIDI](https://huggingface.co/datasets/Metacreation/GigaMIDI) corpus. It is tailored for symbolic music understanding, fill-mask style infilling, and as a backbone for downstream generative tasks. - **Checkpoint**: 130000 steps - **Hidden size**: 768 - **Parameters**: ~150M - **Validation loss**: 1.509289264678955 ## Training Configuration - **Objective**: Masked language modeling with span-aware masking - **Dataset**: GigaMIDI (REMI tokens → BPE, vocab size 50000) - **Sequence length**: 1024 - **Max events per MIDI**: 2048 ## Inference Example ### Using with MIDI files ```python import torch from transformers import BertForMaskedLM from miditok import MusicTokenizer # Load model and tokenizer model = BertForMaskedLM.from_pretrained("manoskary/musicbert") tokenizer = MusicTokenizer.from_pretrained("manoskary/miditok-REMI") # Convert MIDI to BPE tokens (MIDI → REMI → BPE pipeline) midi_path = "path/to/your/file.mid" tok_seq = tokenizer(midi_path) bpe_ids = tok_seq.ids # Mask some tokens for prediction import random mask_token_id = 3 # MASK_None token input_ids = bpe_ids.copy() mask_positions = random.sample(range(1, len(input_ids)-1), k=5) for pos in mask_positions: input_ids[pos] = mask_token_id # Run inference input_tensor = torch.tensor([input_ids]) with torch.no_grad(): outputs = model(input_tensor) predictions = outputs.logits[0, mask_positions, :].argmax(dim=-1) print("Predicted token IDs:", predictions.tolist()) ``` ## Limitations and Risks - Model is trained purely on symbolic data; it does not produce audio directly. - The GigaMIDI dataset is biased towards Western tonal music. - Long-form structure beyond 1024 tokens requires chunking or iterative decoding. - Generated continuations may need post-processing to ensure musical coherence. ## Citation If you use this checkpoint, please cite the original MusicBERT introduction and the GigaMIDI dataset.