Itty Bitty Piano
A small hybrid Mamba + FFN + Sparse Attention piano continuation model trained on the MAESTRO dataset. Give it a seed MIDI clip and it will continue it into a longer piano piece.
Model Description
Itty Bitty Piano is a 25.6M parameter generative model for piano MIDI continuation. It takes a short piano seed clip (ideally 10 seconds) as input and generates a musical continuation in the same style and character.
Architecture
This model uses a novel hybrid architecture combining three components:
- Mamba (State Space Model) β efficient sequence processing at O(n) complexity, handles long-range musical context
- Feed-Forward Network (FFN) β depth and non-linearity between Mamba layers
- Sparse Music Attention β transformer-style attention with learned ALiBi-style relative position bias, inserted every few layers for precise in-context retrieval and motif reference
The attention uses relative position bias rather than absolute positional encoding, which is important for music β the model learns that rhythmic relationships between notes matter regardless of where in the piece they occur.
| Property | Value |
|---|---|
| Parameters | 25.6M (with real Mamba on CUDA) |
| Layers | 6 hybrid layer groups |
| d_model | 512 |
| Attention heads | 8 |
| Vocabulary | 2,000 REMI+BPE tokens |
| Context window | 1,024 tokens |
| Training data | MAESTRO v3.0.0 |
| Training epochs | ~235 |
Training
Trained on the full MAESTRO v3.0.0 dataset (1,273 classical piano pieces, ~200 hours of music) using:
- REMI tokenization with BPE compression to 2,000 token vocabulary
- Label smoothing (Ξ΅=0.1) to prevent overconfident token distributions
- Cosine LR schedule with 200 step warmup
- AdamW optimizer
- DataParallel across 2x NVIDIA T4 GPUs on Kaggle free tier
- Batch size 64, ~235 epochs
Usage
Requirements
pip install torch miditok ncps safetensors pretty_midi
For GPU inference, install the pre-built Mamba wheel matching your CUDA/PyTorch version from:
CPU inference works via a Mamba-compatible reference fallback (slower but functional).
Basic Usage
import torch
from safetensors.torch import load_file
from data.tokenizer import PianoTokenizer
from model.hybrid import PianoHybridModel
from generation.generate import generate_continuation, GenerationConfig
from scale_config import SCALE_PRESETS
from pathlib import Path
# Load tokenizer
tokenizer = PianoTokenizer()
tokenizer.load('tokenizer.json')
# Load model
preset = SCALE_PRESETS['small']
model = PianoHybridModel(preset['model'])
state = load_file('model.safetensors')
state = {k.replace('module.', ''): v for k, v in state.items()}
model.load_state_dict(state, strict=False)
model.eval()
# Generate continuation
gen_config = GenerationConfig(
max_new_tokens=512,
temperature=0.9,
top_p=0.95,
top_k=50,
)
generate_continuation(
model, tokenizer,
seed_midi_path=Path('your_seed.mid'),
output_path=Path('continuation.mid'),
config=preset['data'],
generation_config=gen_config,
)
Generation Tips
- Seed length: 10 seconds of piano works best. The model was trained with 128-token seeds.
- Temperature: 0.8β1.0 for coherent output. Higher values produce more variety but less structure.
- Seed choice: Classical piano seeds from composers in MAESTRO (Beethoven, Chopin, Schubert, Mozart, Bach) will produce the most coherent continuations since these styles dominate the training data.
- CPU inference: Works but is slow (~3-5 minutes for 512 tokens). GPU inference is significantly faster.
Limitations
- Trained exclusively on classical piano β not suitable for jazz, pop, or other styles
- At 25.6M parameters, long-range compositional structure (returning themes, sonata form) is limited
- Generated continuations stay broadly in key and maintain rhythmic patterns but may not develop the seed's specific motifs as a human composer would
- CPU inference uses a Mamba reference fallback which may produce slightly different output than GPU inference with real Mamba kernels
Training Data
Trained on MAESTRO v3.0.0 (MIDI and Audio Edited for Synchronous TRacks and Organization).
Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C. A., Dieleman, S., Uria, B., Manzagol, P., & Eck, D. (2019). Enabling factorized piano music modeling and generation with the MAESTRO dataset. International Conference on Learning Representations (ICLR 2019).
Dataset license: CC BY 4.0 β https://magenta.tensorflow.org/datasets/maestro
License
MIT License β free to use, modify, and distribute including for commercial purposes.
Citation
If you use this model, please cite:
@misc{itty-bitty-piano-2026,
title={Itty Bitty Piano: A Hybrid Mamba-Attention Piano Continuation Model},
author={chickaboomcmurtrie},
year={2026},
url={https://huggingface.co/chickaboomcmurtrie/itty-bitty-piano}
}
Trained on Kaggle free tier using dual T4 GPUs. Novel hybrid architecture designed and built from scratch.
Roadmap
A future version of this model is planned using CfC (Closed-form Continuous-time) layers in place of the FFN blocks. CfC is a liquid neural network component with adaptive time constants β theoretically better suited to music's continuous-time dynamics than standard feed-forward networks. The current release uses FFN for stability. Watch this repo for updates.
