Itty Bitty Piano

A small hybrid Mamba + FFN + Sparse Attention piano continuation model trained on the MAESTRO dataset. Give it a seed MIDI clip and it will continue it into a longer piano piece.

Model Description

Itty Bitty Piano is a 25.6M parameter generative model for piano MIDI continuation. It takes a short piano seed clip (ideally 10 seconds) as input and generates a musical continuation in the same style and character.

Architecture

This model uses a novel hybrid architecture combining three components:

Mamba (State Space Model) — efficient sequence processing at O(n) complexity, handles long-range musical context
Feed-Forward Network (FFN) — depth and non-linearity between Mamba layers
Sparse Music Attention — transformer-style attention with learned ALiBi-style relative position bias, inserted every few layers for precise in-context retrieval and motif reference

The attention uses relative position bias rather than absolute positional encoding, which is important for music — the model learns that rhythmic relationships between notes matter regardless of where in the piece they occur.

Property	Value
Parameters	25.6M (with real Mamba on CUDA)
Layers	6 hybrid layer groups
d_model	512
Attention heads	8
Vocabulary	2,000 REMI+BPE tokens
Context window	1,024 tokens
Training data	MAESTRO v3.0.0
Training epochs	~235

Training

Trained on the full MAESTRO v3.0.0 dataset (1,273 classical piano pieces, ~200 hours of music) using:

REMI tokenization with BPE compression to 2,000 token vocabulary
Label smoothing (ε=0.1) to prevent overconfident token distributions
Cosine LR schedule with 200 step warmup
AdamW optimizer
DataParallel across 2x NVIDIA T4 GPUs on Kaggle free tier
Batch size 64, ~235 epochs

Usage

Requirements

pip install torch miditok ncps safetensors pretty_midi

For GPU inference, install the pre-built Mamba wheel matching your CUDA/PyTorch version from:

CPU inference works via a Mamba-compatible reference fallback (slower but functional).

Basic Usage

import torch
from safetensors.torch import load_file
from data.tokenizer import PianoTokenizer
from model.hybrid import PianoHybridModel
from generation.generate import generate_continuation, GenerationConfig
from scale_config import SCALE_PRESETS
from pathlib import Path

# Load tokenizer
tokenizer = PianoTokenizer()
tokenizer.load('tokenizer.json')

# Load model
preset = SCALE_PRESETS['small']
model = PianoHybridModel(preset['model'])
state = load_file('model.safetensors')
state = {k.replace('module.', ''): v for k, v in state.items()}
model.load_state_dict(state, strict=False)
model.eval()

# Generate continuation
gen_config = GenerationConfig(
    max_new_tokens=512,
    temperature=0.9,
    top_p=0.95,
    top_k=50,
)

generate_continuation(
    model, tokenizer,
    seed_midi_path=Path('your_seed.mid'),
    output_path=Path('continuation.mid'),
    config=preset['data'],
    generation_config=gen_config,
)

Generation Tips

Seed length: 10 seconds of piano works best. The model was trained with 128-token seeds.
Temperature: 0.8–1.0 for coherent output. Higher values produce more variety but less structure.
Seed choice: Classical piano seeds from composers in MAESTRO (Beethoven, Chopin, Schubert, Mozart, Bach) will produce the most coherent continuations since these styles dominate the training data.
CPU inference: Works but is slow (~3-5 minutes for 512 tokens). GPU inference is significantly faster.

Limitations

Trained exclusively on classical piano — not suitable for jazz, pop, or other styles
At 25.6M parameters, long-range compositional structure (returning themes, sonata form) is limited
Generated continuations stay broadly in key and maintain rhythmic patterns but may not develop the seed's specific motifs as a human composer would
CPU inference uses a Mamba reference fallback which may produce slightly different output than GPU inference with real Mamba kernels

Training Data

Trained on MAESTRO v3.0.0 (MIDI and Audio Edited for Synchronous TRacks and Organization).

Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C. A., Dieleman, S., Uria, B., Manzagol, P., & Eck, D. (2019). Enabling factorized piano music modeling and generation with the MAESTRO dataset. International Conference on Learning Representations (ICLR 2019).

Dataset license: CC BY 4.0 — https://magenta.tensorflow.org/datasets/maestro

License

MIT License — free to use, modify, and distribute including for commercial purposes.

Citation

If you use this model, please cite:

@misc{itty-bitty-piano-2026,
  title={Itty Bitty Piano: A Hybrid Mamba-Attention Piano Continuation Model},
  author={chickaboomcmurtrie},
  year={2026},
  url={https://huggingface.co/chickaboomcmurtrie/itty-bitty-piano}
}

Trained on Kaggle free tier using dual T4 GPUs. Novel hybrid architecture designed and built from scratch.

Roadmap

A future version of this model is planned using CfC (Closed-form Continuous-time) layers in place of the FFN blocks. CfC is a liquid neural network component with adaptive time constants — theoretically better suited to music's continuous-time dynamics than standard feed-forward networks. The current release uses FFN for stability. Watch this repo for updates.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support