---
license: apache-2.0
datasets:
- loubb/aria-midi
language:
- en
tags:
- music
- MIDI
- piano
---
# Model

`Aria` is a pretrained autoregressive generative model for symbolic music based on the LLaMA 3.2 (1B) architecture. It was trained on ~60k hours of MIDI transcriptions of expressive solo-piano recordings. It has been finetuned to produce realistic continuations of solo-piano compositions as well as to produce general-purpose contrastive MIDI embeddings. 

This HuggingFace page contains weights and usage instructions for the embedding model. For the pretrained base model, see [aria-medium-base](https://huggingface.co/loubb/aria-medium-base), and for the generative model, see [aria-medium-gen](https://huggingface.co/loubb/aria-medium-gen).

📖 Read our [release blog post](https://example.com/) and [paper](https://example.com/)  
🚀 Check out the real-time demo in the official [GitHub repository](https://github.com/EleutherAI/aria)  
📊 Get access to our training dataset [Aria-MIDI](https://huggingface.co/datasets/loubb/aria-midi) to train your own models

## Usage Guidelines

Our embedding model was trained to capture composition and performance-level attributes by learning to embed different random slices of transcriptions of solo-piano performances into similar regions of latent space. As the model was trained to produce global embeddings with data augmentation (e.g., pitch, tempo, etc.), it might not be appropriate for every use case. For more information, see our [paper](https://example.com/).

## Quickstart

All of our models were trained using MIDI tooling and tokenizer accessible in the [aria-utils](https://github.com/EleutherAI/aria-utils) repository. Install the aria-utils package with pip:

```bash
pip install git+https://github.com/EleutherAI/aria-utils.git
```

You can then generate a embedding for a (piano) MIDI file using the transformers library:

```bash
pip install transformers
pip install torch
```

```python
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

PROMPT_MIDI_LOAD_PATH = "mydir/prompt.midi"
MAX_SEQ_LEN = 2048

model = AutoModelForCausalLM.from_pretrained(
    "loubb/aria-medium-embedding",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "loubb/aria-medium-embedding",
    trust_remote_code=True,
)

prompt = tokenizer.encode_from_file(PROMPT_MIDI_LOAD_PATH, return_tensors="pt")

# Only sequences up to 2048 are supported.
# Embedding is extracted from end-of-sequence token
assert prompt.input_ids.shape[1] <= MAX_SEQ_LEN
assert prompt.input_ids[0, -1] == tokenizer._convert_token_to_id(tokenizer.eos_token)

# Alternatively if the sequence is too long:
prompt.input_ids = prompt.input_ids[:, :MAX_SEQ_LEN]
prompt.input_ids[:, -1] = tokenizer._convert_token_to_id(tokenizer.eos_token)

# Generate and extract embedding
outputs = model.forward(input_ids=prompt.input_ids)
embedding = outputs[0].squeeze(0)
```

## License and Attribution

The Aria project has been kindly supported by EleutherAI, Stability AI, as well as by a compute grant from the Ministry of Science and ICT of Korea. Our models and MIDI tooling are released under the Apache-2.0 license. If you use the models or tooling for follow-up work, please cite the paper in which they were introduced:

```bibtex
@inproceedings{bradshawscaling,
  title={Scaling Self-Supervised Representation Learning for Symbolic Piano Performance},
  author={Bradshaw, Louis and Fan, Honglu and Spangher, Alex and Biderman, Stella and Colton, Simon},
  booktitle={arXiv preprint},
  year={2025},
  url={https://arxiv.org/abs/2504.15071}
}
```