Instructions to use yuyi1005/cmrextr-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use yuyi1005/cmrextr-1b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="yuyi1005/cmrextr-1b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("yuyi1005/cmrextr-1b")
model = AutoModelForCausalLM.from_pretrained("yuyi1005/cmrextr-1b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use yuyi1005/cmrextr-1b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "yuyi1005/cmrextr-1b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yuyi1005/cmrextr-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/yuyi1005/cmrextr-1b

SGLang

How to use yuyi1005/cmrextr-1b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "yuyi1005/cmrextr-1b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yuyi1005/cmrextr-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "yuyi1005/cmrextr-1b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yuyi1005/cmrextr-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use yuyi1005/cmrextr-1b with Docker Model Runner:
```
docker model run hf.co/yuyi1005/cmrextr-1b
```

CMR-EXTR: Structured Extraction from Cardiac MRI Reports

CMR-EXTR is a lightweight framework for converting free-text cardiac magnetic resonance (CMR) reports into structured, auditable data with per-field confidence estimation. It was introduced in the paper Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs.

Overview

The model is designed to support cohort assembly, longitudinal data curation, and clinical decision support in real-world clinical workflows. It performs structured information extraction from reports and assigns confidence scores to each extracted field, enabling efficient human review and quality control.

Key Features

Structured Extraction: Converts free-text CMR reports into predefined structured fields
Per-field Confidence: Provides uncertainty estimates for each extracted variable
Offline Inference: Fully deployable without external API dependencies
Efficient Design: Lightweight student model distilled from a larger teacher model

Code

The official implementation is available on GitHub:
CMR-EXTR

Method Summary

CMR-EXTR is built on a teacher–student distillation framework:

A large teacher model generates high-quality structured outputs
A compact student model (based on Llama-3.2-1B) is trained to replicate these outputs efficiently
The student model supports fast and fully offline inference

Uncertainty estimation integrates three complementary principles:

Distribution Plausibility — evaluates whether predictions follow expected value ranges
Sampling Stability — measures consistency under stochastic decoding
Cross-field Consistency — enforces logical relationships across extracted variables

Citation

If you use this work, please cite:

@inproceedings{yu2026uncertainty,
  title={Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs},
  author={Yu, Yi and Martin, Parker and Bu, Zhenyu and Liu, Yixuan and Zheng, Yi-Yu and Simonetti, Orlando and Han, Yuchi and Xue, Yuan},
  booktitle={IEEE 23rd International Symposium on Biomedical Imaging (ISBI)},
  year={2026},
}

Downloads last month: 128

Safetensors

Model size

1B params

Tensor type

F32

F16

Model tree for yuyi1005/cmrextr-1b

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

(373)

this model

Paper for yuyi1005/cmrextr-1b

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

Paper • 2605.08045 • Published 25 days ago