---
license: apache-2.0
tags:
- object-detection
- medical-imaging
- heart-anatomy
- computer-vision
metrics:
- mean-average-precision
model-index:
- name: Heartformer
  results:
  - task:
      type: object-detection
    dataset:
      name: Heart Anatomy Types v2
      type: custom
    metrics:
    - type: mean-average-precision
      value: 0.977
      name: mAP@50
---

# Heartformer: Heart Anatomy Type Detection

**Heartformer** is a specialized object detection model for identifying and localizing different types of heart anatomy visualizations in medical images. Built on the RF-DETR architecture, this model can detect and classify seven distinct categories of cardiac imaging and illustration modalities.

## 📋 Model Description

Heartformer addresses the challenge of automatically categorizing diverse representations of cardiac anatomy, from real cadaver specimens to textbook illustrations. This capability is valuable for:

- **Medical Education**: Automatically organizing and categorizing cardiac anatomy learning materials
- **Content Curation**: Indexing large medical image databases by visualization type
- **Research Support**: Filtering cardiac datasets by imaging modality for meta-analyses
- **Educational Technology**: Building intelligent tutoring systems that adapt to different anatomy representation types

### Key Features

- **Multi-modal Detection**: Simultaneously detects 7 different heart anatomy visualization types
- **High Accuracy**: Achieves 97.7% mAP@50 on held-out test set
- **Real-time Inference**: Optimized RF-DETR Nano architecture for fast detection
- **Robust Generalization**: Tested on diverse image sources and artistic styles


### Architecture Overview

```
Input Image (any size)
    ↓
Backbone: DINOv2 ViT (Vision Transformer)
    ├── Patch Embedding (16×16 patches)
    ├── Transformer Encoder (12 layers)
    └── Feature Extraction
    ↓
Neck: Feature Pyramid Network (FPN)
    └── Multi-scale feature fusion
    ↓
Head: Transformer Decoder
    ├── Object Queries (300 learnable embeddings)
    ├── Cross-attention with image features
    └── Self-attention between queries
    ↓
Detection Heads
    ├── Classification Head → Class probabilities (7 + 1 background)
    └── Regression Head → Bounding box coordinates (x, y, w, h)
```

### Key Components

1. **Backbone**: DINOv2-based Vision Transformer
   - Self-supervised pre-trained on large-scale image data
   - Patch size: 16×16 pixels
   - Produces rich semantic features

2. **Transformer Encoder-Decoder**
   - Encoder: Processes image features with self-attention
   - Decoder: Uses cross-attention to localize objects
   - Set-based prediction (no NMS required)

3. **Detection Head**
   - Bipartite matching loss for optimal assignment
   - Joint classification and localization

### Model Specifications

- **Parameters**: 30.5M total
- **Input**: RGB images (resized to 640×640 during training)
- **Output**: Up to 300 detection proposals per image
- **Inference Speed**: <2 seconds per image on Apple M3 (MPS backend)

## 📊 Dataset

### Heart Anatomy Types v2 (Roboflow)

The model was trained on a curated dataset of 621 annotated images from [Roboflow Universe](https://universe.roboflow.com/), specifically designed to capture the diversity of cardiac anatomy representations.

#### Dataset Statistics

| Split | Images | Annotations | Distribution |
|-------|--------|-------------|--------------|
| Train | 497    | 767         | 80.0%        |
| Valid | 62     | 78          | 10.0%        |
| Test  | 62     | 95          | 10.0%        |
| **Total** | **621** | **940** | **100%** |

#### Class Distribution (Test Set)

| Class | Count | Description |
|-------|-------|-------------|
| `heart_cadaver` | 16 | Real anatomical specimens from dissection |
| `heart_cell` | 17 | Microscopic/cellular views of cardiac tissue |
| `heart_ct_scan` | 7 | CT imaging of the heart |
| `heart_drawing` | 10 | Hand-drawn or digital medical illustrations |
| `heart_textbook` | 38 | Educational anatomy images from textbooks |
| `heart_wall` | 6 | Cross-sectional views showing heart wall layers |
| `heart_xray` | 1 | Radiographic chest/heart images |

#### Data Sources

- Medical textbooks (openly licensed)
- Roboflow Universe community contributions
- Educational anatomy databases
- All images verified for appropriate licensing

### Annotation Format

Annotations follow the COCO format:
```json
{
  "categories": [
    {"id": 0, "name": "heart-anatomy-images", "supercategory": "none"},
    {"id": 1, "name": "heart_cadaver", "supercategory": "heart-anatomy-images"},
    ...
  ],
  "images": [...],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [x, y, width, height],
      "area": 12345,
      "iscrowd": 0
    }
  ]
}
```

## 🔬 Training Procedure

### Training Configuration

```python
{
    "model": "RF-DETR Nano",
    "epochs": 30,
    "early_stopping_patience": 8,
    "batch_size": 4,
    "gradient_accumulation_steps": 4,  # Effective batch size: 16
    "learning_rate": 1e-4,
    "optimizer": "AdamW",
    "weight_decay": 1e-4,
    "lr_scheduler": "cosine annealing with warmup",
    "warmup_epochs": 5,
    "image_size": 640,
    "augmentations": [
        "random_horizontal_flip",
        "random_brightness_contrast",
        "color_jitter",
        "gaussian_noise"
    ]
}
```

### Training Details

- **Hardware**: Apple M3 MacBook Pro (MPS backend)
- **Training Time**: ~1 hour 50 minutes
- **Best Epoch**: Epoch 4 (with EMA weights)
- **Early Stopping**: Triggered at epoch 11 (no improvement for 8 epochs)
- **Pretrained Weights**: RF-DETR Nano checkpoint (COCO pretrained)

### Training Dynamics

| Epoch | Train Loss | Val Loss | mAP@50 | mAP@50-95 |
|-------|------------|----------|--------|-----------|
| 0     | 2.145      | 1.832    | 65.3%  | 42.1%     |
| 1     | 1.234      | 1.102    | 76.5%  | 51.8%     |
| 2     | 0.876      | 0.654    | 91.9%  | 68.2%     |
| 3     | 0.543      | 0.432    | 95.5%  | 74.3%     |
| **4** | **0.321**  | **0.298** | **97.7%** | **79.1%** |
| 5-11  | 0.298-0.285| 0.301-0.312 | 97.3-97.6% | 78.5-79.0% |

**Key Observations**:
- Rapid convergence in first 4 epochs
- No significant overfitting (train/val loss gap remained small)
- Validation performance plateaued after epoch 4
- EMA weights provided slight performance boost (+0.2% mAP)

## 📈 Evaluation Results

### Overall Performance (Test Set)

| Metric | Value |
|--------|-------|
| **mAP@50** | **97.7%** |
| mAP@50-95 | 79.1% |
| Precision@0.5 | 96.8% |
| Recall@0.5 | 98.9% |
| F1-Score@0.5 | 97.8% |

### Per-Class Performance (Confidence Threshold = 0.3)

| Class | Ground Truth | Predictions | Precision | Recall | Average Confidence |
|-------|--------------|-------------|-----------|--------|-------------------|
| heart_cadaver | 16 | 17 | 94.1% | 100% | 90.5% |
| heart_cell | 17 | 17 | 100% | 100% | 86.8% |
| heart_ct_scan | 7 | 7 | 100% | 100% | 88.7% |
| heart_drawing | 10 | 10 | 100% | 100% | 86.0% |
| heart_textbook | 38 | 37 | 97.3% | 94.7% | 88.8% |
| heart_wall | 6 | 6 | 100% | 100% | 94.3% |
| heart_xray | 1 | 1 | 100% | 100% | 79.1% |

### Performance at Different Confidence Thresholds

| Threshold | Total Detections | Coverage | Average Precision |
|-----------|------------------|----------|-------------------|
| 0.3 | 95 / 95 GT | 100% | 97.7% |
| 0.5 | 93 / 95 GT | 97.9% | 98.1% |
| 0.7 | 90 / 95 GT | 94.7% | 98.5% |

### Confusion Matrix Analysis

The model shows excellent class separation with minimal confusion:
- **Zero false positives** for most classes at threshold 0.5+
- **One false positive** for `heart_cadaver` (likely a borderline case)
- **High confidence scores** across all classes (79-94% average)

### Error Analysis

- **Missed detections**: 2 instances at threshold 0.7 (both `heart_cell` with complex backgrounds)
- **False positives**: 1 instance (cadaver misclassified, confidence 0.42)
- **Localization**: IoU > 0.75 for 94% of correct detections

## 🚀 Usage

### Installation

```bash
pip install torch torchvision
pip install git+https://github.com/roboflow/rf-detr.git
pip install safetensors  # For loading .safetensors format
```

### Download Model

**Recommended: SafeTensors format (safer, smaller, faster)**
```bash
wget https://huggingface.co/giannisan/heartformer/resolve/main/heartformer-v0.1.safetensors
```

**Alternative: PyTorch format**
```bash
wget https://huggingface.co/giannisan/heartformer/resolve/main/checkpoint_best_ema.pth
```

### Inference

**Using SafeTensors (Recommended)**
```python
from rfdetr import RFDETRNano
from safetensors.torch import load_file

# Load model
model = RFDETRNano(num_classes=8)
state_dict = load_file("heartformer-v0.1.safetensors")
model.load_state_dict(state_dict)

# Run inference
detections = model.predict("heart_image.jpg", threshold=0.3)

# Access results
for bbox, confidence, class_id in zip(
    detections.xyxy,
    detections.confidence,
    detections.class_id
):
    print(f"Class: {CLASS_NAMES[class_id]}")
    print(f"Confidence: {confidence:.2f}")
    print(f"BBox: {bbox}")
```

**Using PyTorch Checkpoint
### Class Names

```python
CLASS_NAMES = [
    "heart-anatomy-images",  # Parent category (index 0)
    "heart_cadaver",
    "heart_cell",
    "heart_ct_scan",
    "heart_drawing",
    "heart_textbook",
    "heart_wall",
    "heart_xray"
]
```

### Example with Visualization

```python
import cv2
import numpy as np

# Load and run inference
image = Image.open("heart_image.jpg")
detections = model.predict("heart_image.jpg", threshold=0.5)

# Draw bounding boxes
img_array = np.array(image)
for bbox, conf, cls_id in zip(detections.xyxy, detections.confidence, detections.class_id):
    x1, y1, x2, y2 = map(int, bbox)
    cv2.rectangle(img_array, (x1, y1), (x2, y2), (0, 255, 0), 2)
    label = f"{CLASS_NAMES[cls_id]}: {conf:.2f}"
    cv2.putText(img_array, label, (x1, y1-10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

cv2.imwrite("output.jpg", img_array)
```

## ⚠️ Limitations and Biases

### Known Limitations

1. **Dataset Size**: Trained on 621 images - performance may degrade on edge cases
2. **Class Imbalance**: `heart_xray` has only 1 test instance (limited generalization testing)
3. **Single Organ Focus**: Specialized for heart anatomy only
4. **Image Quality**: Optimized for clear, well-lit images; may struggle with low-quality scans
5. **Artistic Variation**: Limited exposure to highly stylized or abstract representations

### Potential Biases

- **Textbook Bias**: Heavy representation of textbook-style images (40% of test set)
- **Western Medical Tradition**: Dataset predominantly from Western anatomical illustration conventions
- **Modern Imaging**: Limited historical or non-standard imaging modalities

### Ethical Considerations

- **Not for Clinical Use**: This model is for educational/research purposes only
- **Human Oversight Required**: Should not replace expert medical judgment
- **Privacy**: Ensure appropriate consent/licensing when using with medical images

## 📚 Citation

If you use Heartformer in your research or application, please cite:

```bibtex
@misc{heartformer2024,
  title={Heartformer: Heart Anatomy Type Detection with RF-DETR},
  author={Giannisan},
  year={2024},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/giannisan/heartformer}},
  note={Apache License 2.0}
}
```

### Acknowledgments

- **RF-DETR**: Based on RF-DETR architecture
  ```bibtex
  @misc{rfdetr2024,
    title={RF-DETR: Real-time Detection Transformer},
    author={Roboflow},
    year={2024},
    publisher={GitHub},
    howpublished={\url{https://github.com/roboflow/rf-detr}}
  }
  ```
- **Dataset**: Heart Anatomy Types v2 from Roboflow Universe
- **DINOv2 Backbone**: Meta AI's self-supervised vision transformer

## 📄 License

This model is released under the **Apache License 2.0**, the same license as RF-DETR.

```
Copyright 2024 Giannisan

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

### Third-Party Licenses

- **RF-DETR**: Apache 2.0
- **PyTorch**: BSD-3-Clause
- **DINOv2**: Apache 2.0

## 🔗 Resources

- **Model Weights**: [Download checkpoint](https://huggingface.co/giannisan/heartformer)
- **RF-DETR Repository**: https://github.com/roboflow/rf-detr
- **Dataset**: https://universe.roboflow.com/heart-anatomy-types
- **Demo Application**: [Coming Soon]

## 📞 Contact

For questions, issues, or collaboration opportunities:
- **HuggingFace**: [@giannisan](https://huggingface.co/giannisan)
- **Issues**: Open an issue on the HuggingFace model page

---

**Note**: This model is continuously being improved. Check back for updates and new versions!