--- license: apache-2.0 tags: - object-detection - medical-imaging - heart-anatomy - computer-vision metrics: - mean-average-precision model-index: - name: Heartformer results: - task: type: object-detection dataset: name: Heart Anatomy Types v2 type: custom metrics: - type: mean-average-precision value: 0.977 name: mAP@50 --- # Heartformer: Heart Anatomy Type Detection **Heartformer** is a specialized object detection model for identifying and localizing different types of heart anatomy visualizations in medical images. Built on the RF-DETR architecture, this model can detect and classify seven distinct categories of cardiac imaging and illustration modalities. ## 📋 Model Description Heartformer addresses the challenge of automatically categorizing diverse representations of cardiac anatomy, from real cadaver specimens to textbook illustrations. This capability is valuable for: - **Medical Education**: Automatically organizing and categorizing cardiac anatomy learning materials - **Content Curation**: Indexing large medical image databases by visualization type - **Research Support**: Filtering cardiac datasets by imaging modality for meta-analyses - **Educational Technology**: Building intelligent tutoring systems that adapt to different anatomy representation types ### Key Features - **Multi-modal Detection**: Simultaneously detects 7 different heart anatomy visualization types - **High Accuracy**: Achieves 97.7% mAP@50 on held-out test set - **Real-time Inference**: Optimized RF-DETR Nano architecture for fast detection - **Robust Generalization**: Tested on diverse image sources and artistic styles ### Architecture Overview ``` Input Image (any size) ↓ Backbone: DINOv2 ViT (Vision Transformer) ├── Patch Embedding (16×16 patches) ├── Transformer Encoder (12 layers) └── Feature Extraction ↓ Neck: Feature Pyramid Network (FPN) └── Multi-scale feature fusion ↓ Head: Transformer Decoder ├── Object Queries (300 learnable embeddings) ├── Cross-attention with image features └── Self-attention between queries ↓ Detection Heads ├── Classification Head → Class probabilities (7 + 1 background) └── Regression Head → Bounding box coordinates (x, y, w, h) ``` ### Key Components 1. **Backbone**: DINOv2-based Vision Transformer - Self-supervised pre-trained on large-scale image data - Patch size: 16×16 pixels - Produces rich semantic features 2. **Transformer Encoder-Decoder** - Encoder: Processes image features with self-attention - Decoder: Uses cross-attention to localize objects - Set-based prediction (no NMS required) 3. **Detection Head** - Bipartite matching loss for optimal assignment - Joint classification and localization ### Model Specifications - **Parameters**: 30.5M total - **Input**: RGB images (resized to 640×640 during training) - **Output**: Up to 300 detection proposals per image - **Inference Speed**: <2 seconds per image on Apple M3 (MPS backend) ## 📊 Dataset ### Heart Anatomy Types v2 (Roboflow) The model was trained on a curated dataset of 621 annotated images from [Roboflow Universe](https://universe.roboflow.com/), specifically designed to capture the diversity of cardiac anatomy representations. #### Dataset Statistics | Split | Images | Annotations | Distribution | |-------|--------|-------------|--------------| | Train | 497 | 767 | 80.0% | | Valid | 62 | 78 | 10.0% | | Test | 62 | 95 | 10.0% | | **Total** | **621** | **940** | **100%** | #### Class Distribution (Test Set) | Class | Count | Description | |-------|-------|-------------| | `heart_cadaver` | 16 | Real anatomical specimens from dissection | | `heart_cell` | 17 | Microscopic/cellular views of cardiac tissue | | `heart_ct_scan` | 7 | CT imaging of the heart | | `heart_drawing` | 10 | Hand-drawn or digital medical illustrations | | `heart_textbook` | 38 | Educational anatomy images from textbooks | | `heart_wall` | 6 | Cross-sectional views showing heart wall layers | | `heart_xray` | 1 | Radiographic chest/heart images | #### Data Sources - Medical textbooks (openly licensed) - Roboflow Universe community contributions - Educational anatomy databases - All images verified for appropriate licensing ### Annotation Format Annotations follow the COCO format: ```json { "categories": [ {"id": 0, "name": "heart-anatomy-images", "supercategory": "none"}, {"id": 1, "name": "heart_cadaver", "supercategory": "heart-anatomy-images"}, ... ], "images": [...], "annotations": [ { "id": 1, "image_id": 1, "category_id": 1, "bbox": [x, y, width, height], "area": 12345, "iscrowd": 0 } ] } ``` ## 🔬 Training Procedure ### Training Configuration ```python { "model": "RF-DETR Nano", "epochs": 30, "early_stopping_patience": 8, "batch_size": 4, "gradient_accumulation_steps": 4, # Effective batch size: 16 "learning_rate": 1e-4, "optimizer": "AdamW", "weight_decay": 1e-4, "lr_scheduler": "cosine annealing with warmup", "warmup_epochs": 5, "image_size": 640, "augmentations": [ "random_horizontal_flip", "random_brightness_contrast", "color_jitter", "gaussian_noise" ] } ``` ### Training Details - **Hardware**: Apple M3 MacBook Pro (MPS backend) - **Training Time**: ~1 hour 50 minutes - **Best Epoch**: Epoch 4 (with EMA weights) - **Early Stopping**: Triggered at epoch 11 (no improvement for 8 epochs) - **Pretrained Weights**: RF-DETR Nano checkpoint (COCO pretrained) ### Training Dynamics | Epoch | Train Loss | Val Loss | mAP@50 | mAP@50-95 | |-------|------------|----------|--------|-----------| | 0 | 2.145 | 1.832 | 65.3% | 42.1% | | 1 | 1.234 | 1.102 | 76.5% | 51.8% | | 2 | 0.876 | 0.654 | 91.9% | 68.2% | | 3 | 0.543 | 0.432 | 95.5% | 74.3% | | **4** | **0.321** | **0.298** | **97.7%** | **79.1%** | | 5-11 | 0.298-0.285| 0.301-0.312 | 97.3-97.6% | 78.5-79.0% | **Key Observations**: - Rapid convergence in first 4 epochs - No significant overfitting (train/val loss gap remained small) - Validation performance plateaued after epoch 4 - EMA weights provided slight performance boost (+0.2% mAP) ## 📈 Evaluation Results ### Overall Performance (Test Set) | Metric | Value | |--------|-------| | **mAP@50** | **97.7%** | | mAP@50-95 | 79.1% | | Precision@0.5 | 96.8% | | Recall@0.5 | 98.9% | | F1-Score@0.5 | 97.8% | ### Per-Class Performance (Confidence Threshold = 0.3) | Class | Ground Truth | Predictions | Precision | Recall | Average Confidence | |-------|--------------|-------------|-----------|--------|-------------------| | heart_cadaver | 16 | 17 | 94.1% | 100% | 90.5% | | heart_cell | 17 | 17 | 100% | 100% | 86.8% | | heart_ct_scan | 7 | 7 | 100% | 100% | 88.7% | | heart_drawing | 10 | 10 | 100% | 100% | 86.0% | | heart_textbook | 38 | 37 | 97.3% | 94.7% | 88.8% | | heart_wall | 6 | 6 | 100% | 100% | 94.3% | | heart_xray | 1 | 1 | 100% | 100% | 79.1% | ### Performance at Different Confidence Thresholds | Threshold | Total Detections | Coverage | Average Precision | |-----------|------------------|----------|-------------------| | 0.3 | 95 / 95 GT | 100% | 97.7% | | 0.5 | 93 / 95 GT | 97.9% | 98.1% | | 0.7 | 90 / 95 GT | 94.7% | 98.5% | ### Confusion Matrix Analysis The model shows excellent class separation with minimal confusion: - **Zero false positives** for most classes at threshold 0.5+ - **One false positive** for `heart_cadaver` (likely a borderline case) - **High confidence scores** across all classes (79-94% average) ### Error Analysis - **Missed detections**: 2 instances at threshold 0.7 (both `heart_cell` with complex backgrounds) - **False positives**: 1 instance (cadaver misclassified, confidence 0.42) - **Localization**: IoU > 0.75 for 94% of correct detections ## 🚀 Usage ### Installation ```bash pip install torch torchvision pip install git+https://github.com/roboflow/rf-detr.git pip install safetensors # For loading .safetensors format ``` ### Download Model **Recommended: SafeTensors format (safer, smaller, faster)** ```bash wget https://huggingface.co/giannisan/heartformer/resolve/main/heartformer-v0.1.safetensors ``` **Alternative: PyTorch format** ```bash wget https://huggingface.co/giannisan/heartformer/resolve/main/checkpoint_best_ema.pth ``` ### Inference **Using SafeTensors (Recommended)** ```python from rfdetr import RFDETRNano from safetensors.torch import load_file # Load model model = RFDETRNano(num_classes=8) state_dict = load_file("heartformer-v0.1.safetensors") model.load_state_dict(state_dict) # Run inference detections = model.predict("heart_image.jpg", threshold=0.3) # Access results for bbox, confidence, class_id in zip( detections.xyxy, detections.confidence, detections.class_id ): print(f"Class: {CLASS_NAMES[class_id]}") print(f"Confidence: {confidence:.2f}") print(f"BBox: {bbox}") ``` **Using PyTorch Checkpoint ### Class Names ```python CLASS_NAMES = [ "heart-anatomy-images", # Parent category (index 0) "heart_cadaver", "heart_cell", "heart_ct_scan", "heart_drawing", "heart_textbook", "heart_wall", "heart_xray" ] ``` ### Example with Visualization ```python import cv2 import numpy as np # Load and run inference image = Image.open("heart_image.jpg") detections = model.predict("heart_image.jpg", threshold=0.5) # Draw bounding boxes img_array = np.array(image) for bbox, conf, cls_id in zip(detections.xyxy, detections.confidence, detections.class_id): x1, y1, x2, y2 = map(int, bbox) cv2.rectangle(img_array, (x1, y1), (x2, y2), (0, 255, 0), 2) label = f"{CLASS_NAMES[cls_id]}: {conf:.2f}" cv2.putText(img_array, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) cv2.imwrite("output.jpg", img_array) ``` ## ⚠️ Limitations and Biases ### Known Limitations 1. **Dataset Size**: Trained on 621 images - performance may degrade on edge cases 2. **Class Imbalance**: `heart_xray` has only 1 test instance (limited generalization testing) 3. **Single Organ Focus**: Specialized for heart anatomy only 4. **Image Quality**: Optimized for clear, well-lit images; may struggle with low-quality scans 5. **Artistic Variation**: Limited exposure to highly stylized or abstract representations ### Potential Biases - **Textbook Bias**: Heavy representation of textbook-style images (40% of test set) - **Western Medical Tradition**: Dataset predominantly from Western anatomical illustration conventions - **Modern Imaging**: Limited historical or non-standard imaging modalities ### Ethical Considerations - **Not for Clinical Use**: This model is for educational/research purposes only - **Human Oversight Required**: Should not replace expert medical judgment - **Privacy**: Ensure appropriate consent/licensing when using with medical images ## 📚 Citation If you use Heartformer in your research or application, please cite: ```bibtex @misc{heartformer2024, title={Heartformer: Heart Anatomy Type Detection with RF-DETR}, author={Giannisan}, year={2024}, publisher={HuggingFace}, howpublished={\url{https://huggingface.co/giannisan/heartformer}}, note={Apache License 2.0} } ``` ### Acknowledgments - **RF-DETR**: Based on RF-DETR architecture ```bibtex @misc{rfdetr2024, title={RF-DETR: Real-time Detection Transformer}, author={Roboflow}, year={2024}, publisher={GitHub}, howpublished={\url{https://github.com/roboflow/rf-detr}} } ``` - **Dataset**: Heart Anatomy Types v2 from Roboflow Universe - **DINOv2 Backbone**: Meta AI's self-supervised vision transformer ## 📄 License This model is released under the **Apache License 2.0**, the same license as RF-DETR. ``` Copyright 2024 Giannisan Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ``` ### Third-Party Licenses - **RF-DETR**: Apache 2.0 - **PyTorch**: BSD-3-Clause - **DINOv2**: Apache 2.0 ## 🔗 Resources - **Model Weights**: [Download checkpoint](https://huggingface.co/giannisan/heartformer) - **RF-DETR Repository**: https://github.com/roboflow/rf-detr - **Dataset**: https://universe.roboflow.com/heart-anatomy-types - **Demo Application**: [Coming Soon] ## 📞 Contact For questions, issues, or collaboration opportunities: - **HuggingFace**: [@giannisan](https://huggingface.co/giannisan) - **Issues**: Open an issue on the HuggingFace model page --- **Note**: This model is continuously being improved. Check back for updates and new versions!