Pascal-TriheadNet: Joint Detection & Segmentation

Single-stage unified perception model for Pascal VOC: Detection, Semantic, and Instance Segmentation in one forward pass.

Pascal-TriheadNet is a multi-task learning model that jointly solves three computer vision tasks using a unified Vision Transformer backbone with three specialized task heads. Validated on Pascal VOC 2012, it achieves strong performance across all tasks while maintaining efficient inference.

🔗 View Full Code & Documentation on GitHub

🚀 Key Highlights

Detection mAP@50: 75.6%
Semantic mIoU: 87.3%
Instance Mask mAP@50: 65.7%
Architecture: One Backbone, One Neck, Three Heads (ViT + FPN)

📥 Model Checkpoints

Two versions of the model are provided:

File	Description	Size
`checkpoint_epoch_50.pth`	Best performing FP32 model.	826MB
`checkpoint_epoch_50_quantized.pth`	optimized INT8 Quantized model.	136MB

Training Context: Model was fine-tuned on an L4 GPU in Google Colab.

📊 Performance Metrics

Evaluated on the Pascal VOC 2012 Validation set:

Task	Metric	Score
Detection	mAP (0.5:0.95)	46.7%
Detection	mAP@50	75.6%
Semantic	mIoU	87.3%
Instance	Mask mAP (0.5:0.95)	35.8%
Instance	Mask mAP@50	65.7%

For detailed per-class analysis and ablation studies, please refer to the GitHub Repository.

🏗 Model Overview

The architecture utilizes a Vision Transformer (ViT-Base) backbone pretrained on ImageNet.

Backbone: vit_base_patch16_224 with the last 6 blocks fine-tuned.
Neck: A Simple Feature Pyramid (ViTDet-style) that creates multi-scale feature maps (P2-P5) from the single-scale ViT output.
Heads:
- Detection: FCOS-style anchor-free detector.
- Semantic: Panoptic FPN-style segmentation head.
- Instance: Mask R-CNN-style head using RoI Align.

⚙️ Training Configuration

Epochs: 50
Batch Size: 32
Optimizer: AdamW (Base LR: 2e-4)
Loss: Weighted sum of Focal Loss (Det), Cross-Entropy/Dice (Sem/Inst), and GIoU (Box).

Model Details

Developed by: Sivasubiramaniam Subbiah
Model type: Multi-task Vision Model
Language(s): Python, PyTorch
License: MIT
Finetuned from: Vision Transformer (ViT)

Downloads last month: -; Downloads are not tracked for this model. How to track