ADE20K Segmentation Probe — DINOv3 ViT-S/16 @ 128px input

Linear segmentation probe on the spatial features of facebook/dinov3-vits16-pretrain-lvd1689m.

Paper: arXiv:2603.22570
Training code: github.com/m2b3/CanViT-specialize

Usage

uv add "canvit-pytorch @ git+https://github.com/m2b3/CanViT-PyTorch.git"

import torch
from canvit_pytorch.probes import SegmentationProbe

probe = SegmentationProbe.from_pretrained("canvit/probe-ade20k-40k-dv3s-128px").eval()

# [B, H, W, D] DINOv3 ViT-S/16 spatial features at 128px input
features = torch.randn(1, 8, 8, 384)
with torch.inference_mode():
    logits = probe(features)    # [B, num_classes, H, W]
assert logits.shape == (1, 150, 8, 8)

Training

Architecture: Dropout → BatchNorm → Conv1×1.

Hyperparameter	Value
Input size	128 × 128 px
Optimizer	AdamW
Peak LR	$3 \times 10^{-4}$
Weight decay	$10^{-3}$
LR schedule	1,500-step warmup → cosine decay
Batch size	16
Max steps	40,000
Dropout	0.1
Augmentation	RandomResizedCrop scale [0.5, 2] + HFlip
Precision	bf16 (AMP)