File size: 2,891 Bytes
6e54552
e4c8837
 
891ea2e
 
4733f42
bb088ff
4f300e3
6e54552
891ea2e
6e54552
 
6eabc0b
e4c8837
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bb088ff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: NAF Zero-Shot Feature Upsampling
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
---


# 🎯 NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

This Space demonstrates **NAF (Neighborhood Attention Filtering)**, a method for upsampling features from Vision Foundation Models to any resolution without model-specific training.

## πŸš€ Features

- **Universal Upsampling**: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.)
- **Arbitrary Resolutions**: Upsample features to any target resolution while maintaining aspect ratio
- **Zero-Shot**: No model-specific training or fine-tuning required
- **Interactive Demo**: Upload your own images or try sample images from various domains

## 🎨 How to Use

1. **Upload an Image**: Click "Upload Your Image" or select from sample images
2. **Choose a Model**: Select a Vision Foundation Model from the dropdown
3. **Set Resolution**: Choose the target resolution for upsampled features (64-512)
4. **Click "Upsample Features"**: See the comparison between low and high-resolution features

## πŸ“Š Visualization

The output shows three panels:
- **Left**: Your input image
- **Center**: Low-resolution features from the backbone (PCA visualization)
- **Right**: High-resolution features upsampled by NAF

Features are visualized using PCA for the first 3 principal components as RGB channels.

## πŸ”¬ Supported Models

- **DINOv3**: Latest self-supervised vision models
- **RADIO v2.5**: High-performance vision backbones
- **DINOv2**: Self-supervised learning with registers
- **DINO**: Original self-supervised ViT
- **SigLIP**: Contrastive vision-language models

## πŸ“– Learn More

- **Paper**: [NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering](https://arxiv.org/abs/2501.01535)
- **Code**: [GitHub Repository](https://github.com/valeoai/NAF)
- **Organization**: [Valeo.ai](https://www.valeo.com/en/valeo-ai/)

## πŸ’‘ Use Cases

NAF enables better feature representations for:
- Dense prediction tasks (segmentation, depth estimation)
- High-resolution visual understanding
- Feature matching and correspondence
- Vision-language alignment

## βš™οΈ Technical Details

- **Input**: Images up to 512px (maintains aspect ratio)
- **Processing**: Backbone feature extraction β†’ NAF upsampling
- **Output**: High-resolution features at target resolution
- **Device**: Runs on CPU (free tier) or GPU (faster inference)

## 🀝 Citation

If you use NAF in your research, please cite:

```bibtex
@article{chambon2025naf,
  title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering},
  author={Chambon, Lucas and others},
  journal={arXiv preprint arXiv:2501.01535},
  year={2025}
}
```

## πŸ“œ License

This demo is released under the Apache 2.0 license.