Spaces:
Running
on
Zero
Running
on
Zero
File size: 2,891 Bytes
6e54552 e4c8837 891ea2e 4733f42 bb088ff 4f300e3 6e54552 891ea2e 6e54552 6eabc0b e4c8837 bb088ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
title: NAF Zero-Shot Feature Upsampling
emoji: π―
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
---
# π― NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering
This Space demonstrates **NAF (Neighborhood Attention Filtering)**, a method for upsampling features from Vision Foundation Models to any resolution without model-specific training.
## π Features
- **Universal Upsampling**: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.)
- **Arbitrary Resolutions**: Upsample features to any target resolution while maintaining aspect ratio
- **Zero-Shot**: No model-specific training or fine-tuning required
- **Interactive Demo**: Upload your own images or try sample images from various domains
## π¨ How to Use
1. **Upload an Image**: Click "Upload Your Image" or select from sample images
2. **Choose a Model**: Select a Vision Foundation Model from the dropdown
3. **Set Resolution**: Choose the target resolution for upsampled features (64-512)
4. **Click "Upsample Features"**: See the comparison between low and high-resolution features
## π Visualization
The output shows three panels:
- **Left**: Your input image
- **Center**: Low-resolution features from the backbone (PCA visualization)
- **Right**: High-resolution features upsampled by NAF
Features are visualized using PCA for the first 3 principal components as RGB channels.
## π¬ Supported Models
- **DINOv3**: Latest self-supervised vision models
- **RADIO v2.5**: High-performance vision backbones
- **DINOv2**: Self-supervised learning with registers
- **DINO**: Original self-supervised ViT
- **SigLIP**: Contrastive vision-language models
## π Learn More
- **Paper**: [NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering](https://arxiv.org/abs/2501.01535)
- **Code**: [GitHub Repository](https://github.com/valeoai/NAF)
- **Organization**: [Valeo.ai](https://www.valeo.com/en/valeo-ai/)
## π‘ Use Cases
NAF enables better feature representations for:
- Dense prediction tasks (segmentation, depth estimation)
- High-resolution visual understanding
- Feature matching and correspondence
- Vision-language alignment
## βοΈ Technical Details
- **Input**: Images up to 512px (maintains aspect ratio)
- **Processing**: Backbone feature extraction β NAF upsampling
- **Output**: High-resolution features at target resolution
- **Device**: Runs on CPU (free tier) or GPU (faster inference)
## π€ Citation
If you use NAF in your research, please cite:
```bibtex
@article{chambon2025naf,
title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering},
author={Chambon, Lucas and others},
journal={arXiv preprint arXiv:2501.01535},
year={2025}
}
```
## π License
This demo is released under the Apache 2.0 license. |