Spaces:
Running
on
Zero
Running
on
Zero
| title: NAF Zero-Shot Feature Upsampling | |
| emoji: π― | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.0.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # π― NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering | |
| This Space demonstrates **NAF (Neighborhood Attention Filtering)**, a method for upsampling features from Vision Foundation Models to any resolution without model-specific training. | |
| ## π Features | |
| - **Universal Upsampling**: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.) | |
| - **Arbitrary Resolutions**: Upsample features to any target resolution while maintaining aspect ratio | |
| - **Zero-Shot**: No model-specific training or fine-tuning required | |
| - **Interactive Demo**: Upload your own images or try sample images from various domains | |
| ## π¨ How to Use | |
| 1. **Upload an Image**: Click "Upload Your Image" or select from sample images | |
| 2. **Choose a Model**: Select a Vision Foundation Model from the dropdown | |
| 3. **Set Resolution**: Choose the target resolution for upsampled features (64-512) | |
| 4. **Click "Upsample Features"**: See the comparison between low and high-resolution features | |
| ## π Visualization | |
| The output shows three panels: | |
| - **Left**: Your input image | |
| - **Center**: Low-resolution features from the backbone (PCA visualization) | |
| - **Right**: High-resolution features upsampled by NAF | |
| Features are visualized using PCA for the first 3 principal components as RGB channels. | |
| ## π¬ Supported Models | |
| - **DINOv3**: Latest self-supervised vision models | |
| - **RADIO v2.5**: High-performance vision backbones | |
| - **DINOv2**: Self-supervised learning with registers | |
| - **DINO**: Original self-supervised ViT | |
| - **SigLIP**: Contrastive vision-language models | |
| ## π Learn More | |
| - **Paper**: [NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering](https://arxiv.org/abs/2501.01535) | |
| - **Code**: [GitHub Repository](https://github.com/valeoai/NAF) | |
| - **Organization**: [Valeo.ai](https://www.valeo.com/en/valeo-ai/) | |
| ## π‘ Use Cases | |
| NAF enables better feature representations for: | |
| - Dense prediction tasks (segmentation, depth estimation) | |
| - High-resolution visual understanding | |
| - Feature matching and correspondence | |
| - Vision-language alignment | |
| ## βοΈ Technical Details | |
| - **Input**: Images up to 512px (maintains aspect ratio) | |
| - **Processing**: Backbone feature extraction β NAF upsampling | |
| - **Output**: High-resolution features at target resolution | |
| - **Device**: Runs on CPU (free tier) or GPU (faster inference) | |
| ## π€ Citation | |
| If you use NAF in your research, please cite: | |
| ```bibtex | |
| @article{chambon2025naf, | |
| title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering}, | |
| author={Chambon, Lucas and others}, | |
| journal={arXiv preprint arXiv:2501.01535}, | |
| year={2025} | |
| } | |
| ``` | |
| ## π License | |
| This demo is released under the Apache 2.0 license. |