Spaces:

rockeycoss
/

Prompt-Segment-Anything-Demo

Runtime error

App Files Files Community

RockeyCoss commited on Apr 13, 2023

Commit

0702ffc

1 Parent(s): 3ed28cb

reconstruct implementation

Browse files

Files changed (29) hide show

.gitattributes +0 -34
README.md +237 -13
app.py +157 -19
assets/example1.jpg +0 -0
assets/example2.jpg +0 -0
assets/example3.jpg +0 -0
assets/example4.jpg +0 -0
assets/example5.jpg +0 -0
assets/img1.jpg +0 -0
assets/img2.jpg +0 -0
assets/img3.jpg +0 -0
assets/img4.jpg +0 -0
flagged/Input/tmpaytsmk0e.jpg +0 -0
flagged/Output/tmpgs59m7u_.png +0 -0
flagged/log.csv +0 -2
mmdet/apis/inference.py +3 -4
projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi.py +82 -0
projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi_cascade.py +83 -0
projects/configs/hdetr/r50-hdetr_sam-vit-b_cascade.py +83 -0
projects/instance_segment_anything/__init__.py +2 -1
projects/instance_segment_anything/models/det_wrapper_instance_sam.py +25 -7
projects/instance_segment_anything/models/det_wrapper_instance_sam_cascade.py +127 -0
projects/instance_segment_anything/ops/functions/ms_deform_attn_func.py +0 -1
projects/instance_segment_anything/ops/modules/ms_deform_attn.py +1 -0
requirements.txt +1 -2
setup.cfg +21 -0
setup.py +220 -0
tools/dist_test.sh +20 -0
tools/test.py +308 -0

.gitattributes DELETED Viewed

@@ -1,34 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,13 +1,237 @@
----
-title: Prompt Segment Anything
-emoji: 🚀
-colorFrom: pink
-colorTo: yellow
-sdk: gradio
-sdk_version: 3.24.1
-app_file: app.py
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Prompt-Segment-Anything
+This is an implementation of zero-shot instance segmentation using [Segment Anything](https://github.com/facebookresearch/segment-anything). Thanks to the authors of Segment Anything for their wonderful work!
+This repository is based on [MMDetection](https://github.com/open-mmlab/mmdetection) and includes some code from [H-Deformable-DETR](https://github.com/HDETR/H-Deformable-DETR) and [FocalNet-DINO](https://github.com/FocalNet/FocalNet-DINO).
+![example1](assets/example1.jpg)
+## News
+**2023.04.12** Multimask output mode and cascade prompt mode is available now.
+**2023.04.11** Our [demo](https://huggingface.co/spaces/rockeycoss/Prompt-Segment-Anything-Demo) is available now. Please feel free to check it out.
+**2023.04.11** [Swin-L+H-Deformable-DETR + SAM](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py)/[FocalNet-L+DINO + SAM](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) achieves strong COCO instance segmentation results: mask AP=46.8/49.1 by simply prompting SAM with boxes predicted by Swin-L+H-Deformable-DETR/FocalNet-L+DINO. (mask AP=46.5 based on ViTDet)🍺
+## Catalog
+- [x] Support Swin-L+H-Deformable-DETR+SAM
+- [x] Support FocalNet-L+DINO+SAM
+- [x] Support R50+H-Deformable-DETR+SAM/Swin-T+H-Deformable-DETR
+- [x] Support HuggingFace gradio demo
+- [x] Support cascade prompts (box prompt + mask prompt)
+## Box-as-Prompt Results
+|         Detector         |    SAM    |    multimask ouput    | Detector's Box AP | Mask AP |                            Config                            |
+| :---------------------: | :-------: | :---------------: | :-----: | :----------------------------------------------------------: | ----------------------- |
+|  R50+H-Deformable-DETR   | sam-vit-b | :x: |       50.0        |  38.2   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b.py) |
+| R50+H-Deformable-DETR | sam-vit-b | :heavy_check_mark: | 50.0 | 39.9 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi.py) |
+|  R50+H-Deformable-DETR   | sam-vit-l | :x: |       50.0        |  41.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-l.py) |
+| Swin-T+H-Deformable-DETR | sam-vit-b | :x: |       53.2        |  40.0   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-t-hdetr_sam-vit-b.py) |
+| Swin-T+H-Deformable-DETR | sam-vit-l | :x: |       53.2        |  43.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-t-hdetr_sam-vit-l.py) |
+| Swin-L+H-Deformable-DETR | sam-vit-b | :x: |       58.0        |  42.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-b.py) |
+| Swin-L+H-Deformable-DETR | sam-vit-l | :x: |       58.0        |  46.3   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-l.py) |
+| Swin-L+H-Deformable-DETR | sam-vit-h | :x: |       58.0        |  46.8   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) |
+|     FocalNet-L+DINO      | sam-vit-b | :x: |       63.2        |  44.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-b.py) |
+|     FocalNet-L+DINO      | sam-vit-l | :x: |       63.2        |  48.6   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-l.py) |
+|     FocalNet-L+DINO      | sam-vit-h | :x: |       63.2        |  49.1   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) |
+## Cascade-Prompt Results
+|       Detector        |    SAM    |  multimask ouput   | Detector's Box AP | Mask AP | Config                                                       |
+| :-------------------: | :-------: | :----------------: | :---------------: | :-----: | ------------------------------------------------------------ |
+| R50+H-Deformable-DETR | sam-vit-b |        :x:         |       50.0        |  38.8   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_cascade.py) |
+| R50+H-Deformable-DETR | sam-vit-b | :heavy_check_mark: |       50.0        |  40.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi_cascade.py) |
+***Note***
+**multimask ouput**: If multimask output is :heavy_check_mark:, SAM will predict three masks for each prompt, and the segmentation result will be the one with the highest predicted IoU. Otherwise, if multimask output is :x:, SAM will return only one mask for each prompt, which will be used as the segmentation result.
+**cascade-prompt**: In the cascade-prompt setting, the segmentation process involves two stages. In the first stage, a coarse mask is predicted with a bounding box prompt. The second stage then utilizes both the bounding box and the coarse mask as prompts to predict the final segmentation result. Note that if multimask output is :heavy_check_mark:, the first stage will predict three coarse masks, and the second stage will use the mask with the highest predicted IoU as the prompt.
+## Installation
+🍺🍺🍺 Add dockerhub enviroment
+```
+docker pull kxqt/prompt-sam-torch1.12-cuda11.6:20230410
+nvidia-docker run -it --shm-size=4096m -v {your_path}:{path_in_docker} kxqt/prompt-sam-torch1.12-cuda11.6:20230410
+```
+We test the models under `python=3.7.10,pytorch=1.10.2,cuda=10.2`. Other versions might be available as well.
+1. Clone this repository
+```
+git clone https://github.com/RockeyCoss/Instance-Segment-Anything
+cd Instance-Segment-Anything
+```
+2. Install PyTorch
+```bash
+# an example
+pip install torch torchvision
+```
+3. Install MMCV
+```
+pip install -U openmim
+mim install "mmcv>=2.0.0"
+```
+4. Install MMDetection's requirements
+```
+pip install -r requirements.txt
+```
+5. Compile CUDA operators
+```bash
+cd projects/instance_segment_anything/ops
+python setup.py build install
+cd ../../..
+```
+## Prepare COCO Dataset
+Please refer to [data preparation](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).
+## Prepare Checkpoints
+1. Install wget
+```
+pip install wget
+```
+2. SAM checkpoints
+```bash
+mkdir ckpt
+cd ckpt
+python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
+python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
+python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+cd ..
+```
+3. Here are the checkpoints for the detection models. You can download only the checkpoints you need.
+```bash
+# R50+H-Deformable-DETR
+cd ckpt
+python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/r50_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o r50_hdetr.pth
+cd ..
+python tools/convert_ckpt.py ckpt/r50_hdetr.pth ckpt/r50_hdetr.pth
+# Swin-T+H-Deformable-DETR
+cd ckpt
+python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/swin_tiny_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o swin_t_hdetr.pth
+cd ..
+python tools/convert_ckpt.py ckpt/swin_t_hdetr.pth ckpt/swin_t_hdetr.pth
+# Swin-L+H-Deformable-DETR
+cd ckpt
+python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/decay0.05_drop_path0.5_swin_large_hybrid_branch_lambda1_group6_t1500_n900_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o swin_l_hdetr.pth
+cd ..
+python tools/convert_ckpt.py ckpt/swin_l_hdetr.pth ckpt/swin_l_hdetr.pth
+# FocalNet-L+DINO
+cd ckpt
+python -m wget https://projects4jw.blob.core.windows.net/focalnet/release/detection/focalnet_large_fl4_o365_finetuned_on_coco.pth -o focalnet_l_dino.pth
+cd ..
+python tools/convert_ckpt.py ckpt/focalnet_l_dino.pth ckpt/focalnet_l_dino.pth
+```
+## Run Evaluation
+1. Evaluate Metrics
+```bash
+# single GPU
+python tools/test.py path/to/the/config/file --eval segm
+# multiple GPUs
+bash tools/dist_test.sh path/to/the/config/file num_gpus --eval segm
+```
+2. Visualize Segmentation Results
+```bash
+python tools/test.py path/to/the/config/file --show-dir path/to/the/visualization/results
+```
+## Gradio Demo
+We also provide a UI for displaying the segmentation results that is built with gradio. To launch the demo, simply run the following command in a terminal:
+```bash
+pip install gradio
+python app.py
+```
+This demo is also hosted on HuggingFace [here](https://huggingface.co/spaces/rockeycoss/Prompt-Segment-Anything-Demo).
+## More Segmentation Examples
+![example2](assets/example2.jpg)
+![example3](assets/example3.jpg)
+![example4](assets/example4.jpg)
+![example5](assets/example5.jpg)
+## Citation
+**Segment Anything**
+```latex
+@article{kirillov2023segany,
+  title={Segment Anything},
+  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
+  journal={arXiv:2304.02643},
+  year={2023}
+}
+```
+**H-Deformable-DETR**
+```latex
+@article{jia2022detrs,
+  title={DETRs with Hybrid Matching},
+  author={Jia, Ding and Yuan, Yuhui and He, Haodi and Wu, Xiaopei and Yu, Haojun and Lin, Weihong and Sun, Lei and Zhang, Chao and Hu, Han},
+  journal={arXiv preprint arXiv:2207.13080},
+  year={2022}
+}
+```
+**Swin Transformer**
+```latex
+@inproceedings{liu2021Swin,
+  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
+  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
+  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+  year={2021}
+}
+```
+**DINO**
+```latex
+@misc{zhang2022dino,
+      title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
+      author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},
+      year={2022},
+      eprint={2203.03605},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
+**FocalNet**
+```latex
+@misc{yang2022focalnet,
+  author = {Yang, Jianwei and Li, Chunyuan and Dai, Xiyang and Yuan, Lu and Gao, Jianfeng},
+  title = {Focal Modulation Networks},
+  publisher = {arXiv},
+  year = {2022},
+}
+```

app.py CHANGED Viewed

@@ -1,28 +1,54 @@
-# Copyright (c) OpenMMLab. All rights reserved.
 import os
 from collections import OrderedDict
 import torch
-# print(torch.__version__)
-# torch_ver, cuda_ver = torch.__version__.split('+')
-# os.system('pip list')
-# os.system(f'pip install pycocotools==2.0.0 mmdet mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/{cuda_ver}/torch1.10.0/index.html --no-cache-dir')
-os.system(r'python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/r50_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o ckpt/r50_hdetr.pth')
-os.system(r'python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/swin_tiny_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o ckpt/swin_t_hdetr.pth')
-os.system(r'python tools/convert_ckpt.py ckpt/r50_hdetr.pth ckpt/r50_hdetr.pth')
-os.system(r'python tools/convert_ckpt.py ckpt/swin_t_hdetr.pth ckpt/swin_t_hdetr.pth')
 from mmcv import Config
 from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
-from mmdet.apis import init_detector, inference_detector
-from mmdet.datasets import (CocoDataset)
 from mmdet.utils import (compat_cfg, replace_cfg_vals, setup_multi_processes,
                          update_data_root)
-import gradio as gr
 config_dict = OrderedDict([('r50-hdetr_sam-vit-b', 'projects/configs/hdetr/r50-hdetr_sam-vit-b.py'),
                            ('r50-hdetr_sam-vit-l', 'projects/configs/hdetr/r50-hdetr_sam-vit-l.py'),
                            ('swin-t-hdetr_sam-vit-b', 'projects/configs/hdetr/swin-t-hdetr_sam-vit-b.py'),
@@ -33,7 +59,118 @@ config_dict = OrderedDict([('r50-hdetr_sam-vit-b', 'projects/configs/hdetr/r50-h
                            ('focalnet-l-dino_sam-vit-b', 'projects/configs/focalnet_dino/focalnet-l-dino_sam-vit-b.py'),
                            # ('focalnet-l-dino_sam-vit-l', 'projects/configs/focalnet_dino/focalnet-l-dino_sam-vit-l.py'),
                            # ('focalnet-l-dino_sam-vit-h', 'projects/configs/focalnet_dino/focalnet-l-dino_sam-vit-h.py')
-                          ])
 def inference(img, config):
@@ -85,10 +222,10 @@ def inference(img, config):
         device = "cuda"
     else:
         device = "cpu"
-    model = init_detector(cfg, None, device=device)
     model.CLASSES = CocoDataset.CLASSES
-    results = inference_detector(model, img)
     visualize = model.show_result(
         img,
         results,
@@ -108,9 +245,10 @@ description = """
 Github link: [Link](https://github.com/RockeyCoss/Prompt-Segment-Anything)
 You can select the model you want to use from the "Model" dropdown menu and click "Submit" to segment the image you uploaded to the "Input Image" box.
 """
-if (SPACE_ID := os.getenv('SPACE_ID')) is not None:
     description += f'\n<p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings. <a href="https://huggingface.co/spaces/{SPACE_ID}?duplicate=true"><img style="display: inline; margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space" /></a></p>'
 def main():
     with gr.Blocks() as demo:
         gr.Markdown(description)

 import os
+SPACE_ID = os.getenv('SPACE_ID')
+if SPACE_ID is not None:
+    # running on huggingface space
+    os.system(r'mkdir ckpt')
+    os.system(
+        r'python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth -o ckpt/sam_vit_b_01ec64.pth')
+    os.system(
+        r'python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth -o ckpt/sam_vit_l_0b3195.pth')
+    os.system(
+        r'python -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -o ckpt/sam_vit_h_4b8939.pth')
+    os.system(
+        r'python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1'
+        r'/r50_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o ckpt/r50_hdetr.pth')
+    os.system(
+        r'python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1'
+        r'/swin_tiny_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o ckpt/swin_t_hdetr.pth')
+    os.system(
+        r'python -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/decay0.05_drop_path0'
+        r'.5_swin_large_hybrid_branch_lambda1_group6_t1500_n900_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o ckpt/swin_l_hdetr.pth')
+    os.system(r'python -m wget https://projects4jw.blob.core.windows.net/focalnet/release/detection'
+              r'/focalnet_large_fl4_o365_finetuned_on_coco.pth -o ckpt/focalnet_l_dino.pth')
+    os.system(r'python tools/convert_ckpt.py ckpt/r50_hdetr.pth ckpt/r50_hdetr.pth')
+    os.system(r'python tools/convert_ckpt.py ckpt/swin_t_hdetr.pth ckpt/swin_t_hdetr.pth')
+    os.system(r'python tools/convert_ckpt.py ckpt/swin_l_hdetr.pth ckpt/swin_l_hdetr.pth')
+    os.system(r'python tools/convert_ckpt.py ckpt/focalnet_l_dino.pth ckpt/focalnet_l_dino.pth')
+import warnings
 from collections import OrderedDict
+from pathlib import Path
+import gradio as gr
+import numpy as np
 import torch
+import mmcv
 from mmcv import Config
+from mmcv.ops import RoIPool
+from mmcv.parallel import collate, scatter
+from mmcv.runner import load_checkpoint
 from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
+from mmdet.core import get_classes
+from mmdet.datasets import (CocoDataset, replace_ImageToTensor)
+from mmdet.datasets.pipelines import Compose
+from mmdet.models import build_detector
 from mmdet.utils import (compat_cfg, replace_cfg_vals, setup_multi_processes,
                          update_data_root)
 config_dict = OrderedDict([('r50-hdetr_sam-vit-b', 'projects/configs/hdetr/r50-hdetr_sam-vit-b.py'),
                            ('r50-hdetr_sam-vit-l', 'projects/configs/hdetr/r50-hdetr_sam-vit-l.py'),
                            ('swin-t-hdetr_sam-vit-b', 'projects/configs/hdetr/swin-t-hdetr_sam-vit-b.py'),
                            ('focalnet-l-dino_sam-vit-b', 'projects/configs/focalnet_dino/focalnet-l-dino_sam-vit-b.py'),
                            # ('focalnet-l-dino_sam-vit-l', 'projects/configs/focalnet_dino/focalnet-l-dino_sam-vit-l.py'),
                            # ('focalnet-l-dino_sam-vit-h', 'projects/configs/focalnet_dino/focalnet-l-dino_sam-vit-h.py')
+                           ])
+def init_demo_detector(config, checkpoint=None, device='cuda:0', cfg_options=None):
+    """Initialize a detector from config file.
+    Args:
+        config (str, :obj:`Path`, or :obj:`mmcv.Config`): Config file path,
+            :obj:`Path`, or the config object.
+        checkpoint (str, optional): Checkpoint path. If left as None, the model
+            will not load any weights.
+        cfg_options (dict): Options to override some settings in the used
+            config.
+    Returns:
+        nn.Module: The constructed detector.
+    """
+    if isinstance(config, (str, Path)):
+        config = mmcv.Config.fromfile(config)
+    elif not isinstance(config, mmcv.Config):
+        raise TypeError('config must be a filename or Config object, '
+                        f'but got {type(config)}')
+    if cfg_options is not None:
+        config.merge_from_dict(cfg_options)
+    if 'pretrained' in config.model:
+        config.model.pretrained = None
+    elif (config.model.get('backbone', None) is not None
+          and 'init_cfg' in config.model.backbone):
+        config.model.backbone.init_cfg = None
+    config.model.train_cfg = None
+    model = build_detector(config.model, test_cfg=config.get('test_cfg'))
+    if checkpoint is not None:
+        checkpoint = load_checkpoint(model, checkpoint, map_location='cpu')
+        if 'CLASSES' in checkpoint.get('meta', {}):
+            model.CLASSES = checkpoint['meta']['CLASSES']
+        else:
+            warnings.simplefilter('once')
+            warnings.warn('Class names are not saved in the checkpoint\'s '
+                          'meta data, use COCO classes by default.')
+            model.CLASSES = get_classes('coco')
+    model.cfg = config  # save the config in the model for convenience
+    model.to(device)
+    model.eval()
+    if device == 'npu':
+        from mmcv.device.npu import NPUDataParallel
+        model = NPUDataParallel(model)
+        model.cfg = config
+    return model
+def inference_demo_detector(model, imgs):
+    """Inference image(s) with the detector.
+    Args:
+        model (nn.Module): The loaded detector.
+        imgs (str/ndarray or list[str/ndarray] or tuple[str/ndarray]):
+           Either image files or loaded images.
+    Returns:
+        If imgs is a list or tuple, the same length list type results
+        will be returned, otherwise return the detection results directly.
+    """
+    ori_img = imgs
+    if isinstance(imgs, (list, tuple)):
+        is_batch = True
+    else:
+        imgs = [imgs]
+        is_batch = False
+    cfg = model.cfg
+    device = next(model.parameters()).device  # model device
+    if isinstance(imgs[0], np.ndarray):
+        cfg = cfg.copy()
+        # set loading pipeline type
+        cfg.data.test.pipeline[0].type = 'LoadImageFromWebcam'
+    cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline)
+    test_pipeline = Compose(cfg.data.test.pipeline)
+    datas = []
+    for img in imgs:
+        # prepare data
+        if isinstance(img, np.ndarray):
+            # directly add img
+            data = dict(img=img)
+        else:
+            # add information into dict
+            data = dict(img_info=dict(filename=img), img_prefix=None)
+        # build the data pipeline
+        data = test_pipeline(data)
+        datas.append(data)
+    data = collate(datas, samples_per_gpu=len(imgs))
+    # just get the actual data from DataContainer
+    data['img_metas'] = [img_metas.data[0] for img_metas in data['img_metas']]
+    data['img'] = [img.data[0] for img in data['img']]
+    if next(model.parameters()).is_cuda:
+        # scatter to specified GPU
+        data = scatter(data, [device])[0]
+    else:
+        for m in model.modules():
+            assert not isinstance(
+                m, RoIPool
+            ), 'CPU inference with RoIPool is not supported currently.'
+    # forward the model
+    with torch.no_grad():
+        results = model(return_loss=False, rescale=True, **data, ori_img=ori_img)
+    if not is_batch:
+        return results[0]
+    else:
+        return results
 def inference(img, config):
         device = "cuda"
     else:
         device = "cpu"
+    model = init_demo_detector(cfg, None, device=device)
     model.CLASSES = CocoDataset.CLASSES
+    results = inference_demo_detector(model, img)
     visualize = model.show_result(
         img,
         results,
 Github link: [Link](https://github.com/RockeyCoss/Prompt-Segment-Anything)
 You can select the model you want to use from the "Model" dropdown menu and click "Submit" to segment the image you uploaded to the "Input Image" box.
 """
+if SPACE_ID is not None:
     description += f'\n<p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings. <a href="https://huggingface.co/spaces/{SPACE_ID}?duplicate=true"><img style="display: inline; margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space" /></a></p>'
 def main():
     with gr.Blocks() as demo:
         gr.Markdown(description)

assets/example1.jpg ADDED Viewed

assets/example2.jpg ADDED Viewed

assets/example3.jpg ADDED Viewed

assets/example4.jpg ADDED Viewed

assets/example5.jpg ADDED Viewed

assets/img1.jpg ADDED Viewed

assets/img2.jpg ADDED Viewed

assets/img3.jpg ADDED Viewed

assets/img4.jpg ADDED Viewed

flagged/Input/tmpaytsmk0e.jpg DELETED Viewed

Binary file (111 kB)

flagged/Output/tmpgs59m7u_.png DELETED Viewed

Binary file (498 kB)

flagged/log.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Input,Output,flag,username,timestamp
2	- C:\Users\13502\Documents\msra\prompt_segment_anything_demo\flagged\Input\tmpaytsmk0e.jpg,C:\Users\13502\Documents\msra\prompt_segment_anything_demo\flagged\Output\tmpgs59m7u_.png,,,2023-04-10 20:52:40.908980

mmdet/apis/inference.py CHANGED Viewed

@@ -38,8 +38,7 @@ def init_detector(config, checkpoint=None, device='cuda:0', cfg_options=None):
         config.merge_from_dict(cfg_options)
     if 'pretrained' in config.model:
         config.model.pretrained = None
-    elif (config.model.get('backbone', None) is not None
-          and 'init_cfg' in config.model.backbone):
         config.model.backbone.init_cfg = None
     config.model.train_cfg = None
     model = build_detector(config.model, test_cfg=config.get('test_cfg'))
@@ -109,7 +108,7 @@ def inference_detector(model, imgs):
         If imgs is a list or tuple, the same length list type results
         will be returned, otherwise return the detection results directly.
     """
-    ori_img = imgs
     if isinstance(imgs, (list, tuple)):
         is_batch = True
     else:
@@ -155,7 +154,7 @@ def inference_detector(model, imgs):
     # forward the model
     with torch.no_grad():
-        results = model(return_loss=False, rescale=True, **data, ori_img=ori_img)
     if not is_batch:
         return results[0]

         config.merge_from_dict(cfg_options)
     if 'pretrained' in config.model:
         config.model.pretrained = None
+    elif 'init_cfg' in config.model.backbone:
         config.model.backbone.init_cfg = None
     config.model.train_cfg = None
     model = build_detector(config.model, test_cfg=config.get('test_cfg'))
         If imgs is a list or tuple, the same length list type results
         will be returned, otherwise return the detection results directly.
     """
     if isinstance(imgs, (list, tuple)):
         is_batch = True
     else:
     # forward the model
     with torch.no_grad():
+        results = model(return_loss=False, rescale=True, **data)
     if not is_batch:
         return results[0]

projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi.py ADDED Viewed

	@@ -0,0 +1,82 @@

+_base_ = [
+    '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py'
+]
+plugin = True
+plugin_dir = 'projects/instance_segment_anything/'
+model = dict(
+    type='DetWrapperInstanceSAM',
+    det_wrapper_type='hdetr',
+    det_wrapper_cfg=dict(aux_loss=True,
+                         backbone='resnet50',
+                         num_classes=91,
+                         cache_mode=False,
+                         dec_layers=6,
+                         dec_n_points=4,
+                         dilation=False,
+                         dim_feedforward=2048,
+                         drop_path_rate=0.2,
+                         dropout=0.0,
+                         enc_layers=6,
+                         enc_n_points=4,
+                         focal_alpha=0.25,
+                         frozen_weights=None,
+                         hidden_dim=256,
+                         k_one2many=6,
+                         lambda_one2many=1.0,
+                         look_forward_twice=True,
+                         masks=False,
+                         mixed_selection=True,
+                         nheads=8,
+                         num_feature_levels=4,
+                         num_queries_one2many=1500,
+                         num_queries_one2one=300,
+                         position_embedding='sine',
+                         position_embedding_scale=6.283185307179586,
+                         remove_difficult=False,
+                         topk=100,
+                         two_stage=True,
+                         use_checkpoint=False,
+                         use_fp16=False,
+                         with_box_refine=True),
+    det_model_ckpt='ckpt/r50_hdetr.pth',
+    num_classes=80,
+    model_type='vit_b',
+    sam_checkpoint='ckpt/sam_vit_b_01ec64.pth',
+    use_sam_iou=True,
+    best_in_multi_mask=True,
+)
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+# test_pipeline, NOTE the Pad's size_divisor is different from the default
+# setting (size_divisor=32). While there is little effect on the performance
+# whether we use the default setting or use size_divisor=1.
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=1),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+data = dict(
+    samples_per_gpu=1,
+    workers_per_gpu=1,
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))

projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi_cascade.py ADDED Viewed

	@@ -0,0 +1,83 @@

+_base_ = [
+    '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py'
+]
+plugin = True
+plugin_dir = 'projects/instance_segment_anything/'
+model = dict(
+    type='DetWrapperInstanceSAMCascade',
+    det_wrapper_type='hdetr',
+    det_wrapper_cfg=dict(aux_loss=True,
+                         backbone='resnet50',
+                         num_classes=91,
+                         cache_mode=False,
+                         dec_layers=6,
+                         dec_n_points=4,
+                         dilation=False,
+                         dim_feedforward=2048,
+                         drop_path_rate=0.2,
+                         dropout=0.0,
+                         enc_layers=6,
+                         enc_n_points=4,
+                         focal_alpha=0.25,
+                         frozen_weights=None,
+                         hidden_dim=256,
+                         k_one2many=6,
+                         lambda_one2many=1.0,
+                         look_forward_twice=True,
+                         masks=False,
+                         mixed_selection=True,
+                         nheads=8,
+                         num_feature_levels=4,
+                         num_queries_one2many=1500,
+                         num_queries_one2one=300,
+                         position_embedding='sine',
+                         position_embedding_scale=6.283185307179586,
+                         remove_difficult=False,
+                         topk=100,
+                         two_stage=True,
+                         use_checkpoint=False,
+                         use_fp16=False,
+                         with_box_refine=True),
+    det_model_ckpt='ckpt/r50_hdetr.pth',
+    num_classes=80,
+    model_type='vit_b',
+    sam_checkpoint='ckpt/sam_vit_b_01ec64.pth',
+    use_sam_iou=True,
+    best_in_multi_mask=True,
+    stage_1_multi_mask=True,
+)
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+# test_pipeline, NOTE the Pad's size_divisor is different from the default
+# setting (size_divisor=32). While there is little effect on the performance
+# whether we use the default setting or use size_divisor=1.
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=1),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+data = dict(
+    samples_per_gpu=1,
+    workers_per_gpu=1,
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))

projects/configs/hdetr/r50-hdetr_sam-vit-b_cascade.py ADDED Viewed

	@@ -0,0 +1,83 @@

+_base_ = [
+    '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py'
+]
+plugin = True
+plugin_dir = 'projects/instance_segment_anything/'
+model = dict(
+    type='DetWrapperInstanceSAMCascade',
+    det_wrapper_type='hdetr',
+    det_wrapper_cfg=dict(aux_loss=True,
+                         backbone='resnet50',
+                         num_classes=91,
+                         cache_mode=False,
+                         dec_layers=6,
+                         dec_n_points=4,
+                         dilation=False,
+                         dim_feedforward=2048,
+                         drop_path_rate=0.2,
+                         dropout=0.0,
+                         enc_layers=6,
+                         enc_n_points=4,
+                         focal_alpha=0.25,
+                         frozen_weights=None,
+                         hidden_dim=256,
+                         k_one2many=6,
+                         lambda_one2many=1.0,
+                         look_forward_twice=True,
+                         masks=False,
+                         mixed_selection=True,
+                         nheads=8,
+                         num_feature_levels=4,
+                         num_queries_one2many=1500,
+                         num_queries_one2one=300,
+                         position_embedding='sine',
+                         position_embedding_scale=6.283185307179586,
+                         remove_difficult=False,
+                         topk=100,
+                         two_stage=True,
+                         use_checkpoint=False,
+                         use_fp16=False,
+                         with_box_refine=True),
+    det_model_ckpt='ckpt/r50_hdetr.pth',
+    num_classes=80,
+    model_type='vit_b',
+    sam_checkpoint='ckpt/sam_vit_b_01ec64.pth',
+    use_sam_iou=True,
+    best_in_multi_mask=False,
+    stage_1_multi_mask=False,
+)
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+# test_pipeline, NOTE the Pad's size_divisor is different from the default
+# setting (size_divisor=32). While there is little effect on the performance
+# whether we use the default setting or use size_divisor=1.
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=1),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img'])
+        ])
+]
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+data = dict(
+    samples_per_gpu=1,
+    workers_per_gpu=1,
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))

projects/instance_segment_anything/__init__.py CHANGED Viewed

	@@ -1 +1,2 @@
1	- from .models.det_wrapper_instance_sam import DetWrapperInstanceSAM


1	+ from .models.det_wrapper_instance_sam import DetWrapperInstanceSAM
2	+ from .models.det_wrapper_instance_sam_cascade import DetWrapperInstanceSAMCascade

projects/instance_segment_anything/models/det_wrapper_instance_sam.py CHANGED Viewed

@@ -25,6 +25,7 @@ class DetWrapperInstanceSAM(BaseDetector):
                  model_type='vit_b',
                  sam_checkpoint=None,
                  use_sam_iou=True,
                  init_cfg=None,
                  train_cfg=None,
@@ -45,12 +46,16 @@ class DetWrapperInstanceSAM(BaseDetector):
         sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
         _ = sam.to(device=self.learnable_placeholder.weight.device)
         self.predictor = SamPredictor(sam)
         self.use_sam_iou = use_sam_iou
     def init_weights(self):
         pass
-    def simple_test(self, img, img_metas, ori_img, rescale=True):
         """Test without augmentation.
         Args:
             imgs (Tensor): A batch of images.
@@ -66,22 +71,35 @@ class DetWrapperInstanceSAM(BaseDetector):
         # Tensor(n,4), xyxy, ori image scale
         output_boxes = results[0]['boxes']
         self.predictor.set_image(ori_img)
         transformed_boxes = self.predictor.transform.apply_boxes_torch(output_boxes, ori_img.shape[:2])
-        # mask_pred: n,1,h,w
-        # sam_score: n, 1
         mask_pred, sam_score, _ = self.predictor.predict_torch(
             point_coords=None,
             point_labels=None,
             boxes=transformed_boxes,
-            multimask_output=False,
             return_logits=True,
         )
-        # Tensor(n,h,w), raw mask pred
-        mask_pred = mask_pred.squeeze(1)
-        sam_score = sam_score.squeeze(-1)
         # Tensor(n,)
         label_pred = results[0]['labels']

                  model_type='vit_b',
                  sam_checkpoint=None,
                  use_sam_iou=True,
+                 best_in_multi_mask=False,
                  init_cfg=None,
                  train_cfg=None,
         sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
         _ = sam.to(device=self.learnable_placeholder.weight.device)
         self.predictor = SamPredictor(sam)
+        # Whether use SAM's predicted IoU to calibrate the confidence score.
         self.use_sam_iou = use_sam_iou
+        # If True, set multimask_output=True and return the mask with highest predicted IoU.
+        # if False, set multimask_output=False and return the unique output mask.
+        self.best_in_multi_mask = best_in_multi_mask
     def init_weights(self):
         pass
+    def simple_test(self, img, img_metas, rescale=True, ori_img=None):
         """Test without augmentation.
         Args:
             imgs (Tensor): A batch of images.
         # Tensor(n,4), xyxy, ori image scale
         output_boxes = results[0]['boxes']
+        if ori_img is None:
+            image_path = img_metas[0]['filename']
+            ori_img = cv2.imread(image_path)
+            ori_img = cv2.cvtColor(ori_img, cv2.COLOR_BGR2RGB)
         self.predictor.set_image(ori_img)
         transformed_boxes = self.predictor.transform.apply_boxes_torch(output_boxes, ori_img.shape[:2])
+        # mask_pred: n,1/3,h,w
+        # sam_score: n, 1/3
         mask_pred, sam_score, _ = self.predictor.predict_torch(
             point_coords=None,
             point_labels=None,
             boxes=transformed_boxes,
+            multimask_output=self.best_in_multi_mask,
             return_logits=True,
         )
+        if self.best_in_multi_mask:
+            # sam_score: n
+            sam_score, max_iou_idx = torch.max(sam_score, dim=1)
+            # mask_pred: n,h,w
+            mask_pred = mask_pred[torch.arange(mask_pred.size(0)),
+                                  max_iou_idx]
+        else:
+            # Tensor(n,h,w), raw mask pred
+            # n,1,h,w->n,h,w
+            mask_pred = mask_pred.squeeze(1)
+            # n,1->n
+            sam_score = sam_score.squeeze(-1)
         # Tensor(n,)
         label_pred = results[0]['labels']

projects/instance_segment_anything/models/det_wrapper_instance_sam_cascade.py ADDED Viewed

	@@ -0,0 +1,127 @@

+import cv2
+import torch
+from mmdet.core import bbox2result
+from mmdet.models import DETECTORS
+from .det_wrapper_instance_sam import DetWrapperInstanceSAM
+@DETECTORS.register_module()
+class DetWrapperInstanceSAMCascade(DetWrapperInstanceSAM):
+    def __init__(self,
+                 stage_1_multi_mask=False,
+                 det_wrapper_type='hdetr',
+                 det_wrapper_cfg=None,
+                 det_model_ckpt=None,
+                 num_classes=80,
+                 model_type='vit_b',
+                 sam_checkpoint=None,
+                 use_sam_iou=True,
+                 best_in_multi_mask=False,
+                 init_cfg=None,
+                 train_cfg=None,
+                 test_cfg=None):
+        super(DetWrapperInstanceSAMCascade, self).__init__(det_wrapper_type=det_wrapper_type,
+                                                           det_wrapper_cfg=det_wrapper_cfg,
+                                                           det_model_ckpt=det_model_ckpt,
+                                                           num_classes=num_classes,
+                                                           model_type=model_type,
+                                                           sam_checkpoint=sam_checkpoint,
+                                                           use_sam_iou=use_sam_iou,
+                                                           best_in_multi_mask=best_in_multi_mask,
+                                                           init_cfg=init_cfg,
+                                                           train_cfg=train_cfg,
+                                                           test_cfg=test_cfg)
+        # If True, then the coarse mask output by stage 1 will be the
+        # one with the highest predicted IoU among the three masks.
+        # If False, then stage 1 will only output one coarse mask.
+        self.stage_1_multi_mask = stage_1_multi_mask
+    def simple_test(self, img, img_metas, rescale=True, ori_img=None):
+        """Test without augmentation.
+        Args:
+            imgs (Tensor): A batch of images.
+            img_metas (list[dict]): List of image information.
+        """
+        assert rescale
+        assert len(img_metas) == 1
+        # results: List[dict(scores, labels, boxes)]
+        results = self.det_model.simple_test(img,
+                                             img_metas,
+                                             rescale)
+        # Tensor(n,4), xyxy, ori image scale
+        output_boxes = results[0]['boxes']
+        if ori_img is None:
+            image_path = img_metas[0]['filename']
+            ori_img = cv2.imread(image_path)
+            ori_img = cv2.cvtColor(ori_img, cv2.COLOR_BGR2RGB)
+        self.predictor.set_image(ori_img)
+        transformed_boxes = self.predictor.transform.apply_boxes_torch(output_boxes, ori_img.shape[:2])
+        # mask_pred: n,1/3,h,w
+        # sam_score: n, 1/3
+        # coarse_mask: n,1/3,256,256
+        _1, coarse_mask_score, coarse_mask = self.predictor.predict_torch(
+            point_coords=None,
+            point_labels=None,
+            boxes=transformed_boxes,
+            multimask_output=self.stage_1_multi_mask,
+            return_logits=True,
+        )
+        if self.stage_1_multi_mask:
+            max_iou_idx = torch.max(coarse_mask_score, dim=1)[1]
+            coarse_mask = (coarse_mask[torch.arange(coarse_mask.size(0)),
+                                       max_iou_idx]).unsqueeze(1)
+        mask_pred, sam_score, _ = self.predictor.predict_torch(
+            point_coords=None,
+            point_labels=None,
+            boxes=transformed_boxes,
+            mask_input=coarse_mask,
+            multimask_output=self.best_in_multi_mask,
+            return_logits=True,
+        )
+        if self.best_in_multi_mask:
+            # sam_score: n
+            sam_score, max_iou_idx = torch.max(sam_score, dim=1)
+            # mask_pred: n,h,w
+            mask_pred = mask_pred[torch.arange(mask_pred.size(0)),
+                                  max_iou_idx]
+        else:
+            # Tensor(n,h,w), raw mask pred
+            # n,1,h,w->n,h,w
+            mask_pred = mask_pred.squeeze(1)
+            # n,1->n
+            sam_score = sam_score.squeeze(-1)
+        # Tensor(n,)
+        label_pred = results[0]['labels']
+        score_pred = results[0]['scores']
+        # mask_pred: Tensor(n,h,w)
+        # label_pred: Tensor(n,)
+        # score_pred: Tensor(n,)
+        # sam_score: Tensor(n,)
+        mask_pred_binary = (mask_pred > self.predictor.model.mask_threshold).float()
+        if self.use_sam_iou:
+            det_scores = score_pred * sam_score
+        else:
+            # n
+            mask_scores_per_image = (mask_pred * mask_pred_binary).flatten(1).sum(1) / (
+                    mask_pred_binary.flatten(1).sum(1) + 1e-6)
+            det_scores = score_pred * mask_scores_per_image
+        # det_scores = score_pred
+        mask_pred_binary = mask_pred_binary.bool()
+        bboxes = torch.cat([output_boxes, det_scores[:, None]], dim=-1)
+        bbox_results = bbox2result(bboxes, label_pred, self.num_classes)
+        mask_results = [[] for _ in range(self.num_classes)]
+        for j, label in enumerate(label_pred):
+            mask = mask_pred_binary[j].detach().cpu().numpy()
+            mask_results[label].append(mask)
+        output_results = [(bbox_results, mask_results)]
+        return output_results

projects/instance_segment_anything/ops/functions/ms_deform_attn_func.py CHANGED Viewed

@@ -24,7 +24,6 @@ try:
 except:
     pass
 class MSDeformAttnFunction(Function):
     @staticmethod
     @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)

 except:
     pass
 class MSDeformAttnFunction(Function):
     @staticmethod
     @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)

projects/instance_segment_anything/ops/modules/ms_deform_attn.py CHANGED Viewed

@@ -21,6 +21,7 @@ import torch
 from torch import nn
 import torch.nn.functional as F
 from torch.nn.init import xavier_uniform_, constant_
 from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
 from ..functions import MSDeformAttnFunction, ms_deform_attn_core_pytorch

 from torch import nn
 import torch.nn.functional as F
 from torch.nn.init import xavier_uniform_, constant_
 from mmcv.utils import IS_CUDA_AVAILABLE, IS_MLU_AVAILABLE
 from ..functions import MSDeformAttnFunction, ms_deform_attn_core_pytorch

requirements.txt CHANGED Viewed

@@ -12,5 +12,4 @@ timm
 wget
 gradio
 --find-links https://download.openmmlab.com/mmcv/dist/cpu/torch1.12.0/index.html
-mmcv-full==1.6.0

 wget
 gradio
 --find-links https://download.openmmlab.com/mmcv/dist/cpu/torch1.12.0/index.html
+mmcv-full==1.6.0

setup.cfg ADDED Viewed

	@@ -0,0 +1,21 @@

+[isort]
+line_length = 79
+multi_line_output = 0
+extra_standard_library = setuptools
+known_first_party = mmdet
+known_third_party = PIL,asynctest,cityscapesscripts,cv2,gather_models,matplotlib,mmcv,numpy,onnx,onnxruntime,pycocotools,pytest,pytorch_sphinx_theme,requests,scipy,seaborn,six,terminaltables,torch,ts,yaml
+no_lines_before = STDLIB,LOCALFOLDER
+default_section = THIRDPARTY
+[yapf]
+BASED_ON_STYLE = pep8
+BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = true
+SPLIT_BEFORE_EXPRESSION_AFTER_OPENING_PAREN = true
+# ignore-words-list needs to be lowercase format. For example, if we want to
+# ignore word "BA", then we need to append "ba" to ignore-words-list rather
+# than "BA"
+[codespell]
+skip = *.ipynb
+quiet-level = 3
+ignore-words-list = patten,nd,ty,mot,hist,formating,winn,gool,datas,wan,confids,TOOD,tood,ba,warmup,nam,dota,DOTA

setup.py ADDED Viewed

	@@ -0,0 +1,220 @@

+#!/usr/bin/env python
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import os.path as osp
+import platform
+import shutil
+import sys
+import warnings
+from setuptools import find_packages, setup
+import torch
+from torch.utils.cpp_extension import (BuildExtension, CppExtension,
+                                       CUDAExtension)
+def readme():
+    with open('README.md', encoding='utf-8') as f:
+        content = f.read()
+    return content
+version_file = 'mmdet/version.py'
+def get_version():
+    with open(version_file, 'r') as f:
+        exec(compile(f.read(), version_file, 'exec'))
+    return locals()['__version__']
+def make_cuda_ext(name, module, sources, sources_cuda=[]):
+    define_macros = []
+    extra_compile_args = {'cxx': []}
+    if torch.cuda.is_available() or os.getenv('FORCE_CUDA', '0') == '1':
+        define_macros += [('WITH_CUDA', None)]
+        extension = CUDAExtension
+        extra_compile_args['nvcc'] = [
+            '-D__CUDA_NO_HALF_OPERATORS__',
+            '-D__CUDA_NO_HALF_CONVERSIONS__',
+            '-D__CUDA_NO_HALF2_OPERATORS__',
+        ]
+        sources += sources_cuda
+    else:
+        print(f'Compiling {name} without CUDA')
+        extension = CppExtension
+    return extension(
+        name=f'{module}.{name}',
+        sources=[os.path.join(*module.split('.'), p) for p in sources],
+        define_macros=define_macros,
+        extra_compile_args=extra_compile_args)
+def parse_requirements(fname='requirements.txt', with_version=True):
+    """Parse the package dependencies listed in a requirements file but strips
+    specific versioning information.
+    Args:
+        fname (str): path to requirements file
+        with_version (bool, default=False): if True include version specs
+    Returns:
+        List[str]: list of requirements items
+    CommandLine:
+        python -c "import setup; print(setup.parse_requirements())"
+    """
+    import re
+    import sys
+    from os.path import exists
+    require_fpath = fname
+    def parse_line(line):
+        """Parse information from a line in a requirements text file."""
+        if line.startswith('-r '):
+            # Allow specifying requirements in other files
+            target = line.split(' ')[1]
+            for info in parse_require_file(target):
+                yield info
+        else:
+            info = {'line': line}
+            if line.startswith('-e '):
+                info['package'] = line.split('#egg=')[1]
+            elif '@git+' in line:
+                info['package'] = line
+            else:
+                # Remove versioning from the package
+                pat = '(' + '|'.join(['>=', '==', '>']) + ')'
+                parts = re.split(pat, line, maxsplit=1)
+                parts = [p.strip() for p in parts]
+                info['package'] = parts[0]
+                if len(parts) > 1:
+                    op, rest = parts[1:]
+                    if ';' in rest:
+                        # Handle platform specific dependencies
+                        # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies
+                        version, platform_deps = map(str.strip,
+                                                     rest.split(';'))
+                        info['platform_deps'] = platform_deps
+                    else:
+                        version = rest  # NOQA
+                    info['version'] = (op, version)
+            yield info
+    def parse_require_file(fpath):
+        with open(fpath, 'r') as f:
+            for line in f.readlines():
+                line = line.strip()
+                if line and not line.startswith('#'):
+                    for info in parse_line(line):
+                        yield info
+    def gen_packages_items():
+        if exists(require_fpath):
+            for info in parse_require_file(require_fpath):
+                parts = [info['package']]
+                if with_version and 'version' in info:
+                    parts.extend(info['version'])
+                if not sys.version.startswith('3.4'):
+                    # apparently package_deps are broken in 3.4
+                    platform_deps = info.get('platform_deps')
+                    if platform_deps is not None:
+                        parts.append(';' + platform_deps)
+                item = ''.join(parts)
+                yield item
+    packages = list(gen_packages_items())
+    return packages
+def add_mim_extension():
+    """Add extra files that are required to support MIM into the package.
+    These files will be added by creating a symlink to the originals if the
+    package is installed in `editable` mode (e.g. pip install -e .), or by
+    copying from the originals otherwise.
+    """
+    # parse installment mode
+    if 'develop' in sys.argv:
+        # installed by `pip install -e .`
+        if platform.system() == 'Windows':
+            # set `copy` mode here since symlink fails on Windows.
+            mode = 'copy'
+        else:
+            mode = 'symlink'
+    elif 'sdist' in sys.argv or 'bdist_wheel' in sys.argv:
+        # installed by `pip install .`
+        # or create source distribution by `python setup.py sdist`
+        mode = 'copy'
+    else:
+        return
+    filenames = ['tools', 'configs', 'demo', 'model-index.yml']
+    repo_path = osp.dirname(__file__)
+    mim_path = osp.join(repo_path, 'mmdet', '.mim')
+    os.makedirs(mim_path, exist_ok=True)
+    for filename in filenames:
+        if osp.exists(filename):
+            src_path = osp.join(repo_path, filename)
+            tar_path = osp.join(mim_path, filename)
+            if osp.isfile(tar_path) or osp.islink(tar_path):
+                os.remove(tar_path)
+            elif osp.isdir(tar_path):
+                shutil.rmtree(tar_path)
+            if mode == 'symlink':
+                src_relpath = osp.relpath(src_path, osp.dirname(tar_path))
+                os.symlink(src_relpath, tar_path)
+            elif mode == 'copy':
+                if osp.isfile(src_path):
+                    shutil.copyfile(src_path, tar_path)
+                elif osp.isdir(src_path):
+                    shutil.copytree(src_path, tar_path)
+                else:
+                    warnings.warn(f'Cannot copy file {src_path}.')
+            else:
+                raise ValueError(f'Invalid mode {mode}')
+if __name__ == '__main__':
+    add_mim_extension()
+    setup(
+        name='mmdet',
+        version=get_version(),
+        description='OpenMMLab Detection Toolbox and Benchmark',
+        long_description=readme(),
+        long_description_content_type='text/markdown',
+        author='MMDetection Contributors',
+        author_email='[email protected]',
+        keywords='computer vision, object detection',
+        url='https://github.com/open-mmlab/mmdetection',
+        packages=find_packages(exclude=('configs', 'tools', 'demo')),
+        include_package_data=True,
+        classifiers=[
+            'Development Status :: 5 - Production/Stable',
+            'License :: OSI Approved :: Apache Software License',
+            'Operating System :: OS Independent',
+            'Programming Language :: Python :: 3',
+            'Programming Language :: Python :: 3.7',
+            'Programming Language :: Python :: 3.8',
+            'Programming Language :: Python :: 3.9',
+        ],
+        license='Apache License 2.0',
+        install_requires=parse_requirements('requirements/runtime.txt'),
+        extras_require={
+            'all': parse_requirements('requirements.txt'),
+            'tests': parse_requirements('requirements/tests.txt'),
+            'build': parse_requirements('requirements/build.txt'),
+            'optional': parse_requirements('requirements/optional.txt'),
+            'mim': parse_requirements('requirements/mminstall.txt'),
+        },
+        ext_modules=[],
+        cmdclass={'build_ext': BuildExtension},
+        zip_safe=False)

tools/dist_test.sh ADDED Viewed

	@@ -0,0 +1,20 @@

+#!/usr/bin/env bash
+CONFIG=$1
+GPUS=$2
+NNODES=${NNODES:-1}
+NODE_RANK=${NODE_RANK:-0}
+PORT=${PORT:-29500}
+MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
+PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
+python -m torch.distributed.launch \
+    --nnodes=$NNODES \
+    --node_rank=$NODE_RANK \
+    --master_addr=$MASTER_ADDR \
+    --nproc_per_node=$GPUS \
+    --master_port=$PORT \
+    $(dirname "$0")/test.py \
+    $CONFIG \
+    --launcher pytorch \
+    ${@:3}

tools/test.py ADDED Viewed

	@@ -0,0 +1,308 @@

+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import os
+import os.path as osp
+import time
+import warnings
+import mmcv
+import torch
+from mmcv import Config, DictAction
+from mmcv.cnn import fuse_conv_bn
+from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
+                         wrap_fp16_model)
+from mmdet.apis import multi_gpu_test, single_gpu_test
+from mmdet.datasets import (build_dataloader, build_dataset,
+                            replace_ImageToTensor)
+from mmdet.models import build_detector
+from mmdet.utils import (build_ddp, build_dp, compat_cfg, get_device,
+                         replace_cfg_vals, setup_multi_processes,
+                         update_data_root)
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='MMDet test (and eval) a model')
+    parser.add_argument('config', help='test config file path')
+    parser.add_argument(
+        '--work-dir',
+        help='the directory to save the file containing evaluation metrics')
+    parser.add_argument('--out', help='output result file in pickle format')
+    parser.add_argument(
+        '--fuse-conv-bn',
+        action='store_true',
+        help='Whether to fuse conv and bn, this will slightly increase'
+        'the inference speed')
+    parser.add_argument(
+        '--gpu-ids',
+        type=int,
+        nargs='+',
+        help='(Deprecated, please use --gpu-id) ids of gpus to use '
+        '(only applicable to non-distributed training)')
+    parser.add_argument(
+        '--gpu-id',
+        type=int,
+        default=0,
+        help='id of gpu to use '
+        '(only applicable to non-distributed testing)')
+    parser.add_argument(
+        '--format-only',
+        action='store_true',
+        help='Format the output results without perform evaluation. It is'
+        'useful when you want to format the result to a specific format and '
+        'submit it to the test server')
+    parser.add_argument(
+        '--eval',
+        type=str,
+        nargs='+',
+        help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
+        ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
+    parser.add_argument('--show', action='store_true', help='show results')
+    parser.add_argument(
+        '--show-dir', help='directory where painted images will be saved')
+    parser.add_argument(
+        '--show-score-thr',
+        type=float,
+        default=0.3,
+        help='score threshold (default: 0.3)')
+    parser.add_argument(
+        '--gpu-collect',
+        action='store_true',
+        help='whether to use gpu to collect results.')
+    parser.add_argument(
+        '--tmpdir',
+        help='tmp directory used for collecting results from multiple '
+        'workers, available when gpu-collect is not specified')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    parser.add_argument(
+        '--options',
+        nargs='+',
+        action=DictAction,
+        help='custom options for evaluation, the key-value pair in xxx=yyy '
+        'format will be kwargs for dataset.evaluate() function (deprecate), '
+        'change to --eval-options instead.')
+    parser.add_argument(
+        '--eval-options',
+        nargs='+',
+        action=DictAction,
+        help='custom options for evaluation, the key-value pair in xxx=yyy '
+        'format will be kwargs for dataset.evaluate() function')
+    parser.add_argument(
+        '--launcher',
+        choices=['none', 'pytorch', 'slurm', 'mpi'],
+        default='none',
+        help='job launcher')
+    parser.add_argument('--local_rank', type=int, default=0)
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+    if args.options and args.eval_options:
+        raise ValueError(
+            '--options and --eval-options cannot be both '
+            'specified, --options is deprecated in favor of --eval-options')
+    if args.options:
+        warnings.warn('--options is deprecated in favor of --eval-options')
+        args.eval_options = args.options
+    return args
+def main():
+    args = parse_args()
+    assert args.out or args.eval or args.format_only or args.show \
+        or args.show_dir, \
+        ('Please specify at least one operation (save/eval/format/show the '
+         'results / save the results) with the argument "--out", "--eval"'
+         ', "--format-only", "--show" or "--show-dir"')
+    if args.eval and args.format_only:
+        raise ValueError('--eval and --format_only cannot be both specified')
+    if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
+        raise ValueError('The output file must be a pkl file.')
+    cfg = Config.fromfile(args.config)
+    # replace the ${key} with the value of cfg.key
+    cfg = replace_cfg_vals(cfg)
+    # update data root according to MMDET_DATASETS
+    update_data_root(cfg)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    cfg = compat_cfg(cfg)
+    # set multi-process settings
+    setup_multi_processes(cfg)
+    # import modules from plguin/xx, registry will be updated
+    if hasattr(cfg, 'plugin'):
+        if cfg.plugin:
+            import importlib
+            if hasattr(cfg, 'plugin_dir'):
+                plugin_dir = cfg.plugin_dir
+                _module_dir = os.path.dirname(plugin_dir)
+                _module_dir = _module_dir.split('/')
+                _module_path = _module_dir[0]
+                for m in _module_dir[1:]:
+                    _module_path = _module_path + '.' + m
+                print(_module_path)
+                plg_lib = importlib.import_module(_module_path)
+            else:
+                # import dir is the dirpath for the config file
+                _module_dir = os.path.dirname(args.config)
+                _module_dir = _module_dir.split('/')
+                _module_path = _module_dir[0]
+                for m in _module_dir[1:]:
+                    _module_path = _module_path + '.' + m
+                # print(_module_path)
+                plg_lib = importlib.import_module(_module_path)
+    # set cudnn_benchmark
+    if cfg.get('cudnn_benchmark', False):
+        torch.backends.cudnn.benchmark = True
+    if 'pretrained' in cfg.model:
+        cfg.model.pretrained = None
+    elif (cfg.model.get('backbone', None) is not None
+          and 'init_cfg' in cfg.model.backbone):
+        cfg.model.backbone.init_cfg = None
+    if cfg.model.get('neck'):
+        if isinstance(cfg.model.neck, list):
+            for neck_cfg in cfg.model.neck:
+                if neck_cfg.get('rfp_backbone'):
+                    if neck_cfg.rfp_backbone.get('pretrained'):
+                        neck_cfg.rfp_backbone.pretrained = None
+        elif cfg.model.neck.get('rfp_backbone'):
+            if cfg.model.neck.rfp_backbone.get('pretrained'):
+                cfg.model.neck.rfp_backbone.pretrained = None
+    if args.gpu_ids is not None:
+        cfg.gpu_ids = args.gpu_ids[0:1]
+        warnings.warn('`--gpu-ids` is deprecated, please use `--gpu-id`. '
+                      'Because we only support single GPU mode in '
+                      'non-distributed testing. Use the first GPU '
+                      'in `gpu_ids` now.')
+    else:
+        cfg.gpu_ids = [args.gpu_id]
+    cfg.device = get_device()
+    # init distributed env first, since logger depends on the dist info.
+    if args.launcher == 'none':
+        distributed = False
+    else:
+        distributed = True
+        init_dist(args.launcher, **cfg.dist_params)
+    test_dataloader_default_args = dict(
+        samples_per_gpu=1, workers_per_gpu=2, dist=distributed, shuffle=False)
+    # in case the test dataset is concatenated
+    if isinstance(cfg.data.test, dict):
+        cfg.data.test.test_mode = True
+        if cfg.data.test_dataloader.get('samples_per_gpu', 1) > 1:
+            # Replace 'ImageToTensor' to 'DefaultFormatBundle'
+            cfg.data.test.pipeline = replace_ImageToTensor(
+                cfg.data.test.pipeline)
+    elif isinstance(cfg.data.test, list):
+        for ds_cfg in cfg.data.test:
+            ds_cfg.test_mode = True
+        if cfg.data.test_dataloader.get('samples_per_gpu', 1) > 1:
+            for ds_cfg in cfg.data.test:
+                ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
+    test_loader_cfg = {
+        **test_dataloader_default_args,
+        **cfg.data.get('test_dataloader', {})
+    }
+    rank, _ = get_dist_info()
+    # allows not to create
+    if args.work_dir is not None and rank == 0:
+        mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
+        timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
+        json_file = osp.join(args.work_dir, f'eval_{timestamp}.json')
+    # build the dataloader
+    dataset = build_dataset(cfg.data.test)
+    data_loader = build_dataloader(dataset, **test_loader_cfg)
+    # build the model and load checkpoint
+    cfg.model.train_cfg = None
+    model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
+    fp16_cfg = cfg.get('fp16', None)
+    if fp16_cfg is not None:
+        wrap_fp16_model(model)
+    # checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
+    checkpoint = {}
+    if args.fuse_conv_bn:
+        model = fuse_conv_bn(model)
+    # old versions did not save class info in checkpoints, this walkaround is
+    # for backward compatibility
+    if 'CLASSES' in checkpoint.get('meta', {}):
+        model.CLASSES = checkpoint['meta']['CLASSES']
+    else:
+        model.CLASSES = dataset.CLASSES
+    if not distributed:
+        model = build_dp(model, cfg.device, device_ids=cfg.gpu_ids)
+        outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
+                                  args.show_score_thr)
+    else:
+        model = build_ddp(
+            model,
+            cfg.device,
+            device_ids=[int(os.environ['LOCAL_RANK'])],
+            broadcast_buffers=False)
+        # In multi_gpu_test, if tmpdir is None, some tesnors
+        # will init on cuda by default, and no device choice supported.
+        # Init a tmpdir to avoid error on npu here.
+        if cfg.device == 'npu' and args.tmpdir is None:
+            args.tmpdir = './npu_tmpdir'
+        outputs = multi_gpu_test(
+            model, data_loader, args.tmpdir, args.gpu_collect
+            or cfg.evaluation.get('gpu_collect', False))
+    rank, _ = get_dist_info()
+    if rank == 0:
+        if args.out:
+            print(f'\nwriting results to {args.out}')
+            mmcv.dump(outputs, args.out)
+        kwargs = {} if args.eval_options is None else args.eval_options
+        if args.format_only:
+            dataset.format_results(outputs, **kwargs)
+        if args.eval:
+            eval_kwargs = cfg.get('evaluation', {}).copy()
+            # hard-code way to remove EvalHook args
+            for key in [
+                    'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
+                    'rule', 'dynamic_intervals'
+            ]:
+                eval_kwargs.pop(key, None)
+            eval_kwargs.update(dict(metric=args.eval, **kwargs))
+            metric = dataset.evaluate(outputs, **eval_kwargs)
+            print(metric)
+            metric_dict = dict(config=args.config, metric=metric)
+            if args.work_dir is not None and rank == 0:
+                mmcv.dump(metric_dict, json_file)
+if __name__ == '__main__':
+    main()