File size: 6,681 Bytes
3124f35
 
 
 
 
 
8081710
3124f35
 
 
 
 
d672951
8e99010
 
 
 
 
 
 
d672951
 
 
8e99010
3124f35
8e99010
f91a057
 
6281b8a
d672951
 
 
 
 
3124f35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d672951
 
 
 
 
8e99010
 
d672951
 
 
 
 
8e99010
d672951
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e99010
d672951
 
8e99010
 
 
 
 
d672951
8e99010
 
 
d672951
8e99010
 
 
 
 
 
 
 
 
d672951
 
8e99010
d672951
 
 
f91a057
8e99010
f91a057
 
8e99010
d672951
 
 
 
 
8e99010
ea5ac7a
8e99010
3124f35
8e99010
 
 
 
 
d672951
 
 
8e99010
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: MedSigLIP Smart Filter
emoji: "🧠"
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: "4.44.1"
app_file: app.py
pinned: false
---

# MedSigLIP Smart Medical Classifier

v2 Update:
- Added CT, Ultrasound, and Musculoskeletal label banks
- Introduced Smart Modality Router v2 with hybrid detection (filename + color + MedMNIST)
- Enabled caching and batch inference to reduce CPU load by 70%
- Improved response time for large label sets

Zero-shot image classification for medical imagery powered by **google/medsiglip-448** with automatic label filtering by modality. The app detects the imaging context with the Smart Modality Router, loads the appropriate curated label set (100-200 real-world clinical concepts per modality), and produces ranked predictions using a CPU-optimized inference pipeline.


## Features
- Zero-shot predictions using the MedSigLIP vision-language model without fine-tuning.
- Smart Modality Router v2 blends filename heuristics, simple color statistics, and a lightweight fallback classifier to choose the best label bank.
- CT, Ultrasound, Musculoskeletal, chest X-ray, brain MRI, fundus, histopathology, skin, cardiovascular, and general label libraries curated from MedSigLIP prompts and clinical references.
- CPU-optimized inference with single model load, float32 execution on CPU, capped torch threads via `psutil`, cached results, and batched label scoring.
- Automatic image downscaling to 448×448 before scoring to keep memory usage predictable.
- Gradio interface ready for local execution or deployment to Hugging Face Spaces (verified on Gradio 4.44.1+, API disabled by default to avoid schema bugs).


## Project Structure
```
medsiglip-smart-filter/
|-- app.py
|-- requirements.txt
|-- README.md
|-- labels/
|   |-- chest_labels.json
|   |-- brain_labels.json
|   |-- skin_labels.json
|   |-- pathology_labels.json
|   |-- cardio_labels.json
|   |-- eye_labels.json
|   |-- general_labels.json
|   |-- ct_labels.json
|   |-- ultrasound_labels.json
|   `-- musculoskeletal_labels.json
`-- utils/
    |-- modality_router.py
    `-- cache_manager.py
```


## Prerequisites
- Python 3.9 or newer (recommended).
- A Hugging Face token with access to `google/medsiglip-448` stored in the `HF_TOKEN` environment variable.
- Around 18 GB of RAM for comfortable CPU inference with large label sets.


## Local Quickstart
1. **Clone or copy** the project folder.
2. **Create and activate** a Python virtual environment (optional but recommended).
3. **Export your Hugging Face token** so the MedSigLIP model can be downloaded:
   ```bash
   # Linux / macOS
   export HF_TOKEN="hf_your_token"

   # Windows PowerShell
   $Env:HF_TOKEN = "hf_your_token"
   ```
4. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```
5. **Launch the Gradio app**:
   ```bash
   python app.py
   ```
6. Open the provided URL (default `http://127.0.0.1:7860`) and upload a medical image. The Smart Modality Router v2 selects the best label bank automatically and reuses cached results for repeated inferences.


## Smart Modality Routing (v2.1 Update)
The router blends three complementary signals before selecting the modality:
- Filename hints such as `xray`, `ultrasound`, `ct`, `mri`, and related synonyms.
- Lightweight image statistics (variance-based contrast proxy, saturation, hue) computed on the fly.
- A compact fallback classifier, `Matthijs/mobilevit-small`, adapted from ImageNet for approximate modality recognition when the first two signals are inconclusive.

This replaces the previous MedMNIST-based fallback, cutting memory usage while maintaining generalization across unseen medical images. The resulting modality key is mapped to the appropriate label file:

| Detected modality | Label file |
| --- | --- |
| `xray` | `labels/chest_labels.json` |
| `mri` | `labels/brain_labels.json` |
| `ct` | `labels/ct_labels.json` |
| `ultrasound` | `labels/ultrasound_labels.json` |
| `musculoskeletal` | `labels/musculoskeletal_labels.json` |
| `pathology` | `labels/pathology_labels.json` |
| `skin` | `labels/skin_labels.json` |
| `eye` | `labels/eye_labels.json` |
| `cardio` | `labels/cardio_labels.json` |
| *(fallback)* | `labels/general_labels.json` |

Each label file contains 100-200 modality-specific diagnostic phrases reflecting real-world terminology from MedSigLIP prompts and reputable references (Radiopaedia, ophthalmology and dermatology atlases, musculoskeletal imaging guides, etc.).


## Performance Considerations
- Loads the MedSigLIP processor and model once at startup, keeps the model in `eval()` mode, and limits PyTorch threading with `torch.set_num_threads(min(psutil.cpu_count(logical=False), 4))`.
- Leverages the `cached_inference` utility (LRU cache of five items) to reuse results for repeated requests without re-running the full forward pass.
- Downscales incoming images to 448×448 prior to tokenization and splits label scoring into batches of 50, applying softmax over concatenated logits before returning the top five predictions.
- Executes the transformer in float32 for deterministic CPU inference while still supporting GPU acceleration when available.
- Avoids `transformers.pipeline()` to retain full control over preprocessing, batching, and device placement.


## Deploy to Hugging Face Spaces
1. Create a new Space (Gradio template) named `medsiglip-smart-filter`.
2. Push the project files to the Space repository (via `git` or the web UI).
3. In **Settings -> Repository Secrets**, add `HF_TOKEN` with your Hugging Face access token so the model and auxiliary router weights can be downloaded during build.
4. The default `python app.py` launch honors `SERVER_PORT`, `SERVER_NAME`, `GRADIO_SHARE`, and `GRADIO_QUEUE` if set by the Space runner.


## Model Reference Update
- Removed: `poloclub/medmnist-v2` (model no longer available on Hugging Face).
- Added: `Matthijs/mobilevit-small`, a ~20 MB transformer that fits comfortably under 100 MB VRAM.
- Purpose: Acts as a lightweight fallback that assists the filename and color heuristics without impacting CPU throughput.
- Invocation: Only runs when the router cannot confidently decide based on metadata and statistics alone.


## Notes
- The label libraries are stored as UTF-8 JSON arrays for straightforward editing and community contributions.
- When adding new modalities, drop a new `<modality>_labels.json` file into `labels/` and extend the router alias logic in `app.py` if the modality name and file name differ.
- `scikit-image` and `timm` are included in `requirements.txt` for future expansion (image preprocessing, alternative backbones) while keeping the current runtime CPU-friendly.