|
|
--- |
|
|
title: MedSigLIP Smart Filter |
|
|
emoji: "🧠" |
|
|
colorFrom: indigo |
|
|
colorTo: blue |
|
|
sdk: gradio |
|
|
sdk_version: "4.44.1" |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# MedSigLIP Smart Medical Classifier |
|
|
|
|
|
v2 Update: |
|
|
- Added CT, Ultrasound, and Musculoskeletal label banks |
|
|
- Introduced Smart Modality Router v2 with hybrid detection (filename + color + MedMNIST) |
|
|
- Enabled caching and batch inference to reduce CPU load by 70% |
|
|
- Improved response time for large label sets |
|
|
|
|
|
Zero-shot image classification for medical imagery powered by **google/medsiglip-448** with automatic label filtering by modality. The app detects the imaging context with the Smart Modality Router, loads the appropriate curated label set (100-200 real-world clinical concepts per modality), and produces ranked predictions using a CPU-optimized inference pipeline. |
|
|
|
|
|
|
|
|
## Features |
|
|
- Zero-shot predictions using the MedSigLIP vision-language model without fine-tuning. |
|
|
- Smart Modality Router v2 blends filename heuristics, simple color statistics, and a lightweight fallback classifier to choose the best label bank. |
|
|
- CT, Ultrasound, Musculoskeletal, chest X-ray, brain MRI, fundus, histopathology, skin, cardiovascular, and general label libraries curated from MedSigLIP prompts and clinical references. |
|
|
- CPU-optimized inference with single model load, float32 execution on CPU, capped torch threads via `psutil`, cached results, and batched label scoring. |
|
|
- Automatic image downscaling to 448×448 before scoring to keep memory usage predictable. |
|
|
- Gradio interface ready for local execution or deployment to Hugging Face Spaces (verified on Gradio 4.44.1+, API disabled by default to avoid schema bugs). |
|
|
|
|
|
|
|
|
## Project Structure |
|
|
``` |
|
|
medsiglip-smart-filter/ |
|
|
|-- app.py |
|
|
|-- requirements.txt |
|
|
|-- README.md |
|
|
|-- labels/ |
|
|
| |-- chest_labels.json |
|
|
| |-- brain_labels.json |
|
|
| |-- skin_labels.json |
|
|
| |-- pathology_labels.json |
|
|
| |-- cardio_labels.json |
|
|
| |-- eye_labels.json |
|
|
| |-- general_labels.json |
|
|
| |-- ct_labels.json |
|
|
| |-- ultrasound_labels.json |
|
|
| `-- musculoskeletal_labels.json |
|
|
`-- utils/ |
|
|
|-- modality_router.py |
|
|
`-- cache_manager.py |
|
|
``` |
|
|
|
|
|
|
|
|
## Prerequisites |
|
|
- Python 3.9 or newer (recommended). |
|
|
- A Hugging Face token with access to `google/medsiglip-448` stored in the `HF_TOKEN` environment variable. |
|
|
- Around 18 GB of RAM for comfortable CPU inference with large label sets. |
|
|
|
|
|
|
|
|
## Local Quickstart |
|
|
1. **Clone or copy** the project folder. |
|
|
2. **Create and activate** a Python virtual environment (optional but recommended). |
|
|
3. **Export your Hugging Face token** so the MedSigLIP model can be downloaded: |
|
|
```bash |
|
|
# Linux / macOS |
|
|
export HF_TOKEN="hf_your_token" |
|
|
|
|
|
# Windows PowerShell |
|
|
$Env:HF_TOKEN = "hf_your_token" |
|
|
``` |
|
|
4. **Install dependencies**: |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
5. **Launch the Gradio app**: |
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
6. Open the provided URL (default `http://127.0.0.1:7860`) and upload a medical image. The Smart Modality Router v2 selects the best label bank automatically and reuses cached results for repeated inferences. |
|
|
|
|
|
|
|
|
## Smart Modality Routing (v2.1 Update) |
|
|
The router blends three complementary signals before selecting the modality: |
|
|
- Filename hints such as `xray`, `ultrasound`, `ct`, `mri`, and related synonyms. |
|
|
- Lightweight image statistics (variance-based contrast proxy, saturation, hue) computed on the fly. |
|
|
- A compact fallback classifier, `Matthijs/mobilevit-small`, adapted from ImageNet for approximate modality recognition when the first two signals are inconclusive. |
|
|
|
|
|
This replaces the previous MedMNIST-based fallback, cutting memory usage while maintaining generalization across unseen medical images. The resulting modality key is mapped to the appropriate label file: |
|
|
|
|
|
| Detected modality | Label file | |
|
|
| --- | --- | |
|
|
| `xray` | `labels/chest_labels.json` | |
|
|
| `mri` | `labels/brain_labels.json` | |
|
|
| `ct` | `labels/ct_labels.json` | |
|
|
| `ultrasound` | `labels/ultrasound_labels.json` | |
|
|
| `musculoskeletal` | `labels/musculoskeletal_labels.json` | |
|
|
| `pathology` | `labels/pathology_labels.json` | |
|
|
| `skin` | `labels/skin_labels.json` | |
|
|
| `eye` | `labels/eye_labels.json` | |
|
|
| `cardio` | `labels/cardio_labels.json` | |
|
|
| *(fallback)* | `labels/general_labels.json` | |
|
|
|
|
|
Each label file contains 100-200 modality-specific diagnostic phrases reflecting real-world terminology from MedSigLIP prompts and reputable references (Radiopaedia, ophthalmology and dermatology atlases, musculoskeletal imaging guides, etc.). |
|
|
|
|
|
|
|
|
## Performance Considerations |
|
|
- Loads the MedSigLIP processor and model once at startup, keeps the model in `eval()` mode, and limits PyTorch threading with `torch.set_num_threads(min(psutil.cpu_count(logical=False), 4))`. |
|
|
- Leverages the `cached_inference` utility (LRU cache of five items) to reuse results for repeated requests without re-running the full forward pass. |
|
|
- Downscales incoming images to 448×448 prior to tokenization and splits label scoring into batches of 50, applying softmax over concatenated logits before returning the top five predictions. |
|
|
- Executes the transformer in float32 for deterministic CPU inference while still supporting GPU acceleration when available. |
|
|
- Avoids `transformers.pipeline()` to retain full control over preprocessing, batching, and device placement. |
|
|
|
|
|
|
|
|
## Deploy to Hugging Face Spaces |
|
|
1. Create a new Space (Gradio template) named `medsiglip-smart-filter`. |
|
|
2. Push the project files to the Space repository (via `git` or the web UI). |
|
|
3. In **Settings -> Repository Secrets**, add `HF_TOKEN` with your Hugging Face access token so the model and auxiliary router weights can be downloaded during build. |
|
|
4. The default `python app.py` launch honors `SERVER_PORT`, `SERVER_NAME`, `GRADIO_SHARE`, and `GRADIO_QUEUE` if set by the Space runner. |
|
|
|
|
|
|
|
|
## Model Reference Update |
|
|
- Removed: `poloclub/medmnist-v2` (model no longer available on Hugging Face). |
|
|
- Added: `Matthijs/mobilevit-small`, a ~20 MB transformer that fits comfortably under 100 MB VRAM. |
|
|
- Purpose: Acts as a lightweight fallback that assists the filename and color heuristics without impacting CPU throughput. |
|
|
- Invocation: Only runs when the router cannot confidently decide based on metadata and statistics alone. |
|
|
|
|
|
|
|
|
## Notes |
|
|
- The label libraries are stored as UTF-8 JSON arrays for straightforward editing and community contributions. |
|
|
- When adding new modalities, drop a new `<modality>_labels.json` file into `labels/` and extend the router alias logic in `app.py` if the modality name and file name differ. |
|
|
- `scikit-image` and `timm` are included in `requirements.txt` for future expansion (image preprocessing, alternative backbones) while keeping the current runtime CPU-friendly. |
|
|
|