|
|
--- |
|
|
title: MobileCLIP Image Classifier |
|
|
emoji: πΈ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# πΈ MobileCLIP-B Image Classifier |
|
|
|
|
|
Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence. |
|
|
|
|
|
## π― Key Features |
|
|
|
|
|
### Core Capabilities |
|
|
- **πΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining |
|
|
- **π·οΈ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly |
|
|
- **π Interactive Results**: Visual confidence scores with sortable data tables |
|
|
- **β‘ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks |
|
|
- **π Secure Admin Panel**: Token-protected label management interface |
|
|
- **βοΈ Hub Persistence**: Optional versioned label storage on Hugging Face Hub |
|
|
|
|
|
### API Access |
|
|
- **REST API**: Fully accessible via Gradio's automatic API endpoints |
|
|
- **Base64 Support**: Direct base64 image input for backend integration |
|
|
- **Batch Processing**: Efficient handling of multiple classification requests |
|
|
|
|
|
## ποΈ Architecture |
|
|
|
|
|
### Components |
|
|
- **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints |
|
|
- **`handler.py`**: Core model management, inference logic, and label operations |
|
|
- **`reparam.py`**: MobileOne re-parameterization for optimized inference |
|
|
- **`items.json`**: Default label catalog with metadata |
|
|
|
|
|
### Model Details |
|
|
- **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder |
|
|
- **Text Encoder**: Optimized CLIP text transformer |
|
|
- **Embedding Cache**: Pre-computed text embeddings for fast inference |
|
|
- **Device Support**: Automatic GPU/CPU detection with float16 optimization |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Environment Variables |
|
|
|
|
|
Configure in your Space Settings β Variables and secrets: |
|
|
|
|
|
| Variable | Description | Required | |
|
|
|----------|-------------|----------| |
|
|
| `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) | |
|
|
| `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No | |
|
|
| `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No | |
|
|
| `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No | |
|
|
|
|
|
### Usage Examples |
|
|
|
|
|
#### Web Interface |
|
|
1. Navigate to the Space URL |
|
|
2. Upload an image in the Classification tab |
|
|
3. Adjust top-k results (default: 10) |
|
|
4. View ranked predictions with confidence scores |
|
|
|
|
|
#### API Usage |
|
|
|
|
|
**Standard Classification:** |
|
|
```python |
|
|
import requests |
|
|
|
|
|
response = requests.post( |
|
|
"YOUR_SPACE_URL/api/classify_image", |
|
|
files={"image": open("photo.jpg", "rb")}, |
|
|
data={"top_k": 5} |
|
|
) |
|
|
results = response.json() |
|
|
``` |
|
|
|
|
|
**Base64 Input:** |
|
|
```python |
|
|
import base64 |
|
|
import requests |
|
|
|
|
|
with open("photo.jpg", "rb") as f: |
|
|
img_base64 = base64.b64encode(f.read()).decode() |
|
|
|
|
|
response = requests.post( |
|
|
"YOUR_SPACE_URL/api/classify_base64", |
|
|
json={ |
|
|
"image": img_base64, |
|
|
"top_k": 10 |
|
|
} |
|
|
) |
|
|
results = response.json() |
|
|
``` |
|
|
|
|
|
## π§ Admin Operations |
|
|
|
|
|
### Label Management |
|
|
|
|
|
Authenticated admins can perform the following operations: |
|
|
|
|
|
#### Add Labels |
|
|
```json |
|
|
{ |
|
|
"op": "upsert_labels", |
|
|
"token": "YOUR_ADMIN_TOKEN", |
|
|
"items": [ |
|
|
{"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"}, |
|
|
{"id": 101, "name": "airplane", "prompt": "a photo of an airplane"} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
#### Reload Specific Version |
|
|
```json |
|
|
{ |
|
|
"op": "reload_labels", |
|
|
"token": "YOUR_ADMIN_TOKEN", |
|
|
"version": 5 |
|
|
} |
|
|
``` |
|
|
|
|
|
#### Remove Labels |
|
|
```json |
|
|
{ |
|
|
"op": "remove_labels", |
|
|
"token": "YOUR_ADMIN_TOKEN", |
|
|
"ids": [100, 101] |
|
|
} |
|
|
``` |
|
|
|
|
|
### Label Deduplication |
|
|
- Automatic case-insensitive name deduplication |
|
|
- Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same) |
|
|
- ID-based deduplication for consistent label management |
|
|
|
|
|
## π¦ Hub Integration |
|
|
|
|
|
When configured with `HF_LABEL_REPO` and tokens, the system automatically: |
|
|
|
|
|
1. **Saves Snapshots**: Each label update creates versioned snapshots |
|
|
- `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings |
|
|
- `snapshots/v{N}/meta.json`: Label metadata and model info |
|
|
- `snapshots/latest.json`: Points to current version |
|
|
|
|
|
2. **Loads on Startup**: Fetches latest snapshot or specified version |
|
|
3. **Fallback**: Uses local `items.json` if Hub unavailable |
|
|
|
|
|
## π¨ Default Label Catalog |
|
|
|
|
|
The bundled `items.json` includes 50+ kid-friendly objects with: |
|
|
- Unique IDs and display names |
|
|
- CLIP-optimized prompts |
|
|
- Category metadata |
|
|
- Fun facts and rarity ratings |
|
|
|
|
|
Categories include animals, toys, food, vehicles, nature, and everyday objects. |
|
|
|
|
|
## β‘ Performance Optimization |
|
|
|
|
|
- **GPU Acceleration**: Automatic CUDA detection with float16 inference |
|
|
- **CPU Fallback**: Graceful degradation with float32 precision |
|
|
- **Embedding Cache**: Pre-computed text embeddings updated on label changes |
|
|
- **Re-parameterization**: MobileOne blocks optimized for inference speed |
|
|
- **Batch Processing**: Efficient matrix operations for multi-label scoring |
|
|
|
|
|
## π Security Considerations |
|
|
|
|
|
- **Token Protection**: Admin operations require `ADMIN_TOKEN` |
|
|
- **Private Datasets**: Keep label repos private for sensitive applications |
|
|
- **Input Validation**: Automatic sanitization of uploaded images |
|
|
- **Memory Management**: Images processed and discarded after inference |
|
|
|
|
|
## π License |
|
|
|
|
|
- **Model Weights**: Apple Sample Code License (ASCL) |
|
|
- **Interface Code**: MIT License |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
Contributions welcome! Areas for improvement: |
|
|
- Additional label management features |
|
|
- Performance optimizations |
|
|
- Extended API capabilities |
|
|
- Multi-language support |
|
|
|
|
|
## π Resources |
|
|
|
|
|
- [MobileCLIP Paper](https://arxiv.org/abs/2311.17049) |
|
|
- [OpenCLIP Library](https://github.com/mlfoundations/open_clip) |
|
|
- [Gradio Documentation](https://gradio.app/docs) |
|
|
- [Hugging Face Spaces](https://huggingface.co/spaces) |