clipspace / README.md
borso271's picture
Enhance README with comprehensive documentation
c190603
---
title: MobileCLIP Image Classifier
emoji: πŸ“Έ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# πŸ“Έ MobileCLIP-B Image Classifier
Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence.
## 🎯 Key Features
### Core Capabilities
- **πŸ–ΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining
- **🏷️ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly
- **πŸ“Š Interactive Results**: Visual confidence scores with sortable data tables
- **⚑ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks
- **πŸ”’ Secure Admin Panel**: Token-protected label management interface
- **☁️ Hub Persistence**: Optional versioned label storage on Hugging Face Hub
### API Access
- **REST API**: Fully accessible via Gradio's automatic API endpoints
- **Base64 Support**: Direct base64 image input for backend integration
- **Batch Processing**: Efficient handling of multiple classification requests
## πŸ—οΈ Architecture
### Components
- **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints
- **`handler.py`**: Core model management, inference logic, and label operations
- **`reparam.py`**: MobileOne re-parameterization for optimized inference
- **`items.json`**: Default label catalog with metadata
### Model Details
- **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder
- **Text Encoder**: Optimized CLIP text transformer
- **Embedding Cache**: Pre-computed text embeddings for fast inference
- **Device Support**: Automatic GPU/CPU detection with float16 optimization
## πŸš€ Quick Start
### Environment Variables
Configure in your Space Settings β†’ Variables and secrets:
| Variable | Description | Required |
|----------|-------------|----------|
| `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) |
| `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No |
| `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No |
| `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No |
### Usage Examples
#### Web Interface
1. Navigate to the Space URL
2. Upload an image in the Classification tab
3. Adjust top-k results (default: 10)
4. View ranked predictions with confidence scores
#### API Usage
**Standard Classification:**
```python
import requests
response = requests.post(
"YOUR_SPACE_URL/api/classify_image",
files={"image": open("photo.jpg", "rb")},
data={"top_k": 5}
)
results = response.json()
```
**Base64 Input:**
```python
import base64
import requests
with open("photo.jpg", "rb") as f:
img_base64 = base64.b64encode(f.read()).decode()
response = requests.post(
"YOUR_SPACE_URL/api/classify_base64",
json={
"image": img_base64,
"top_k": 10
}
)
results = response.json()
```
## πŸ”§ Admin Operations
### Label Management
Authenticated admins can perform the following operations:
#### Add Labels
```json
{
"op": "upsert_labels",
"token": "YOUR_ADMIN_TOKEN",
"items": [
{"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"},
{"id": 101, "name": "airplane", "prompt": "a photo of an airplane"}
]
}
```
#### Reload Specific Version
```json
{
"op": "reload_labels",
"token": "YOUR_ADMIN_TOKEN",
"version": 5
}
```
#### Remove Labels
```json
{
"op": "remove_labels",
"token": "YOUR_ADMIN_TOKEN",
"ids": [100, 101]
}
```
### Label Deduplication
- Automatic case-insensitive name deduplication
- Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same)
- ID-based deduplication for consistent label management
## πŸ“¦ Hub Integration
When configured with `HF_LABEL_REPO` and tokens, the system automatically:
1. **Saves Snapshots**: Each label update creates versioned snapshots
- `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings
- `snapshots/v{N}/meta.json`: Label metadata and model info
- `snapshots/latest.json`: Points to current version
2. **Loads on Startup**: Fetches latest snapshot or specified version
3. **Fallback**: Uses local `items.json` if Hub unavailable
## 🎨 Default Label Catalog
The bundled `items.json` includes 50+ kid-friendly objects with:
- Unique IDs and display names
- CLIP-optimized prompts
- Category metadata
- Fun facts and rarity ratings
Categories include animals, toys, food, vehicles, nature, and everyday objects.
## ⚑ Performance Optimization
- **GPU Acceleration**: Automatic CUDA detection with float16 inference
- **CPU Fallback**: Graceful degradation with float32 precision
- **Embedding Cache**: Pre-computed text embeddings updated on label changes
- **Re-parameterization**: MobileOne blocks optimized for inference speed
- **Batch Processing**: Efficient matrix operations for multi-label scoring
## πŸ” Security Considerations
- **Token Protection**: Admin operations require `ADMIN_TOKEN`
- **Private Datasets**: Keep label repos private for sensitive applications
- **Input Validation**: Automatic sanitization of uploaded images
- **Memory Management**: Images processed and discarded after inference
## πŸ“„ License
- **Model Weights**: Apple Sample Code License (ASCL)
- **Interface Code**: MIT License
## 🀝 Contributing
Contributions welcome! Areas for improvement:
- Additional label management features
- Performance optimizations
- Extended API capabilities
- Multi-language support
## πŸ“š Resources
- [MobileCLIP Paper](https://arxiv.org/abs/2311.17049)
- [OpenCLIP Library](https://github.com/mlfoundations/open_clip)
- [Gradio Documentation](https://gradio.app/docs)
- [Hugging Face Spaces](https://huggingface.co/spaces)