clipspace

Sleeping

App Files Files Community

clipspace / README.md

borso271

Enhance README with comprehensive documentation

c190603 3 months ago

preview code

raw

history blame contribute delete

6.03 kB

	---
	title: MobileCLIP Image Classifier
	emoji: 📸
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# 📸 MobileCLIP-B Image Classifier

	Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence.

	## 🎯 Key Features

	### Core Capabilities
	- 🖼️ Zero-Shot Classification: Upload any image for instant classification without model retraining
	- 🏷️ Dynamic Label Management: Add, remove, and update classification labels on-the-fly
	- 📊 Interactive Results: Visual confidence scores with sortable data tables
	- ⚡ Optimized Performance: Sub-30ms inference on GPU with re-parameterized MobileOne blocks
	- 🔒 Secure Admin Panel: Token-protected label management interface
	- ☁️ Hub Persistence: Optional versioned label storage on Hugging Face Hub

	### API Access
	- REST API: Fully accessible via Gradio's automatic API endpoints
	- Base64 Support: Direct base64 image input for backend integration
	- Batch Processing: Efficient handling of multiple classification requests

	## 🏗️ Architecture

	### Components
	- `app.py`: Main Gradio interface with public/admin tabs and API endpoints
	- `handler.py`: Core model management, inference logic, and label operations
	- `reparam.py`: MobileOne re-parameterization for optimized inference
	- `items.json`: Default label catalog with metadata

	### Model Details
	- Architecture: MobileCLIP-B with re-parameterized MobileOne image encoder
	- Text Encoder: Optimized CLIP text transformer
	- Embedding Cache: Pre-computed text embeddings for fast inference
	- Device Support: Automatic GPU/CPU detection with float16 optimization

	## 🚀 Quick Start

	### Environment Variables

	Configure in your Space Settings → Variables and secrets:

	\| Variable \| Description \| Required \|
	\|----------\|-------------\|----------\|
	\| `ADMIN_TOKEN` \| Secret token for admin operations \| Yes (for admin) \|
	\| `HF_LABEL_REPO` \| Hub dataset for label storage (e.g., `user/labels`) \| No \|
	\| `HF_WRITE_TOKEN` \| Token with write permissions to dataset repo \| No \|
	\| `HF_READ_TOKEN` \| Token with read permissions (defaults to write token) \| No \|

	### Usage Examples

	#### Web Interface
	1. Navigate to the Space URL
	2. Upload an image in the Classification tab
	3. Adjust top-k results (default: 10)
	4. View ranked predictions with confidence scores

	#### API Usage

	Standard Classification:
	```python
	import requests

	response = requests.post(
	"YOUR_SPACE_URL/api/classify_image",
	files={"image": open("photo.jpg", "rb")},
	data={"top_k": 5}
	)
	results = response.json()
	```

	Base64 Input:
	```python
	import base64
	import requests

	with open("photo.jpg", "rb") as f:
	img_base64 = base64.b64encode(f.read()).decode()

	response = requests.post(
	"YOUR_SPACE_URL/api/classify_base64",
	json={
	"image": img_base64,
	"top_k": 10
	}
	)
	results = response.json()
	```

	## 🔧 Admin Operations

	### Label Management

	Authenticated admins can perform the following operations:

	#### Add Labels
	```json
	{
	"op": "upsert_labels",
	"token": "YOUR_ADMIN_TOKEN",
	"items": [
	{"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"},
	{"id": 101, "name": "airplane", "prompt": "a photo of an airplane"}
	]
	}
	```

	#### Reload Specific Version
	```json
	{
	"op": "reload_labels",
	"token": "YOUR_ADMIN_TOKEN",
	"version": 5
	}
	```

	#### Remove Labels
	```json
	{
	"op": "remove_labels",
	"token": "YOUR_ADMIN_TOKEN",
	"ids": [100, 101]
	}
	```

	### Label Deduplication
	- Automatic case-insensitive name deduplication
	- Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same)
	- ID-based deduplication for consistent label management

	## 📦 Hub Integration

	When configured with `HF_LABEL_REPO` and tokens, the system automatically:

	1. Saves Snapshots: Each label update creates versioned snapshots
	- `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings
	- `snapshots/v{N}/meta.json`: Label metadata and model info
	- `snapshots/latest.json`: Points to current version

	2. Loads on Startup: Fetches latest snapshot or specified version
	3. Fallback: Uses local `items.json` if Hub unavailable

	## 🎨 Default Label Catalog

	The bundled `items.json` includes 50+ kid-friendly objects with:
	- Unique IDs and display names
	- CLIP-optimized prompts
	- Category metadata
	- Fun facts and rarity ratings

	Categories include animals, toys, food, vehicles, nature, and everyday objects.

	## ⚡ Performance Optimization

	- GPU Acceleration: Automatic CUDA detection with float16 inference
	- CPU Fallback: Graceful degradation with float32 precision
	- Embedding Cache: Pre-computed text embeddings updated on label changes
	- Re-parameterization: MobileOne blocks optimized for inference speed
	- Batch Processing: Efficient matrix operations for multi-label scoring

	## 🔐 Security Considerations

	- Token Protection: Admin operations require `ADMIN_TOKEN`
	- Private Datasets: Keep label repos private for sensitive applications
	- Input Validation: Automatic sanitization of uploaded images
	- Memory Management: Images processed and discarded after inference

	## 📄 License

	- Model Weights: Apple Sample Code License (ASCL)
	- Interface Code: MIT License

	## 🤝 Contributing

	Contributions welcome! Areas for improvement:
	- Additional label management features
	- Performance optimizations
	- Extended API capabilities
	- Multi-language support

	## 📚 Resources

	- [MobileCLIP Paper](https://arxiv.org/abs/2311.17049)
	- [OpenCLIP Library](https://github.com/mlfoundations/open_clip)
	- [Gradio Documentation](https://gradio.app/docs)
	- [Hugging Face Spaces](https://huggingface.co/spaces)