|
|
--- |
|
|
title: MCP Image Analysis Tool |
|
|
emoji: πΌοΈ |
|
|
colorFrom: yellow |
|
|
colorTo: red |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
tags: |
|
|
- mcp-server-track |
|
|
- mcp |
|
|
- computer-vision |
|
|
- image-analysis |
|
|
- ai-captioning |
|
|
- hackathon |
|
|
- gradio |
|
|
python_version: 3.11.8 |
|
|
--- |
|
|
|
|
|
# πΌοΈ MCP Image Analysis Tool |
|
|
|
|
|
An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications. |
|
|
|
|
|
## β¨ Features |
|
|
|
|
|
- **π― Smart Image Captioning**: Generate detailed, contextual descriptions of images |
|
|
- **π Object Detection**: Identify and locate objects, people, and scenes in images |
|
|
- **π Content Analysis**: Extract metadata, colors, composition, and visual elements |
|
|
- **π MCP Server**: Compliant with Model Context Protocol for AI assistant integration |
|
|
- **π¨ Interactive UI**: Modern Gradio interface with image upload and preview |
|
|
- **β‘ Fast Processing**: Efficient AI-powered visual analysis and description |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Using the Web Interface |
|
|
|
|
|
1. **Visit this Space** and interact with the web interface directly |
|
|
2. **Upload an image** using the drag-and-drop interface or file browser |
|
|
3. **Select analysis type** (caption, objects, detailed analysis) |
|
|
4. **Click Analyze** to get comprehensive image insights |
|
|
5. **View results** including descriptions, detected objects, and metadata |
|
|
|
|
|
### Supported Image Formats |
|
|
|
|
|
- πΈ **JPEG/JPG**: Standard photo format with full analysis support |
|
|
- πΌοΈ **PNG**: Images with transparency and high-quality graphics |
|
|
- π¨ **WebP**: Modern web format with efficient compression |
|
|
- π **BMP**: Bitmap images with detailed pixel analysis |
|
|
- π **GIF**: Static GIF analysis (first frame) |
|
|
|
|
|
## π MCP Server Integration |
|
|
|
|
|
This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities. |
|
|
|
|
|
### MCP Endpoint Details |
|
|
|
|
|
- **Endpoint URL**: `https://[this-space-url]/gradio_api/mcp/sse` |
|
|
- **HTTP Method**: `POST` |
|
|
- **Content-Type**: `application/json` |
|
|
|
|
|
### Request Format |
|
|
|
|
|
Send a POST request with the following JSON payload: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"data": [ |
|
|
"<image_file_or_base64>", |
|
|
"<analysis_type>" |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
**Parameters:** |
|
|
- `data[0]` (string): Image file path, URL, or base64-encoded image data |
|
|
- `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility") |
|
|
|
|
|
### Response Format |
|
|
|
|
|
Successful responses return: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"data": [ |
|
|
{ |
|
|
"status": "success", |
|
|
"analysis_type": "detailed", |
|
|
"results": { |
|
|
"caption": "A modern office workspace with a laptop computer on a wooden desk...", |
|
|
"objects": [ |
|
|
{ |
|
|
"label": "laptop", |
|
|
"confidence": 0.95, |
|
|
"location": "center-left" |
|
|
}, |
|
|
{ |
|
|
"label": "coffee cup", |
|
|
"confidence": 0.87, |
|
|
"location": "top-right" |
|
|
} |
|
|
], |
|
|
"scene": "indoor office", |
|
|
"colors": ["brown", "silver", "white"], |
|
|
"mood": "professional, organized", |
|
|
"accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface" |
|
|
}, |
|
|
"metadata": { |
|
|
"width": 1024, |
|
|
"height": 768, |
|
|
"format": "JPEG", |
|
|
"size_kb": 245 |
|
|
} |
|
|
} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
Error responses return: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"data": ["β Error: Unable to process image or unsupported format"] |
|
|
} |
|
|
``` |
|
|
|
|
|
### Example MCP Request |
|
|
|
|
|
```bash |
|
|
curl -X POST https://[space-url]/gradio_api/mcp/sse \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"data": [ |
|
|
"https://example.com/image.jpg", |
|
|
"detailed" |
|
|
] |
|
|
}' |
|
|
``` |
|
|
|
|
|
### Integration Examples |
|
|
|
|
|
#### Python Integration |
|
|
|
|
|
```python |
|
|
import requests |
|
|
import base64 |
|
|
|
|
|
def call_mcp_image_analyzer(image_path, analysis_type="caption"): |
|
|
# Convert image to base64 |
|
|
with open(image_path, "rb") as img_file: |
|
|
img_base64 = base64.b64encode(img_file.read()).decode() |
|
|
|
|
|
url = "https://[space-url]/gradio_api/mcp/sse" |
|
|
payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]} |
|
|
|
|
|
response = requests.post(url, json=payload) |
|
|
if response.status_code == 200: |
|
|
result = response.json() |
|
|
return result["data"][0] |
|
|
else: |
|
|
return f"Error: {response.status_code}" |
|
|
|
|
|
# Example usage |
|
|
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed") |
|
|
print(f"Caption: {analysis['results']['caption']}") |
|
|
print(f"Objects: {analysis['results']['objects']}") |
|
|
``` |
|
|
|
|
|
#### Claude/AI Assistant Integration |
|
|
|
|
|
When integrating with Claude or other AI assistants supporting MCP: |
|
|
|
|
|
1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint |
|
|
2. Use the tool in conversations: "Analyze this image and describe what you see..." |
|
|
3. The AI assistant will automatically format image data and parse visual descriptions |
|
|
|
|
|
## π οΈ Technical Details |
|
|
|
|
|
### Analysis Capabilities |
|
|
|
|
|
- **πΌοΈ Image Captioning** |
|
|
- Detailed scene descriptions |
|
|
- Context-aware narratives |
|
|
- Multiple caption styles (descriptive, creative, technical) |
|
|
- Accessibility-focused descriptions |
|
|
|
|
|
- **π― Object Detection** |
|
|
- Common objects and items |
|
|
- People and faces (privacy-conscious) |
|
|
- Animals and nature elements |
|
|
- Text and document detection |
|
|
- Spatial relationships and positioning |
|
|
|
|
|
- **π¨ Visual Analysis** |
|
|
- Color palette extraction |
|
|
- Composition analysis |
|
|
- Mood and atmosphere detection |
|
|
- Style and aesthetic classification |
|
|
- Technical image properties |
|
|
|
|
|
### AI Models |
|
|
|
|
|
- **Vision Model**: Advanced computer vision models via Hugging Face |
|
|
- **Captioning**: Specialized image-to-text models |
|
|
- **Object Detection**: YOLO-based and transformer models |
|
|
- **Scene Analysis**: Multi-modal AI for comprehensive understanding |
|
|
- **Accuracy**: High-quality results with confidence scoring |
|
|
|
|
|
### API Configuration |
|
|
|
|
|
- **Image Size Limits**: 10MB maximum file size |
|
|
- **Supported Formats**: JPEG, PNG, WebP, BMP, GIF |
|
|
- **Processing Time**: 5-30 seconds depending on analysis type |
|
|
- **Rate Limiting**: Standard Gradio Space limits |
|
|
- **Privacy**: Images processed in-memory only, not stored |
|
|
|
|
|
## π Hackathon Submission |
|
|
|
|
|
This tool is submitted for the **MCP Server Track** of the hackathon, demonstrating: |
|
|
|
|
|
- β
**MCP Protocol Compliance**: Full implementation of MCP server specification |
|
|
- β
**Production Ready**: Enterprise-grade computer vision capabilities |
|
|
- β
**User Experience**: Intuitive image upload with real-time preview |
|
|
- β
**Documentation**: Comprehensive API documentation and examples |
|
|
- β
**Integration Ready**: Easy to integrate with AI assistants and workflows |
|
|
|
|
|
## π― Use Cases for AI Assistants |
|
|
|
|
|
When integrated with AI assistants via MCP, this tool enables: |
|
|
|
|
|
1. **Content Creation**: "Describe this image for social media caption" |
|
|
2. **Accessibility Support**: "Generate alt-text for this website image" |
|
|
3. **Document Analysis**: "Extract text and analyze this screenshot" |
|
|
4. **Quality Assessment**: "Analyze the composition and quality of this photo" |
|
|
5. **Educational Support**: "Explain what's happening in this historical image" |
|
|
|
|
|
## π§ Local Development |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- Python 3.11+ |
|
|
- Computer vision libraries (PIL, OpenCV) |
|
|
- AI model dependencies (transformers, torch) |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
# Clone this repository |
|
|
git clone [repository-url] |
|
|
cd mcp_image_tool_gradio |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Run the application |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
### Testing MCP Endpoint Locally |
|
|
|
|
|
```bash |
|
|
# Test with curl (using image URL) |
|
|
curl -X POST http://localhost:7860/gradio_api/mcp/sse \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{"data": ["https://example.com/test-image.jpg", "caption"]}' |
|
|
``` |
|
|
|
|
|
## π Performance & Limitations |
|
|
|
|
|
### Strengths |
|
|
- High-quality image descriptions |
|
|
- Multi-format image support |
|
|
- Fast processing with GPU acceleration |
|
|
- MCP protocol compliance |
|
|
- Privacy-focused processing |
|
|
|
|
|
### Limitations |
|
|
- 10MB file size limit |
|
|
- Processing time varies with image complexity |
|
|
- Limited to static images (no video) |
|
|
- Requires internet for some AI models |
|
|
- Best performance with clear, well-lit images |
|
|
|
|
|
## π Privacy & Security |
|
|
|
|
|
- **No Image Storage**: Images processed in-memory only |
|
|
- **Privacy First**: No logging of uploaded images |
|
|
- **Secure Processing**: Sandboxed analysis environment |
|
|
- **Data Protection**: GDPR-compliant image handling |
|
|
- **Content Safety**: Appropriate content filtering |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License - Feel free to use and modify for your projects. |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
This is a hackathon submission, but feedback and suggestions are welcome! Feel free to: |
|
|
|
|
|
- Test image analysis with different photo types |
|
|
- Report accuracy issues or missed objects |
|
|
- Suggest additional analysis features |
|
|
- Contribute new use cases and examples |
|
|
|
|
|
## π·οΈ Tags |
|
|
|
|
|
`#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol` |
|
|
|
|
|
--- |
|
|
|
|
|
**Built with β€οΈ for the MCP Hackathon** |