BasalGanglia's picture
πŸ› οΈ Fix HuggingFace Space configuration - Remove quotes from frontmatter
64ced8b verified
---
title: MCP Image Analysis Tool
emoji: πŸ–ΌοΈ
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
- mcp-server-track
- mcp
- computer-vision
- image-analysis
- ai-captioning
- hackathon
- gradio
python_version: 3.11.8
---
# πŸ–ΌοΈ MCP Image Analysis Tool
An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.
## ✨ Features
- **🎯 Smart Image Captioning**: Generate detailed, contextual descriptions of images
- **πŸ” Object Detection**: Identify and locate objects, people, and scenes in images
- **πŸ“Š Content Analysis**: Extract metadata, colors, composition, and visual elements
- **πŸ”Œ MCP Server**: Compliant with Model Context Protocol for AI assistant integration
- **🎨 Interactive UI**: Modern Gradio interface with image upload and preview
- **⚑ Fast Processing**: Efficient AI-powered visual analysis and description
## πŸš€ Quick Start
### Using the Web Interface
1. **Visit this Space** and interact with the web interface directly
2. **Upload an image** using the drag-and-drop interface or file browser
3. **Select analysis type** (caption, objects, detailed analysis)
4. **Click Analyze** to get comprehensive image insights
5. **View results** including descriptions, detected objects, and metadata
### Supported Image Formats
- πŸ“Έ **JPEG/JPG**: Standard photo format with full analysis support
- πŸ–ΌοΈ **PNG**: Images with transparency and high-quality graphics
- 🎨 **WebP**: Modern web format with efficient compression
- πŸ“Š **BMP**: Bitmap images with detailed pixel analysis
- 🎭 **GIF**: Static GIF analysis (first frame)
## πŸ”Œ MCP Server Integration
This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.
### MCP Endpoint Details
- **Endpoint URL**: `https://[this-space-url]/gradio_api/mcp/sse`
- **HTTP Method**: `POST`
- **Content-Type**: `application/json`
### Request Format
Send a POST request with the following JSON payload:
```json
{
"data": [
"<image_file_or_base64>",
"<analysis_type>"
]
}
```
**Parameters:**
- `data[0]` (string): Image file path, URL, or base64-encoded image data
- `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility")
### Response Format
Successful responses return:
```json
{
"data": [
{
"status": "success",
"analysis_type": "detailed",
"results": {
"caption": "A modern office workspace with a laptop computer on a wooden desk...",
"objects": [
{
"label": "laptop",
"confidence": 0.95,
"location": "center-left"
},
{
"label": "coffee cup",
"confidence": 0.87,
"location": "top-right"
}
],
"scene": "indoor office",
"colors": ["brown", "silver", "white"],
"mood": "professional, organized",
"accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
},
"metadata": {
"width": 1024,
"height": 768,
"format": "JPEG",
"size_kb": 245
}
}
]
}
```
Error responses return:
```json
{
"data": ["❌ Error: Unable to process image or unsupported format"]
}
```
### Example MCP Request
```bash
curl -X POST https://[space-url]/gradio_api/mcp/sse \
-H "Content-Type: application/json" \
-d '{
"data": [
"https://example.com/image.jpg",
"detailed"
]
}'
```
### Integration Examples
#### Python Integration
```python
import requests
import base64
def call_mcp_image_analyzer(image_path, analysis_type="caption"):
# Convert image to base64
with open(image_path, "rb") as img_file:
img_base64 = base64.b64encode(img_file.read()).decode()
url = "https://[space-url]/gradio_api/mcp/sse"
payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
response = requests.post(url, json=payload)
if response.status_code == 200:
result = response.json()
return result["data"][0]
else:
return f"Error: {response.status_code}"
# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")
```
#### Claude/AI Assistant Integration
When integrating with Claude or other AI assistants supporting MCP:
1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint
2. Use the tool in conversations: "Analyze this image and describe what you see..."
3. The AI assistant will automatically format image data and parse visual descriptions
## πŸ› οΈ Technical Details
### Analysis Capabilities
- **πŸ–ΌοΈ Image Captioning**
- Detailed scene descriptions
- Context-aware narratives
- Multiple caption styles (descriptive, creative, technical)
- Accessibility-focused descriptions
- **🎯 Object Detection**
- Common objects and items
- People and faces (privacy-conscious)
- Animals and nature elements
- Text and document detection
- Spatial relationships and positioning
- **🎨 Visual Analysis**
- Color palette extraction
- Composition analysis
- Mood and atmosphere detection
- Style and aesthetic classification
- Technical image properties
### AI Models
- **Vision Model**: Advanced computer vision models via Hugging Face
- **Captioning**: Specialized image-to-text models
- **Object Detection**: YOLO-based and transformer models
- **Scene Analysis**: Multi-modal AI for comprehensive understanding
- **Accuracy**: High-quality results with confidence scoring
### API Configuration
- **Image Size Limits**: 10MB maximum file size
- **Supported Formats**: JPEG, PNG, WebP, BMP, GIF
- **Processing Time**: 5-30 seconds depending on analysis type
- **Rate Limiting**: Standard Gradio Space limits
- **Privacy**: Images processed in-memory only, not stored
## πŸ† Hackathon Submission
This tool is submitted for the **MCP Server Track** of the hackathon, demonstrating:
- βœ… **MCP Protocol Compliance**: Full implementation of MCP server specification
- βœ… **Production Ready**: Enterprise-grade computer vision capabilities
- βœ… **User Experience**: Intuitive image upload with real-time preview
- βœ… **Documentation**: Comprehensive API documentation and examples
- βœ… **Integration Ready**: Easy to integrate with AI assistants and workflows
## 🎯 Use Cases for AI Assistants
When integrated with AI assistants via MCP, this tool enables:
1. **Content Creation**: "Describe this image for social media caption"
2. **Accessibility Support**: "Generate alt-text for this website image"
3. **Document Analysis**: "Extract text and analyze this screenshot"
4. **Quality Assessment**: "Analyze the composition and quality of this photo"
5. **Educational Support**: "Explain what's happening in this historical image"
## πŸ”§ Local Development
### Prerequisites
- Python 3.11+
- Computer vision libraries (PIL, OpenCV)
- AI model dependencies (transformers, torch)
### Installation
```bash
# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
```
### Testing MCP Endpoint Locally
```bash
# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
-H "Content-Type: application/json" \
-d '{"data": ["https://example.com/test-image.jpg", "caption"]}'
```
## πŸ“Š Performance & Limitations
### Strengths
- High-quality image descriptions
- Multi-format image support
- Fast processing with GPU acceleration
- MCP protocol compliance
- Privacy-focused processing
### Limitations
- 10MB file size limit
- Processing time varies with image complexity
- Limited to static images (no video)
- Requires internet for some AI models
- Best performance with clear, well-lit images
## πŸ”’ Privacy & Security
- **No Image Storage**: Images processed in-memory only
- **Privacy First**: No logging of uploaded images
- **Secure Processing**: Sandboxed analysis environment
- **Data Protection**: GDPR-compliant image handling
- **Content Safety**: Appropriate content filtering
## πŸ“ License
MIT License - Feel free to use and modify for your projects.
## 🀝 Contributing
This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:
- Test image analysis with different photo types
- Report accuracy issues or missed objects
- Suggest additional analysis features
- Contribute new use cases and examples
## 🏷️ Tags
`#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol`
---
**Built with ❀️ for the MCP Hackathon**