A newer version of the Gradio SDK is available:
6.1.0
title: MCP Image Analysis Tool
emoji: πΌοΈ
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
- mcp-server-track
- mcp
- computer-vision
- image-analysis
- ai-captioning
- hackathon
- gradio
python_version: 3.11.8
πΌοΈ MCP Image Analysis Tool
An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.
β¨ Features
- π― Smart Image Captioning: Generate detailed, contextual descriptions of images
- π Object Detection: Identify and locate objects, people, and scenes in images
- π Content Analysis: Extract metadata, colors, composition, and visual elements
- π MCP Server: Compliant with Model Context Protocol for AI assistant integration
- π¨ Interactive UI: Modern Gradio interface with image upload and preview
- β‘ Fast Processing: Efficient AI-powered visual analysis and description
π Quick Start
Using the Web Interface
- Visit this Space and interact with the web interface directly
- Upload an image using the drag-and-drop interface or file browser
- Select analysis type (caption, objects, detailed analysis)
- Click Analyze to get comprehensive image insights
- View results including descriptions, detected objects, and metadata
Supported Image Formats
- πΈ JPEG/JPG: Standard photo format with full analysis support
- πΌοΈ PNG: Images with transparency and high-quality graphics
- π¨ WebP: Modern web format with efficient compression
- π BMP: Bitmap images with detailed pixel analysis
- π GIF: Static GIF analysis (first frame)
π MCP Server Integration
This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.
MCP Endpoint Details
- Endpoint URL:
https://[this-space-url]/gradio_api/mcp/sse - HTTP Method:
POST - Content-Type:
application/json
Request Format
Send a POST request with the following JSON payload:
{
"data": [
"<image_file_or_base64>",
"<analysis_type>"
]
}
Parameters:
data[0](string): Image file path, URL, or base64-encoded image datadata[1](string): Analysis type ("caption", "objects", "detailed", "accessibility")
Response Format
Successful responses return:
{
"data": [
{
"status": "success",
"analysis_type": "detailed",
"results": {
"caption": "A modern office workspace with a laptop computer on a wooden desk...",
"objects": [
{
"label": "laptop",
"confidence": 0.95,
"location": "center-left"
},
{
"label": "coffee cup",
"confidence": 0.87,
"location": "top-right"
}
],
"scene": "indoor office",
"colors": ["brown", "silver", "white"],
"mood": "professional, organized",
"accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
},
"metadata": {
"width": 1024,
"height": 768,
"format": "JPEG",
"size_kb": 245
}
}
]
}
Error responses return:
{
"data": ["β Error: Unable to process image or unsupported format"]
}
Example MCP Request
curl -X POST https://[space-url]/gradio_api/mcp/sse \
-H "Content-Type: application/json" \
-d '{
"data": [
"https://example.com/image.jpg",
"detailed"
]
}'
Integration Examples
Python Integration
import requests
import base64
def call_mcp_image_analyzer(image_path, analysis_type="caption"):
# Convert image to base64
with open(image_path, "rb") as img_file:
img_base64 = base64.b64encode(img_file.read()).decode()
url = "https://[space-url]/gradio_api/mcp/sse"
payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
response = requests.post(url, json=payload)
if response.status_code == 200:
result = response.json()
return result["data"][0]
else:
return f"Error: {response.status_code}"
# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")
Claude/AI Assistant Integration
When integrating with Claude or other AI assistants supporting MCP:
- Configure the MCP client to point to this Space's
/gradio_api/mcp/sseendpoint - Use the tool in conversations: "Analyze this image and describe what you see..."
- The AI assistant will automatically format image data and parse visual descriptions
π οΈ Technical Details
Analysis Capabilities
πΌοΈ Image Captioning
- Detailed scene descriptions
- Context-aware narratives
- Multiple caption styles (descriptive, creative, technical)
- Accessibility-focused descriptions
π― Object Detection
- Common objects and items
- People and faces (privacy-conscious)
- Animals and nature elements
- Text and document detection
- Spatial relationships and positioning
π¨ Visual Analysis
- Color palette extraction
- Composition analysis
- Mood and atmosphere detection
- Style and aesthetic classification
- Technical image properties
AI Models
- Vision Model: Advanced computer vision models via Hugging Face
- Captioning: Specialized image-to-text models
- Object Detection: YOLO-based and transformer models
- Scene Analysis: Multi-modal AI for comprehensive understanding
- Accuracy: High-quality results with confidence scoring
API Configuration
- Image Size Limits: 10MB maximum file size
- Supported Formats: JPEG, PNG, WebP, BMP, GIF
- Processing Time: 5-30 seconds depending on analysis type
- Rate Limiting: Standard Gradio Space limits
- Privacy: Images processed in-memory only, not stored
π Hackathon Submission
This tool is submitted for the MCP Server Track of the hackathon, demonstrating:
- β MCP Protocol Compliance: Full implementation of MCP server specification
- β Production Ready: Enterprise-grade computer vision capabilities
- β User Experience: Intuitive image upload with real-time preview
- β Documentation: Comprehensive API documentation and examples
- β Integration Ready: Easy to integrate with AI assistants and workflows
π― Use Cases for AI Assistants
When integrated with AI assistants via MCP, this tool enables:
- Content Creation: "Describe this image for social media caption"
- Accessibility Support: "Generate alt-text for this website image"
- Document Analysis: "Extract text and analyze this screenshot"
- Quality Assessment: "Analyze the composition and quality of this photo"
- Educational Support: "Explain what's happening in this historical image"
π§ Local Development
Prerequisites
- Python 3.11+
- Computer vision libraries (PIL, OpenCV)
- AI model dependencies (transformers, torch)
Installation
# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
Testing MCP Endpoint Locally
# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
-H "Content-Type: application/json" \
-d '{"data": ["https://example.com/test-image.jpg", "caption"]}'
π Performance & Limitations
Strengths
- High-quality image descriptions
- Multi-format image support
- Fast processing with GPU acceleration
- MCP protocol compliance
- Privacy-focused processing
Limitations
- 10MB file size limit
- Processing time varies with image complexity
- Limited to static images (no video)
- Requires internet for some AI models
- Best performance with clear, well-lit images
π Privacy & Security
- No Image Storage: Images processed in-memory only
- Privacy First: No logging of uploaded images
- Secure Processing: Sandboxed analysis environment
- Data Protection: GDPR-compliant image handling
- Content Safety: Appropriate content filtering
π License
MIT License - Feel free to use and modify for your projects.
π€ Contributing
This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:
- Test image analysis with different photo types
- Report accuracy issues or missed objects
- Suggest additional analysis features
- Contribute new use cases and examples
π·οΈ Tags
#mcp-server-track #computer-vision #image-analysis #ai-captioning #gradio #ai-assistant #model-context-protocol
Built with β€οΈ for the MCP Hackathon