--- title: MCP Image Analysis Tool emoji: 🖼️ colorFrom: yellow colorTo: red sdk: gradio sdk_version: 4.44.1 app_file: app.py pinned: false license: mit tags: - mcp-server-track - mcp - computer-vision - image-analysis - ai-captioning - hackathon - gradio python_version: 3.11.8 --- # 🖼️ MCP Image Analysis Tool An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications. ## ✨ Features - **🎯 Smart Image Captioning**: Generate detailed, contextual descriptions of images - **🔍 Object Detection**: Identify and locate objects, people, and scenes in images - **📊 Content Analysis**: Extract metadata, colors, composition, and visual elements - **🔌 MCP Server**: Compliant with Model Context Protocol for AI assistant integration - **🎨 Interactive UI**: Modern Gradio interface with image upload and preview - **⚡ Fast Processing**: Efficient AI-powered visual analysis and description ## 🚀 Quick Start ### Using the Web Interface 1. **Visit this Space** and interact with the web interface directly 2. **Upload an image** using the drag-and-drop interface or file browser 3. **Select analysis type** (caption, objects, detailed analysis) 4. **Click Analyze** to get comprehensive image insights 5. **View results** including descriptions, detected objects, and metadata ### Supported Image Formats - 📸 **JPEG/JPG**: Standard photo format with full analysis support - 🖼️ **PNG**: Images with transparency and high-quality graphics - 🎨 **WebP**: Modern web format with efficient compression - 📊 **BMP**: Bitmap images with detailed pixel analysis - 🎭 **GIF**: Static GIF analysis (first frame) ## 🔌 MCP Server Integration This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities. ### MCP Endpoint Details - **Endpoint URL**: `https://[this-space-url]/gradio_api/mcp/sse` - **HTTP Method**: `POST` - **Content-Type**: `application/json` ### Request Format Send a POST request with the following JSON payload: ```json { "data": [ "", "" ] } ``` **Parameters:** - `data[0]` (string): Image file path, URL, or base64-encoded image data - `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility") ### Response Format Successful responses return: ```json { "data": [ { "status": "success", "analysis_type": "detailed", "results": { "caption": "A modern office workspace with a laptop computer on a wooden desk...", "objects": [ { "label": "laptop", "confidence": 0.95, "location": "center-left" }, { "label": "coffee cup", "confidence": 0.87, "location": "top-right" } ], "scene": "indoor office", "colors": ["brown", "silver", "white"], "mood": "professional, organized", "accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface" }, "metadata": { "width": 1024, "height": 768, "format": "JPEG", "size_kb": 245 } } ] } ``` Error responses return: ```json { "data": ["❌ Error: Unable to process image or unsupported format"] } ``` ### Example MCP Request ```bash curl -X POST https://[space-url]/gradio_api/mcp/sse \ -H "Content-Type: application/json" \ -d '{ "data": [ "https://example.com/image.jpg", "detailed" ] }' ``` ### Integration Examples #### Python Integration ```python import requests import base64 def call_mcp_image_analyzer(image_path, analysis_type="caption"): # Convert image to base64 with open(image_path, "rb") as img_file: img_base64 = base64.b64encode(img_file.read()).decode() url = "https://[space-url]/gradio_api/mcp/sse" payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]} response = requests.post(url, json=payload) if response.status_code == 200: result = response.json() return result["data"][0] else: return f"Error: {response.status_code}" # Example usage analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed") print(f"Caption: {analysis['results']['caption']}") print(f"Objects: {analysis['results']['objects']}") ``` #### Claude/AI Assistant Integration When integrating with Claude or other AI assistants supporting MCP: 1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint 2. Use the tool in conversations: "Analyze this image and describe what you see..." 3. The AI assistant will automatically format image data and parse visual descriptions ## 🛠️ Technical Details ### Analysis Capabilities - **🖼️ Image Captioning** - Detailed scene descriptions - Context-aware narratives - Multiple caption styles (descriptive, creative, technical) - Accessibility-focused descriptions - **🎯 Object Detection** - Common objects and items - People and faces (privacy-conscious) - Animals and nature elements - Text and document detection - Spatial relationships and positioning - **🎨 Visual Analysis** - Color palette extraction - Composition analysis - Mood and atmosphere detection - Style and aesthetic classification - Technical image properties ### AI Models - **Vision Model**: Advanced computer vision models via Hugging Face - **Captioning**: Specialized image-to-text models - **Object Detection**: YOLO-based and transformer models - **Scene Analysis**: Multi-modal AI for comprehensive understanding - **Accuracy**: High-quality results with confidence scoring ### API Configuration - **Image Size Limits**: 10MB maximum file size - **Supported Formats**: JPEG, PNG, WebP, BMP, GIF - **Processing Time**: 5-30 seconds depending on analysis type - **Rate Limiting**: Standard Gradio Space limits - **Privacy**: Images processed in-memory only, not stored ## 🏆 Hackathon Submission This tool is submitted for the **MCP Server Track** of the hackathon, demonstrating: - ✅ **MCP Protocol Compliance**: Full implementation of MCP server specification - ✅ **Production Ready**: Enterprise-grade computer vision capabilities - ✅ **User Experience**: Intuitive image upload with real-time preview - ✅ **Documentation**: Comprehensive API documentation and examples - ✅ **Integration Ready**: Easy to integrate with AI assistants and workflows ## 🎯 Use Cases for AI Assistants When integrated with AI assistants via MCP, this tool enables: 1. **Content Creation**: "Describe this image for social media caption" 2. **Accessibility Support**: "Generate alt-text for this website image" 3. **Document Analysis**: "Extract text and analyze this screenshot" 4. **Quality Assessment**: "Analyze the composition and quality of this photo" 5. **Educational Support**: "Explain what's happening in this historical image" ## 🔧 Local Development ### Prerequisites - Python 3.11+ - Computer vision libraries (PIL, OpenCV) - AI model dependencies (transformers, torch) ### Installation ```bash # Clone this repository git clone [repository-url] cd mcp_image_tool_gradio # Install dependencies pip install -r requirements.txt # Run the application python app.py ``` ### Testing MCP Endpoint Locally ```bash # Test with curl (using image URL) curl -X POST http://localhost:7860/gradio_api/mcp/sse \ -H "Content-Type: application/json" \ -d '{"data": ["https://example.com/test-image.jpg", "caption"]}' ``` ## 📊 Performance & Limitations ### Strengths - High-quality image descriptions - Multi-format image support - Fast processing with GPU acceleration - MCP protocol compliance - Privacy-focused processing ### Limitations - 10MB file size limit - Processing time varies with image complexity - Limited to static images (no video) - Requires internet for some AI models - Best performance with clear, well-lit images ## 🔒 Privacy & Security - **No Image Storage**: Images processed in-memory only - **Privacy First**: No logging of uploaded images - **Secure Processing**: Sandboxed analysis environment - **Data Protection**: GDPR-compliant image handling - **Content Safety**: Appropriate content filtering ## 📝 License MIT License - Feel free to use and modify for your projects. ## 🤝 Contributing This is a hackathon submission, but feedback and suggestions are welcome! Feel free to: - Test image analysis with different photo types - Report accuracy issues or missed objects - Suggest additional analysis features - Contribute new use cases and examples ## 🏷️ Tags `#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol` --- **Built with ❤️ for the MCP Hackathon**