BasalGanglia's picture
πŸ› οΈ Fix HuggingFace Space configuration - Remove quotes from frontmatter
64ced8b verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: MCP Image Analysis Tool
emoji: πŸ–ΌοΈ
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
  - mcp-server-track
  - mcp
  - computer-vision
  - image-analysis
  - ai-captioning
  - hackathon
  - gradio
python_version: 3.11.8

πŸ–ΌοΈ MCP Image Analysis Tool

An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.

✨ Features

  • 🎯 Smart Image Captioning: Generate detailed, contextual descriptions of images
  • πŸ” Object Detection: Identify and locate objects, people, and scenes in images
  • πŸ“Š Content Analysis: Extract metadata, colors, composition, and visual elements
  • πŸ”Œ MCP Server: Compliant with Model Context Protocol for AI assistant integration
  • 🎨 Interactive UI: Modern Gradio interface with image upload and preview
  • ⚑ Fast Processing: Efficient AI-powered visual analysis and description

πŸš€ Quick Start

Using the Web Interface

  1. Visit this Space and interact with the web interface directly
  2. Upload an image using the drag-and-drop interface or file browser
  3. Select analysis type (caption, objects, detailed analysis)
  4. Click Analyze to get comprehensive image insights
  5. View results including descriptions, detected objects, and metadata

Supported Image Formats

  • πŸ“Έ JPEG/JPG: Standard photo format with full analysis support
  • πŸ–ΌοΈ PNG: Images with transparency and high-quality graphics
  • 🎨 WebP: Modern web format with efficient compression
  • πŸ“Š BMP: Bitmap images with detailed pixel analysis
  • 🎭 GIF: Static GIF analysis (first frame)

πŸ”Œ MCP Server Integration

This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.

MCP Endpoint Details

  • Endpoint URL: https://[this-space-url]/gradio_api/mcp/sse
  • HTTP Method: POST
  • Content-Type: application/json

Request Format

Send a POST request with the following JSON payload:

{
  "data": [
    "<image_file_or_base64>",
    "<analysis_type>"
  ]
}

Parameters:

  • data[0] (string): Image file path, URL, or base64-encoded image data
  • data[1] (string): Analysis type ("caption", "objects", "detailed", "accessibility")

Response Format

Successful responses return:

{
  "data": [
    {
      "status": "success",
      "analysis_type": "detailed",
      "results": {
        "caption": "A modern office workspace with a laptop computer on a wooden desk...",
        "objects": [
          {
            "label": "laptop",
            "confidence": 0.95,
            "location": "center-left"
          },
          {
            "label": "coffee cup",
            "confidence": 0.87,
            "location": "top-right"
          }
        ],
        "scene": "indoor office",
        "colors": ["brown", "silver", "white"],
        "mood": "professional, organized",
        "accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
      },
      "metadata": {
        "width": 1024,
        "height": 768,
        "format": "JPEG",
        "size_kb": 245
      }
    }
  ]
}

Error responses return:

{
  "data": ["❌ Error: Unable to process image or unsupported format"]
}

Example MCP Request

curl -X POST https://[space-url]/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "https://example.com/image.jpg",
      "detailed"
    ]
  }'

Integration Examples

Python Integration

import requests
import base64

def call_mcp_image_analyzer(image_path, analysis_type="caption"):
    # Convert image to base64
    with open(image_path, "rb") as img_file:
        img_base64 = base64.b64encode(img_file.read()).decode()
    
    url = "https://[space-url]/gradio_api/mcp/sse"
    payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
    
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        result = response.json()
        return result["data"][0]
    else:
        return f"Error: {response.status_code}"

# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")

Claude/AI Assistant Integration

When integrating with Claude or other AI assistants supporting MCP:

  1. Configure the MCP client to point to this Space's /gradio_api/mcp/sse endpoint
  2. Use the tool in conversations: "Analyze this image and describe what you see..."
  3. The AI assistant will automatically format image data and parse visual descriptions

πŸ› οΈ Technical Details

Analysis Capabilities

  • πŸ–ΌοΈ Image Captioning

    • Detailed scene descriptions
    • Context-aware narratives
    • Multiple caption styles (descriptive, creative, technical)
    • Accessibility-focused descriptions
  • 🎯 Object Detection

    • Common objects and items
    • People and faces (privacy-conscious)
    • Animals and nature elements
    • Text and document detection
    • Spatial relationships and positioning
  • 🎨 Visual Analysis

    • Color palette extraction
    • Composition analysis
    • Mood and atmosphere detection
    • Style and aesthetic classification
    • Technical image properties

AI Models

  • Vision Model: Advanced computer vision models via Hugging Face
  • Captioning: Specialized image-to-text models
  • Object Detection: YOLO-based and transformer models
  • Scene Analysis: Multi-modal AI for comprehensive understanding
  • Accuracy: High-quality results with confidence scoring

API Configuration

  • Image Size Limits: 10MB maximum file size
  • Supported Formats: JPEG, PNG, WebP, BMP, GIF
  • Processing Time: 5-30 seconds depending on analysis type
  • Rate Limiting: Standard Gradio Space limits
  • Privacy: Images processed in-memory only, not stored

πŸ† Hackathon Submission

This tool is submitted for the MCP Server Track of the hackathon, demonstrating:

  • βœ… MCP Protocol Compliance: Full implementation of MCP server specification
  • βœ… Production Ready: Enterprise-grade computer vision capabilities
  • βœ… User Experience: Intuitive image upload with real-time preview
  • βœ… Documentation: Comprehensive API documentation and examples
  • βœ… Integration Ready: Easy to integrate with AI assistants and workflows

🎯 Use Cases for AI Assistants

When integrated with AI assistants via MCP, this tool enables:

  1. Content Creation: "Describe this image for social media caption"
  2. Accessibility Support: "Generate alt-text for this website image"
  3. Document Analysis: "Extract text and analyze this screenshot"
  4. Quality Assessment: "Analyze the composition and quality of this photo"
  5. Educational Support: "Explain what's happening in this historical image"

πŸ”§ Local Development

Prerequisites

  • Python 3.11+
  • Computer vision libraries (PIL, OpenCV)
  • AI model dependencies (transformers, torch)

Installation

# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

Testing MCP Endpoint Locally

# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{"data": ["https://example.com/test-image.jpg", "caption"]}'

πŸ“Š Performance & Limitations

Strengths

  • High-quality image descriptions
  • Multi-format image support
  • Fast processing with GPU acceleration
  • MCP protocol compliance
  • Privacy-focused processing

Limitations

  • 10MB file size limit
  • Processing time varies with image complexity
  • Limited to static images (no video)
  • Requires internet for some AI models
  • Best performance with clear, well-lit images

πŸ”’ Privacy & Security

  • No Image Storage: Images processed in-memory only
  • Privacy First: No logging of uploaded images
  • Secure Processing: Sandboxed analysis environment
  • Data Protection: GDPR-compliant image handling
  • Content Safety: Appropriate content filtering

πŸ“ License

MIT License - Feel free to use and modify for your projects.

🀝 Contributing

This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:

  • Test image analysis with different photo types
  • Report accuracy issues or missed objects
  • Suggest additional analysis features
  • Contribute new use cases and examples

🏷️ Tags

#mcp-server-track #computer-vision #image-analysis #ai-captioning #gradio #ai-assistant #model-context-protocol


Built with ❀️ for the MCP Hackathon