Spaces:

BasalGanglia
/

kgraph-mcp-agent-platform

Sleeping

App Files Files Community

kgraph-mcp-agent-platform / mcp_image_tool_gradio /README.md

BasalGanglia

🛠️ Fix HuggingFace Space configuration - Remove quotes from frontmatter

64ced8b verified 6 months ago

preview code

raw

history blame contribute delete

9.02 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: MCP Image Analysis Tool
emoji: 🖼️
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
  - mcp-server-track
  - mcp
  - computer-vision
  - image-analysis
  - ai-captioning
  - hackathon
  - gradio
python_version: 3.11.8

🖼️ MCP Image Analysis Tool

An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.

✨ Features

🎯 Smart Image Captioning: Generate detailed, contextual descriptions of images
🔍 Object Detection: Identify and locate objects, people, and scenes in images
📊 Content Analysis: Extract metadata, colors, composition, and visual elements
🔌 MCP Server: Compliant with Model Context Protocol for AI assistant integration
🎨 Interactive UI: Modern Gradio interface with image upload and preview
⚡ Fast Processing: Efficient AI-powered visual analysis and description

🚀 Quick Start

Using the Web Interface

Visit this Space and interact with the web interface directly
Upload an image using the drag-and-drop interface or file browser
Select analysis type (caption, objects, detailed analysis)
Click Analyze to get comprehensive image insights
View results including descriptions, detected objects, and metadata

Supported Image Formats

📸 JPEG/JPG: Standard photo format with full analysis support
🖼️ PNG: Images with transparency and high-quality graphics
🎨 WebP: Modern web format with efficient compression
📊 BMP: Bitmap images with detailed pixel analysis
🎭 GIF: Static GIF analysis (first frame)

🔌 MCP Server Integration

This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.

MCP Endpoint Details

Endpoint URL: https://[this-space-url]/gradio_api/mcp/sse
HTTP Method: POST
Content-Type: application/json

Request Format

Send a POST request with the following JSON payload:

{
  "data": [
    "<image_file_or_base64>",
    "<analysis_type>"
  ]
}

Parameters:

data[0] (string): Image file path, URL, or base64-encoded image data
data[1] (string): Analysis type ("caption", "objects", "detailed", "accessibility")

Response Format

Successful responses return:

{
  "data": [
    {
      "status": "success",
      "analysis_type": "detailed",
      "results": {
        "caption": "A modern office workspace with a laptop computer on a wooden desk...",
        "objects": [
          {
            "label": "laptop",
            "confidence": 0.95,
            "location": "center-left"
          },
          {
            "label": "coffee cup",
            "confidence": 0.87,
            "location": "top-right"
          }
        ],
        "scene": "indoor office",
        "colors": ["brown", "silver", "white"],
        "mood": "professional, organized",
        "accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
      },
      "metadata": {
        "width": 1024,
        "height": 768,
        "format": "JPEG",
        "size_kb": 245
      }
    }
  ]
}

Error responses return:

{
  "data": ["❌ Error: Unable to process image or unsupported format"]
}

Example MCP Request

curl -X POST https://[space-url]/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "https://example.com/image.jpg",
      "detailed"
    ]
  }'

Integration Examples

Python Integration

import requests
import base64

def call_mcp_image_analyzer(image_path, analysis_type="caption"):
    # Convert image to base64
    with open(image_path, "rb") as img_file:
        img_base64 = base64.b64encode(img_file.read()).decode()
    
    url = "https://[space-url]/gradio_api/mcp/sse"
    payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
    
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        result = response.json()
        return result["data"][0]
    else:
        return f"Error: {response.status_code}"

# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")

Claude/AI Assistant Integration

When integrating with Claude or other AI assistants supporting MCP:

Configure the MCP client to point to this Space's /gradio_api/mcp/sse endpoint
Use the tool in conversations: "Analyze this image and describe what you see..."
The AI assistant will automatically format image data and parse visual descriptions

🛠️ Technical Details

Analysis Capabilities

🖼️ Image Captioning
- Detailed scene descriptions
- Context-aware narratives
- Multiple caption styles (descriptive, creative, technical)
- Accessibility-focused descriptions
🎯 Object Detection
- Common objects and items
- People and faces (privacy-conscious)
- Animals and nature elements
- Text and document detection
- Spatial relationships and positioning
🎨 Visual Analysis
- Color palette extraction
- Composition analysis
- Mood and atmosphere detection
- Style and aesthetic classification
- Technical image properties

AI Models

Vision Model: Advanced computer vision models via Hugging Face
Captioning: Specialized image-to-text models
Object Detection: YOLO-based and transformer models
Scene Analysis: Multi-modal AI for comprehensive understanding
Accuracy: High-quality results with confidence scoring

API Configuration

Image Size Limits: 10MB maximum file size
Supported Formats: JPEG, PNG, WebP, BMP, GIF
Processing Time: 5-30 seconds depending on analysis type
Rate Limiting: Standard Gradio Space limits
Privacy: Images processed in-memory only, not stored

🏆 Hackathon Submission

This tool is submitted for the MCP Server Track of the hackathon, demonstrating:

✅ MCP Protocol Compliance: Full implementation of MCP server specification
✅ Production Ready: Enterprise-grade computer vision capabilities
✅ User Experience: Intuitive image upload with real-time preview
✅ Documentation: Comprehensive API documentation and examples
✅ Integration Ready: Easy to integrate with AI assistants and workflows

🎯 Use Cases for AI Assistants

When integrated with AI assistants via MCP, this tool enables:

Content Creation: "Describe this image for social media caption"
Accessibility Support: "Generate alt-text for this website image"
Document Analysis: "Extract text and analyze this screenshot"
Quality Assessment: "Analyze the composition and quality of this photo"
Educational Support: "Explain what's happening in this historical image"

🔧 Local Development

Prerequisites

Python 3.11+
Computer vision libraries (PIL, OpenCV)
AI model dependencies (transformers, torch)

Installation

# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

Testing MCP Endpoint Locally

# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{"data": ["https://example.com/test-image.jpg", "caption"]}'

📊 Performance & Limitations

Strengths

High-quality image descriptions
Multi-format image support
Fast processing with GPU acceleration
MCP protocol compliance
Privacy-focused processing

Limitations

10MB file size limit
Processing time varies with image complexity
Limited to static images (no video)
Requires internet for some AI models
Best performance with clear, well-lit images

🔒 Privacy & Security

No Image Storage: Images processed in-memory only
Privacy First: No logging of uploaded images
Secure Processing: Sandboxed analysis environment
Data Protection: GDPR-compliant image handling
Content Safety: Appropriate content filtering

📝 License

MIT License - Feel free to use and modify for your projects.

🤝 Contributing

This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:

Test image analysis with different photo types
Report accuracy issues or missed objects
Suggest additional analysis features
Contribute new use cases and examples

🏷️ Tags

#mcp-server-track #computer-vision #image-analysis #ai-captioning #gradio #ai-assistant #model-context-protocol

Built with ❤️ for the MCP Hackathon