Spaces:

BasalGanglia
/

kgraph-mcp-agent-platform

Sleeping

File size: 9,017 Bytes

64ced8b

---
title: MCP Image Analysis Tool
emoji: 🖼️
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
  - mcp-server-track
  - mcp
  - computer-vision
  - image-analysis
  - ai-captioning
  - hackathon
  - gradio
python_version: 3.11.8
---

# 🖼️ MCP Image Analysis Tool

An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.

## ✨ Features

- **🎯 Smart Image Captioning**: Generate detailed, contextual descriptions of images
- **🔍 Object Detection**: Identify and locate objects, people, and scenes in images
- **📊 Content Analysis**: Extract metadata, colors, composition, and visual elements
- **🔌 MCP Server**: Compliant with Model Context Protocol for AI assistant integration
- **🎨 Interactive UI**: Modern Gradio interface with image upload and preview
- **⚡ Fast Processing**: Efficient AI-powered visual analysis and description

## 🚀 Quick Start

### Using the Web Interface

1. **Visit this Space** and interact with the web interface directly
2. **Upload an image** using the drag-and-drop interface or file browser
3. **Select analysis type** (caption, objects, detailed analysis)
4. **Click Analyze** to get comprehensive image insights
5. **View results** including descriptions, detected objects, and metadata

### Supported Image Formats

- 📸 **JPEG/JPG**: Standard photo format with full analysis support
- 🖼️ **PNG**: Images with transparency and high-quality graphics
- 🎨 **WebP**: Modern web format with efficient compression
- 📊 **BMP**: Bitmap images with detailed pixel analysis
- 🎭 **GIF**: Static GIF analysis (first frame)

## 🔌 MCP Server Integration

This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.

### MCP Endpoint Details

- **Endpoint URL**: `https://[this-space-url]/gradio_api/mcp/sse`
- **HTTP Method**: `POST`
- **Content-Type**: `application/json`

### Request Format

Send a POST request with the following JSON payload:

```json
{
  "data": [
    "<image_file_or_base64>",
    "<analysis_type>"
  ]
}
```

**Parameters:**
- `data[0]` (string): Image file path, URL, or base64-encoded image data
- `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility")

### Response Format

Successful responses return:

```json
{
  "data": [
    {
      "status": "success",
      "analysis_type": "detailed",
      "results": {
        "caption": "A modern office workspace with a laptop computer on a wooden desk...",
        "objects": [
          {
            "label": "laptop",
            "confidence": 0.95,
            "location": "center-left"
          },
          {
            "label": "coffee cup",
            "confidence": 0.87,
            "location": "top-right"
          }
        ],
        "scene": "indoor office",
        "colors": ["brown", "silver", "white"],
        "mood": "professional, organized",
        "accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
      },
      "metadata": {
        "width": 1024,
        "height": 768,
        "format": "JPEG",
        "size_kb": 245
      }
    }
  ]
}
```

Error responses return:

```json
{
  "data": ["❌ Error: Unable to process image or unsupported format"]
}
```

### Example MCP Request

```bash
curl -X POST https://[space-url]/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "https://example.com/image.jpg",
      "detailed"
    ]
  }'
```

### Integration Examples

#### Python Integration

```python
import requests
import base64

def call_mcp_image_analyzer(image_path, analysis_type="caption"):
    # Convert image to base64
    with open(image_path, "rb") as img_file:
        img_base64 = base64.b64encode(img_file.read()).decode()
    
    url = "https://[space-url]/gradio_api/mcp/sse"
    payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
    
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        result = response.json()
        return result["data"][0]
    else:
        return f"Error: {response.status_code}"

# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")
```

#### Claude/AI Assistant Integration

When integrating with Claude or other AI assistants supporting MCP:

1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint
2. Use the tool in conversations: "Analyze this image and describe what you see..."
3. The AI assistant will automatically format image data and parse visual descriptions

## 🛠️ Technical Details

### Analysis Capabilities

- **🖼️ Image Captioning**
  - Detailed scene descriptions
  - Context-aware narratives
  - Multiple caption styles (descriptive, creative, technical)
  - Accessibility-focused descriptions

- **🎯 Object Detection**
  - Common objects and items
  - People and faces (privacy-conscious)
  - Animals and nature elements
  - Text and document detection
  - Spatial relationships and positioning

- **🎨 Visual Analysis**
  - Color palette extraction
  - Composition analysis
  - Mood and atmosphere detection
  - Style and aesthetic classification
  - Technical image properties

### AI Models

- **Vision Model**: Advanced computer vision models via Hugging Face
- **Captioning**: Specialized image-to-text models
- **Object Detection**: YOLO-based and transformer models
- **Scene Analysis**: Multi-modal AI for comprehensive understanding
- **Accuracy**: High-quality results with confidence scoring

### API Configuration

- **Image Size Limits**: 10MB maximum file size
- **Supported Formats**: JPEG, PNG, WebP, BMP, GIF
- **Processing Time**: 5-30 seconds depending on analysis type
- **Rate Limiting**: Standard Gradio Space limits
- **Privacy**: Images processed in-memory only, not stored

## 🏆 Hackathon Submission

This tool is submitted for the **MCP Server Track** of the hackathon, demonstrating:

- ✅ **MCP Protocol Compliance**: Full implementation of MCP server specification
- ✅ **Production Ready**: Enterprise-grade computer vision capabilities
- ✅ **User Experience**: Intuitive image upload with real-time preview
- ✅ **Documentation**: Comprehensive API documentation and examples
- ✅ **Integration Ready**: Easy to integrate with AI assistants and workflows

## 🎯 Use Cases for AI Assistants

When integrated with AI assistants via MCP, this tool enables:

1. **Content Creation**: "Describe this image for social media caption"
2. **Accessibility Support**: "Generate alt-text for this website image"
3. **Document Analysis**: "Extract text and analyze this screenshot"
4. **Quality Assessment**: "Analyze the composition and quality of this photo"
5. **Educational Support**: "Explain what's happening in this historical image"

## 🔧 Local Development

### Prerequisites

- Python 3.11+
- Computer vision libraries (PIL, OpenCV)
- AI model dependencies (transformers, torch)

### Installation

```bash
# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py
```

### Testing MCP Endpoint Locally

```bash
# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{"data": ["https://example.com/test-image.jpg", "caption"]}'
```

## 📊 Performance & Limitations

### Strengths
- High-quality image descriptions
- Multi-format image support
- Fast processing with GPU acceleration
- MCP protocol compliance
- Privacy-focused processing

### Limitations
- 10MB file size limit
- Processing time varies with image complexity
- Limited to static images (no video)
- Requires internet for some AI models
- Best performance with clear, well-lit images

## 🔒 Privacy & Security

- **No Image Storage**: Images processed in-memory only
- **Privacy First**: No logging of uploaded images
- **Secure Processing**: Sandboxed analysis environment
- **Data Protection**: GDPR-compliant image handling
- **Content Safety**: Appropriate content filtering

## 📝 License

MIT License - Feel free to use and modify for your projects.

## 🤝 Contributing

This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:

- Test image analysis with different photo types
- Report accuracy issues or missed objects
- Suggest additional analysis features
- Contribute new use cases and examples

## 🏷️ Tags

`#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol`

---

**Built with ❤️ for the MCP Hackathon**