File size: 9,017 Bytes
64ced8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
---
title: MCP Image Analysis Tool
emoji: πŸ–ΌοΈ
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
  - mcp-server-track
  - mcp
  - computer-vision
  - image-analysis
  - ai-captioning
  - hackathon
  - gradio
python_version: 3.11.8
---

# πŸ–ΌοΈ MCP Image Analysis Tool

An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.

## ✨ Features

- **🎯 Smart Image Captioning**: Generate detailed, contextual descriptions of images
- **πŸ” Object Detection**: Identify and locate objects, people, and scenes in images
- **πŸ“Š Content Analysis**: Extract metadata, colors, composition, and visual elements
- **πŸ”Œ MCP Server**: Compliant with Model Context Protocol for AI assistant integration
- **🎨 Interactive UI**: Modern Gradio interface with image upload and preview
- **⚑ Fast Processing**: Efficient AI-powered visual analysis and description

## πŸš€ Quick Start

### Using the Web Interface

1. **Visit this Space** and interact with the web interface directly
2. **Upload an image** using the drag-and-drop interface or file browser
3. **Select analysis type** (caption, objects, detailed analysis)
4. **Click Analyze** to get comprehensive image insights
5. **View results** including descriptions, detected objects, and metadata

### Supported Image Formats

- πŸ“Έ **JPEG/JPG**: Standard photo format with full analysis support
- πŸ–ΌοΈ **PNG**: Images with transparency and high-quality graphics
- 🎨 **WebP**: Modern web format with efficient compression
- πŸ“Š **BMP**: Bitmap images with detailed pixel analysis
- 🎭 **GIF**: Static GIF analysis (first frame)

## πŸ”Œ MCP Server Integration

This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.

### MCP Endpoint Details

- **Endpoint URL**: `https://[this-space-url]/gradio_api/mcp/sse`
- **HTTP Method**: `POST`
- **Content-Type**: `application/json`

### Request Format

Send a POST request with the following JSON payload:

```json
{
  "data": [
    "<image_file_or_base64>",
    "<analysis_type>"
  ]
}
```

**Parameters:**
- `data[0]` (string): Image file path, URL, or base64-encoded image data
- `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility")

### Response Format

Successful responses return:

```json
{
  "data": [
    {
      "status": "success",
      "analysis_type": "detailed",
      "results": {
        "caption": "A modern office workspace with a laptop computer on a wooden desk...",
        "objects": [
          {
            "label": "laptop",
            "confidence": 0.95,
            "location": "center-left"
          },
          {
            "label": "coffee cup",
            "confidence": 0.87,
            "location": "top-right"
          }
        ],
        "scene": "indoor office",
        "colors": ["brown", "silver", "white"],
        "mood": "professional, organized",
        "accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
      },
      "metadata": {
        "width": 1024,
        "height": 768,
        "format": "JPEG",
        "size_kb": 245
      }
    }
  ]
}
```

Error responses return:

```json
{
  "data": ["❌ Error: Unable to process image or unsupported format"]
}
```

### Example MCP Request

```bash
curl -X POST https://[space-url]/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "https://example.com/image.jpg",
      "detailed"
    ]
  }'
```

### Integration Examples

#### Python Integration

```python
import requests
import base64

def call_mcp_image_analyzer(image_path, analysis_type="caption"):
    # Convert image to base64
    with open(image_path, "rb") as img_file:
        img_base64 = base64.b64encode(img_file.read()).decode()
    
    url = "https://[space-url]/gradio_api/mcp/sse"
    payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
    
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        result = response.json()
        return result["data"][0]
    else:
        return f"Error: {response.status_code}"

# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")
```

#### Claude/AI Assistant Integration

When integrating with Claude or other AI assistants supporting MCP:

1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint
2. Use the tool in conversations: "Analyze this image and describe what you see..."
3. The AI assistant will automatically format image data and parse visual descriptions

## πŸ› οΈ Technical Details

### Analysis Capabilities

- **πŸ–ΌοΈ Image Captioning**
  - Detailed scene descriptions
  - Context-aware narratives
  - Multiple caption styles (descriptive, creative, technical)
  - Accessibility-focused descriptions

- **🎯 Object Detection**
  - Common objects and items
  - People and faces (privacy-conscious)
  - Animals and nature elements
  - Text and document detection
  - Spatial relationships and positioning

- **🎨 Visual Analysis**
  - Color palette extraction
  - Composition analysis
  - Mood and atmosphere detection
  - Style and aesthetic classification
  - Technical image properties

### AI Models

- **Vision Model**: Advanced computer vision models via Hugging Face
- **Captioning**: Specialized image-to-text models
- **Object Detection**: YOLO-based and transformer models
- **Scene Analysis**: Multi-modal AI for comprehensive understanding
- **Accuracy**: High-quality results with confidence scoring

### API Configuration

- **Image Size Limits**: 10MB maximum file size
- **Supported Formats**: JPEG, PNG, WebP, BMP, GIF
- **Processing Time**: 5-30 seconds depending on analysis type
- **Rate Limiting**: Standard Gradio Space limits
- **Privacy**: Images processed in-memory only, not stored

## πŸ† Hackathon Submission

This tool is submitted for the **MCP Server Track** of the hackathon, demonstrating:

- βœ… **MCP Protocol Compliance**: Full implementation of MCP server specification
- βœ… **Production Ready**: Enterprise-grade computer vision capabilities
- βœ… **User Experience**: Intuitive image upload with real-time preview
- βœ… **Documentation**: Comprehensive API documentation and examples
- βœ… **Integration Ready**: Easy to integrate with AI assistants and workflows

## 🎯 Use Cases for AI Assistants

When integrated with AI assistants via MCP, this tool enables:

1. **Content Creation**: "Describe this image for social media caption"
2. **Accessibility Support**: "Generate alt-text for this website image"
3. **Document Analysis**: "Extract text and analyze this screenshot"
4. **Quality Assessment**: "Analyze the composition and quality of this photo"
5. **Educational Support**: "Explain what's happening in this historical image"

## πŸ”§ Local Development

### Prerequisites

- Python 3.11+
- Computer vision libraries (PIL, OpenCV)
- AI model dependencies (transformers, torch)

### Installation

```bash
# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py
```

### Testing MCP Endpoint Locally

```bash
# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
  -H "Content-Type: application/json" \
  -d '{"data": ["https://example.com/test-image.jpg", "caption"]}'
```

## πŸ“Š Performance & Limitations

### Strengths
- High-quality image descriptions
- Multi-format image support
- Fast processing with GPU acceleration
- MCP protocol compliance
- Privacy-focused processing

### Limitations
- 10MB file size limit
- Processing time varies with image complexity
- Limited to static images (no video)
- Requires internet for some AI models
- Best performance with clear, well-lit images

## πŸ”’ Privacy & Security

- **No Image Storage**: Images processed in-memory only
- **Privacy First**: No logging of uploaded images
- **Secure Processing**: Sandboxed analysis environment
- **Data Protection**: GDPR-compliant image handling
- **Content Safety**: Appropriate content filtering

## πŸ“ License

MIT License - Feel free to use and modify for your projects.

## 🀝 Contributing

This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:

- Test image analysis with different photo types
- Report accuracy issues or missed objects
- Suggest additional analysis features
- Contribute new use cases and examples

## 🏷️ Tags

`#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol`

---

**Built with ❀️ for the MCP Hackathon**