File size: 9,017 Bytes
64ced8b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
---
title: MCP Image Analysis Tool
emoji: πΌοΈ
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
tags:
- mcp-server-track
- mcp
- computer-vision
- image-analysis
- ai-captioning
- hackathon
- gradio
python_version: 3.11.8
---
# πΌοΈ MCP Image Analysis Tool
An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.
## β¨ Features
- **π― Smart Image Captioning**: Generate detailed, contextual descriptions of images
- **π Object Detection**: Identify and locate objects, people, and scenes in images
- **π Content Analysis**: Extract metadata, colors, composition, and visual elements
- **π MCP Server**: Compliant with Model Context Protocol for AI assistant integration
- **π¨ Interactive UI**: Modern Gradio interface with image upload and preview
- **β‘ Fast Processing**: Efficient AI-powered visual analysis and description
## π Quick Start
### Using the Web Interface
1. **Visit this Space** and interact with the web interface directly
2. **Upload an image** using the drag-and-drop interface or file browser
3. **Select analysis type** (caption, objects, detailed analysis)
4. **Click Analyze** to get comprehensive image insights
5. **View results** including descriptions, detected objects, and metadata
### Supported Image Formats
- πΈ **JPEG/JPG**: Standard photo format with full analysis support
- πΌοΈ **PNG**: Images with transparency and high-quality graphics
- π¨ **WebP**: Modern web format with efficient compression
- π **BMP**: Bitmap images with detailed pixel analysis
- π **GIF**: Static GIF analysis (first frame)
## π MCP Server Integration
This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.
### MCP Endpoint Details
- **Endpoint URL**: `https://[this-space-url]/gradio_api/mcp/sse`
- **HTTP Method**: `POST`
- **Content-Type**: `application/json`
### Request Format
Send a POST request with the following JSON payload:
```json
{
"data": [
"<image_file_or_base64>",
"<analysis_type>"
]
}
```
**Parameters:**
- `data[0]` (string): Image file path, URL, or base64-encoded image data
- `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility")
### Response Format
Successful responses return:
```json
{
"data": [
{
"status": "success",
"analysis_type": "detailed",
"results": {
"caption": "A modern office workspace with a laptop computer on a wooden desk...",
"objects": [
{
"label": "laptop",
"confidence": 0.95,
"location": "center-left"
},
{
"label": "coffee cup",
"confidence": 0.87,
"location": "top-right"
}
],
"scene": "indoor office",
"colors": ["brown", "silver", "white"],
"mood": "professional, organized",
"accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
},
"metadata": {
"width": 1024,
"height": 768,
"format": "JPEG",
"size_kb": 245
}
}
]
}
```
Error responses return:
```json
{
"data": ["β Error: Unable to process image or unsupported format"]
}
```
### Example MCP Request
```bash
curl -X POST https://[space-url]/gradio_api/mcp/sse \
-H "Content-Type: application/json" \
-d '{
"data": [
"https://example.com/image.jpg",
"detailed"
]
}'
```
### Integration Examples
#### Python Integration
```python
import requests
import base64
def call_mcp_image_analyzer(image_path, analysis_type="caption"):
# Convert image to base64
with open(image_path, "rb") as img_file:
img_base64 = base64.b64encode(img_file.read()).decode()
url = "https://[space-url]/gradio_api/mcp/sse"
payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}
response = requests.post(url, json=payload)
if response.status_code == 200:
result = response.json()
return result["data"][0]
else:
return f"Error: {response.status_code}"
# Example usage
analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
print(f"Caption: {analysis['results']['caption']}")
print(f"Objects: {analysis['results']['objects']}")
```
#### Claude/AI Assistant Integration
When integrating with Claude or other AI assistants supporting MCP:
1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint
2. Use the tool in conversations: "Analyze this image and describe what you see..."
3. The AI assistant will automatically format image data and parse visual descriptions
## π οΈ Technical Details
### Analysis Capabilities
- **πΌοΈ Image Captioning**
- Detailed scene descriptions
- Context-aware narratives
- Multiple caption styles (descriptive, creative, technical)
- Accessibility-focused descriptions
- **π― Object Detection**
- Common objects and items
- People and faces (privacy-conscious)
- Animals and nature elements
- Text and document detection
- Spatial relationships and positioning
- **π¨ Visual Analysis**
- Color palette extraction
- Composition analysis
- Mood and atmosphere detection
- Style and aesthetic classification
- Technical image properties
### AI Models
- **Vision Model**: Advanced computer vision models via Hugging Face
- **Captioning**: Specialized image-to-text models
- **Object Detection**: YOLO-based and transformer models
- **Scene Analysis**: Multi-modal AI for comprehensive understanding
- **Accuracy**: High-quality results with confidence scoring
### API Configuration
- **Image Size Limits**: 10MB maximum file size
- **Supported Formats**: JPEG, PNG, WebP, BMP, GIF
- **Processing Time**: 5-30 seconds depending on analysis type
- **Rate Limiting**: Standard Gradio Space limits
- **Privacy**: Images processed in-memory only, not stored
## π Hackathon Submission
This tool is submitted for the **MCP Server Track** of the hackathon, demonstrating:
- β
**MCP Protocol Compliance**: Full implementation of MCP server specification
- β
**Production Ready**: Enterprise-grade computer vision capabilities
- β
**User Experience**: Intuitive image upload with real-time preview
- β
**Documentation**: Comprehensive API documentation and examples
- β
**Integration Ready**: Easy to integrate with AI assistants and workflows
## π― Use Cases for AI Assistants
When integrated with AI assistants via MCP, this tool enables:
1. **Content Creation**: "Describe this image for social media caption"
2. **Accessibility Support**: "Generate alt-text for this website image"
3. **Document Analysis**: "Extract text and analyze this screenshot"
4. **Quality Assessment**: "Analyze the composition and quality of this photo"
5. **Educational Support**: "Explain what's happening in this historical image"
## π§ Local Development
### Prerequisites
- Python 3.11+
- Computer vision libraries (PIL, OpenCV)
- AI model dependencies (transformers, torch)
### Installation
```bash
# Clone this repository
git clone [repository-url]
cd mcp_image_tool_gradio
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
```
### Testing MCP Endpoint Locally
```bash
# Test with curl (using image URL)
curl -X POST http://localhost:7860/gradio_api/mcp/sse \
-H "Content-Type: application/json" \
-d '{"data": ["https://example.com/test-image.jpg", "caption"]}'
```
## π Performance & Limitations
### Strengths
- High-quality image descriptions
- Multi-format image support
- Fast processing with GPU acceleration
- MCP protocol compliance
- Privacy-focused processing
### Limitations
- 10MB file size limit
- Processing time varies with image complexity
- Limited to static images (no video)
- Requires internet for some AI models
- Best performance with clear, well-lit images
## π Privacy & Security
- **No Image Storage**: Images processed in-memory only
- **Privacy First**: No logging of uploaded images
- **Secure Processing**: Sandboxed analysis environment
- **Data Protection**: GDPR-compliant image handling
- **Content Safety**: Appropriate content filtering
## π License
MIT License - Feel free to use and modify for your projects.
## π€ Contributing
This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:
- Test image analysis with different photo types
- Report accuracy issues or missed objects
- Suggest additional analysis features
- Contribute new use cases and examples
## π·οΈ Tags
`#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol`
---
**Built with β€οΈ for the MCP Hackathon** |