Spaces:

BasalGanglia
/

kgraph-mcp-agent-platform

Sleeping

App Files Files Community

kgraph-mcp-agent-platform / mcp_image_tool_gradio /README.md

BasalGanglia

🛠️ Fix HuggingFace Space configuration - Remove quotes from frontmatter

64ced8b verified 6 months ago

preview code

raw

history blame contribute delete

9.02 kB

	---
	title: MCP Image Analysis Tool
	emoji: 🖼️
	colorFrom: yellow
	colorTo: red
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	pinned: false
	license: mit
	tags:
	- mcp-server-track
	- mcp
	- computer-vision
	- image-analysis
	- ai-captioning
	- hackathon
	- gradio
	python_version: 3.11.8
	---

	# 🖼️ MCP Image Analysis Tool

	An AI-powered image analysis tool that provides both a user-friendly Gradio interface and an MCP (Model Context Protocol) server endpoint for integration with AI assistants and other applications.

	## ✨ Features

	- 🎯 Smart Image Captioning: Generate detailed, contextual descriptions of images
	- 🔍 Object Detection: Identify and locate objects, people, and scenes in images
	- 📊 Content Analysis: Extract metadata, colors, composition, and visual elements
	- 🔌 MCP Server: Compliant with Model Context Protocol for AI assistant integration
	- 🎨 Interactive UI: Modern Gradio interface with image upload and preview
	- ⚡ Fast Processing: Efficient AI-powered visual analysis and description

	## 🚀 Quick Start

	### Using the Web Interface

	1. Visit this Space and interact with the web interface directly
	2. Upload an image using the drag-and-drop interface or file browser
	3. Select analysis type (caption, objects, detailed analysis)
	4. Click Analyze to get comprehensive image insights
	5. View results including descriptions, detected objects, and metadata

	### Supported Image Formats

	- 📸 JPEG/JPG: Standard photo format with full analysis support
	- 🖼️ PNG: Images with transparency and high-quality graphics
	- 🎨 WebP: Modern web format with efficient compression
	- 📊 BMP: Bitmap images with detailed pixel analysis
	- 🎭 GIF: Static GIF analysis (first frame)

	## 🔌 MCP Server Integration

	This tool implements the Model Context Protocol (MCP) for integration with AI assistants, allowing programmatic image analysis capabilities.

	### MCP Endpoint Details

	- Endpoint URL: `https://[this-space-url]/gradio_api/mcp/sse`
	- HTTP Method: `POST`
	- Content-Type: `application/json`

	### Request Format

	Send a POST request with the following JSON payload:

	```json
	{
	"data": [
	"<image_file_or_base64>",
	"<analysis_type>"
	]
	}
	```

	Parameters:
	- `data[0]` (string): Image file path, URL, or base64-encoded image data
	- `data[1]` (string): Analysis type ("caption", "objects", "detailed", "accessibility")

	### Response Format

	Successful responses return:

	```json
	{
	"data": [
	{
	"status": "success",
	"analysis_type": "detailed",
	"results": {
	"caption": "A modern office workspace with a laptop computer on a wooden desk...",
	"objects": [
	{
	"label": "laptop",
	"confidence": 0.95,
	"location": "center-left"
	},
	{
	"label": "coffee cup",
	"confidence": 0.87,
	"location": "top-right"
	}
	],
	"scene": "indoor office",
	"colors": ["brown", "silver", "white"],
	"mood": "professional, organized",
	"accessibility_description": "Workspace image showing a laptop and coffee cup on a wooden surface"
	},
	"metadata": {
	"width": 1024,
	"height": 768,
	"format": "JPEG",
	"size_kb": 245
	}
	}
	]
	}
	```

	Error responses return:

	```json
	{
	"data": ["❌ Error: Unable to process image or unsupported format"]
	}
	```

	### Example MCP Request

	```bash
	curl -X POST https://[space-url]/gradio_api/mcp/sse \
	-H "Content-Type: application/json" \
	-d '{
	"data": [
	"https://example.com/image.jpg",
	"detailed"
	]
	}'
	```

	### Integration Examples

	#### Python Integration

	```python
	import requests
	import base64

	def call_mcp_image_analyzer(image_path, analysis_type="caption"):
	# Convert image to base64
	with open(image_path, "rb") as img_file:
	img_base64 = base64.b64encode(img_file.read()).decode()

	url = "https://[space-url]/gradio_api/mcp/sse"
	payload = {"data": [f"data:image/jpeg;base64,{img_base64}", analysis_type]}

	response = requests.post(url, json=payload)
	if response.status_code == 200:
	result = response.json()
	return result["data"][0]
	else:
	return f"Error: {response.status_code}"

	# Example usage
	analysis = call_mcp_image_analyzer("my_photo.jpg", "detailed")
	print(f"Caption: {analysis['results']['caption']}")
	print(f"Objects: {analysis['results']['objects']}")
	```

	#### Claude/AI Assistant Integration

	When integrating with Claude or other AI assistants supporting MCP:

	1. Configure the MCP client to point to this Space's `/gradio_api/mcp/sse` endpoint
	2. Use the tool in conversations: "Analyze this image and describe what you see..."
	3. The AI assistant will automatically format image data and parse visual descriptions

	## 🛠️ Technical Details

	### Analysis Capabilities

	- 🖼️ Image Captioning
	- Detailed scene descriptions
	- Context-aware narratives
	- Multiple caption styles (descriptive, creative, technical)
	- Accessibility-focused descriptions

	- 🎯 Object Detection
	- Common objects and items
	- People and faces (privacy-conscious)
	- Animals and nature elements
	- Text and document detection
	- Spatial relationships and positioning

	- 🎨 Visual Analysis
	- Color palette extraction
	- Composition analysis
	- Mood and atmosphere detection
	- Style and aesthetic classification
	- Technical image properties

	### AI Models

	- Vision Model: Advanced computer vision models via Hugging Face
	- Captioning: Specialized image-to-text models
	- Object Detection: YOLO-based and transformer models
	- Scene Analysis: Multi-modal AI for comprehensive understanding
	- Accuracy: High-quality results with confidence scoring

	### API Configuration

	- Image Size Limits: 10MB maximum file size
	- Supported Formats: JPEG, PNG, WebP, BMP, GIF
	- Processing Time: 5-30 seconds depending on analysis type
	- Rate Limiting: Standard Gradio Space limits
	- Privacy: Images processed in-memory only, not stored

	## 🏆 Hackathon Submission

	This tool is submitted for the MCP Server Track of the hackathon, demonstrating:

	- ✅ MCP Protocol Compliance: Full implementation of MCP server specification
	- ✅ Production Ready: Enterprise-grade computer vision capabilities
	- ✅ User Experience: Intuitive image upload with real-time preview
	- ✅ Documentation: Comprehensive API documentation and examples
	- ✅ Integration Ready: Easy to integrate with AI assistants and workflows

	## 🎯 Use Cases for AI Assistants

	When integrated with AI assistants via MCP, this tool enables:

	1. Content Creation: "Describe this image for social media caption"
	2. Accessibility Support: "Generate alt-text for this website image"
	3. Document Analysis: "Extract text and analyze this screenshot"
	4. Quality Assessment: "Analyze the composition and quality of this photo"
	5. Educational Support: "Explain what's happening in this historical image"

	## 🔧 Local Development

	### Prerequisites

	- Python 3.11+
	- Computer vision libraries (PIL, OpenCV)
	- AI model dependencies (transformers, torch)

	### Installation

	```bash
	# Clone this repository
	git clone [repository-url]
	cd mcp_image_tool_gradio

	# Install dependencies
	pip install -r requirements.txt

	# Run the application
	python app.py
	```

	### Testing MCP Endpoint Locally

	```bash
	# Test with curl (using image URL)
	curl -X POST http://localhost:7860/gradio_api/mcp/sse \
	-H "Content-Type: application/json" \
	-d '{"data": ["https://example.com/test-image.jpg", "caption"]}'
	```

	## 📊 Performance & Limitations

	### Strengths
	- High-quality image descriptions
	- Multi-format image support
	- Fast processing with GPU acceleration
	- MCP protocol compliance
	- Privacy-focused processing

	### Limitations
	- 10MB file size limit
	- Processing time varies with image complexity
	- Limited to static images (no video)
	- Requires internet for some AI models
	- Best performance with clear, well-lit images

	## 🔒 Privacy & Security

	- No Image Storage: Images processed in-memory only
	- Privacy First: No logging of uploaded images
	- Secure Processing: Sandboxed analysis environment
	- Data Protection: GDPR-compliant image handling
	- Content Safety: Appropriate content filtering

	## 📝 License

	MIT License - Feel free to use and modify for your projects.

	## 🤝 Contributing

	This is a hackathon submission, but feedback and suggestions are welcome! Feel free to:

	- Test image analysis with different photo types
	- Report accuracy issues or missed objects
	- Suggest additional analysis features
	- Contribute new use cases and examples

	## 🏷️ Tags

	`#mcp-server-track` `#computer-vision` `#image-analysis` `#ai-captioning` `#gradio` `#ai-assistant` `#model-context-protocol`

	---

	Built with ❤️ for the MCP Hackathon