# Hugging Face Spaces Deployment Guide

## Overview

This CX AI Agent is designed to run as a single Gradio app on Hugging Face Spaces without requiring separate server processes.

## Architecture for HF Spaces

### In-Memory Mode
The app automatically uses in-memory services instead of HTTP MCP servers:

```
Gradio App
├── Web Search Service (DuckDuckGo)
├── In-Memory MCP Services
│   ├── Store Service (in-memory)
│   ├── Search Service (web search wrapper)
│   ├── Email Service (simulated)
│   └── Calendar Service (simulated)
└── AI Agents (Hunter, Enricher, etc.)
```

### Key Features for HF Spaces

1. **No Separate Processes** - Everything runs in a single Gradio app
2. **No Port Management** - All services are in-memory
3. **Free Web Search** - Uses DuckDuckGo (no API key)
4. **Rate Limiting Protection** - Built-in delays and retry logic
5. **Error Handling** - Graceful fallbacks when search fails

---

## Deployment Steps

### 1. Create HF Space

1. Go to https://huggingface.co/spaces
2. Click "Create new Space"
3. Choose:
   - **SDK**: Gradio
   - **Python version**: 3.10
   - **Space hardware**: CPU Basic (free tier)

### 2. Upload Files

Upload these files to your Space:

**Required Files:**
- `app.py` - Main Gradio app
- `requirements_gradio.txt` - Dependencies
- `README.md` - Space description

**Application Code:**
- `app/` - Main application logic
- `agents/` - AI agents
- `mcp/` - MCP services (in-memory)
- `services/` - Web search and discovery
- `vector/` - Vector store and embeddings
- `data/` - Data files

### 3. Set Environment Variables

In your Space settings, add:

```bash
# Required
HF_API_TOKEN=your_hf_token_here

# Optional (with defaults)
MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
USE_IN_MEMORY_MCP=true  # Always true for HF Spaces
```

### 4. Configure Space

**app.py** should be at the root level:
```
your-space/
├── app.py                 # Main Gradio app
├── requirements_gradio.txt
├── README.md
├── app/
├── agents/
├── mcp/
├── services/
├── vector/
└── data/
```

### 5. Dependencies

**requirements_gradio.txt** should include:

```txt
gradio==5.5.0
huggingface-hub>=0.19.3,<1.0
transformers>=4.36.0,<5.0
fastapi==0.109.0
pydantic==2.5.3
aiohttp==3.9.1
sentence-transformers==2.3.1
faiss-cpu==1.7.4
duckduckgo-search==4.1.1
email-validator==2.1.0
python-dotenv==1.0.0
pandas==2.1.4
```

---

## Configuration

### Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `HF_API_TOKEN` | Yes | - | HuggingFace API token for inference |
| `USE_IN_MEMORY_MCP` | No | `true` | Use in-memory services (always true for HF Spaces) |
| `MODEL_NAME` | No | `Qwen/Qwen2.5-7B-Instruct` | LLM model for content generation |

### Automatic Mode Detection

The app automatically detects HF Spaces environment and uses in-memory mode:

```python
# In mcp/registry.py
USE_IN_MEMORY_MODE = os.getenv("USE_IN_MEMORY_MCP", "true").lower() == "true"
```

---

## Rate Limiting

### DuckDuckGo Rate Limits

The web search service includes protection against rate limiting:

**Built-in Protection:**
- 2-second delay between requests
- Exponential backoff on rate limit errors (5s, 10s, 20s)
- Maximum 3 retry attempts per query
- Fresh DDGS instance for each request

**Configuration:**
```python
# In services/web_search.py
WebSearchService(
    max_results=10,
    rate_limit_delay=2.0  # Seconds between requests
)
```

### Handling Rate Limits

If you encounter rate limits:

1. **Reduce Company Count** - Process fewer companies at once
2. **Increase Delay** - Modify `rate_limit_delay` in `web_search.py`
3. **Use Fallbacks** - System automatically uses fallback data

---

## Performance

### Expected Times (Per Company)

| Phase | Time | Notes |
|-------|------|-------|
| Discovery | 5-10s | Web search for company info |
| Enrichment | 5-10s | Web search for facts/news |
| Contact Finding | 3-5s | Web search for prospects |
| Content Generation | 10-20s | LLM generation with HF API |
| **Total** | **25-45s** | Per company |

### Optimization Tips

1. **Single Company** - Start with one company to test
2. **Batch Processing** - Process multiple companies sequentially
3. **Caching** - Results are cached in in-memory store
4. **Error Handling** - Fallbacks keep pipeline moving

---

## Troubleshooting

### Issue: Rate Limit Errors

**Symptoms:**
```
DuckDuckGoSearchException: Ratelimit
```

**Solutions:**
1. Wait 1-2 minutes and try again
2. Process fewer companies
3. System will automatically retry with backoff

### Issue: Slow Performance

**Symptoms:** Pipeline takes >60s per company

**Solutions:**
1. Normal for web search (30-60s expected)
2. Use CPU Basic tier (free)
3. Consider upgrading to CPU Upgrade ($9/month) for faster processing

### Issue: Memory Errors

**Symptoms:**
```
MemoryError or Out of Memory
```

**Solutions:**
1. Process companies one at a time
2. Clear store between runs: `store.clear_all()`
3. Upgrade to higher tier Space

### Issue: HF API Errors

**Symptoms:**
```
HuggingFaceAPIError or 503 errors
```

**Solutions:**
1. Check HF_API_TOKEN is valid
2. Verify model name is correct
3. Check HF API status
4. Wait and retry (HF API rate limits)

---

## Space Configuration

### Recommended Settings

**Hardware:**
- Free tier: CPU Basic (sufficient for demo)
- Production: CPU Upgrade (faster, $9/month)

**Visibility:**
- Public: Anyone can use
- Private: Only you can access

**Sleep Mode:**
- Disabled: Always on (requires paid plan)
- Enabled: Sleeps after inactivity (free tier)

### README.md

Include in your Space's README:

```markdown
---
title: CX AI Agent - Dynamic Discovery
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
---

# CX AI Agent - Dynamic Discovery Edition

Autonomous multi-agent system for customer experience research and outreach.

## Features

- 🔍 Dynamic company discovery via web search
- 🌐 Live data from DuckDuckGo (no API key needed)
- 👥 Real prospect finding
- ✍️ AI-generated personalized outreach
- ✅ Compliance checking

## Usage

1. Enter a company name (e.g., "Shopify")
2. Click "Discover & Process"
3. Watch real-time discovery and content generation!

## Performance

- ~30-60 seconds per company
- Uses free DuckDuckGo search
- HuggingFace Inference API for LLM

## Limitations

- Free tier may have rate limits
- Web search can be slow
- Demo mode for email/calendar services
```

---

## Advanced Configuration

### Custom Model

Change the LLM model in Space secrets:

```bash
MODEL_NAME=meta-llama/Llama-2-7b-chat-hf
# or
MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2
```

### Adjust Rate Limiting

Edit `services/web_search.py`:

```python
def __init__(self, max_results: int = 10, rate_limit_delay: float = 3.0):
    # Increase delay to 3 seconds
```

### Reduce Search Queries

Edit `services/company_discovery.py` to reduce queries:

```python
# Reduce from 4 queries to 2
queries = [
    f"{company_name} official website",
    f"{company_name} industry business"
]
```

---

## Cost Estimation

### Free Tier

- **Compute**: Free (CPU Basic)
- **Storage**: Free (up to 50GB)
- **DuckDuckGo**: Free (no limits)
- **HF Inference API**: Free tier (limited)

**Limitations:**
- May sleep after inactivity
- Rate limits on HF API
- Slower performance

### Paid Tier

**CPU Upgrade ($9/month):**
- Always on
- Faster processing
- Higher priority

**Resources:**
- 2 vCPU cores
- 16GB RAM
- 50GB storage

---

## Monitoring

### View Logs

Check Space logs for:
- Web search requests
- Rate limit warnings
- Error messages
- Performance metrics

### Health Check

Use the System tab in the UI:
- MCP services status
- Vector store status
- Model configuration

---

## Security

### API Tokens

**Never commit tokens to Git!**

Use Space secrets:
1. Go to Space → Settings → Secrets
2. Add `HF_API_TOKEN`
3. Reference in code: `os.getenv("HF_API_TOKEN")`

### Data Privacy

- No data is stored permanently (in-memory only)
- Web searches are anonymous (DuckDuckGo)
- HF API calls are private to your account

---

## Support

For issues:
1. Check logs in Space console
2. Review error messages
3. See `TROUBLESHOOTING.md`
4. Open GitHub issue

---

## License

Same as main project - see LICENSE file.