# Hugging Face Spaces Deployment Guide ## Overview This CX AI Agent is designed to run as a single Gradio app on Hugging Face Spaces without requiring separate server processes. ## Architecture for HF Spaces ### In-Memory Mode The app automatically uses in-memory services instead of HTTP MCP servers: ``` Gradio App ├── Web Search Service (DuckDuckGo) ├── In-Memory MCP Services │ ├── Store Service (in-memory) │ ├── Search Service (web search wrapper) │ ├── Email Service (simulated) │ └── Calendar Service (simulated) └── AI Agents (Hunter, Enricher, etc.) ``` ### Key Features for HF Spaces 1. **No Separate Processes** - Everything runs in a single Gradio app 2. **No Port Management** - All services are in-memory 3. **Free Web Search** - Uses DuckDuckGo (no API key) 4. **Rate Limiting Protection** - Built-in delays and retry logic 5. **Error Handling** - Graceful fallbacks when search fails --- ## Deployment Steps ### 1. Create HF Space 1. Go to https://huggingface.co/spaces 2. Click "Create new Space" 3. Choose: - **SDK**: Gradio - **Python version**: 3.10 - **Space hardware**: CPU Basic (free tier) ### 2. Upload Files Upload these files to your Space: **Required Files:** - `app.py` - Main Gradio app - `requirements_gradio.txt` - Dependencies - `README.md` - Space description **Application Code:** - `app/` - Main application logic - `agents/` - AI agents - `mcp/` - MCP services (in-memory) - `services/` - Web search and discovery - `vector/` - Vector store and embeddings - `data/` - Data files ### 3. Set Environment Variables In your Space settings, add: ```bash # Required HF_API_TOKEN=your_hf_token_here # Optional (with defaults) MODEL_NAME=Qwen/Qwen2.5-7B-Instruct USE_IN_MEMORY_MCP=true # Always true for HF Spaces ``` ### 4. Configure Space **app.py** should be at the root level: ``` your-space/ ├── app.py # Main Gradio app ├── requirements_gradio.txt ├── README.md ├── app/ ├── agents/ ├── mcp/ ├── services/ ├── vector/ └── data/ ``` ### 5. Dependencies **requirements_gradio.txt** should include: ```txt gradio==5.5.0 huggingface-hub>=0.19.3,<1.0 transformers>=4.36.0,<5.0 fastapi==0.109.0 pydantic==2.5.3 aiohttp==3.9.1 sentence-transformers==2.3.1 faiss-cpu==1.7.4 duckduckgo-search==4.1.1 email-validator==2.1.0 python-dotenv==1.0.0 pandas==2.1.4 ``` --- ## Configuration ### Environment Variables | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `HF_API_TOKEN` | Yes | - | HuggingFace API token for inference | | `USE_IN_MEMORY_MCP` | No | `true` | Use in-memory services (always true for HF Spaces) | | `MODEL_NAME` | No | `Qwen/Qwen2.5-7B-Instruct` | LLM model for content generation | ### Automatic Mode Detection The app automatically detects HF Spaces environment and uses in-memory mode: ```python # In mcp/registry.py USE_IN_MEMORY_MODE = os.getenv("USE_IN_MEMORY_MCP", "true").lower() == "true" ``` --- ## Rate Limiting ### DuckDuckGo Rate Limits The web search service includes protection against rate limiting: **Built-in Protection:** - 2-second delay between requests - Exponential backoff on rate limit errors (5s, 10s, 20s) - Maximum 3 retry attempts per query - Fresh DDGS instance for each request **Configuration:** ```python # In services/web_search.py WebSearchService( max_results=10, rate_limit_delay=2.0 # Seconds between requests ) ``` ### Handling Rate Limits If you encounter rate limits: 1. **Reduce Company Count** - Process fewer companies at once 2. **Increase Delay** - Modify `rate_limit_delay` in `web_search.py` 3. **Use Fallbacks** - System automatically uses fallback data --- ## Performance ### Expected Times (Per Company) | Phase | Time | Notes | |-------|------|-------| | Discovery | 5-10s | Web search for company info | | Enrichment | 5-10s | Web search for facts/news | | Contact Finding | 3-5s | Web search for prospects | | Content Generation | 10-20s | LLM generation with HF API | | **Total** | **25-45s** | Per company | ### Optimization Tips 1. **Single Company** - Start with one company to test 2. **Batch Processing** - Process multiple companies sequentially 3. **Caching** - Results are cached in in-memory store 4. **Error Handling** - Fallbacks keep pipeline moving --- ## Troubleshooting ### Issue: Rate Limit Errors **Symptoms:** ``` DuckDuckGoSearchException: Ratelimit ``` **Solutions:** 1. Wait 1-2 minutes and try again 2. Process fewer companies 3. System will automatically retry with backoff ### Issue: Slow Performance **Symptoms:** Pipeline takes >60s per company **Solutions:** 1. Normal for web search (30-60s expected) 2. Use CPU Basic tier (free) 3. Consider upgrading to CPU Upgrade ($9/month) for faster processing ### Issue: Memory Errors **Symptoms:** ``` MemoryError or Out of Memory ``` **Solutions:** 1. Process companies one at a time 2. Clear store between runs: `store.clear_all()` 3. Upgrade to higher tier Space ### Issue: HF API Errors **Symptoms:** ``` HuggingFaceAPIError or 503 errors ``` **Solutions:** 1. Check HF_API_TOKEN is valid 2. Verify model name is correct 3. Check HF API status 4. Wait and retry (HF API rate limits) --- ## Space Configuration ### Recommended Settings **Hardware:** - Free tier: CPU Basic (sufficient for demo) - Production: CPU Upgrade (faster, $9/month) **Visibility:** - Public: Anyone can use - Private: Only you can access **Sleep Mode:** - Disabled: Always on (requires paid plan) - Enabled: Sleeps after inactivity (free tier) ### README.md Include in your Space's README: ```markdown --- title: CX AI Agent - Dynamic Discovery emoji: 🤖 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.5.0 app_file: app.py pinned: false --- # CX AI Agent - Dynamic Discovery Edition Autonomous multi-agent system for customer experience research and outreach. ## Features - 🔍 Dynamic company discovery via web search - 🌐 Live data from DuckDuckGo (no API key needed) - 👥 Real prospect finding - ✍️ AI-generated personalized outreach - ✅ Compliance checking ## Usage 1. Enter a company name (e.g., "Shopify") 2. Click "Discover & Process" 3. Watch real-time discovery and content generation! ## Performance - ~30-60 seconds per company - Uses free DuckDuckGo search - HuggingFace Inference API for LLM ## Limitations - Free tier may have rate limits - Web search can be slow - Demo mode for email/calendar services ``` --- ## Advanced Configuration ### Custom Model Change the LLM model in Space secrets: ```bash MODEL_NAME=meta-llama/Llama-2-7b-chat-hf # or MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2 ``` ### Adjust Rate Limiting Edit `services/web_search.py`: ```python def __init__(self, max_results: int = 10, rate_limit_delay: float = 3.0): # Increase delay to 3 seconds ``` ### Reduce Search Queries Edit `services/company_discovery.py` to reduce queries: ```python # Reduce from 4 queries to 2 queries = [ f"{company_name} official website", f"{company_name} industry business" ] ``` --- ## Cost Estimation ### Free Tier - **Compute**: Free (CPU Basic) - **Storage**: Free (up to 50GB) - **DuckDuckGo**: Free (no limits) - **HF Inference API**: Free tier (limited) **Limitations:** - May sleep after inactivity - Rate limits on HF API - Slower performance ### Paid Tier **CPU Upgrade ($9/month):** - Always on - Faster processing - Higher priority **Resources:** - 2 vCPU cores - 16GB RAM - 50GB storage --- ## Monitoring ### View Logs Check Space logs for: - Web search requests - Rate limit warnings - Error messages - Performance metrics ### Health Check Use the System tab in the UI: - MCP services status - Vector store status - Model configuration --- ## Security ### API Tokens **Never commit tokens to Git!** Use Space secrets: 1. Go to Space → Settings → Secrets 2. Add `HF_API_TOKEN` 3. Reference in code: `os.getenv("HF_API_TOKEN")` ### Data Privacy - No data is stored permanently (in-memory only) - Web searches are anonymous (DuckDuckGo) - HF API calls are private to your account --- ## Support For issues: 1. Check logs in Space console 2. Review error messages 3. See `TROUBLESHOOTING.md` 4. Open GitHub issue --- ## License Same as main project - see LICENSE file.