--- title: AI Digital Library Assistant emoji: πŸ“š colorFrom: blue colorTo: indigo sdk: gradio sdk_version: "5.38.0" app_file: app.py pinned: false license: mit tags: - mcp-in-action-track-consumer - mcp-in-action-track-creative - building-mcp-track-consumer - building-mcp-track-creative - MCP-1st-Birthday --- Demo Link : https://youtu.be/09Lls0zJ-QE Social media post Link : https://x.com/nihald2000/status/1995198714156286290?s=20 The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library. Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of toolsβ€”Ingestion, Search, and Podcast Generationβ€”that work harmoniously to help you consume information in the way that suits *you* best. ```mermaid graph TD User((πŸ‘€ User)) subgraph "Frontend (Gradio)" UI[Web Interface] PodcastUI[Podcast Studio] end subgraph "MCP Server Layer" MCPServer[Content Organizer MCP Server] subgraph "MCP Tools" IngestTool[πŸ“₯ Ingestion Tool] SearchTool[πŸ” Search Tool] GenTool[✨ Generative Tool] PodTool[🎧 Podcast Tool] end end subgraph "Service Layer" VecStore[(Vector Store)] DocStore[(Document Store)] LLM[LLM Service (OpenAI / Nebius AI)] ElevenLabs[ElevenLabs API] LlamaIndex[LlamaIndex Agent] end User <--> UI UI <--> MCPServer MCPServer --> IngestTool MCPServer --> SearchTool MCPServer --> GenTool MCPServer --> PodTool IngestTool --> VecStore IngestTool --> DocStore SearchTool --> VecStore GenTool --> LLM PodTool --> LlamaIndex PodTool --> ElevenLabs PodTool --> LLM ``` ![AI LIB](https://cdn-uploads.huggingface.co/production/uploads/66f1712d906c08084995f808/TSJexR45eNpUjHhbHDOag.png) ## πŸš€ Quick Start Check out [QUICKSTART.md](QUICKSTART.md) for detailed local setup instructions. 1. **Clone & Install**: ```bash git clone https://huggingface.co/spaces/Nihal2000/AiDigitalLibraryAssistant pip install -r requirements.txt ``` 2. **Configure**: Add your `OPENAI_API_KEY` and `ELEVENLABS_API_KEY` to `.env`. 3. **Run**: `python app.py` ## πŸ’‘ How It Works ### 1. The MCP Core At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client! ```bash { "mcpServers": { "ai-library": { "command": "npx", "args": [ "-y", "mcp-remote", "https://mcp-1st-birthday-ai-digital-library-assistant.hf.space/gradio_api/mcp/sse" ] } } } ``` ### 2. 🎧 Podcast Studio (Star Feature) Turn your reading list into a playlist! The **Podcast Studio** is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast. - **Intelligent Scripting**: Uses **LlamaIndex** and **OpenAI/Nebius AI** to analyze your documents and generate a natural, conversational script. - **Multi-Speaker Synthesis**: Leverages **ElevenLabs** to bring the script to life with distinct, realistic voices for each host. - **Customizable**: Choose your style (Educational, Casual, Teaching) and duration. ### ✨ Features ## πŸ“š Document Management Multi-format Support: PDF, DOCX, TXT, and image files (PNG, JPG, JPEG) Intelligent OCR: Automatic text extraction from images and scanned documents Semantic Chunking: Documents automatically split into meaningful segments for better retrieval Metadata Tracking: Comprehensive document metadata including file size, type, creation date, and custom tags Vector Embeddings: All documents indexed with dense vector embeddings for semantic search ## πŸ” Advanced Search Semantic Search: Find documents by meaning, not just keywords Configurable Results: Adjust the number of results (1-20) based on your needs Relevance Scoring: Each result includes a confidence score Source Attribution: Direct links to source documents with highlighted excerpts ## 🎨 Content Studio Transform your documents with 8 powerful AI tools: Summarize: Generate concise, detailed, bullet-point, or executive summaries Generate Outline: Create structured outlines from topics or documents (3-10 sections) Explain Concept: Get explanations tailored to different audiences (general, technical, beginner, expert) Paraphrase: Rewrite text in various styles (formal, casual, academic, simple, technical) Categorize: Automatically classify content into user-defined categories Key Insights: Extract the most important points from any document Generate Questions: Create comprehension, analysis, application, creative, or factual questions Extract Key Info: Pull out structured information (entities, dates, facts) in JSON format ## 🏷️ Smart Tagging AI-Generated Tags: Automatically generate 3-15 relevant tags for any document Persistent Storage: Tags saved directly to document metadata Batch Processing: Tag multiple documents or custom text snippets ## ❓ RAG-Powered Q&A Context-Aware Answers: Ask questions and get answers grounded in your documents Source Citations: Every answer includes relevant source excerpts Confidence Scoring: Transparency about answer reliability Multi-Document Synthesis: Answers can draw from multiple documents simultaneously ## πŸŽ™οΈ Podcast Studio Convert documents into engaging audio conversations: AI Voice Generation: Ultra-realistic voices powered by ElevenLabs Two-Host Format: Dynamic dialogue between two AI personalities Multiple Styles: Conversational, educational, technical, or casual Custom Duration: 5-30 minute podcasts Voice Selection: Choose from 7+ professional AI voices Full Transcripts: Complete text transcripts for every generated podcast Podcast Library: Browse, play, and manage all generated podcasts ## πŸ“Š Dashboard & Analytics Real-time Stats: Track total documents, vector chunks, and storage usage Recent Activity: View recently added documents at a glance System Health: Monitor vector store, LLM service, and voice service status ## Data Flow ## Document Ingestion: - Files β†’ OCR β†’ Text Extraction β†’ Chunking β†’ Embedding Generation β†’ Vector Store ## Semantic Search: - Query β†’ Embedding β†’ Vector Search β†’ Relevance Ranking β†’ Results ## RAG Q&A: - Question β†’ Search β†’ Context Retrieval β†’ LLM Generation β†’ Answer + Sources ## Podcast Generation: - Documents β†’ Content Analysis β†’ Script Generation β†’ Voice Synthesis β†’ Audio File ### Basic Workflow 1. Upload Documents Navigate to the "πŸ“„ Upload Documents" tab: Click "Select a document" or drag-and-drop files Supported formats: PDF, DOCX, TXT, PNG, JPG, JPEG Click "πŸš€ Process & Add to Library" Wait for processing to complete (OCR runs automatically for images) Note the Document ID from the output 2. Search Your Library Go to "πŸ” Search Documents": Enter a natural language query (e.g., "What are the key findings about climate change?") Adjust "Number of Results" slider (1-20) Click "πŸ” Search" Review results with relevance scores and source excerpts 3. Ask Questions Navigate to "❓ Ask Questions": Type your question about uploaded documents Click "❓ Get Answer" Receive AI-generated answer with source citations Check confidence level and review source documents 4. Generate Content Open "πŸ“ Content Studio": Select a document from dropdown OR paste custom text Choose a task from the dropdown: Summarize, Outline, Explain, Paraphrase, etc. Configure task-specific options in "βš™οΈ Advanced Options" Click "πŸš€ Run Task" Copy or download the generated content 5. Create Podcasts Visit "🎧 Podcast Studio": Select 1-5 documents using checkboxes Choose Style (conversational, educational, technical, casual) Set Duration (5-30 minutes) Select voices for Host 1 and Host 2 Click "πŸŽ™οΈ Generate Podcast" Listen to the generated audio and read the transcript Browse past podcasts in the Podcast Library 6. Generate Tags Go to "🏷️ Generate Tags": Select a document OR paste custom text Adjust "Number of Tags" slider (3-15) Click "🏷️ Generate Tags" ## πŸ† Hackathon Tracks We are submitting to: - **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation. - **MCP in Action (Consumer/Creative)**: For the innovative Podcast interface that makes personal knowledge management accessible and fun. ## πŸ“œ License MIT License. Built with ❀️ for the AI community. ## πŸ™ Acknowledgements & Sponsors This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from: - **[OpenAI](https://openai.com)**: Providing the foundational intelligence for our document analysis and content generation. - **[Nebius AI](https://nebius.com)**: Powering our high-performance inference needs. - **[LlamaIndex](https://www.llamaindex.ai)**: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio. - **[ElevenLabs](https://elevenlabs.io)**: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech. - **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI. - **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible. ## πŸ”Œ Connect to Claude Want to use these tools directly inside Claude Desktop? Check out our [Client Setup Guide](CLIENT_SETUP.md) to connect this MCP server to your local Claude instance!