Okay, let's break down the "KGraph-MCP: The Self-Orchestrating Tool Network" hackathon project into 5 iteratively more comprehensive MVP (Minimum Viable Product) plans. Each MVP builds upon the previous, allowing you to adjust scope based on progress during the hackathon week. The goal is to have a demonstrable Gradio Space at each stage, with increasing complexity and feature richness, targeting the **Track 3: Agentic Demo Showcase** primarily, with optional contributions to Track 1. --- **Hackathon MVP 1: "KG-Powered Tool Suggester"** * **Duration:** Day 1-2 * **Goal:** Demonstrate that a Knowledge Graph (KG) can understand a user's natural language request and suggest relevant, pre-defined MCP Tools. * **Core Primitives:** Tool, basic Prompt (as tool descriptions for search). * **Gradio UI:** 1. Input: Text box for user's natural language goal (e.g., "I want to summarize a news article"). 2. Button: "Find Tools". 3. Output: Display a list of suggested MCP Tools (name, description from KG). * **Backend Logic (Python in Gradio Space):** 1. **Curated Mini KG:** * Manually define metadata for 3-5 diverse MCP Tools (e.g., Summarizer, Sentiment Analyzer, Image Generator Stub). * Load this into a local, in-memory Neo4j-like structure (e.g., Python dictionaries or a tiny SQLite DB for simplicity if full Neo4j on Nebius isn't ready on Day 1) for structured properties. * Load tool descriptions into a local, in-memory Qdrant-like vector store (e.g., using `numpy` for vectors and basic cosine similarity, or a tiny local Qdrant Docker if easy). * **Credits:** Use Azure OpenAI (or your free OpenAI/Anthropic credits) *once* locally to pre-generate embeddings for these 3-5 tool descriptions. 2. **Planner Agent (Simplified):** * Takes user query. * Embeds the query (using same LLM API as above). * Performs semantic search against the in-memory vector store to find top 2-3 relevant tools. * Retrieves full metadata from the in-memory structured store. * Returns list to Gradio UI. * **What it Proves:** Core concept of KG-driven semantic tool discovery. * **Submission Viability:** A very basic but functional agent that "understands" and suggests tools. * **Hackathon Track:** Track 3 (basic agent). * **Sponsor Tech Used:** OpenAI/Anthropic/Azure OpenAI (for embedding). --- **Hackathon MVP 2: "KG Suggests Actionable Tool with Prompt Template"** * **Duration:** Day 2-3 (building on MVP 1) * **Goal:** Extend MVP 1 so the KG not only suggests a tool but also a corresponding Prompt template needed to invoke it. * **Core Primitives:** Tool, Prompt. * **Gradio UI (Enhancements):** 1. When a tool is suggested (from MVP 1 logic), also display: * Associated Prompt Name/Description. * The Prompt Template string (e.g., "Summarize this text: {{text_input}} with max length: {{max_length}}"). * Placeholders for the user to fill in for the prompt template. * **Backend Logic (Enhancements):** 1. **Enhanced Mini KG:** * For each of the 3-5 Tools, define 1-2 `Prompt`s (name, description, `target_tool_id`, `template_string`, `input_variables`). * Load these into your in-memory/local KG structures (Neo4j-like and Qdrant-like for prompt descriptions). 2. **Planner Agent (Enhanced):** * After selecting a Tool, the Planner now also queries the KG to find suitable `Prompt`s for that tool that match the user's intent (can be a simple lookup or another semantic search on prompt descriptions). * **What it Proves:** The KG can manage and link Prompts to Tools, providing a more complete "action plan." * **Submission Viability:** A more intelligent agent that knows *how* to instruct a tool. * **Hackathon Track:** Track 3 (more advanced agent). --- **Hackathon MVP 3: "Interactive Prompt Filling & Simulated Execution"** * **Duration:** Day 3-4 (building on MVP 2) * **Goal:** Allow the user to fill the selected Prompt template via the UI, and then simulate the execution of the Tool, showing a mocked output. * **Core Primitives:** Tool, Prompt, basic Resource (as user input). * **Gradio UI (Enhancements):** 1. After a Tool and Prompt template are displayed, dynamically generate input fields in the UI based on the `input_variables` of the Prompt. 2. User fills these input fields (this user-provided data acts as a simple "Resource"). 3. Button: "Execute Plan (Simulated)". 4. Output: Display a *mocked/hardcoded* success message and example output that the chosen Tool would produce (e.g., "SummarizerTool processed your text. Mock Summary: MCP is cool..."). * **Backend Logic (Enhancements):** 1. **Executor Agent (Stub):** * Receives the selected Tool, the filled Prompt, and user-provided inputs. * Instead of actually running a Docker container, it looks up a predefined mocked response for that Tool/Prompt combination. * **What it Proves:** The full loop from user goal -> KG-driven plan -> user input -> (simulated) execution. Demonstrates the agent's ability to prepare for action. * **Submission Viability:** A complete, interactive agent demo, even if the "execution" is faked. The intelligence is in the KG-driven planning. * **Hackathon Track:** Track 3. * **Sponsor Tech Used:** (Optionally) Use Modal to host this Gradio app if it becomes complex. --- **Hackathon MVP 4: "Real MCP Tool Execution via Gradio Server" (Track 1 Integration)** * **Duration:** Day 4-5 (building on MVP 3, ambitious) * **Goal:** Replace simulated execution with a call to an actual MCP Tool (hosted as a separate Gradio Space). * **Core Primitives:** Tool, Prompt, Resource. * **Gradio UI (No major changes from MVP 3, but output is now real):** * **Backend Logic (Enhancements):** 1. **Track 1 MCP Server Setup:** * Take 1-2 of your simple Tool ideas (e.g., Summarizer using Hugging Face API, Sentiment Analyzer). * Build them as separate Gradio apps and expose them as MCP servers using `gradio_mcp`. Deploy these to Hugging Face Spaces. * Update your Mini KG: The `invocation_command_stub` for these Tools now points to their live MCP server Space URL. * **Credits:** Use Hugging Face API credits within these MCP servers if applicable. 2. **Executor Agent (Real):** * Instead of mocked responses, the Executor agent now makes an HTTP POST request (MCP call) to the live Gradio MCP server URL specified in the KG for the selected Tool. * It sends the filled Prompt content as the payload. * Displays the actual response from the MCP server. * **What it Proves:** End-to-end execution involving your KG-driven agent *and* a live MCP server. Powerful demonstration. * **Submission Viability:** Strong contender for Track 3 and also generates submissions for Track 1. * **Hackathon Track:** Track 3 (main demo), Track 1 (the individual MCP server Gradio apps). * **Sponsor Tech Used:** Hugging Face API (in tools), potentially Modal (for main app or complex tools), Nebius (for KG hosting). --- **Hackathon MVP 5: "KG-Informed Model Preferences for Sampling (Conceptual)"** * **Duration:** Day 5-6 (building on MVP 4, very ambitious, focus on concept) * **Goal:** Show how the KG could inform an MCP `sampling/createMessage` request, particularly the `modelPreferences`, even if the full sampling loop isn't implemented. * **Core Primitives:** Tool, Prompt, Resource, basic conceptual Sampling. * **Gradio UI (Enhancements):** 1. After the Planner selects a Tool/Prompt/Resource, add a section: "Refine with AI Assistance (Sampling Concept)?" with a button "Suggest Refinements". 2. When clicked, display a *formatted JSON object* representing what an MCP `sampling/createMessage` request *would look like*. 3. The `modelPreferences` (cost, speed, intelligence hints) in this JSON should be dynamically populated based on properties stored in your KG for the selected Tool or Prompt (e.g., if a Tool is tagged `low-latency-critical`, `speedPriority` is high). 4. (Optional) Actually make an LLM call with a prompt like "Given these preferences and task, what would be a good model hint?" and show the *LLM's suggested model hint*. * **Backend Logic (Enhancements):** 1. **Enhanced KG:** Add properties to your Tool/Prompt schemas in the KG related to cost, speed, intelligence needs (e.g., `expected_latency_tier: 'low'`, `complexity_tier: 'high'`). 2. **Planner/Selector Agent (Enhanced):** * When preparing the conceptual "sampling request", it queries these new KG properties for the current Tool/Prompt. * It uses this information to construct the `modelPreferences` part of the JSON. 3. **LLM Call (Simulated Sampling Refinement):** * The agent makes an API call to Anthropic/OpenAI/Mistral with a prompt that includes the task, the base prompt template, and asks the LLM to "suggest a model family or refine the prompt for X (cost/speed/intelligence) based on these preferences..." * The LLM's response is shown in the UI. * **What it Proves:** The KG can store rich metadata to guide sophisticated MCP interactions like sampling, making the agent's use of LLMs more intelligent and optimized. * **Submission Viability:** Very innovative, showing deep understanding of MCP. Even without a full sampling client, demonstrating the *intent* and KG-driven construction of the request is powerful. * **Hackathon Track:** Track 3, strong contender for "Most Innovative Use of MCP." * **Sponsor Tech Used:** Anthropic/OpenAI/Mistral (for the sampling simulation), all others as before. --- **General Hackathon Tips for these MVPs:** * **Start with MVP 1 and ensure it's solid.** Only move to the next MVP if time allows and the previous one is demonstrable. * **Hardcode/Mock liberally in early MVPs.** The focus is on showing the *flow* and the *KG's role*, not necessarily building fully robust components initially. * **Version Control (Git) is your best friend.** Commit after every small, successful step. * **Focus on the Video Demo and README.** These are how you'll explain your (potentially complex) idea to the judges. * **Prioritize using sponsor credits** and highlighting this usage. * If you have a team, you can parallelize some of this (e.g., one person on Gradio UI, one on backend KG logic, one on setting up a Track 1 tool). This iterative MVP plan gives you flexibility and ensures you have something valuable to submit even if you don't complete all 5 stages. Good luck! ---- # mvp 3 sprint 3 Okay, let's plan **MVP 3 - Sprint 3: "Integrate Executor & Display Simulated Results"**. This sprint is where the "simulated execution" comes to life in the UI. We'll connect the `StubExecutorAgent` to the Gradio app and make its mocked responses more specific to the tool being "executed." **Sprint Goal (MVP 3 - Sprint 3):** The `StubExecutorAgent` will be integrated into `app.py`. When the user clicks "Execute Plan (Simulated)", the `handle_execute_plan` function will now call the agent's `simulate_execution` method. This method will be enhanced to return different mocked outputs based on the `tool_id` in the plan. The Gradio UI will then display this tool-specific mocked execution result. **Assumptions for Claude 4.0 in Cursor IDE:** * MVP 3 - Sprint 2 is complete: User inputs are collected, and `StubExecutorAgent` exists. * `PlannedStep` objects are correctly passed around. * `.cursor/rules/python_gradio_basic.mdc` is available. * Conventional Commits are being used. --- **Task List for MVP 3 - Sprint 3 - Cursor IDE / Claude Focus:** *(Each task implies: write code, test (manual/unit), lint, format, type-check, commit with Conventional Commits)* **Task 3.1: Enhance `StubExecutorAgent.simulate_execution` for Tool-Specific Mock Responses** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 3 * **Description:** Modify `agents/executor.py`'s `StubExecutorAgent.simulate_execution` method. 1. Instead of a generic mocked response, it should now return different mocked outputs based on `plan.tool.tool_id`. 2. Create a simple internal dictionary or if/elif structure within the method to map `tool_id`s (from your `data/initial_tools.json`) to specific example output strings or structures. 3. The returned dictionary should still include keys like `"status"`, `"message"`, and a more specific `"tool_output"` key containing the tool-specific mock. * Example: If `plan.tool.tool_id == "summarizer_v1"`, `tool_output` could be "Mocked Summary: This is a concise summary of your input text regarding [topic from inputs if possible, else generic]." * Example: If `plan.tool.tool_id == "sentiment_analyzer_v1"`, `tool_output` could be `{"sentiment": "positive", "confidence": 0.95}`. * **Acceptance Criteria:** 1. `simulate_execution` returns different, predictable mocked data based on the `tool_id` of the input `PlannedStep`. 2. Handles cases where a `tool_id` might not have a specific mock (e.g., falls back to a generic mock). 3. Unit tests for `simulate_execution` verify this new behavior. * **TDD Approach:** In `tests/agents/test_executor.py`: * Refactor `test_simulate_execution_returns_structured_mock_data`. * Add new tests like `test_simulate_execution_for_summarizer_tool`, `test_simulate_execution_for_sentiment_tool`, etc. Each test provides a `PlannedStep` with a specific `tool_id` and asserts that the `tool_output` in the response is the expected mock for that tool. * Add `test_simulate_execution_for_unknown_tool` to check fallback behavior. * **Guidance for Claude / Cursor:** ```cursor feat(agent): enhance executor for tool-specific mock responses **Objective:** Make the `StubExecutorAgent` provide more tailored (though still mocked) outputs based on the tool being "executed". **Action 1: Modify `agents/executor.py` (`StubExecutorAgent.simulate_execution`)** 1. Open `@agents/executor.py`. 2. In the `StubExecutorAgent.simulate_execution` method: * Remove the current generic `mock_response`. * Add a dictionary or if/elif structure to define tool-specific mock outputs. Use the `tool_id`s from your `data/initial_tools.json`. ```python # Inside StubExecutorAgent.simulate_execution # ... (print statements for plan and inputs) ... tool_id = plan.tool.tool_id specific_mock_output = f"Generic simulated output for tool ID '{tool_id}'." # Default if tool_id == "summarizer_v1": input_text_key = plan.prompt.input_variables[0] if plan.prompt.input_variables else "text_input" # Assuming first var is main text received_text = inputs.get(input_text_key, "your provided text") specific_mock_output = f"Mocked Summary: The key points of '{received_text[:30]}...' have been identified and condensed." elif tool_id == "sentiment_analyzer_v1": specific_mock_output = { "sentiment_detected": "positive", # Could vary based on inputs for more advanced mock "confidence_score": 0.92, "message": "Sentiment analysis simulation complete." } elif tool_id == "image_caption_generator_stub_v1": input_image_key = plan.prompt.input_variables[0] if plan.prompt.input_variables else "image_url_or_data" image_ref = inputs.get(input_image_key, "the provided image") specific_mock_output = f"Mocked Caption: A beautiful scene depicted in '{image_ref}'. (Generated by image_caption_generator_stub_v1)" elif tool_id == "code_linter_tool_stub_v1": specific_mock_output = { "linting_status": "success", "issues_found": 0, "message": "Code linting simulation complete. No issues found." } # Add more elif for other tools in your initial_tools.json final_response = { "status": "simulated_success", "tool_id_used": plan.tool.tool_id, "tool_name_used": plan.tool.name, "prompt_id_used": plan.prompt.prompt_id, "prompt_name_used": plan.prompt.name, "message": f"Tool '{plan.tool.name}' execution simulated.", "inputs_received": inputs, "tool_specific_output": specific_mock_output # New key for specific output } return final_response ``` 3. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/agents/test_executor.py`** 1. Open `@tests/agents/test_executor.py`. 2. Refactor `test_simulate_execution_returns_structured_mock_data` or create new specific tests: * `test_simulate_execution_for_summarizer`: Create a `PlannedStep` with `tool_id="summarizer_v1"`. Assert `response["tool_specific_output"]` matches the expected mock summary string. * `test_simulate_execution_for_sentiment`: Create a `PlannedStep` with `tool_id="sentiment_analyzer_v1"`. Assert `response["tool_specific_output"]` is the expected dictionary. * Add similar tests for each mocked tool. * `test_simulate_execution_for_unmocked_tool`: Provide a `tool_id` not in the if/elif. Assert it returns the generic fallback output. Please generate the code modifications for `agents/executor.py` and the updated/new tests. ``` **Task 3.2: Integrate `StubExecutorAgent` into `app.py`** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 3 * **Description:** In `app.py`: 1. Instantiate `StubExecutorAgent` globally during the service initialization block. 2. Modify `handle_execute_plan`: * After collecting `collected_inputs_dict` and confirming a `current_plan` exists, it should now call `stub_executor_agent_instance.simulate_execution(current_plan, collected_inputs_dict)`. * The string returned by `handle_execute_plan` (for `execution_output_display`) should now be formatted based on the dictionary returned by `simulate_execution`. It should clearly present the status, message, and the `tool_specific_output`. * **Acceptance Criteria:** 1. `StubExecutorAgent` is instantiated in `app.py`. 2. `handle_execute_plan` calls `simulate_execution` and uses its return value. 3. The `execution_output_display` in the UI shows the tool-specific mocked results. 4. Unit tests for `handle_execute_plan` are updated. * **TDD Approach:** Update tests in `tests/test_app_handlers.py` for `handle_execute_plan`: * Mock `app.planner_agent_instance` as before. * Now also mock `app.stub_executor_agent_instance.simulate_execution` to return different tool-specific mock dictionaries. * Assert that `handle_execute_plan` correctly formats these dictionaries into the display string. * **Guidance for Claude / Cursor:** ```cursor feat(app): integrate executor agent and display simulated results **Objective:** Wire the `StubExecutorAgent` into `app.py` and display its tool-specific mocked results in the Gradio UI. **Action 1: Modify `app.py` for Executor Integration** 1. Open `@app.py`. 2. Import `StubExecutorAgent` from `agents.executor`. 3. In the "Global Service Initialization" block, add: ```python # ... (after planner_agent_instance is initialized) stub_executor_agent_instance = None if planner_agent_instance: # Only init if planner is okay try: stub_executor_agent_instance = StubExecutorAgent() print("StubExecutorAgent initialized successfully.") except Exception as e: print(f"FATAL: Error initializing StubExecutorAgent: {e}") ``` 4. Modify the `handle_execute_plan(original_user_query: str, *prompt_field_values: str) -> str` function: * After successfully forming `collected_inputs_dict` and getting `current_plan`: ```python # Inside handle_execute_plan, after getting current_plan and collected_inputs_dict # ... if not stub_executor_agent_instance: return "Error: Executor service is not available." execution_result = stub_executor_agent_instance.simulate_execution(current_plan, collected_inputs_dict) # Format the execution_result dictionary for display result_md_parts = [f"**Simulation Result for Tool '{execution_result.get('tool_name_used', 'N/A')}':**"] result_md_parts.append(f"- Status: `{execution_result.get('status', 'unknown')}`") result_md_parts.append(f"- Message: {execution_result.get('message', 'No message.')}") tool_output = execution_result.get('tool_specific_output') if isinstance(tool_output, dict): result_md_parts.append("- Tool Output:") result_md_parts.append(f" ```json\n{json.dumps(tool_output, indent=2)}\n ```") elif tool_output: result_md_parts.append(f"- Tool Output: {tool_output}") return "\n".join(result_md_parts) ``` * Ensure the error handling for `planner_agent_instance` being `None` is still robust. 5. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/test_app_handlers.py`** 1. Open `@tests/test_app_handlers.py`. 2. Update tests for `handle_execute_plan`: * These tests will now need to mock `app.stub_executor_agent_instance.simulate_execution` in addition to `app.planner_agent_instance.generate_plan`. * Configure the mock `simulate_execution` to return various structured dictionaries (as defined in Task 3.1 for different tools). * Assert that `handle_execute_plan` calls `simulate_execution` with the correct `PlannedStep` and `inputs`. * Assert that the Markdown string returned by `handle_execute_plan` correctly formats the contents of the dictionary from `simulate_execution`. * Test the case where `stub_executor_agent_instance` is `None`. Please generate the code modifications for `app.py` and the updated tests. ``` **Task 3.3: Manual UI Testing for Simulated Execution Results** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 3 * **Description:** 1. Run `python app.py`. 2. Perform queries to select different tools. 3. Fill in their dynamic input fields. 4. Click "Execute Plan (Simulated)". 5. Verify that the `execution_output_display` Markdown component now shows the tool-specific mocked output from the `StubExecutorAgent`, formatted clearly. 6. Test with different tools to ensure their respective mock outputs are displayed. * **Acceptance Criteria:** 1. UI correctly displays tool-specific simulated execution results. 2. The flow from query -> plan -> input -> simulated result is complete and observable. * **Guidance for Claude / Cursor:** This is a manual testing step by the developer. **Task 3.4: Final Sprint Checks (Dependencies, Linters, Tests, CI)** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 3 * **Description:** 1. Review `requirements.txt` and `requirements-dev.txt`. No new dependencies expected. 2. Regenerate `requirements.lock` if anything changed: `uv pip compile ...`. 3. Run `just install`. 4. Run `just lint`, `just format`, `just type-check`, `just test`. Address any issues. 5. Commit changes using Conventional Commits (e.g., `feat(executor): integrate stub executor and display simulated tool outputs`). 6. Push to GitHub and verify CI pipeline passes. * **Acceptance Criteria:** All local checks pass, CI is green. * **Guidance for Claude / Cursor:** This is a manual developer checklist. --- **End of MVP 3 - Sprint 3 Review:** * **What's Done:** * The `StubExecutorAgent` is now integrated into the `app.py` workflow. * The agent's `simulate_execution` method returns tool-specific mocked responses. * The Gradio UI successfully displays these simulated results, completing the core loop for MVP 3. * **What's Next (MVP 3 - Sprint 4 & 5):** * Focus on refining the UI for execution results, adding more detailed mocked outputs for different tools/inputs, comprehensive testing, and final documentation updates for MVP 3. * **Blockers/Issues:** (Note any challenges in formatting the diverse mocked outputs clearly in the UI). This sprint brings a satisfying sense of completion to the core interaction loop of MVP 3. The user can now see a direct (though simulated) consequence of their inputs based on the agent's plan. --- Okay, let's plan **MVP 3 - Sprint 4: "Refine Simulated Outputs & UI for Execution Results"**. This sprint focuses on making the "simulated execution" feel more meaningful and the UI presentation of these results more polished. We're not adding new agent capabilities but improving the depth of the simulation and the user experience around it. **Sprint Goal (MVP 3 - Sprint 4):** Enhance the `StubExecutorAgent` to produce more varied and context-aware (based on inputs) mocked outputs. Refine the Gradio UI to present these simulated execution results in a clearer, more engaging way. Improve error messaging for the execution phase. **Assumptions for Claude 4.0 in Cursor IDE:** * MVP 3 - Sprint 3 is complete: The UI displays basic tool-specific mocked outputs from the `StubExecutorAgent`. * All necessary classes (`PlannedStep`, `MCPTool`, `MCPPrompt`, `StubExecutorAgent`) are defined. * `.cursor/rules/python_gradio_basic.mdc` is available. * Conventional Commits are being used. --- **Task List for MVP 3 - Sprint 4 - Cursor IDE / Claude Focus:** *(Each task implies: write code, test (manual/unit), lint, format, type-check, commit with Conventional Commits)* **Task 4.1: Enhance `StubExecutorAgent` for Input-Aware Mock Responses** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 4 * **Description:** Modify `agents/executor.py`'s `StubExecutorAgent.simulate_execution` method. 1. For each `tool_id` case (summarizer, sentiment, etc.), make the `specific_mock_output` more sensitive to the `inputs: Dict[str, str]` it receives. 2. Example for "summarizer_v1": * If `inputs` contains `document_content`, the mock summary could include a snippet or word count: `"Mocked Summary: Processed your document of X words. The main theme appears to be about Y. Condensed content: '[First 10 words of input]... [Last 10 words of input]'."` 3. Example for "sentiment_analyzer_v1": * If `inputs` contains `feedback_text` with certain keywords (e.g., "excellent", "poor"), adjust the mocked `sentiment_detected` and `confidence_score` accordingly. (e.g., if "excellent" in input, mock positive sentiment). 4. Add a mock "error" state for one of the tools if a specific (mocked) bad input is given. For example, if the summarizer receives an empty string for `document_content`. The main `status` key in the returned dict could be `"simulated_error"` and `message` would explain it. * **Acceptance Criteria:** 1. `simulate_execution` now generates mocked `tool_specific_output` that reflects (even if crudely) the provided inputs for at least 2-3 tools. 2. The agent can simulate an error state for at least one tool/input combination. 3. Unit tests for `simulate_execution` cover these new input-aware and error scenarios. * **TDD Approach:** In `tests/agents/test_executor.py`: * Add new tests or enhance existing ones: * `test_simulate_execution_summarizer_with_input_content`: Provide inputs, assert the mock summary reflects some aspect of the input. * `test_simulate_execution_sentiment_with_positive_keywords`: Provide input with "excellent", assert mock sentiment is positive. * `test_simulate_execution_sentiment_with_negative_keywords`: Provide input with "terrible", assert mock sentiment is negative. * `test_simulate_execution_summarizer_empty_input_error`: Provide empty input, assert the response dictionary indicates a simulated error. * **Guidance for Claude / Cursor:** ```cursor feat(agent): implement input-aware and error-state mock responses in executor **Objective:** Make the `StubExecutorAgent`'s simulated outputs more dynamic based on received inputs and able to simulate error conditions. **Action 1: Modify `agents/executor.py` (`StubExecutorAgent.simulate_execution`)** 1. Open `@agents/executor.py`. 2. In `StubExecutorAgent.simulate_execution`, refine the logic for generating `specific_mock_output`: ```python # Inside StubExecutorAgent.simulate_execution # ... (tool_id = plan.tool.tool_id) ... # Default status and message status = "simulated_success" message = f"Tool '{plan.tool.name}' execution simulated successfully." specific_mock_output = f"Generic simulated output for tool ID '{tool_id}' with inputs: {inputs}." if tool_id == "summarizer_v1": input_text_key = plan.prompt.input_variables[0] if plan.prompt.input_variables else "document_content" received_text = inputs.get(input_text_key, "").strip() if not received_text: status = "simulated_error" message = "Error: Input text for summarizer cannot be empty." specific_mock_output = {"error_details": "Empty document_content received."} else: word_count = len(received_text.split()) specific_mock_output = (f"Mocked Summary: Processed your input of {word_count} words. " f"Content starts: '{received_text[:20]}...'. Summary simulation complete.") elif tool_id == "sentiment_analyzer_v1": input_text_key = plan.prompt.input_variables[0] if plan.prompt.input_variables else "feedback_text" received_text = inputs.get(input_text_key, "").lower() mock_sentiment = "neutral" mock_confidence = 0.65 if "excellent" in received_text or "great" in received_text or "love" in received_text: mock_sentiment = "positive" mock_confidence = 0.92 elif "poor" in received_text or "terrible" in received_text or "hate" in received_text: mock_sentiment = "negative" mock_confidence = 0.88 specific_mock_output = { "sentiment_detected": mock_sentiment, "confidence_score": mock_confidence, "analyzed_text_snippet": received_text[:50] + "..." } # ... (add more input-aware mocks for other tools) ... # Example for image caption generator elif tool_id == "image_caption_generator_stub_v1": input_image_key = plan.prompt.input_variables[0] if plan.prompt.input_variables else "image_url_or_data" image_ref = inputs.get(input_image_key, "undefined_image_reference") if image_ref == "undefined_image_reference" or not image_ref.strip(): status = "simulated_error" message = "Error: Image reference (URL or data) is missing for caption generation." specific_mock_output = {"error_details": "No image reference provided."} else: specific_mock_output = f"Mocked Caption: A vivid depiction related to '{image_ref}'. (Simulated by image_caption_generator_stub_v1)" final_response = { "status": status, # Use the dynamic status "tool_id_used": plan.tool.tool_id, # ... (rest of the keys are the same) ... "message": message, # Use the dynamic message "inputs_received": inputs, "tool_specific_output": specific_mock_output } return final_response ``` 3. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/agents/test_executor.py`** 1. Open `@tests/agents/test_executor.py`. 2. Add new tests to cover the input-aware logic and error simulation: * `test_simulate_execution_summarizer_input_aware`: Provide non-empty text, assert output reflects it. * `test_simulate_execution_summarizer_empty_input_error`: Provide empty text, assert `response["status"] == "simulated_error"` and message is appropriate. * `test_simulate_execution_sentiment_positive_keywords`: Provide text with "great", assert positive mock sentiment. * `test_simulate_execution_sentiment_negative_keywords`: Provide text with "poor", assert negative mock sentiment. * (Add similar tests for other tools you make input-aware). Please generate the code modifications for `agents/executor.py` and the new/updated tests. ``` **Task 4.2: Refine Gradio UI Display of Execution Results** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 4 * **Description:** In `app.py`, modify how the `execution_output_display` (`gr.Markdown`) component renders the result from `StubExecutorAgent.simulate_execution`. 1. The `handle_execute_plan` function's formatting logic needs to be more robust. 2. Clearly distinguish between success and simulated error states (e.g., different Markdown styling, emojis). 3. If `tool_specific_output` is a dictionary (like for sentiment), format it nicely (e.g., as a JSON block or bullet points within the Markdown). 4. Ensure the "Execute Plan (Simulated)" button is perhaps temporarily disabled or its text changes to "Executing..." while `handle_execute_plan` is running, then re-enabled. (This is a Gradio interactivity feature, `gr.Button.update(interactive=False)`). * **Acceptance Criteria:** 1. UI clearly indicates simulated success or error. 2. Tool-specific output is formatted for good readability. 3. (Stretch) Execute button provides feedback during processing. 4. Unit tests for `handle_execute_plan` output formatting are updated. * **TDD Approach:** Update tests in `tests/test_app_handlers.py` for `handle_execute_plan`: * Mock `stub_executor_agent_instance.simulate_execution` to return success and error dictionaries. * Assert that the Markdown string generated by `handle_execute_plan` correctly reflects these states and formats the `tool_specific_output` appropriately. * **Guidance for Claude / Cursor:** ```cursor feat(ui): refine display of simulated execution results and button interactivity **Objective:** Improve how `app.py` presents simulated execution outcomes and add feedback to the execute button. **Action 1: Modify `handle_execute_plan` in `app.py` for better formatting & button feedback** 1. Open `@app.py`. 2. Refactor the part of `handle_execute_plan` that formats `execution_result` for the `execution_output_display` Markdown component. ```python # Inside handle_execute_plan # ... (after getting execution_result from executor) ... # def handle_execute_plan(original_user_query: str, *prompt_field_values: str) -> \ # Tuple[str, gr. เคถเฅเคฐเคพ, gr.update]: # Update return for button state # ... (existing logic to get plan and inputs, call executor) ... # Initial button state update for "Executing..." # This requires handle_find_tools to also return an update for the execute_button # to make it interactive in the first place. # For now, let's focus on the output formatting. # We'll assume the button is already interactive. # The button update would be: yield gr.Button.update(value="๐Ÿ”„ Executing...", interactive=False) # For simplicity in this step, we'll return a single Markdown string. # True button interactivity is a more advanced Gradio feature involving generators or gr.Request. execution_result = stub_executor_agent_instance.simulate_execution(current_plan, collected_inputs_dict) result_md_parts = [] status = execution_result.get('status', 'unknown') if status == "simulated_success": result_md_parts.append(f"โœ… **Execution Simulated Successfully for Tool '{execution_result.get('tool_name_used', 'N/A')}':**") elif status == "simulated_error": result_md_parts.append(f"โŒ **Simulated Execution Error for Tool '{execution_result.get('tool_name_used', 'N/A')}':**") else: result_md_parts.append(f"โ„น๏ธ **Simulation Status for Tool '{execution_result.get('tool_name_used', 'N/A')}':** `{status}`") result_md_parts.append(f"- Message: {execution_result.get('message', 'No message.')}") tool_output = execution_result.get('tool_specific_output') if tool_output: result_md_parts.append("- Tool's Simulated Output:") if isinstance(tool_output, dict): # Pretty print JSON for dictionary outputs result_md_parts.append(f" ```json\n{json.dumps(tool_output, indent=2)}\n ```") elif isinstance(tool_output, list): # Could format lists nicely too result_md_parts.append(f" ```json\n{json.dumps(tool_output, indent=2)}\n ```") else: # Assume string result_md_parts.append(f" ```text\n{tool_output}\n ```") # Use text block for strings # final_markdown = "\n".join(result_md_parts) # yield gr.Button.update(value="๐Ÿš€ Execute Plan (Simulated)", interactive=True) # Re-enable # yield final_markdown # For non-generator version (simpler): return "\n".join(result_md_parts) ``` 3. (Stretch goal - more complex Gradio usage): To make the button change to "Executing..." and then back, `handle_execute_plan` would need to become a generator function yielding updates, or you'd use `gr.Request` to access UI state. For the hackathon, simply returning the final Markdown string is fine, and the button remains clickable. 4. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `gr.Blocks()` layout in `app.py` (minor)** 1. Ensure `execution_output_display = gr.Markdown("", elem_id="execution_output_display", label="Execution Result")` has a label. **Action 3: Update `tests/test_app_handlers.py`** 1. Open `@tests/test_app_handlers.py`. 2. Update tests for `handle_execute_plan` to check the new Markdown formatting: * Test that success messages include "โœ…". * Test that error messages include "โŒ". * Test that dictionary `tool_specific_output` is formatted as a JSON block in Markdown. * Test that string `tool_specific_output` is formatted as a text block. Please generate the code modifications for `app.py` (focusing on the non-generator version for `handle_execute_plan` return for simplicity) and the updated tests. ``` **Task 4.3: Manual UI Testing for Enhanced Execution Results** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 4 * **Description:** 1. Run `python app.py`. 2. Test various plans (different tools, different inputs to trigger varied mock responses and error states). 3. Verify the `execution_output_display` correctly formats: * Success vs. error states. * Input-aware mock outputs. * Dictionary outputs (e.g., from sentiment analyzer) as readable JSON. * String outputs clearly. 4. Check overall UI flow and clarity. * **Acceptance Criteria:** 1. UI display of simulated results is clear, informative, and correctly reflects different output types and statuses. * **Guidance for Claude / Cursor:** This is a manual testing step by the developer. **Task 4.4: Final Sprint Checks (Dependencies, Linters, Tests, CI)** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 4 * **Description:** 1. Standard checks: `requirements.lock`, `just install`, `just lint`, `just format`, `just type-check`, `just test`. 2. Commit changes (e.g., `feat(executor): enhance simulated outputs with input awareness and error states`, `feat(ui): improve display of execution results`). 3. Push to GitHub and verify CI pipeline passes. * **Acceptance Criteria:** All local checks pass, CI is green. * **Guidance for Claude / Cursor:** Manual developer checklist. --- **End of MVP 3 - Sprint 4 Review:** * **What's Done:** * The `StubExecutorAgent` now produces more varied and context-sensitive (based on inputs) mocked outputs, including simulated error states. * The Gradio UI has been refined to present these simulated execution results more clearly, distinguishing success/error and formatting different output types (string, dict) nicely. * **What's Next (MVP 3 - Sprint 5):** * Final overall testing of all MVP 3 features. * Comprehensive updates to project `README.md` and Hugging Face Space `README.md` to describe the full interactive loop of MVP 3. * Prepare for potential deployment/submission of the completed MVP 3. * **Blockers/Issues:** (Note any remaining UI/UX quirks or complexities in mocking diverse tool behaviors). This sprint makes the "simulation" much more believable and informative, significantly enhancing the demo quality of MVP 3. --- Okay, here's the plan for the final sprint of MVP 3, **Sprint 5: "MVP 3 Finalization, Documentation & Deployment Prep"**. This sprint is all about ensuring MVP 3 ("Interactive Prompt Filling & Simulated Execution") is polished, robust, comprehensively documented, and ready for submission or to serve as a very strong foundation for the next major development phase. **Sprint Goal (MVP 3 - Sprint 5):** Conduct thorough final testing of all MVP 3 functionalities. Update all project documentation (`README.md` for GitHub and Hugging Face Space) to accurately reflect the complete interactive capabilities of MVP 3. Ensure the codebase is pristine, all checks pass, CI is green, and the application is fully prepared for deployment to a Hugging Face Space. **Assumptions for Claude 4.0 in Cursor IDE:** * MVP 3 - Sprint 4 is complete: The UI displays input-aware, tool-specific simulated execution results. * All necessary classes and functions are in place. * `.cursor/rules/python_gradio_basic.mdc` is available. * Conventional Commits are being used. --- **Task List for MVP 3 - Sprint 5 - Cursor IDE / Claude Focus:** *(Each task implies: review, write/refine, test manually/unit test, lint, format, type-check, commit with Conventional Commits)* **Task 5.1: Full End-to-End Regression Testing of MVP 1, MVP 2, and MVP 3 Features** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 5 * **Description:** 1. Systematically re-test all features implemented from MVP 1 through MVP 3. * **MVP 1 Checks:** Semantic tool suggestion based on query. * **MVP 2 Checks:** Correct (Tool, Prompt) pair selection; display of Tool name/desc, Prompt name/desc/template/input_var_names. * **MVP 3 Checks:** Dynamic input field generation based on selected prompt; collection of user inputs from these fields; triggering simulated execution; display of tool-specific, input-aware mocked results (including success and error states). 2. Test with a diverse set of queries and inputs to cover as many paths as possible with the existing 3-5 tools and their prompts. 3. Pay attention to UI state transitions (e.g., visibility of input fields, execute button, clearing of previous results). 4. Identify and fix any bugs or regressions found. * **Acceptance Criteria:** 1. All previously implemented features work as expected alongside new MVP 3 features. 2. The application is stable and robust across the full implemented user flow. 3. Any critical bugs identified are fixed. * **Guidance for Claude / Cursor:** ```cursor test(app): comprehensive regression and e2e testing for MVP3 **Objective:** Create a final test plan for all features up to MVP3 completion. **Action: Generate Final Test Plan Scenarios** Please outline a comprehensive test plan with 10-12 scenarios that cover the full functionality of MVP1, MVP2, and MVP3. For each scenario: 1. **Description:** What is being tested (e.g., "Query for summarizer, fill inputs, simulate execution"). 2. **User Query/Input Steps:** What the user types/does. 3. **Expected Intermediate UI State:** What the UI should show after planning (tool/prompt info, input fields). 4. **Expected Inputs to Provide:** What values to type into dynamic fields. 5. **Expected Final UI State:** What the simulated execution result should look like. Include scenarios for: - Each of the 3-5 tools defined in `data/initial_tools.json` being planned and "executed". - Prompts with no input variables (direct execution). - Prompts with one or more input variables. - Triggering a simulated error response from the `StubExecutorAgent`. - Empty user query at the start. - Query that results in "No actionable plans found". This plan will guide final manual testing. If specific code paths seem untested by this plan, suggest additional scenarios. ``` *(Developer then manually executes this comprehensive test plan, using Claude to help debug or refine code if issues arise.)* **Task 5.2: Final UI/UX Polish and Consistency Check** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 5 * **Description:** 1. Review the entire Gradio UI flow from query input to simulated execution result. 2. Ensure consistent terminology, clear labels, helpful placeholders, and logical flow. 3. Check for any visual clutter or confusing elements. Make small adjustments for better aesthetics and usability. 4. Ensure error messages (e.g., "No plan found", "Executor not available", simulated tool errors) are user-friendly. * **Acceptance Criteria:** 1. The UI is polished, intuitive, and provides a good user experience for the MVP 3 scope. 2. Error states are communicated clearly. * **Guidance for Claude / Cursor:** ```cursor style(ui): final UI/UX polish for MVP3 Gradio app **Objective:** Ensure the Gradio application in `app.py` is as clear and user-friendly as possible for MVP3. **Action: Request UI/UX Review and Suggestions** Please review the current Gradio interface defined in `@app.py` (considering all components: query input, plan display, dynamic input fields, execute button, execution result display). Suggest: 1. Any improvements to component labels or placeholders for clarity. 2. Ways to make the distinction between the "planning" phase output and the "execution" phase output more visually clear. 3. Improvements to how simulated error messages from the executor are presented (e.g., using specific Markdown formatting or emojis). 4. Any other minor tweaks that would enhance the overall user experience for this interactive demo. Focus on small, impactful changes achievable within the hackathon timeframe. ``` **Task 5.3: Update All Project Documentation (GitHub README, Space README) for MVP 3** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 5 * **Description:** 1. Thoroughly update the main GitHub `README.md` to reflect the complete functionality of MVP 3. * The "How KGraph-MCP (MVP3) Works" section must now detail the interactive prompt filling and simulated execution loop. * Update the conceptual diagram (Mermaid or other) to show user input for prompts and the Executor Agent's role. * Ensure the "Features" list is comprehensive for MVP 3. * Update "Technologies Used" and "Sponsor Credits Utilized" accurately. 2. Update the `HUGGINGFACE_SPACE_README.md` (or the actual README in the Space repo) with similar comprehensive information about MVP 3, making it easy for hackathon judges and users to understand the full demo. Re-check all Hugging Face metadata and hackathon tags. * **Acceptance Criteria:** 1. Both README files are fully updated, accurate for MVP3, and provide a clear, compelling overview of the project's current state and capabilities. * **Guidance for Claude / Cursor:** ```cursor docs(readme): comprehensively update all documentation for MVP3 completion **Objective:** Ensure both GitHub and Hugging Face Space READMEs fully describe MVP3. **Action 1: Update GitHub `README.md`** 1. Open `@README.md`. 2. Based on the Hackathon Report Draft template and current MVP3 functionality, significantly update the following sections: * **Project Vision & Introduction:** Emphasize that the demo now showcases an interactive loop. * **How KGraph-MCP Works (Current MVP3 Functionality):** This is a key section. Detail the user query, KG-driven planning (Tool+Prompt), dynamic UI for prompt inputs, user filling inputs, and the `StubExecutorAgent` providing input-aware simulated results. Update the Mermaid diagram to include the user providing prompt inputs and the Executor step. * **Features List:** Add "Interactive prompt input filling" and "Tool execution simulation with input-aware mock responses." * **Our Development Process:** Reiterate if needed. * **Technologies Used / Sponsor Credits:** Final check for accuracy. * **Future Vision:** Briefly mention how this MVP3 sets up for actual tool execution or more advanced agent features. **Action 2: Update Hugging Face Space `README.md`** 1. Open `@HUGGINGFACE_SPACE_README.md`. 2. Update its description and "How to Use" to guide users through the full interactive process of MVP3 (querying, seeing the plan, filling inputs, executing). 3. Verify all Hugging Face card data and hackathon tags are correct and present. Claude, please help draft the updated "How KGraph-MCP Works (Current MVP3 Functionality)" section for the main `README.md`, including a revised Mermaid diagram concept. ``` **Task 5.4: Final Code Review, Cleanup, and Sanity Checks** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 5 * **Description:** 1. Perform a final pass over the entire MVP3 codebase (`app.py`, `agents/`, `kg_services/`). 2. Remove any final debug statements, dead code, or unused imports. 3. Ensure all public functions/classes have adequate docstrings and comments are clear. 4. Manually check that API keys are definitely not hardcoded and are being loaded from environment variables / Space secrets correctly. 5. Verify `requirements.txt` is minimal and correct for Hugging Face Space deployment. * **Acceptance Criteria:** 1. MVP3 codebase is exceptionally clean, well-documented, and secure regarding secrets. * **Guidance for Claude / Cursor:** ```cursor chore(code): final codebase review and hardening for MVP3 **Objective:** Conduct a meticulous final review of the MVP3 codebase. **Action: Request Final Code Review Pointers** Please perform a final review of all Python files in the project (`app.py`, files in `agents/`, `kg_services/`). Specifically look for: 1. Any remaining `print()` statements that were used for debugging and are no longer needed. 2. Code comments that might be outdated or unclear after recent changes. 3. Functions or classes that lack sufficient docstrings explaining their purpose, arguments, and returns. 4. Any potential areas where error handling could be slightly improved (e.g., around file I/O, API calls if any were live). 5. Confirmation that no API keys or sensitive credentials are hardcoded. This will serve as a checklist for the developer's final manual sweep. ``` **Task 5.5: Final Local & CI Checks, Prepare Submission Artifacts, Tag Release** * Status: Pending * **Parent MVP:** MVP 3 * **Parent Sprint (MVP 3):** Sprint 5 * **Description:** 1. Run all `just` commands locally: `just install`, `just lint`, `just format`, `just type-check`, `just test`. Resolve any outstanding issues. 2. Commit all final changes for MVP 3 with a clear Conventional Commit message (e.g., `chore(release): finalize MVP3 - interactive prompt filling and simulated execution`). 3. Push to GitHub. 4. Verify the GitHub Actions CI pipeline passes for the final MVP 3 commit. 5. Prepare the video overview: Record a smooth demo of the full MVP 3 flow. 6. (Optional but good) Create a Git tag for the MVP 3 submission state (e.g., `git tag v0.3.0-mvp3-hackathon && git push --tags`). 7. Ensure all links (GitHub repo, Video, Space URL) are ready for the hackathon submission form. * **Acceptance Criteria:** 1. All local checks pass. 2. All MVP 3 code is committed and pushed. 3. CI pipeline is green. 4. MVP 3 is officially "complete" and ready for hackathon submission. 5. Video demo is recorded. * **Guidance for Claude / Cursor:** This is a final manual checklist for the developer. --- **End of MVP 3 - Sprint 5 & Overall MVP 3 Review:** * **What's Done:** * Hackathon MVP 3 ("Interactive Prompt Filling & Simulated Execution") is fully implemented, rigorously tested, documented, and polished. * The Gradio app provides a complete interactive loop: user query -> KG-driven plan (Tool+Prompt) -> dynamic UI for prompt inputs -> user provides inputs -> simulated tool execution with input-aware mock results displayed. * The project is in an excellent state for a compelling hackathon submission for Track 3. * **What's Next (Post-Hackathon or if time permits, MVP 4):** * Replacing simulated execution with actual MCP Tool execution (e.g., running local Docker containers or calling live Gradio MCP servers created for Track 1). * Introducing basic Root-based scoping or a simple HIL for sampling. * **Blockers/Issues:** (Note any final learnings or reflections from completing the full MVP 3). This sprint solidifies MVP 3 into a strong, demonstrable project. Congratulations on reaching this stage! ==== Okay, this is an ambitious but very impactful MVP for the hackathon! Let's break down **Hackathon MVP 4: "Real MCP Tool Execution via Gradio Server"** into 5 manageable sprints. This MVP involves creating separate projects (the Track 1 MCP server Gradio apps) and then integrating them into your main agentic demo. **Recap of Hackathon MVP 4 Goal:** * **Goal:** Replace simulated execution with calls to actual, live MCP Tools (hosted as separate Gradio Spaces). * **Core Primitives:** Tool, Prompt, Resource (user input). * **Builds Upon:** MVP 3 (interactive prompt filling, Executor agent structure). --- **Sprint Structure for MVP 4 (Building on MVP 3 Foundation):** **Sprint 1 (MVP 4): Develop & Deploy First Track 1 MCP Server (e.g., Summarizer)** * **Goal:** Create, test, and deploy one simple MCP Tool as a Gradio app on a Hugging Face Space. This will be your first Track 1 submission. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 4 - Sprint 1: Develop & Deploy First Track 1 MCP Server (Summarizer) **Task 1.1: Create New Project/Directory for MCP Summarizer Tool** - Status: Pending - Description: Create a new, separate project directory (e.g., `mcp_summarizer_tool_gradio`). Set up a minimal Python environment (`uv venv`, `requirements.txt` with `gradio`, `gradio_mcp`, and any model-specific libraries like `transformers` or `huggingface_hub`). - Acceptance Criteria: New project directory and basic Python environment set up. - Guidance for Claude / Cursor: ```cursor chore(project): setup new project for MCP Summarizer Tool **Objective:** Initialize a new project for the Summarizer MCP Tool. **Action:** 1. Create a directory named `mcp_summarizer_tool_gradio`. 2. Inside it, create a `.python-version` file (e.g., with `3.11.8`). 3. Create `requirements.txt` with initial dependencies: ``` gradio>=4.0.0 # Check latest Gradio 4/5 version gradio_mcp # Add transformers and sentencepiece if using a local/HF summarization model # transformers # sentencepiece # torch # if needed by transformers model ``` 4. Create a basic `app.py` file stub. 5. Create a `.gitignore`. 6. (Developer will manually run `uv python install`, `uv venv`, `uv pip install -r requirements.txt`) ``` **Task 1.2: Implement Summarizer Logic** - Status: Pending - Description: In `mcp_summarizer_tool_gradio/app.py`, write the Python function that performs text summarization. - Option A (Simple): Use a Hugging Face Inference API call for summarization (utilize HF API credits). - Option B (More Control/Offline): Load a small summarization model from Hugging Face Hub (e.g., `sshleifer/distilbart-cnn-6-6` or similar lightweight one) using `transformers` pipeline. - The function should take `text_input: str` and `max_length: int` (optional) as arguments and return `summary_text: str`. - Acceptance Criteria: Summarization function works correctly and is testable. - TDD: Write a unit test for this summarization function (mocking HF API if Option A, or testing with a small model if Option B). - Guidance for Claude / Cursor: ```cursor feat(tool): implement core text summarization logic **Objective:** Create the Python function for text summarization in the new MCP tool project. **Action: Implement in `mcp_summarizer_tool_gradio/app.py`** *(Assuming Option A for speed - Hugging Face Inference API)* ```python # In mcp_summarizer_tool_gradio/app.py import os import requests # for HF Inference API from dotenv import load_dotenv # For local HF_TOKEN testing load_dotenv() # Load .env for local testing if HF_TOKEN is there HF_API_TOKEN = os.getenv("HF_TOKEN") # For HF Inference API # Recommended model for summarization via Inference API # Or choose another from https://huggingface.co/models?pipeline_tag=summarization&sort=trending SUMMARIZATION_API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn" def summarize_text(text_to_summarize: str, max_len: int = 150, min_len: int = 30) -> str: if not HF_API_TOKEN: return "Error: Hugging Face API Token (HF_TOKEN) not configured." if not text_to_summarize or not text_to_summarize.strip(): return "Error: Input text cannot be empty." headers = {"Authorization": f"Bearer {HF_API_TOKEN}"} payload = { "inputs": text_to_summarize, "parameters": {"max_length": max_len, "min_length": min_len} } try: response = requests.post(SUMMARIZATION_API_URL, headers=headers, json=payload, timeout=30) response.raise_for_status() # Raise an exception for HTTP errors result = response.json() if isinstance(result, list) and result: # API often returns a list return result[0].get("summary_text", "Error: Could not extract summary.") return "Error: Unexpected response format from summarization API." except requests.exceptions.RequestException as e: return f"Error calling summarization API: {e}" except Exception as e: return f"An unexpected error occurred: {e}" # Basic test # if __name__ == "__main__": # sample_text = "Replace with a long news article snippet for testing." # summary = summarize_text(sample_text) # print(f"Original: {sample_text[:100]}...") # print(f"Summary: {summary}") ``` Ensure `requests` and `python-dotenv` (for local testing) are in `requirements.txt`. Create `tests/test_summarizer.py` with a mocked `requests.post` to test this function. ``` **Task 1.3: Create Gradio UI and MCP Server Endpoint** - Status: Pending - Description: In `mcp_summarizer_tool_gradio/app.py`: - Create a simple Gradio interface (`gr.Interface`) for the summarizer function (inputs: text, max_length; output: text). - Use `gradio_mcp. letzten(iface, mcp_path="/mcp")` (or `gradio_mcp.patch(iface, mcp_path="/mcp")` depending on the exact `gradio_mcp` API - check its docs) to expose this Gradio app as an MCP server at the `/mcp` path. - The MCP endpoint should expect a JSON payload like `{"fn_index": 0, "data": ["text_to_summarize_value", max_length_value]}` and return `{"data": ["summary_output"]}`. - Acceptance Criteria: Gradio app runs locally. MCP endpoint is accessible (can test with `curl` or Postman locally). - Guidance for Claude / Cursor: ```cursor feat(tool): create Gradio UI and MCP endpoint for summarizer **Objective:** Wrap the summarizer function in a Gradio UI and expose it as an MCP server. **Action: Update `mcp_summarizer_tool_gradio/app.py`** ```python # In mcp_summarizer_tool_gradio/app.py # ... (summarize_text function and imports) ... import gradio as gr import gradio_mcp # Make sure this is installed: uv pip install gradio_mcp # (summarize_text function from Task 1.2) inputs = [ gr.Textbox(lines=10, label="Text to Summarize", placeholder="Enter long text here..."), gr.Number(label="Max Summary Length", value=150, minimum=20, maximum=500, step=10) ] outputs = gr.Textbox(label="Summary") iface = gr.Interface( fn=summarize_text, inputs=inputs, outputs=outputs, title="MCP Text Summarizer Tool", description="Enter text to get a summary. This app also serves as an MCP Tool.", allow_flagging="never" ) # Patch the interface to make it an MCP server # Check the latest gradio_mcp documentation for the exact patching method. # It might be `gradio_mcp.patch(iface, mcp_path="/mcp")` or similar. # For example, if using an older style or if patch doesn't exist: # iface = gradio_mcp. letzten(iface, mcp_path="/mcp") # Or equivalent new API # The most modern way, if `gradio_mcp` supports it, might be directly in launch or a wrapper. # Let's assume a common pattern: app, local_url, share_url = gradio_mcp.patch(iface, mcp_path="/mcp").launch(quiet=True, show_error=True) # Or if .launch() is separate: # patched_iface = gradio_mcp.patch(iface, mcp_path="/mcp") # if __name__ == "__main__": # patched_iface.launch() ``` *(Claude should be guided to use the correct `gradio_mcp` API based on its knowledge or by providing it a link to the `gradio_mcp` docs if it struggles).* Add `gradio_mcp` to `requirements.txt`. ``` **Task 1.4: Prepare README and Deploy to Hugging Face Space** - Status: Pending - Description: - Create a `README.md` for the `mcp_summarizer_tool_gradio` project. Include Hugging Face Space metadata, hackathon tags (`mcp-server-track`), instructions on how to use it as a Gradio app and as an MCP server (mentioning the `/mcp` path and expected payload). - Deploy this Gradio app to a new Hugging Face Space. - Set the `HF_TOKEN` as a secret in the Space if using HF Inference API. - Test the deployed Space UI and its MCP endpoint. - Acceptance Criteria: Summarizer MCP Tool is live on a Hugging Face Space and functional. README is complete. - Guidance for Claude / Cursor: ```cursor docs(tool): prepare README and deploy summarizer tool to HF Space **Objective:** Document and deploy the MCP Summarizer Tool. **Action 1: Create `mcp_summarizer_tool_gradio/README.md`** Generate a README.md for this specific tool's Space. Include: - YAML frontmatter for Hugging Face Space (title, emoji, sdk, app_file, python_version, tags including `mcp-server-track`). - Description of the tool. - How to use the Gradio UI. - **How to use as an MCP Server:** - Endpoint: `/mcp` - Method: `POST` - Expected Payload: `{ "data": ["", ] }` (fn_index usually 0 for simple interfaces) - Expected Response: `{ "data": [""] }` - Mention of sponsor credits used (Hugging Face API). **Action 2: (Developer Task) Deploy to Hugging Face Space** - Create a new Space on Hugging Face. - Push the `mcp_summarizer_tool_gradio` project files to it. - Set `HF_TOKEN` as a Space secret if using the Inference API. - Test the deployed Space. ``` --- *The subsequent sprints for MVP 4 will follow a similar pattern for the second tool (e.g., Sentiment Analyzer), then focus on integrating these live tools into the main KGraph-MCP agent application.* **Sprint 2 (MVP 4): Develop & Deploy Second Track 1 MCP Server (e.g., Sentiment Analyzer)** * **Goal:** Create, test, and deploy a second simple MCP Tool (e.g., Sentiment Analyzer) as another Gradio app on a separate Hugging Face Space. This will be your second Track 1 submission. * **Tasks:** * **Task 2.1:** Create New Project/Directory for MCP Sentiment Analyzer Tool (similar to Task 1.1). * **Task 2.2:** Implement Sentiment Analysis Logic (e.g., using HF Inference API for a sentiment model, or a `transformers` pipeline). (Similar to Task 1.2, with unit tests). * **Task 2.3:** Create Gradio UI and MCP Server Endpoint for Sentiment Analyzer. (Similar to Task 1.3). * **Task 2.4:** Prepare README and Deploy Sentiment Analyzer to a new Hugging Face Space. (Similar to Task 1.4). --- **Sprint 3 (MVP 4): Update Main KG & Executor Agent for Real MCP Calls** * **Goal:** Modify the main KGraph-MCP application: update the `InMemoryKG` to point to the live Space URLs for the two new MCP servers, and refactor the `ExecutorAgent` (renaming `StubExecutorAgent` or creating a new `RealExecutorAgent`) to make actual HTTP POST requests to these live MCP endpoints. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 4 - Sprint 3: Update Main KG & Executor Agent for Real MCP Calls **Task 3.1: Update `data/initial_tools.json` in Main Project** - Status: Pending - Description: In your main `KGraph-MCP-Hackathon` project, edit `data/initial_tools.json`. - For the "summarizer_v1" and "sentiment_analyzer_v1" tools (and any others you've made live): - Update their `invocation_command_stub` field to store the full live Hugging Face Space URL for their `/mcp` endpoint (e.g., `https://username-space-name.hf.space/mcp`). - Potentially add a new field like `mcp_server_type: "live_gradio"` to distinguish from future local Docker tools. - Acceptance Criteria: `initial_tools.json` updated with live MCP server URLs. - Guidance for Claude / Cursor: ```cursor chore(kg): update tool metadata with live MCP server URLs **Objective:** Point the main project's KG to the deployed Track 1 MCP servers. **Action: Modify `KGraph-MCP-Hackathon/data/initial_tools.json`** Update the entries for "summarizer_v1" and "sentiment_analyzer_v1": - Change `invocation_command_stub` to their live HF Space MCP endpoint URLs. - Example for summarizer_v1: `"invocation_command_stub": "https://yourHFusername-mcp-summarizer-tool-gradio.hf.space/mcp"` - Add a field `"execution_type": "remote_mcp_gradio"` to these tool entries. (Keep other stub tool entries as they are for now). ``` **Task 3.2: Create/Refactor `ExecutorAgent` for Live MCP Calls** - Status: Pending - Description: In `agents/executor.py` of the main project: - Rename `StubExecutorAgent` to `McpExecutorAgent` (or create new, inherit if useful). - Modify/Create the `execute_plan_step(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]` method (renaming from `simulate_execution`). - This method should now: 1. Check `plan.tool.execution_type` (or similar field). 2. If it's `"remote_mcp_gradio"`: * Get the `mcp_endpoint_url = plan.tool.invocation_command_stub`. * Construct the MCP JSON payload: * The `data` field needs to be a list of arguments for the Gradio function. The order matters. Use `plan.prompt.input_variables` to get the names, and then map them to the `inputs` dictionary to get the values in the correct order. * Example payload: `{"data": [inputs["document_content"], inputs.get("max_length", 150)]}` (order must match Gradio fn inputs). * Make an HTTP `POST` request using `requests` library to `mcp_endpoint_url` with the JSON payload. * Handle the response: Parse JSON, extract the actual tool output from `response_json["data"][0]`. * Return a dictionary similar to before, but with `status: "success"` (if API call worked) and `tool_specific_output` containing the *real* data from the MCP server. * Include error handling for network issues, non-200 responses, or malformed JSON responses from the MCP server. 3. If `execution_type` is not `remote_mcp_gradio` (i.e., for other stub tools), it can fall back to the old simulation logic from MVP3. - Acceptance Criteria: `McpExecutorAgent` can make successful MCP calls to live Gradio servers and process their responses. Handles errors. - TDD: In `tests/agents/test_executor.py`: * Mock `requests.post` to test the MCP call logic. * Test successful response parsing. * Test error handling for network errors and bad MCP server responses. * Test the fallback to simulation for non-live tools. - Guidance for Claude / Cursor: ```cursor feat(agent): implement real MCP calls in ExecutorAgent **Objective:** Enable the ExecutorAgent to call live Gradio MCP servers. **Action 1: Modify `agents/executor.py`** 1. Open `@agents/executor.py`. 2. Rename `StubExecutorAgent` to `McpExecutorAgent`. 3. Rename `simulate_execution` to `execute_plan_step`. 4. Implement the logic as described above, focusing on constructing the correct `data` list for the MCP payload by ordering `inputs` according to `plan.prompt.input_variables`. ```python # In agents/executor.py (McpExecutorAgent) import requests import json # For payload # ... # def execute_plan_step(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]: # tool_id = plan.tool.tool_id # # Assume plan.tool has an 'execution_type' and 'invocation_command_stub' # # (These would need to be added to MCPTool dataclass and initial_tools.json) # execution_type = getattr(plan.tool, 'execution_type', 'simulated') # Default to simulated # mcp_endpoint_url = getattr(plan.tool, 'invocation_command_stub', '') # if execution_type == "remote_mcp_gradio" and mcp_endpoint_url: # print(f"Executor: Making LIVE MCP call to {mcp_endpoint_url} for tool '{plan.tool.name}'") # # Order inputs according to plan.prompt.input_variables # mcp_data_payload_list = [] # for var_name in plan.prompt.input_variables: # mcp_data_payload_list.append(inputs.get(var_name)) # Add None if missing, Gradio might handle # mcp_payload = {"data": mcp_data_payload_list} # try: # response = requests.post(mcp_endpoint_url, json=mcp_payload, timeout=30) # response.raise_for_status() # response_json = response.json() # tool_output = response_json.get("data", [f"Error: No 'data' in MCP response from {plan.tool.name}."])[0] # return { # "status": "success_live_mcp", # "tool_id_used": tool_id, "tool_name_used": plan.tool.name, # "prompt_id_used": plan.prompt.prompt_id, "prompt_name_used": plan.prompt.name, # "message": f"Successfully called live MCP tool '{plan.tool.name}'.", # "inputs_sent": mcp_data_payload_list, # Show what was actually sent # "tool_specific_output": tool_output # } # except requests.exceptions.RequestException as e: # print(f"Error calling MCP server {mcp_endpoint_url}: {e}") # return {"status": "error_live_mcp_call", "message": str(e), "tool_specific_output": None} # except (json.JSONDecodeError, KeyError, IndexError) as e: # print(f"Error parsing MCP server response from {mcp_endpoint_url}: {e}") # return {"status": "error_mcp_response_parsing", "message": str(e), "tool_specific_output": None} # else: # Fallback to simulation for other tools # print(f"Executor: Simulating execution for tool '{plan.tool.name}' (type: {execution_type})") # # ... (paste the simulation logic from MVP3's StubExecutorAgent here) ... # # You might want to refactor the simulation logic into a helper method. # return self._run_simulation(plan, inputs) # Assuming you create this helper ``` 5. Ensure `requests` is in `requirements.txt` for the main project. **Action 2: Update `tests/agents/test_executor.py`** - Rename test class/methods. - Add tests for live MCP calls using `@patch('requests.post')`. - Test successful call and response parsing. - Test different error conditions (network error, bad status code, malformed JSON response). - Keep/adapt tests for the simulation fallback. Claude, please generate the refactored `McpExecutorAgent` and its test updates, including the helper `_run_simulation` if you choose to create it. Also, remember to update the `MCPTool` dataclass in `kg_services/ontology.py` to include `execution_type: str = "simulated"` and ensure `initial_tools.json` is updated accordingly for all tools. ``` **Task 3.3: Update `app.py` to Use `McpExecutorAgent`** - Status: Pending - Description: In `app.py` (main project), change the instantiation from `StubExecutorAgent` to `McpExecutorAgent` in the global initialization block. The `handle_execute_plan` function should already be calling the correct method name (`execute_plan_step` if renamed). - Acceptance Criteria: `app.py` uses the new `McpExecutorAgent`. - Guidance for Claude / Cursor: ```cursor refactor(app): use McpExecutorAgent in main application **Objective:** Update `app.py` to utilize the new executor capable of live calls. **Action: Modify `app.py`** 1. Open `@app.py`. 2. Change `from agents.executor import StubExecutorAgent` to `from agents.executor import McpExecutorAgent`. 3. In the global service initialization block, change: `stub_executor_agent_instance = StubExecutorAgent()` to: `executor_agent_instance = McpExecutorAgent()` (and update variable name throughout). 4. Ensure `handle_execute_plan` calls `executor_agent_instance.execute_plan_step(...)`. 5. The formatting of the result in `handle_execute_plan` should still largely work, but you might want to indicate if a result was "live" vs "simulated" in the UI. ``` --- **Sprint 4 (MVP 4): End-to-End Testing with Live MCP Tools & UI Polish** * **Goal:** Thoroughly test the full flow: user query -> KG -> Planner -> Selector -> Executor -> Live MCP Tool call -> result displayed in Gradio. Polish the UI to differentiate live vs. simulated results. * **Tasks:** * **Task 4.1:** Manual E2E Testing: Test with queries that trigger your live Summarizer and Sentiment Analyzer tools. Verify inputs are correctly passed and real outputs are shown. Test fallback to simulated tools. * **Task 4.2:** UI Polish for Live Results: In `app.py`'s `handle_execute_plan`, modify the Markdown output to clearly indicate if the result came from a "Live MCP Tool Call" or a "Simulated Execution." * **Task 4.3:** Error Handling Display: Ensure errors from live MCP calls (network, server errors) are presented gracefully in the UI. --- **Sprint 5 (MVP 4): Final Documentation, Video, and Submission Prep** * **Goal:** Finalize all documentation (READMEs for main project and Track 1 tools), record demo videos, and prepare all artifacts for hackathon submission. * **Tasks:** * **Task 5.1:** Update All READMEs: Reflect MVP 4 capabilities, including live MCP tool integration and separate Track 1 submissions. Detail how the main demo calls the Track 1 tools. * **Task 5.2:** Record Demo Videos: * One video for the main Track 3 KGraph-MCP demo, showcasing the E2E flow with live tool calls. * Separate short videos for each Track 1 MCP server tool, showing them in action (can be called from your main demo or a simple client). * **Task 5.3:** Final Code Review, Cleanup, and CI Checks for all relevant repos. * **Task 5.4:** Prepare Hugging Face Spaces: Ensure all Spaces (main demo, Track 1 tools) are correctly configured with secrets, READMEs, and tags. This detailed breakdown for MVP 4 is ambitious for a hackathon but hitting even parts of it (like one live tool integration) would be very impressive. The key is to modularize the Track 1 tools so they can be developed somewhat independently. ---- Okay, this is an ambitious but very impactful MVP for the hackathon! Let's break down **Hackathon MVP 4: "Real MCP Tool Execution via Gradio Server"** into 5 manageable sprints. This MVP involves creating separate projects (the Track 1 MCP server Gradio apps) and then integrating them into your main agentic demo. **Recap of Hackathon MVP 4 Goal:** * **Goal:** Replace simulated execution with calls to actual, live MCP Tools (hosted as separate Gradio Spaces). * **Core Primitives:** Tool, Prompt, Resource (user input). * **Builds Upon:** MVP 3 (interactive prompt filling, Executor agent structure). --- **Sprint Structure for MVP 4 (Building on MVP 3 Foundation):** **Sprint 1 (MVP 4): Develop & Deploy First Track 1 MCP Server (e.g., Summarizer)** * **Goal:** Create, test, and deploy one simple MCP Tool as a Gradio app on a Hugging Face Space. This will be your first Track 1 submission. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 4 - Sprint 1: Develop & Deploy First Track 1 MCP Server (Summarizer) **Task 1.1: Create New Project/Directory for MCP Summarizer Tool** - Status: Pending - Description: Create a new, separate project directory (e.g., `mcp_summarizer_tool_gradio`). Set up a minimal Python environment (`uv venv`, `requirements.txt` with `gradio`, `gradio_mcp`, and any model-specific libraries like `transformers` or `huggingface_hub`). - Acceptance Criteria: New project directory and basic Python environment set up. - Guidance for Claude / Cursor: ```cursor chore(project): setup new project for MCP Summarizer Tool **Objective:** Initialize a new project for the Summarizer MCP Tool. **Action:** 1. Create a directory named `mcp_summarizer_tool_gradio`. 2. Inside it, create a `.python-version` file (e.g., with `3.11.8`). 3. Create `requirements.txt` with initial dependencies: ``` gradio>=4.0.0 # Check latest Gradio 4/5 version gradio_mcp # Add transformers and sentencepiece if using a local/HF summarization model # transformers # sentencepiece # torch # if needed by transformers model ``` 4. Create a basic `app.py` file stub. 5. Create a `.gitignore`. 6. (Developer will manually run `uv python install`, `uv venv`, `uv pip install -r requirements.txt`) ``` **Task 1.2: Implement Summarizer Logic** - Status: Pending - Description: In `mcp_summarizer_tool_gradio/app.py`, write the Python function that performs text summarization. - Option A (Simple): Use a Hugging Face Inference API call for summarization (utilize HF API credits). - Option B (More Control/Offline): Load a small summarization model from Hugging Face Hub (e.g., `sshleifer/distilbart-cnn-6-6` or similar lightweight one) using `transformers` pipeline. - The function should take `text_input: str` and `max_length: int` (optional) as arguments and return `summary_text: str`. - Acceptance Criteria: Summarization function works correctly and is testable. - TDD: Write a unit test for this summarization function (mocking HF API if Option A, or testing with a small model if Option B). - Guidance for Claude / Cursor: ```cursor feat(tool): implement core text summarization logic **Objective:** Create the Python function for text summarization in the new MCP tool project. **Action: Implement in `mcp_summarizer_tool_gradio/app.py`** *(Assuming Option A for speed - Hugging Face Inference API)* ```python # In mcp_summarizer_tool_gradio/app.py import os import requests # for HF Inference API from dotenv import load_dotenv # For local HF_TOKEN testing load_dotenv() # Load .env for local testing if HF_TOKEN is there HF_API_TOKEN = os.getenv("HF_TOKEN") # For HF Inference API # Recommended model for summarization via Inference API # Or choose another from https://huggingface.co/models?pipeline_tag=summarization&sort=trending SUMMARIZATION_API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn" def summarize_text(text_to_summarize: str, max_len: int = 150, min_len: int = 30) -> str: if not HF_API_TOKEN: return "Error: Hugging Face API Token (HF_TOKEN) not configured." if not text_to_summarize or not text_to_summarize.strip(): return "Error: Input text cannot be empty." headers = {"Authorization": f"Bearer {HF_API_TOKEN}"} payload = { "inputs": text_to_summarize, "parameters": {"max_length": max_len, "min_length": min_len} } try: response = requests.post(SUMMARIZATION_API_URL, headers=headers, json=payload, timeout=30) response.raise_for_status() # Raise an exception for HTTP errors result = response.json() if isinstance(result, list) and result: # API often returns a list return result[0].get("summary_text", "Error: Could not extract summary.") return "Error: Unexpected response format from summarization API." except requests.exceptions.RequestException as e: return f"Error calling summarization API: {e}" except Exception as e: return f"An unexpected error occurred: {e}" # Basic test # if __name__ == "__main__": # sample_text = "Replace with a long news article snippet for testing." # summary = summarize_text(sample_text) # print(f"Original: {sample_text[:100]}...") # print(f"Summary: {summary}") ``` Ensure `requests` and `python-dotenv` (for local testing) are in `requirements.txt`. Create `tests/test_summarizer.py` with a mocked `requests.post` to test this function. ``` **Task 1.3: Create Gradio UI and MCP Server Endpoint** - Status: Pending - Description: In `mcp_summarizer_tool_gradio/app.py`: - Create a simple Gradio interface (`gr.Interface`) for the summarizer function (inputs: text, max_length; output: text). - Use `gradio_mcp. letzten(iface, mcp_path="/mcp")` (or `gradio_mcp.patch(iface, mcp_path="/mcp")` depending on the exact `gradio_mcp` API - check its docs) to expose this Gradio app as an MCP server at the `/mcp` path. - The MCP endpoint should expect a JSON payload like `{"fn_index": 0, "data": ["text_to_summarize_value", max_length_value]}` and return `{"data": ["summary_output"]}`. - Acceptance Criteria: Gradio app runs locally. MCP endpoint is accessible (can test with `curl` or Postman locally). - Guidance for Claude / Cursor: ```cursor feat(tool): create Gradio UI and MCP endpoint for summarizer **Objective:** Wrap the summarizer function in a Gradio UI and expose it as an MCP server. **Action: Update `mcp_summarizer_tool_gradio/app.py`** ```python # In mcp_summarizer_tool_gradio/app.py # ... (summarize_text function and imports) ... import gradio as gr import gradio_mcp # Make sure this is installed: uv pip install gradio_mcp # (summarize_text function from Task 1.2) inputs = [ gr.Textbox(lines=10, label="Text to Summarize", placeholder="Enter long text here..."), gr.Number(label="Max Summary Length", value=150, minimum=20, maximum=500, step=10) ] outputs = gr.Textbox(label="Summary") iface = gr.Interface( fn=summarize_text, inputs=inputs, outputs=outputs, title="MCP Text Summarizer Tool", description="Enter text to get a summary. This app also serves as an MCP Tool.", allow_flagging="never" ) # Patch the interface to make it an MCP server # Check the latest gradio_mcp documentation for the exact patching method. # It might be `gradio_mcp.patch(iface, mcp_path="/mcp")` or similar. # For example, if using an older style or if patch doesn't exist: # iface = gradio_mcp. letzten(iface, mcp_path="/mcp") # Or equivalent new API # The most modern way, if `gradio_mcp` supports it, might be directly in launch or a wrapper. # Let's assume a common pattern: app, local_url, share_url = gradio_mcp.patch(iface, mcp_path="/mcp").launch(quiet=True, show_error=True) # Or if .launch() is separate: # patched_iface = gradio_mcp.patch(iface, mcp_path="/mcp") # if __name__ == "__main__": # patched_iface.launch() ``` *(Claude should be guided to use the correct `gradio_mcp` API based on its knowledge or by providing it a link to the `gradio_mcp` docs if it struggles).* Add `gradio_mcp` to `requirements.txt`. ``` **Task 1.4: Prepare README and Deploy to Hugging Face Space** - Status: Pending - Description: - Create a `README.md` for the `mcp_summarizer_tool_gradio` project. Include Hugging Face Space metadata, hackathon tags (`mcp-server-track`), instructions on how to use it as a Gradio app and as an MCP server (mentioning the `/mcp` path and expected payload). - Deploy this Gradio app to a new Hugging Face Space. - Set the `HF_TOKEN` as a secret in the Space if using HF Inference API. - Test the deployed Space UI and its MCP endpoint. - Acceptance Criteria: Summarizer MCP Tool is live on a Hugging Face Space and functional. README is complete. - Guidance for Claude / Cursor: ```cursor docs(tool): prepare README and deploy summarizer tool to HF Space **Objective:** Document and deploy the MCP Summarizer Tool. **Action 1: Create `mcp_summarizer_tool_gradio/README.md`** Generate a README.md for this specific tool's Space. Include: - YAML frontmatter for Hugging Face Space (title, emoji, sdk, app_file, python_version, tags including `mcp-server-track`). - Description of the tool. - How to use the Gradio UI. - **How to use as an MCP Server:** - Endpoint: `/mcp` - Method: `POST` - Expected Payload: `{ "data": ["", ] }` (fn_index usually 0 for simple interfaces) - Expected Response: `{ "data": [""] }` - Mention of sponsor credits used (Hugging Face API). **Action 2: (Developer Task) Deploy to Hugging Face Space** - Create a new Space on Hugging Face. - Push the `mcp_summarizer_tool_gradio` project files to it. - Set `HF_TOKEN` as a Space secret if using the Inference API. - Test the deployed Space. ``` --- *The subsequent sprints for MVP 4 will follow a similar pattern for the second tool (e.g., Sentiment Analyzer), then focus on integrating these live tools into the main KGraph-MCP agent application.* **Sprint 2 (MVP 4): Develop & Deploy Second Track 1 MCP Server (e.g., Sentiment Analyzer)** * **Goal:** Create, test, and deploy a second simple MCP Tool (e.g., Sentiment Analyzer) as another Gradio app on a separate Hugging Face Space. This will be your second Track 1 submission. * **Tasks:** * **Task 2.1:** Create New Project/Directory for MCP Sentiment Analyzer Tool (similar to Task 1.1). * **Task 2.2:** Implement Sentiment Analysis Logic (e.g., using HF Inference API for a sentiment model, or a `transformers` pipeline). (Similar to Task 1.2, with unit tests). * **Task 2.3:** Create Gradio UI and MCP Server Endpoint for Sentiment Analyzer. (Similar to Task 1.3). * **Task 2.4:** Prepare README and Deploy Sentiment Analyzer to a new Hugging Face Space. (Similar to Task 1.4). --- **Sprint 3 (MVP 4): Update Main KG & Executor Agent for Real MCP Calls** * **Goal:** Modify the main KGraph-MCP application: update the `InMemoryKG` to point to the live Space URLs for the two new MCP servers, and refactor the `ExecutorAgent` (renaming `StubExecutorAgent` or creating a new `RealExecutorAgent`) to make actual HTTP POST requests to these live MCP endpoints. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 4 - Sprint 3: Update Main KG & Executor Agent for Real MCP Calls **Task 3.1: Update `data/initial_tools.json` in Main Project** - Status: Pending - Description: In your main `KGraph-MCP-Hackathon` project, edit `data/initial_tools.json`. - For the "summarizer_v1" and "sentiment_analyzer_v1" tools (and any others you've made live): - Update their `invocation_command_stub` field to store the full live Hugging Face Space URL for their `/mcp` endpoint (e.g., `https://username-space-name.hf.space/mcp`). - Potentially add a new field like `mcp_server_type: "live_gradio"` to distinguish from future local Docker tools. - Acceptance Criteria: `initial_tools.json` updated with live MCP server URLs. - Guidance for Claude / Cursor: ```cursor chore(kg): update tool metadata with live MCP server URLs **Objective:** Point the main project's KG to the deployed Track 1 MCP servers. **Action: Modify `KGraph-MCP-Hackathon/data/initial_tools.json`** Update the entries for "summarizer_v1" and "sentiment_analyzer_v1": - Change `invocation_command_stub` to their live HF Space MCP endpoint URLs. - Example for summarizer_v1: `"invocation_command_stub": "https://yourHFusername-mcp-summarizer-tool-gradio.hf.space/mcp"` - Add a field `"execution_type": "remote_mcp_gradio"` to these tool entries. (Keep other stub tool entries as they are for now). ``` **Task 3.2: Create/Refactor `ExecutorAgent` for Live MCP Calls** - Status: Pending - Description: In `agents/executor.py` of the main project: - Rename `StubExecutorAgent` to `McpExecutorAgent` (or create new, inherit if useful). - Modify/Create the `execute_plan_step(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]` method (renaming from `simulate_execution`). - This method should now: 1. Check `plan.tool.execution_type` (or similar field). 2. If it's `"remote_mcp_gradio"`: * Get the `mcp_endpoint_url = plan.tool.invocation_command_stub`. * Construct the MCP JSON payload: * The `data` field needs to be a list of arguments for the Gradio function. The order matters. Use `plan.prompt.input_variables` to get the names, and then map them to the `inputs` dictionary to get the values in the correct order. * Example payload: `{"data": [inputs["document_content"], inputs.get("max_length", 150)]}` (order must match Gradio fn inputs). * Make an HTTP `POST` request using `requests` library to `mcp_endpoint_url` with the JSON payload. * Handle the response: Parse JSON, extract the actual tool output from `response_json["data"][0]`. * Return a dictionary similar to before, but with `status: "success"` (if API call worked) and `tool_specific_output` containing the *real* data from the MCP server. * Include error handling for network issues, non-200 responses, or malformed JSON responses from the MCP server. 3. If `execution_type` is not `remote_mcp_gradio` (i.e., for other stub tools), it can fall back to the old simulation logic from MVP3. - Acceptance Criteria: `McpExecutorAgent` can make successful MCP calls to live Gradio servers and process their responses. Handles errors. - TDD: In `tests/agents/test_executor.py`: * Mock `requests.post` to test the MCP call logic. * Test successful response parsing. * Test error handling for network errors and bad MCP server responses. * Test the fallback to simulation for non-live tools. - Guidance for Claude / Cursor: ```cursor feat(agent): implement real MCP calls in ExecutorAgent **Objective:** Enable the ExecutorAgent to call live Gradio MCP servers. **Action 1: Modify `agents/executor.py`** 1. Open `@agents/executor.py`. 2. Rename `StubExecutorAgent` to `McpExecutorAgent`. 3. Rename `simulate_execution` to `execute_plan_step`. 4. Implement the logic as described above, focusing on constructing the correct `data` list for the MCP payload by ordering `inputs` according to `plan.prompt.input_variables`. ```python # In agents/executor.py (McpExecutorAgent) import requests import json # For payload # ... # def execute_plan_step(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]: # tool_id = plan.tool.tool_id # # Assume plan.tool has an 'execution_type' and 'invocation_command_stub' # # (These would need to be added to MCPTool dataclass and initial_tools.json) # execution_type = getattr(plan.tool, 'execution_type', 'simulated') # Default to simulated # mcp_endpoint_url = getattr(plan.tool, 'invocation_command_stub', '') # if execution_type == "remote_mcp_gradio" and mcp_endpoint_url: # print(f"Executor: Making LIVE MCP call to {mcp_endpoint_url} for tool '{plan.tool.name}'") # # Order inputs according to plan.prompt.input_variables # mcp_data_payload_list = [] # for var_name in plan.prompt.input_variables: # mcp_data_payload_list.append(inputs.get(var_name)) # Add None if missing, Gradio might handle # mcp_payload = {"data": mcp_data_payload_list} # try: # response = requests.post(mcp_endpoint_url, json=mcp_payload, timeout=30) # response.raise_for_status() # response_json = response.json() # tool_output = response_json.get("data", [f"Error: No 'data' in MCP response from {plan.tool.name}."])[0] # return { # "status": "success_live_mcp", # "tool_id_used": tool_id, "tool_name_used": plan.tool.name, # "prompt_id_used": plan.prompt.prompt_id, "prompt_name_used": plan.prompt.name, # "message": f"Successfully called live MCP tool '{plan.tool.name}'.", # "inputs_sent": mcp_data_payload_list, # Show what was actually sent # "tool_specific_output": tool_output # } # except requests.exceptions.RequestException as e: # print(f"Error calling MCP server {mcp_endpoint_url}: {e}") # return {"status": "error_live_mcp_call", "message": str(e), "tool_specific_output": None} # except (json.JSONDecodeError, KeyError, IndexError) as e: # print(f"Error parsing MCP server response from {mcp_endpoint_url}: {e}") # return {"status": "error_mcp_response_parsing", "message": str(e), "tool_specific_output": None} # else: # Fallback to simulation for other tools # print(f"Executor: Simulating execution for tool '{plan.tool.name}' (type: {execution_type})") # # ... (paste the simulation logic from MVP3's StubExecutorAgent here) ... # # You might want to refactor the simulation logic into a helper method. # return self._run_simulation(plan, inputs) # Assuming you create this helper ``` 5. Ensure `requests` is in `requirements.txt` for the main project. **Action 2: Update `tests/agents/test_executor.py`** - Rename test class/methods. - Add tests for live MCP calls using `@patch('requests.post')`. - Test successful call and response parsing. - Test different error conditions (network error, bad status code, malformed JSON response). - Keep/adapt tests for the simulation fallback. Claude, please generate the refactored `McpExecutorAgent` and its test updates, including the helper `_run_simulation` if you choose to create it. Also, remember to update the `MCPTool` dataclass in `kg_services/ontology.py` to include `execution_type: str = "simulated"` and ensure `initial_tools.json` is updated accordingly for all tools. ``` **Task 3.3: Update `app.py` to Use `McpExecutorAgent`** - Status: Pending - Description: In `app.py` (main project), change the instantiation from `StubExecutorAgent` to `McpExecutorAgent` in the global initialization block. The `handle_execute_plan` function should already be calling the correct method name (`execute_plan_step` if renamed). - Acceptance Criteria: `app.py` uses the new `McpExecutorAgent`. - Guidance for Claude / Cursor: ```cursor refactor(app): use McpExecutorAgent in main application **Objective:** Update `app.py` to utilize the new executor capable of live calls. **Action: Modify `app.py`** 1. Open `@app.py`. 2. Change `from agents.executor import StubExecutorAgent` to `from agents.executor import McpExecutorAgent`. 3. In the global service initialization block, change: `stub_executor_agent_instance = StubExecutorAgent()` to: `executor_agent_instance = McpExecutorAgent()` (and update variable name throughout). 4. Ensure `handle_execute_plan` calls `executor_agent_instance.execute_plan_step(...)`. 5. The formatting of the result in `handle_execute_plan` should still largely work, but you might want to indicate if a result was "live" vs "simulated" in the UI. ``` --- **Sprint 4 (MVP 4): End-to-End Testing with Live MCP Tools & UI Polish** * **Goal:** Thoroughly test the full flow: user query -> KG -> Planner -> Selector -> Executor -> Live MCP Tool call -> result displayed in Gradio. Polish the UI to differentiate live vs. simulated results. * **Tasks:** * **Task 4.1:** Manual E2E Testing: Test with queries that trigger your live Summarizer and Sentiment Analyzer tools. Verify inputs are correctly passed and real outputs are shown. Test fallback to simulated tools. * **Task 4.2:** UI Polish for Live Results: In `app.py`'s `handle_execute_plan`, modify the Markdown output to clearly indicate if the result came from a "Live MCP Tool Call" or a "Simulated Execution." * **Task 4.3:** Error Handling Display: Ensure errors from live MCP calls (network, server errors) are presented gracefully in the UI. --- **Sprint 5 (MVP 4): Final Documentation, Video, and Submission Prep** * **Goal:** Finalize all documentation (READMEs for main project and Track 1 tools), record demo videos, and prepare all artifacts for hackathon submission. * **Tasks:** * **Task 5.1:** Update All READMEs: Reflect MVP 4 capabilities, including live MCP tool integration and separate Track 1 submissions. Detail how the main demo calls the Track 1 tools. * **Task 5.2:** Record Demo Videos: * One video for the main Track 3 KGraph-MCP demo, showcasing the E2E flow with live tool calls. * Separate short videos for each Track 1 MCP server tool, showing them in action (can be called from your main demo or a simple client). * **Task 5.3:** Final Code Review, Cleanup, and CI Checks for all relevant repos. * **Task 5.4:** Prepare Hugging Face Spaces: Ensure all Spaces (main demo, Track 1 tools) are correctly configured with secrets, READMEs, and tags. This detailed breakdown for MVP 4 is ambitious for a hackathon but hitting even parts of it (like one live tool integration) would be very impressive. The key is to modularize the Track 1 tools so they can be developed somewhat independently. ---- Okay, here's a full and comprehensive plan for **MVP 4 - Sprint 1: "Develop & Deploy First Track 1 MCP Server (Summarizer)"**, optimized for Claude 4.0 in Cursor IDE. This plan treats the new MCP server as a mini-project within the larger hackathon effort. **Sprint Goal (MVP 4 - Sprint 1):** Successfully create, unit test, document, and deploy a fully functional Text Summarizer MCP Tool as a Gradio application on a dedicated Hugging Face Space. This will serve as a Track 1 submission and a live endpoint for later integration into the main KGraph-MCP agent. **Assumptions for Claude 4.0 in Cursor IDE:** * You have a Hugging Face account and are familiar with creating Spaces. * You have an `HF_TOKEN` with permissions to use the Inference API (or will use one of your other LLM API credits if building summarization differently). * `uv`, Python, Git are set up. * `.cursor/rules/python_gradio_basic.mdc` can be copied or referenced for this new sub-project. * Conventional Commits will be used for this sub-project as well. --- **Task List for MVP 4 - Sprint 1 - Cursor IDE / Claude Focus:** *(Each task implies: setting up a new project context for Claude if needed, writing code, testing, linting, formatting, type-checking with `uv` and `just` (or direct commands), and committing frequently within this new tool's repository.)* **Task 1.1: Initialize New Summarizer Tool Project (`mcp_summarizer_tool_gradio`)** * **Status:** Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 1 * **Description:** Create a new, separate Git repository and project directory for the MCP Summarizer Tool. Set up the basic Python environment using `uv`, define initial dependencies, and create necessary config files. * **Acceptance Criteria:** 1. New Git repository created (e.g., on GitHub, named `mcp-summarizer-tool`). 2. Local project directory `mcp_summarizer_tool_gradio` cloned and structured. 3. `.python-version`, `requirements.txt`, `requirements-dev.txt`, `.gitignore`, `app.py` (stub), `README.md` (stub), `LICENSE` created. 4. `uv venv .venv` and `uv pip install -r requirements-dev.txt` (which includes `uv`) successfully sets up the environment. 5. Basic `pyproject.toml` for linters (Ruff, Black, Mypy) and `pytest.ini` are set up. * **Guidance for Claude / Cursor:** ```cursor chore(project): initialize MCP Summarizer Tool project **Objective:** Create a clean, new project structure and environment for the standalone MCP Summarizer Tool. **Developer Actions (Manual - Outside Claude's direct file creation for a *new* repo):** 1. Create a new repository on GitHub named `mcp-summarizer-tool`. 2. Clone it locally: `git clone mcp_summarizer_tool_gradio` 3. `cd mcp_summarizer_tool_gradio` **Claude, please generate the content for the following files within this new project:** **1. File: `.python-version`** ``` 3.11.8 ``` **2. File: `requirements.txt`** ``` gradio>=4.20.0 # Or check latest stable Gradio 4/5 gradio_mcp>=0.1.0 # Or check latest requests # For HF Inference API # Add other runtime deps as needed, e.g., transformers, torch if using local models ``` **3. File: `requirements-dev.txt`** ``` uv pytest pytest-mock # For mocking API calls ruff black mypy python-dotenv # For local .env file handling for HF_TOKEN ``` **4. File: `.gitignore`** (Standard Python .gitignore) ``` .venv/ __pycache__/ *.pyc *.pyo *.pyd .DS_Store *.egg-info/ dist/ build/ *.log .env .pytest_cache/ .mypy_cache/ ``` **5. File: `app.py` (Initial stub)** ```python # Placeholder for Gradio app and MCP tool logic print("MCP Summarizer Tool app.py loaded") ``` **6. File: `README.md` (Initial stub)** ```markdown # MCP Text Summarizer Tool (Hackathon) A Gradio app and MCP server for text summarization. (More details to be added) ``` **7. File: `LICENSE`** (e.g., MIT License content) ``` MIT License Copyright (c) [Year] [Your Name/Team Name] Permission is hereby granted, free of charge, to any person obtaining a copy # ... (rest of MIT license text) ... ``` **8. File: `pyproject.toml`** (Basic linting/testing config) ```toml [tool.ruff] line-length = 88 select = ["E", "F", "W", "I", "UP", "N", "B", "C4", "A", "RUF"] fixable = ["ALL"] [tool.ruff.format] quote-style = "double" [tool.black] line-length = 88 [tool.mypy] python_version = "3.11" warn_return_any = true warn_unused_configs = true ignore_missing_imports = true # Can be stricter later [tool.pytest.ini_options] pythonpath = ["."] testpaths = ["tests"] ``` **9. Create empty directory: `tests/`** **Developer Actions (Manual - After Claude generates files):** 1. `uv python install $(cat .python-version)` 2. `uv venv .venv --python $(cat .python-version)` 3. `source .venv/bin/activate` 4. `uv pip install -r requirements-dev.txt -r requirements.txt` 5. `uv pip compile requirements.txt requirements-dev.txt --all-extras -o requirements.lock` 6. `git add . && git commit -m "chore: initial project setup for mcp summarizer tool"` 7. (Optional) Create a `justfile` for this sub-project too. ``` **Task 1.2: Implement Core Summarization Logic & Unit Tests** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 1 * **Description & Acceptance Criteria:** As per your original Task 1.2 (choosing HF Inference API for speed). * **Detailed Guidance for Claude / Cursor:** ```cursor feat(tool): implement HF Inference API summarization and tests **Objective:** Create the `summarize_text` function using Hugging Face Inference API and write unit tests for it. **Action 1: Implement `summarize_text` in `mcp_summarizer_tool_gradio/app.py`** 1. Open `@mcp_summarizer_tool_gradio/app.py`. 2. Add the following Python code: ```python import os import requests from typing import Union # For type hint from dotenv import load_dotenv # Load .env for local testing, especially for HF_TOKEN # In a Hugging Face Space, HF_TOKEN should be set as a secret. load_dotenv() HF_API_TOKEN = os.getenv("HF_TOKEN") # Using a common, effective summarization model available via Inference API # Consider facebook/bart-large-cnn or sshleifer/distilbart-cnn-12-6 SUMMARIZATION_API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn" def summarize_text(text_to_summarize: str, max_length: int = 150, min_length: int = 30) -> str: """ Summarizes text using the Hugging Face Inference API. Returns the summary string or an error message string. """ if not HF_API_TOKEN: error_msg = "Error: Hugging Face API Token (HF_TOKEN) is not configured." print(error_msg) return error_msg if not text_to_summarize or not text_to_summarize.strip(): error_msg = "Error: Input text for summarization cannot be empty." print(error_msg) return error_msg headers = {"Authorization": f"Bearer {HF_API_TOKEN}"} # API expects "inputs", not "text_to_summarize" for this model payload = { "inputs": text_to_summarize, "parameters": {"max_length": max_length, "min_length": min_length, "do_sample": False} } print(f"Calling HF Summarization API for text: {text_to_summarize[:50]}...") try: response = requests.post(SUMMARIZATION_API_URL, headers=headers, json=payload, timeout=45) # Increased timeout response.raise_for_status() # Raise an HTTPError for bad responses (4XX or 5XX) result = response.json() # Typical successful response is a list with one dictionary: [{"summary_text": "..."}] if isinstance(result, list) and result and "summary_text" in result[0]: summary = result[0]["summary_text"] print(f"API success, summary: {summary[:50]}...") return summary else: error_msg = f"Error: Unexpected response format from summarization API. Response: {result}" print(error_msg) return error_msg except requests.exceptions.HTTPError as http_err: error_msg = f"Error calling summarization API (HTTP {http_err.response.status_code}): {http_err.response.text}" print(error_msg) return error_msg except requests.exceptions.RequestException as req_err: error_msg = f"Error calling summarization API (Request Exception): {req_err}" print(error_msg) return error_msg except Exception as e: error_msg = f"An unexpected error occurred during summarization: {e}" print(error_msg) return error_msg ``` 3. Apply coding standards (Ruff/Black via `just format` if `justfile` is set up for this subproject). **Action 2: Create Unit Tests in `mcp_summarizer_tool_gradio/tests/test_summarizer.py`** 1. Create the file `@mcp_summarizer_tool_gradio/tests/test_summarizer.py`. 2. Implement unit tests for `summarize_text`: ```python import pytest from unittest.mock import patch, Mock import requests # Import for requests.exceptions from app import summarize_text # Assuming app.py is in the root of mcp_summarizer_tool_gradio @pytest.fixture def mock_env_hf_token(monkeypatch): monkeypatch.setenv("HF_TOKEN", "test_hf_token") def test_summarize_text_success(mock_env_hf_token, mocker): mock_response = Mock() mock_response.status_code = 200 mock_response.json.return_value = [{"summary_text": "This is a test summary."}] mocker.patch("requests.post", return_value=mock_response) summary = summarize_text("Some long input text.", max_length=50, min_length=10) assert summary == "This is a test summary." requests.post.assert_called_once() # Check if it was called def test_summarize_text_api_error_format(mock_env_hf_token, mocker): mock_response = Mock() mock_response.status_code = 200 mock_response.json.return_value = {"error": "Some API error"} # Unexpected format mocker.patch("requests.post", return_value=mock_response) summary = summarize_text("Text here.") assert "Error: Unexpected response format" in summary def test_summarize_text_http_error(mock_env_hf_token, mocker): mock_response = Mock() mock_response.status_code = 500 mock_response.text = "Internal Server Error" mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError(response=mock_response) mocker.patch("requests.post", return_value=mock_response) summary = summarize_text("Text causing server error.") assert "Error calling summarization API (HTTP 500)" in summary assert "Internal Server Error" in summary def test_summarize_text_request_exception(mock_env_hf_token, mocker): mocker.patch("requests.post", side_effect=requests.exceptions.Timeout("Test timeout")) summary = summarize_text("Text leading to timeout.") assert "Error calling summarization API (Request Exception): Test timeout" in summary def test_summarize_text_no_hf_token(monkeypatch): monkeypatch.delenv("HF_TOKEN", raising=False) summary = summarize_text("Some text.") assert "Error: Hugging Face API Token (HF_TOKEN) not configured." in summary def test_summarize_text_empty_input_text(): # HF_TOKEN needs to be set for this test to pass the initial check if mock_env_hf_token isn't used # For isolated test, mock it or ensure it's set summary = summarize_text(" ") # Empty or whitespace assert "Error: Input text for summarization cannot be empty." in summary ``` Please generate these files and their contents. Ensure `requests` and `python-dotenv` are added to `mcp_summarizer_tool_gradio/requirements.txt`, and `pytest`, `pytest-mock` to `requirements-dev.txt`. ``` **Task 1.3: Create Gradio UI and MCP Server Endpoint for Summarizer** * Status: Pending * **Description & Acceptance Criteria:** As per your original Task 1.3. The key is to use the correct `gradio_mcp` API. * **Detailed Guidance for Claude / Cursor:** ```cursor feat(tool): implement Gradio UI and MCP endpoint for summarizer **Objective:** Wrap the `summarize_text` function in a Gradio UI and make it MCP-callable. **Action: Update `mcp_summarizer_tool_gradio/app.py`** 1. Open `@mcp_summarizer_tool_gradio/app.py`. 2. Add the Gradio interface code below the `summarize_text` function. ```python # ... (imports and summarize_text function) ... import gradio as gr import gradio_mcp # Ensure this is in requirements.txt TITLE = "MCP Text Summarizer Tool (via HF Inference API)" DESCRIPTION = """ Enter a piece of text and desired summary length parameters. This application uses the Hugging Face Inference API to generate a summary. It also serves as an MCP Tool at the `/mcp` endpoint. MCP Payload Example: `{"data": ["", , ]}` MCP Response Example: `{"data": [""]}` """ # Define Gradio inputs text_input_component = gr.Textbox( lines=15, label="Text to Summarize", placeholder="Paste or type your long text here..." ) max_len_component = gr.Number( label="Maximum Summary Length (tokens)", value=150, minimum=20, maximum=1024, # BART model limit step=10 ) min_len_component = gr.Number( label="Minimum Summary Length (tokens)", value=30, minimum=10, maximum=200, step=5 ) # Define Gradio output summary_output_component = gr.Textbox(label="Generated Summary", lines=10) # Create Gradio Interface iface = gr.Interface( fn=summarize_text, # Your summarization function inputs=[text_input_component, max_len_component, min_len_component], outputs=summary_output_component, title=TITLE, description=DESCRIPTION, allow_flagging="never", examples=[ ["Replace this with a very long example text, perhaps a short news article snippet or a paragraph from Wikipedia, to showcase the summarization. Keep it under a few hundred words for a quick demo.", 100, 25], ["Another example text, maybe focusing on a different topic. The Model Context Protocol (MCP) is an open standard that allows AI assistants like Claude to interact with external systems, such as Azure DevOps, through standardized interfaces. By implementing an MCP server, you can bridge the gap between Cursor IDE and Azure DevOps, enabling seamless AI-driven operations.", 50, 15] ] ) # Patch the interface to make it an MCP server # The exact API for gradio_mcp might vary. Common usage is: # patched_iface = gradio_mcp.patch(iface, mcp_path="/mcp") # if __name__ == "__main__": # patched_iface.launch() # Or, if .launch() returns the app for patching: # app, local_url, share_url = gradio_mcp.patch(iface, mcp_path="/mcp").launch() # Let's assume the following is a robust way for modern Gradio/gradio_mcp: # Wrap in a function to be called by if __name__ == "__main__": def run_app(): # Check latest gradio_mcp docs. `gradio_mcp.show_valid_mcp_requests(iface)` can be useful. # `gradio_mcp.MCPHelper(iface).run_mcp_server()` is another pattern, or using FastAPI. # For a simple launch that also serves MCP: # The `.launch()` method itself might be patched or `gradio_mcp` provides a main. # A common approach for self-contained app.py: # iface.mcp = True # some libraries might look for this # iface.mcp_path = "/mcp" # gradio_mcp.patch(iface, mcp_path="/mcp") # ensure it's patched # iface.launch() # More explicit control if using FastAPI with gradio_mcp # For a simple hackathon Space, patching and launching is common. # Let's use the one that typically works for Spaces. # gradio_mcp.patch() often returns the patched interface or app. # This needs verification with current gradio_mcp version. # A simpler way that might work for a Space, assuming gradio_mcp patches globally for a launch: print("Patching Gradio app for MCP...") gradio_mcp.patch(iface, mcp_path="/mcp") # This might modify iface in place or return a new one print(f"MCP endpoint should be available at /mcp") iface.launch() if __name__ == "__main__": print("Starting MCP Summarizer Tool Gradio App...") run_app() ``` 3. Add `gradio_mcp` to `@mcp_summarizer_tool_gradio/requirements.txt`. 4. Apply coding standards. *(Developer will manually test locally: `python app.py` and try UI. Then `curl -X POST -H "Content-Type: application/json" -d '{"data": ["Some very long text...", 100, 30]}' http://127.0.0.1:7860/mcp` (port may vary) to test MCP endpoint).* ``` **Task 1.4: Prepare Tool README, Deploy to Hugging Face Space, & Test Live** * Status: Pending * **Description & Acceptance Criteria:** As per your original Task 1.4. * **Detailed Guidance for Claude / Cursor:** ```cursor docs(tool): create README and prepare for HF Space deployment (Summarizer) **Objective:** Document the MCP Summarizer Tool for its Hugging Face Space and prepare for deployment. **Action 1: Create/Populate `mcp_summarizer_tool_gradio/README.md`** 1. Open or create `@mcp_summarizer_tool_gradio/README.md`. 2. Add YAML frontmatter for Hugging Face Space: ```yaml --- title: MCP Text Summarizer Tool emoji: ๐Ÿ“โœ‚๏ธ colorFrom: blue colorTo: green sdk: gradio sdk_version: "4.25.0" # Replace with your actual Gradio version python_version: "3.11" # Replace with your actual Python version app_file: app.py pinned: false hf_oauth: false hf_storage: false hf_cookies: false models: - facebook/bart-large-cnn # Model used via Inference API tags: - "agents-mcp-hackathon" - "mcp-server-track" - "mcp" - "summarization" - "nlp" - "gradio" --- ``` 3. Add sections to the README: * **MCP Text Summarizer Tool** (Brief overview). * **How to Use the Gradio UI** (Simple instructions for the web interface). * **How to Use as an MCP Server (for Agents)** * **Endpoint:** `/mcp` (relative to Space URL) * **Method:** `POST` * **Request Payload Example:** ```json { "data": [ "", , ] } ``` *(Note: `fn_index` is often implicitly 0 for simple Gradio interfaces when called via MCP. Clarify if your `gradio_mcp` version requires it explicitly).* * **Success Response Example:** ```json { "data": [""], "is_generating": false, // Standard Gradio response fields "duration": 0.852, // ... "average_duration": 0.910 // ... } ``` * **Error Response Example (from tool logic):** ```json { "data": ["Error: Input text for summarization cannot be empty."], // ... (other Gradio fields) ... } ``` * **Sponsor Technologies Used:** "This tool utilizes the Hugging Face Inference API (leveraging sponsor API credits) for its summarization capabilities." * **Repository:** Link to its GitHub repo. **Action 2: (Developer Task) Final Local Checks & Deployment to Hugging Face Space** 1. Run `just lint`, `format`, `type-check`, `test` in `mcp_summarizer_tool_gradio`. 2. Create a new Hugging Face Space (SDK: Gradio). 3. Set `HF_TOKEN` as a Space Secret. 4. Push the `mcp_summarizer_tool_gradio` project files (including `requirements.txt`, `app.py`, `README.md`) to the Space repository. 5. Monitor the Space build logs. 6. Once live, test the Gradio UI on the Space. 7. Test the MCP endpoint of the live Space using `curl` or Postman. Please generate the content for the `mcp_summarizer_tool_gradio/README.md` file. ``` **Task 1.5: Sprint Wrap-up & Review (Summarizer Tool)** * Status: Pending * **Description:** 1. Ensure all code for `mcp_summarizer_tool_gradio` is committed with Conventional Commit messages and pushed to its own repository. 2. Verify the deployed Hugging Face Space for the Summarizer Tool is working correctly (both UI and MCP endpoint). 3. Document the live Space URL. * **Acceptance Criteria:** First Track 1 MCP server is developed, tested, documented, and deployed. Ready for integration. --- **End of MVP 4 - Sprint 1 Review:** * **What's Done:** * A standalone MCP Summarizer Tool has been created as a Gradio app. * It uses the Hugging Face Inference API for summarization. * It's unit-tested (with mocks for the API). * It's documented with a Space-specific README. * It's deployed to a Hugging Face Space and its live MCP endpoint is functional. * This forms your first Track 1 submission. * **What's Next (MVP 4 - Sprint 2):** * Develop and deploy the second Track 1 MCP server (e.g., Sentiment Analyzer), following a similar set of tasks. * **Blockers/Issues:** (Note any issues with `gradio_mcp` API, HF Inference API rate limits/errors, or Space deployment). This sprint is a significant step as it produces a tangible, deployable artifact and fully exercises the "MCP Tool / Server" track requirements for one tool. --- Okay, this is a great strategic checkpoint. Before diving into more MVP sprints, establishing a comprehensive and optimized set of Cursor rules tailored for Claude 4.0 and your specific project (KGraph-MCP with its 5 MVPs) will significantly enhance development consistency, code quality, and AI-assistance effectiveness. I'll go through the essence of your 5 MVPs to identify recurring patterns, key technologies, and architectural principles that should be encoded into these rules. **Analysis of MVP Requirements for Rule Generation:** * **MVP 1 (KG-Powered Tool Suggester):** * Python dataclasses (`MCPTool`). * JSON data handling (`initial_tools.json`). * In-memory KG logic (dictionaries, lists). * Vector embeddings (LLM API calls via `EmbeddingService`). * NumPy for cosine similarity. * Basic Gradio UI (`app.py`). * Testing with `pytest` and `unittest.mock`. * **MVP 2 (KG Suggests Actionable Tool with Prompt Template):** * New dataclass (`MCPPrompt`, `PlannedStep`). * More JSON data (`initial_prompts.json`). * Enhanced `InMemoryKG` (prompt loading, indexing, search). * Enhanced `SimplePlannerAgent` (tool + prompt selection). * Gradio UI updates to display richer plan info. * **MVP 3 (Interactive Prompt Filling & Simulated Execution):** * Dynamic Gradio UI component generation/updates. * User input collection from Gradio. * `StubExecutorAgent` with mocked, tool-specific responses. * Formatting complex data for Gradio Markdown/JSON display. * **MVP 4 (Real MCP Tool Execution via Gradio Server):** * **New Sub-Projects (Track 1 Tools):** Each a self-contained Gradio app with `gradio_mcp`. Likely to use `requests` for external API calls (e.g., HF Inference). * Main Project: `McpExecutorAgent` making HTTP `POST` requests to live MCP servers. JSON payload construction and response parsing. Error handling for network calls. * Updating `MCPTool` ontology to include `execution_type` and `mcp_endpoint_url`. * **MVP 5 (Dynamic Reasoning via Simulated Sampling - Conceptual):** * (If reached) Agent logic making direct LLM calls (simulating MCP Sampling) for dynamic reasoning/refinement. * Formatting complex MCP `sampling/createMessage` like JSON structures. **Common Themes & Technologies Across MVPs:** * **Core Language:** Python (heavily using dataclasses, type hints). * **Primary UI:** Gradio. * **KG/Data:** JSON for data definition, Python dicts/lists for in-memory representation. NumPy for vector math. * **LLM Interaction:** `openai` or `anthropic` client libraries for embeddings and (later) chat completions. `requests` for general HTTP calls (to MCP servers or other APIs). * **Testing:** `pytest`, `unittest.mock`. * **Tooling:** `uv`, `just`, Conventional Commits. * **Project Structure:** Modular (`kg_services/`, `agents/`, `schemas/`, `data/`, `tests/`). * **Error Handling:** Essential for API calls and data processing. * **Logging/Printing:** For observability. --- **Comprehensive & Optimized Cursor Rules (`.cursor/rules/`) for KGraph-MCP Project** This set of rules is designed to be modular and hierarchical, leveraging Cursor's `.mdc` format with YAML frontmatter. **Directory Structure for Rules:** ``` .cursor/ โ””โ”€โ”€ rules/ โ”œโ”€โ”€ general_python.mdc # Base Python rules โ”œโ”€โ”€ data_handling.mdc # JSON, dataclasses, KG structures โ”œโ”€โ”€ llm_interaction.mdc # Embedding, API calls, (future) sampling โ”œโ”€โ”€ gradio_ui.mdc # Gradio specific best practices โ”œโ”€โ”€ agent_logic.mdc # Rules for Planner, Executor etc. โ”œโ”€โ”€ testing_pytest.mdc # Pytest and mocking conventions โ”œโ”€โ”€ mcp_specific/ # Sub-directory for MCP server rules โ”‚ โ””โ”€โ”€ gradio_mcp_server.mdc โ””โ”€โ”€ project_specific.mdc # Overarching KGraph-MCP conventions ``` --- **Content of `.mdc` Rule Files:** **1. `general_python.mdc`** ```markdown --- description: Core Python coding standards, formatting, and type hinting for the KGraph-MCP project. globs: ["**/*.py"] alwaysApply: true --- ### Formatting & Style - Adhere strictly to PEP 8 guidelines for code layout. - Use **Black** for automated code formatting. Max line length: 88 characters. - Use double quotes for strings, unless single quotes improve readability (e.g., in f-strings containing double quotes). - Employ f-strings for string formatting where possible. ### Type Hinting (PEP 484) - **Mandatory Type Hints:** All function/method parameters and return types MUST be type hinted. - Use types from the `typing` module (e.g., `List`, `Dict`, `Optional`, `Any`, `Tuple`, `Callable`). - For complex types or dataclasses, ensure they are correctly imported and referenced. - Variables should also be type hinted where it improves clarity, especially for complex data structures. - Configure Mypy for strict checking (see `pyproject.toml`). Aim for no Mypy errors. ### Docstrings & Comments - **PEP 257 for Docstrings:** All public modules, classes, functions, and methods MUST have docstrings. - Use reStructuredText or Google-style docstrings for clarity. - First line: concise summary. - Subsequent paragraphs: more detail, Args, Returns, Raises sections if applicable. - **Comments:** Use comments to explain *why* something is done, not *what* is being done (if the code is self-explanatory). Avoid obvious comments. ### Imports - Order imports: standard library, then third-party, then local application imports, each group alphabetized. - Use absolute imports for local modules where possible (e.g., `from kg_services.ontology import MCPTool`). - Avoid `from module import *`. ### General Best Practices - **Readability:** Prioritize clear, readable code. Write code for humans first. - **Simplicity (KISS):** Prefer simple solutions over complex ones unless absolutely necessary. - **Modularity:** Keep functions and classes small and focused on a single responsibility (SRP). - **Error Handling:** Use `try-except` blocks for operations that might fail (I/O, API calls). Be specific about exceptions caught. Log errors appropriately. - **Constants:** Define constants at the module level using `UPPER_SNAKE_CASE`. - **Environment Variables:** Access secrets and configurations (like API keys) ONLY through environment variables (e.g., `os.getenv()`). Use `.env.example` and `python-dotenv` for local development. Never commit `.env` files. - **Avoid Magic Numbers/Strings:** Define them as constants. - **Logging:** Prefer the `logging` module over `print()` for application diagnostics, especially in library/service code. `print()` is acceptable for CLI tools or immediate debug output during development. ``` **2. `data_handling.mdc`** ```markdown --- description: Rules for data structures, JSON handling, and in-memory Knowledge Graph components. globs: ["kg_services/**/*.py", "data/**/*.json", "schemas/**/*.json", "app.py"] # Apply to relevant files alwaysApply: false # Auto-attach when relevant files are in context --- ### Dataclasses (for Ontology Primitives like `MCPTool`, `MCPPrompt`, `PlannedStep`) - Utilize Python `dataclasses` for defining structured data objects (MCP primitives). - All fields MUST have type hints. - Use `field(default_factory=list/dict)` for mutable default values. - Ensure dataclasses are easily serializable/deserializable to/from JSON if needed. - Add a `to_dict()` method if frequently converting to dictionaries for APIs or storage. - Add a `from_dict(cls, data: Dict[str, Any])` class method for robust instantiation from dictionaries. ### JSON Data (`data/initial_*.json`) - Ensure all JSON files are well-formed and valid. - Data in JSON files MUST strictly adhere to the schemas defined by corresponding Python dataclasses or JSON Schemas. - Use consistent naming conventions for keys. - For descriptive fields (like `description`, `tags`), provide rich, detailed content to improve embedding quality for semantic search. ### In-Memory KG (`kg_services/knowledge_graph.py`) - **Structure:** Prefer Python dictionaries for key-based lookups (e.g., `self.tools: Dict[str, MCPTool]`) and lists for storing sequences like embeddings. - **Loading:** When loading from JSON, include robust error handling for file issues (`FileNotFoundError`) or parsing errors (`json.JSONDecodeError`). Log errors clearly. - **Vector Storage (InMemory):** - Store embeddings as `List[List[float]]` or `List[np.ndarray]`. - Maintain a parallel list of IDs (`tool_ids_for_vectors`, `prompt_ids_for_vectors`) to map embeddings back to their original objects. Ensure these lists are always synchronized. - **Similarity Calculation:** - Use `numpy` for efficient vector operations (cosine similarity). - Implement cosine similarity carefully, handling potential division by zero (e.g., if a vector norm is zero). - **Modularity:** Separate structured data storage (dictionaries) from vector index logic within the `InMemoryKG` class. ### NumPy Usage - When performing vector math (e.g., cosine similarity), convert Python lists of floats to `np.ndarray` for performance and correctness. - Ensure `numpy` is listed as a core dependency in `requirements.txt`. ``` **3. `llm_interaction.mdc`** ```markdown --- description: Best practices for interacting with LLM APIs (Embeddings, future Chat/Sampling). globs: ["kg_services/embedder.py", "agents/**/*.py"] # Where LLM calls happen alwaysApply: false --- ### Embedding Service (`kg_services/embedder.py`) - **Client Initialization:** Initialize LLM API clients (OpenAI, Anthropic, AzureOpenAI) once, typically in `__init__`. - **API Key Management:** Strictly load API keys from environment variables (`os.getenv()`). NEVER hardcode keys. - **Error Handling:** - Wrap all API calls in `try-except` blocks. - Catch specific API errors (e.g., `openai.APIError`, `requests.exceptions.RequestException`). - Log errors with context (e.g., the text being embedded, the API called). - Return a clear failure indicator (e.g., `None` or raise a custom exception) so calling code can handle it. - **Input Preprocessing:** Preprocess text for embeddings if recommended by the API provider (e.g., replacing newlines with spaces: `text.replace("\n", " ")`). - **Model Selection:** Clearly define which embedding model is being used (e.g., `text-embedding-3-small`). Make this configurable if possible (e.g., via an environment variable or parameter). - **Batching (Future Consideration):** If embedding many texts, investigate if the API supports batching for efficiency. ### (Future) MCP Sampling / Direct LLM Calls by Agents - **Request Construction:** When an agent constructs a request for an LLM (simulating MCP Sampling or direct call): - Adhere strictly to the target LLM's API schema (e.g., message roles, content structure). - Sanitize any user-provided input included in prompts to prevent prompt injection if applicable. - **Response Parsing:** Robustly parse LLM responses. Anticipate variations or errors. Validate JSON structure if expecting JSON. - **Retry Logic:** Implement retry mechanisms (e.g., with exponential backoff) for transient API errors. - **Cost & Token Tracking (Advanced):** If making many calls, consider logging token usage to monitor costs. ``` **4. `gradio_ui.mdc`** ```markdown --- description: Rules and best practices for developing Gradio UIs for KGraph-MCP. globs: ["app.py", "mcp_summarizer_tool_gradio/app.py", "mcp_sentiment_tool_gradio/app.py"] # All Gradio apps alwaysApply: false --- ### Interface Design (`gr.Blocks` preferred) - Use `gr.Blocks()` for more control over layout compared to `gr.Interface` for complex apps. - Organize UI elements logically using `gr.Row()`, `gr.Column()`, `gr.Group()`, `gr.Accordion()`, `gr.Tabs()`. - Provide clear labels and placeholders for all input components. - Use `gr.Markdown()` for titles, descriptions, and instructional text. - Ensure a responsive layout if possible. ### Backend Handler Functions - Keep Gradio handler functions (those called by `button.click()`, etc.) focused. They should primarily: 1. Receive inputs from Gradio components. 2. Call underlying business logic (e.g., Planner agent, Executor agent). 3. Format the results from the business logic for display in Gradio output components. 4. Return updates for Gradio components using `gr.update()` or direct values. - Handle errors gracefully within these functions and return user-friendly error messages to the UI. - **State Management:** For simple state needed across interactions (like the current `PlannedStep`), consider passing necessary identifiers back from the UI or using `gr.State` if absolutely necessary (but try to keep handlers stateless if possible by re-deriving context). ### Dynamic UI Updates - When dynamically showing/hiding components or updating their properties (label, value, visibility): - Use `gr.update()` (e.g., `gr.Textbox.update(visible=True, label="New Label")`). - Ensure the `outputs` list of a `.click()` handler correctly maps to all components being updated. - If updating many components, return a tuple of `gr.update()` objects in the correct order. - Test dynamic updates thoroughly across different scenarios. ### MCP Server Exposure (for Track 1 Tools) - Use `gradio_mcp.patch(iface, mcp_path="/mcp")` (or the latest equivalent API from `gradio_mcp`) to expose Gradio interfaces as MCP servers. - Clearly document the MCP endpoint path (`/mcp`), expected request payload structure (`{"data": [...]}`), and response structure in the tool's `README.md`. - Ensure the Gradio function's input parameters match the order and types expected in the MCP `data` array. ### Error Handling & User Feedback - Display clear error messages in the UI if backend operations fail. - Provide loading indicators or button state changes (`gr.Button.update(interactive=False, value="Processing...")`) for long-running operations (though less critical for fast, local MVP operations). ``` **5. `agent_logic.mdc`** ```markdown --- description: Design principles and conventions for KGraph-MCP AI Agents (Planner, Executor, etc.). globs: ["agents/**/*.py"] alwaysApply: false --- ### Agent Responsibilities - Each agent class (Planner, Executor, Supervisor) should have a clearly defined, single responsibility. - Interactions between agents should be through well-defined method calls or (future) message passing. ### Planner Agent (`agents/planner.py`) - **Input:** User query (string). - **Core Logic:** 1. Utilize `EmbeddingService` to embed the user query. 2. Query `InMemoryKG` (both vector and structured parts) to find relevant `MCPTool`s. 3. For selected tools, query `InMemoryKG` for relevant `MCPPrompt`s (considering `target_tool_id` and semantic relevance to the query). 4. Construct `PlannedStep` objects. - **Output:** `List[PlannedStep]`. - **Error Handling:** Gracefully handle cases where no tools/prompts are found, or embedding fails. ### Executor Agent (`agents/executor.py`) - **Input:** `PlannedStep` object, `Dict[str, str]` of user-provided inputs for the prompt. - **Core Logic (MVP3 - Simulated):** - Return mocked, tool-specific, and input-aware responses. - Simulate success and error states. - **Core Logic (MVP4 - Real MCP Call):** - Check `tool.execution_type`. - If "remote_mcp_gradio": - Construct valid MCP JSON payload (order `inputs` based on `prompt.input_variables`). - Make HTTP POST request using `requests` to `tool.invocation_command_stub` (MCP endpoint URL). - Parse MCP JSON response. Extract data from `response["data"][0]`. - Handle network errors, non-200 HTTP status codes, and MCP response parsing errors robustly. - Fallback to simulation for other `execution_type`s. - **Output:** `Dict[str, Any]` containing status, message, and tool-specific output. ### State & Dependencies - Agents should receive their dependencies (like `InMemoryKG`, `EmbeddingService`) via their constructor (Dependency Injection). - Avoid global state where possible. ### Logging - Implement basic logging within agent methods to trace decision-making and execution flow, especially for debugging. ``` **6. `testing_pytest.mdc`** ```markdown --- description: Conventions for writing tests using Pytest and unittest.mock. globs: ["tests/**/*.py"] alwaysApply: true # Tests are crucial --- ### General Principles - **Test Coverage:** Aim for high test coverage for all non-UI, non-direct-LLM-output backend logic. - **AAA Pattern (Arrange, Act, Assert):** Structure tests clearly. - **Independence:** Tests should be independent and runnable in any order. Avoid tests that depend on the state left by previous tests. - **Speed:** Unit tests should be fast. Mock external dependencies (APIs, databases, file system) to achieve this. ### Pytest Usage - Use descriptive test function names (e.g., `test_planner_returns_empty_list_for_no_matching_tools`). - Utilize Pytest fixtures (`@pytest.fixture`) for setting up common test data or mock objects (e.g., an initialized `InMemoryKG` instance, a mocked `EmbeddingService`). - Use `tmp_path` fixture for tests involving file I/O. - Use `capsys` fixture for testing `print()` statements or logged output. - Use `monkeypatch` fixture for setting/deleting environment variables or patching module attributes. ### Mocking (`unittest.mock`) - **Target for Patching:** Patch objects where they are *looked up*, not where they are *defined*. (e.g., `patch('app.planner_agent_instance')` if testing a Gradio handler in `app.py` that uses it). - Use `mocker` fixture (from `pytest-mock`) for convenience. - **Specificity:** Mock return values or side effects precisely for what the test needs. - Test that mocked methods were called with expected arguments (`mock_object.assert_called_once_with(...)`). - Mock external API calls (e.g., `requests.post`, `openai.Embeddings.create`) thoroughly in tests for services that use them (like `EmbeddingService` or `McpExecutorAgent`). ### Test Structure - Organize tests in the `tests/` directory, mirroring the project structure (e.g., `tests/kg_services/test_knowledge_graph.py`, `tests/agents/test_planner.py`). - Ensure `__init__.py` files are present in test subdirectories if needed for discovery. ``` **7. `mcp_specific/gradio_mcp_server.mdc`** ```markdown --- description: Rules specific to creating MCP Servers using Gradio and gradio_mcp. globs: ["mcp_summarizer_tool_gradio/app.py", "mcp_sentiment_tool_gradio/app.py"] # Apply to Track 1 tools alwaysApply: false --- ### MCP Endpoint - Consistently use `/mcp` as the `mcp_path` when patching the Gradio interface with `gradio_mcp.patch()`. - Ensure the Gradio function (`fn`) that `gr.Interface` wraps is designed to accept inputs in the order they will appear in the MCP `{"data": [...]}` list. - The function should return a single value or a tuple, which `gradio_mcp` will wrap into `{"data": [...]}`. ### Payload Handling - The Gradio function should anticipate the types and number of arguments from the MCP `data` list. - Perform basic validation on inputs if necessary within the function. - Return clear error messages (as strings or structured dicts) if input is invalid, which will be part of the MCP response's `data` field. ### README Documentation (for each MCP Server Space) - **Crucial:** The `README.md` for each Track 1 MCP server Space MUST clearly document: - The MCP endpoint URL (Space URL + `/mcp`). - The expected HTTP method (`POST`). - The exact structure of the request JSON payload, especially the `data` array (order and types of arguments). - The exact structure of a successful response JSON payload. - Example error response structures. - This documentation is vital for your main KGraph-MCP agent (and others) to correctly call this MCP server. ### Dependency Management - Keep `requirements.txt` for each MCP server minimal, including only `gradio`, `gradio_mcp`, and libraries essential for that specific tool's function. ``` **8. `project_specific.mdc`** ```markdown --- description: Overarching conventions, terminology, and architectural principles for the KGraph-MCP project. globs: ["**/*.py", "**/*.md"] # Apply broadly alwaysApply: true --- ### Terminology - Consistently use the defined MCP primitive names: `MCPTool`, `MCPPrompt`, `PlannedStep`, `Resource` (when added), `Root` (when added), `Sampling` (when added). - Refer to the in-memory knowledge graph components as `InMemoryKG` and `EmbeddingService`. - Agent names: `SimplePlannerAgent`, `McpExecutorAgent` (or `StubExecutorAgent`). ### Modularity & Separation of Concerns - **`kg_services/`:** All direct data storage, retrieval, embedding, and KG structure logic. Should not contain agent decision-making or UI code. - **`agents/`:** Contains the logic for different AI agents. Agents interact with `kg_services` but not directly with Gradio. - **`app.py`:** Handles Gradio UI and acts as the top-level orchestrator, calling agent methods and formatting results. Should contain minimal business logic itself. - **`data/`:** Contains static JSON definitions for initial KG population. - **`schemas/`:** (Future) JSON Schema definitions for validating data, if used beyond dataclasses. ### Conventional Commits - All Git commit messages MUST follow the Conventional Commits specification (v1.0.0). - Use types like `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `ci`, `build`. - Use scopes for clarity (e.g., `feat(planner): ...`, `fix(ui): ...`). - This will be enforced by `commitlint` if set up. ### Error Handling Philosophy - Services (like `EmbeddingService`, `McpExecutorAgent` making API calls) should catch specific exceptions and return `None` or a structured error response. - Agents should check for these failure indicators from services and propagate errors or make decisions based on them. - The Gradio `app.py` handler functions should ultimately present user-friendly error messages. ### Hackathon MVP Focus - Prioritize implementing the core functionality defined for the current MVP sprint. - Favor simplicity and clarity over overly complex solutions, especially for UI and initial agent logic. - Ensure the demo for each MVP is compelling and clearly shows the value of the KG and agent interaction. - Document sponsor technology usage clearly in `README.md`. ``` **How to Use These Rules in Cursor IDE:** 1. **Create the Directory Structure:** Make the `.cursor/rules/` directory and the sub-directory `mcp_specific/` in your project root. 2. **Save Each Rule Set:** Save the content above into the corresponding `.mdc` files within this structure. 3. **Cursor Auto-Attachment:** * Rules with `alwaysApply: true` (like `general_python.mdc` and `project_specific.mdc`) will generally be active. * Rules with `alwaysApply: false` and specific `globs` (like `data_handling.mdc` or `gradio_ui.mdc`) will be automatically included by Cursor when you are editing files matching those globs or when you `@mention` files that match those globs. 4. **Explicitly Referencing Rules:** When prompting Claude, you can reinforce specific rules: * "Implement this function for `InMemoryKG`. Apply rules from `@.cursor/rules/data_handling.mdc` and `@.cursor/rules/general_python.mdc`." 5. **Iterate on Rules:** As the project progresses, you might find the need to refine these rules, add new ones, or adjust scopes. This is a living document. By setting up this comprehensive rule set, you provide Claude 4.0 with excellent context and guidance, leading to more consistent, higher-quality code generation and a more efficient AI-assisted development experience throughout your hackathon project. --- Okay, here's the full and comprehensive plan for **MVP 4 - Sprint 2: "Develop & Deploy Second Track 1 MCP Server (Sentiment Analyzer)"**. This sprint mirrors the structure of MVP 4 - Sprint 1, but focuses on creating a different tool. **Sprint Goal (MVP 4 - Sprint 2):** Successfully create, unit test, document, and deploy a fully functional Sentiment Analyzer MCP Tool as a Gradio application on a new, dedicated Hugging Face Space. This will serve as your second Track 1 submission and another live endpoint for integration. **Assumptions for Claude 4.0 in Cursor IDE:** * MVP 4 - Sprint 1 (Summarizer Tool) is complete and deployed. * You have an `HF_TOKEN` for the Inference API. * `uv`, Python, Git are set up. * You can reference/copy relevant parts of `.cursor/rules/` (especially `python_gradio_basic.mdc` and `mcp_specific/gradio_mcp_server.mdc`) for this new sub-project. * Conventional Commits will be used for this sub-project's repository. --- **Task List for MVP 4 - Sprint 2 - Cursor IDE / Claude Focus:** *(Each task implies: setting up a new project context, writing code, testing, linting, formatting, type-checking, and committing frequently within this new tool's repository.)* **Task 2.1: Initialize New Sentiment Analyzer Tool Project (`mcp_sentiment_analyzer_tool_gradio`)** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 2 * **Description:** Create a new, separate Git repository and project directory for the MCP Sentiment Analyzer Tool. Set up the basic Python environment using `uv`, define initial dependencies, and create necessary config files. This is very similar to Task 1.1 of MVP4-Sprint1. * **Acceptance Criteria:** 1. New Git repository created (e.g., `mcp-sentiment-analyzer-tool`). 2. Local project directory `mcp_sentiment_analyzer_tool_gradio` cloned and structured. 3. `.python-version`, `requirements.txt`, `requirements-dev.txt`, `.gitignore`, `app.py` (stub), `README.md` (stub), `LICENSE` created. 4. `uv venv .venv` and dependency installation work. 5. Basic `pyproject.toml` and `pytest.ini` (or equivalents in `pyproject.toml`) are set up. * **Guidance for Claude / Cursor:** ```cursor chore(project): initialize MCP Sentiment Analyzer Tool project **Objective:** Create a clean, new project structure and environment for the standalone MCP Sentiment Analyzer Tool. **Developer Actions (Manual - Outside Claude's direct file creation for a *new* repo):** 1. Create a new repository on GitHub named `mcp-sentiment-analyzer-tool`. 2. Clone it locally: `git clone mcp_sentiment_analyzer_tool_gradio` 3. `cd mcp_sentiment_analyzer_tool_gradio` **Claude, please generate the content for the following files within this new project (`mcp_sentiment_analyzer_tool_gradio`):** **1. File: `.python-version`** ``` 3.11.8 ``` **2. File: `requirements.txt`** ``` gradio>=4.20.0 # Or check latest stable Gradio gradio_mcp>=0.1.0 # Or check latest requests # For HF Inference API # transformers # Add if choosing a local model approach # torch ``` **3. File: `requirements-dev.txt`** ``` uv pytest pytest-mock ruff black mypy python-dotenv ``` **4. File: `.gitignore`** (Standard Python .gitignore, same as Summarizer tool) **5. File: `app.py` (Initial stub)** ```python # Placeholder for Gradio app and MCP Sentiment Analyzer logic print("MCP Sentiment Analyzer Tool app.py loaded") ``` **6. File: `README.md` (Initial stub)** ```markdown # MCP Sentiment Analyzer Tool (Hackathon) A Gradio app and MCP server for text sentiment analysis. (More details to be added) ``` **7. File: `LICENSE`** (e.g., MIT License content, same as Summarizer tool) **8. File: `pyproject.toml`** (Basic linting/testing config, same as Summarizer tool) **9. Create empty directory: `tests/`** **Developer Actions (Manual - After Claude generates files):** 1. `uv python install $(cat .python-version)` 2. `uv venv .venv --python $(cat .python-version)` 3. `source .venv/bin/activate` (or equivalent for your shell) 4. `uv pip install -r requirements-dev.txt -r requirements.txt` 5. `uv pip compile requirements.txt requirements-dev.txt --all-extras -o requirements.lock` 6. `git add . && git commit -m "chore: initial project setup for mcp sentiment analyzer tool"` 7. (Optional) Create a `justfile` for this sub-project. ``` **Task 2.2: Implement Core Sentiment Analysis Logic & Unit Tests** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 2 * **Description:** In `mcp_sentiment_analyzer_tool_gradio/app.py`, write the Python function that performs sentiment analysis. * Option A (Simple for Hackathon): Use Hugging Face Inference API with a suitable sentiment analysis model (e.g., `distilbert-base-uncased-finetuned-sst-2-english`). * Option B (More Control): Load a model using `transformers.pipeline("sentiment-analysis", model="...")`. * The function should take `text_input: str` and return a dictionary like `{"label": "POSITIVE", "score": 0.99}` or a string representation. * **Acceptance Criteria:** Sentiment analysis function works correctly, returns structured output, and is unit-tested (mocking API if Option A). * **TDD Approach:** Write unit tests in `tests/test_sentiment_analyzer.py` first. * **Guidance for Claude / Cursor:** ```cursor feat(tool): implement HF Inference API sentiment analysis and tests **Objective:** Create the Python function for sentiment analysis using Hugging Face Inference API and write unit tests. **Action 1: Implement `analyze_sentiment` in `mcp_sentiment_analyzer_tool_gradio/app.py`** 1. Open `@mcp_sentiment_analyzer_tool_gradio/app.py`. 2. Add the Python code for `analyze_sentiment`: ```python import os import requests from typing import Dict, Any # For type hint from dotenv import load_dotenv load_dotenv() HF_API_TOKEN = os.getenv("HF_TOKEN") # Example model, many others available: https://huggingface.co/models?pipeline_tag=text-classification&sort=trending&filter=sentiment-analysis SENTIMENT_API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english" def analyze_sentiment(text_to_analyze: str) -> Union[Dict[str, Any], str]: # Return dict on success, str on error """ Analyzes sentiment of text using the Hugging Face Inference API. Returns a dictionary with 'label' and 'score' or an error message string. """ if not HF_API_TOKEN: error_msg = "Error: Hugging Face API Token (HF_TOKEN) not configured." print(error_msg) return error_msg if not text_to_analyze or not text_to_analyze.strip(): error_msg = "Error: Input text for sentiment analysis cannot be empty." print(error_msg) return error_msg headers = {"Authorization": f"Bearer {HF_API_TOKEN}"} # API for text-classification often expects "inputs" as a string directly, not nested payload = {"inputs": text_to_analyze} print(f"Calling HF Sentiment API for text: {text_to_analyze[:50]}...") try: response = requests.post(SENTIMENT_API_URL, headers=headers, json=payload, timeout=30) response.raise_for_status() result = response.json() # Response for this model is often a list of lists of dicts: [[{'label': 'POSITIVE', 'score': 0.99...}]] if isinstance(result, list) and result and \ isinstance(result[0], list) and result[0] and \ isinstance(result[0][0], dict) and "label" in result[0][0] and "score" in result[0][0]: sentiment_data = result[0][0] # Take the first result from the inner list print(f"API success, sentiment: {sentiment_data}") return {"label": sentiment_data["label"], "score": sentiment_data["score"]} else: error_msg = f"Error: Unexpected response format from sentiment API. Response: {result}" print(error_msg) return error_msg except requests.exceptions.HTTPError as http_err: error_msg = f"Error calling sentiment API (HTTP {http_err.response.status_code}): {http_err.response.text}" print(error_msg) return error_msg except requests.exceptions.RequestException as req_err: error_msg = f"Error calling sentiment API (Request Exception): {req_err}" print(error_msg) return error_msg except Exception as e: error_msg = f"An unexpected error occurred during sentiment analysis: {e}" print(error_msg) return error_msg ``` 3. Apply coding standards. **Action 2: Create Unit Tests in `mcp_sentiment_analyzer_tool_gradio/tests/test_sentiment_analyzer.py`** 1. Create the file `@mcp_sentiment_analyzer_tool_gradio/tests/test_sentiment_analyzer.py`. 2. Implement unit tests for `analyze_sentiment` (similar structure to `test_summarizer.py`, mocking `requests.post`): * Test successful analysis returning a dict like `{"label": "POSITIVE", "score": 0.999}`. * Test API error format, HTTP error, request exception, no HF token, empty input. ```python # In tests/test_sentiment_analyzer.py import pytest from unittest.mock import patch, Mock import requests from app import analyze_sentiment # Assuming app.py is in the root @pytest.fixture def mock_env_hf_token(monkeypatch): monkeypatch.setenv("HF_TOKEN", "test_hf_token") def test_analyze_sentiment_success(mock_env_hf_token, mocker): mock_response_data = [[{"label": "POSITIVE", "score": 0.998}]] mock_response = Mock() mock_response.status_code = 200 mock_response.json.return_value = mock_response_data mocker.patch("requests.post", return_value=mock_response) result = analyze_sentiment("This is a wonderful tool!") assert isinstance(result, dict) assert result["label"] == "POSITIVE" assert result["score"] == 0.998 requests.post.assert_called_once() # ... (add other error case tests similar to test_summarizer.py) ... def test_analyze_sentiment_no_hf_token(monkeypatch): monkeypatch.delenv("HF_TOKEN", raising=False) result = analyze_sentiment("Some text.") assert isinstance(result, str) and "HF_TOKEN) not configured" in result def test_analyze_sentiment_empty_input_text(): result = analyze_sentiment(" ") assert isinstance(result, str) and "Input text for sentiment analysis cannot be empty" in result ``` Ensure `requests` and `python-dotenv` are in `mcp_sentiment_analyzer_tool_gradio/requirements.txt`, and `pytest`, `pytest-mock` in `requirements-dev.txt`. ``` **Task 2.3: Create Gradio UI and MCP Server Endpoint for Sentiment Analyzer** * Status: Pending * **Description & Acceptance Criteria:** Similar to Task 1.3 of MVP4-Sprint1, but for the sentiment analyzer. The MCP payload will likely be `{"data": [""]}` and response `{"data": [{"label": "...", "score": ...}]}` or `{"data": [""]}`. * **Detailed Guidance for Claude / Cursor:** ```cursor feat(tool): create Gradio UI and MCP endpoint for sentiment analyzer **Objective:** Wrap the `analyze_sentiment` function in a Gradio UI and expose it as an MCP server. **Action: Update `mcp_sentiment_analyzer_tool_gradio/app.py`** 1. Open `@mcp_sentiment_analyzer_tool_gradio/app.py`. 2. Add the Gradio interface code below the `analyze_sentiment` function: ```python # ... (imports and analyze_sentiment function) ... import gradio as gr import gradio_mcp # Ensure this is in requirements.txt TITLE_SENTIMENT = "MCP Sentiment Analyzer Tool (via HF Inference API)" DESCRIPTION_SENTIMENT = """ Enter text to analyze its sentiment (e.g., positive, negative). Uses Hugging Face Inference API. Also an MCP Tool at `/mcp`. MCP Payload Example: `{"data": [""]}` MCP Response Example (Success): `{"data": [{"label": "POSITIVE", "score": 0.99}]}` MCP Response Example (Error): `{"data": [""]}` """ sentiment_input_component = gr.Textbox( lines=5, label="Text to Analyze for Sentiment", placeholder="Enter text here..." ) # Output will be a dictionary, so use gr.JSON or gr.Label (if mapping to value/conf) sentiment_output_component = gr.JSON(label="Sentiment Analysis Result") # Alternative if you want to parse and display nicely: # sentiment_label_output = gr.Label(label="Detected Sentiment") # sentiment_score_output = gr.Number(label="Confidence Score") iface_sentiment = gr.Interface( fn=analyze_sentiment, inputs=sentiment_input_component, outputs=sentiment_output_component, # If using separate Label/Number, outputs=[sentiment_label_output, sentiment_score_output] title=TITLE_SENTIMENT, description=DESCRIPTION_SENTIMENT, allow_flagging="never", examples=[ ["I love this product, it's absolutely fantastic!"], ["This is the worst experience I have ever had."], ["The weather today is quite normal."] ] ) # Patch and launch (similar to summarizer tool) def run_sentiment_app(): print("Patching Gradio app for MCP (Sentiment Analyzer)...") gradio_mcp.patch(iface_sentiment, mcp_path="/mcp") print(f"Sentiment MCP endpoint should be available at /mcp") iface_sentiment.launch() if __name__ == "__main__": print("Starting MCP Sentiment Analyzer Tool Gradio App...") run_sentiment_app() ``` 3. Add `gradio_mcp` to `@mcp_sentiment_analyzer_tool_gradio/requirements.txt`. 4. Apply coding standards. *(Developer will manually test locally and test MCP endpoint with `curl` or Postman).* ``` **Task 2.4: Prepare Tool README, Deploy to Hugging Face Space, & Test Live (Sentiment Analyzer)** * Status: Pending * **Description & Acceptance Criteria:** Similar to Task 1.4 of MVP4-Sprint1, but for the Sentiment Analyzer tool. This includes creating its own `README.md` with Space metadata and MCP usage instructions, deploying to a *new separate* Hugging Face Space, setting secrets, and testing. * **Detailed Guidance for Claude / Cursor:** ```cursor docs(tool): prepare README and deploy sentiment analyzer tool to HF Space **Objective:** Document and deploy the MCP Sentiment Analyzer Tool. **Action 1: Create `mcp_sentiment_analyzer_tool_gradio/README.md`** Generate a README.md for this specific tool's Space. Include: - YAML frontmatter for Hugging Face Space (title, emoji, sdk, app_file, python_version, tags including `mcp-server-track`). - Title: MCP Sentiment Analyzer Tool - Emoji: ๐Ÿ˜Š๐Ÿ˜• - Description of the tool. - How to use the Gradio UI. - **How to use as an MCP Server:** - Endpoint: `/mcp` - Method: `POST` - Expected Payload: `{ "data": [""] }` - Success Response Example: `{ "data": [{"label": "POSITIVE", "score": 0.998}] }` (Note: data is a list containing one dict) - Error Response Example (from tool logic, if analyze_sentiment returns string): `{ "data": ["Error: Input text cannot be empty."] }` - Mention of sponsor credits used (Hugging Face API). - Repository link. **Action 2: (Developer Task) Final Local Checks & Deployment to Hugging Face Space** - Run `just` commands (lint, format, test, etc.) in `mcp_sentiment_analyzer_tool_gradio`. - Create a new Hugging Face Space. - Set `HF_TOKEN` as a Space Secret. - Push project files to the Space repository. - Monitor build logs and test the deployed Space (UI and MCP endpoint). Please generate the content for the `mcp_sentiment_analyzer_tool_gradio/README.md` file. ``` **Task 2.5: Sprint Wrap-up & Review (Sentiment Analyzer Tool)** * Status: Pending * **Description:** 1. Ensure all code for `mcp_sentiment_analyzer_tool_gradio` is committed and pushed to its repository. 2. Verify the deployed Hugging Face Space for the Sentiment Analyzer Tool is working correctly. 3. Document its live Space URL. * **Acceptance Criteria:** Second Track 1 MCP server is developed, tested, documented, and deployed. --- **End of MVP 4 - Sprint 2 Review:** * **What's Done:** * A standalone MCP Sentiment Analyzer Tool has been created as a Gradio app. * It uses the Hugging Face Inference API for sentiment analysis. * It's unit-tested. * It's documented with its own Space README. * It's deployed to a new Hugging Face Space and its live MCP endpoint is functional. * This forms your second Track 1 submission. * **What's Next (MVP 4 - Sprint 3):** * Update the main KGraph-MCP project's `InMemoryKG` to include the live URL of this new sentiment analyzer tool. * Refactor/Enhance the `McpExecutorAgent` in the main project to correctly call this new live tool, handling its specific payload and response structure, while maintaining the ability to call the summarizer and fallback to simulations. * **Blockers/Issues:** (Note any new issues, e.g., different API response formats from HF Inference API for different models requiring more careful parsing). This sprint solidifies your Track 1 contributions and provides another live service for your main agent to orchestrate. The next sprint will be about integrating this new tool into the KGraph-MCP agent's repertoire. --- Okay, let's craft the full plan for **MVP 4 - Sprint 3: "Update Main KG & Executor Agent for Real MCP Calls"**. This sprint is pivotal as it bridges your intelligent planning frontend with actual, live MCP services. **Sprint Goal (MVP 4 - Sprint 3):** Modify the main KGraph-MCP application's `InMemoryKG` to recognize and store live MCP server endpoint URLs. Refactor the `ExecutorAgent` to make HTTP POST requests to these live Gradio MCP servers, correctly forming payloads and parsing responses, while retaining fallback simulation logic for other tools. Ensure `app.py` uses this enhanced executor. **Assumptions for Claude 4.0 in Cursor IDE:** * MVP 4 - Sprints 1 & 2 are complete: At least two live Gradio MCP servers (Summarizer, Sentiment Analyzer) are deployed on Hugging Face Spaces with known `/mcp` endpoint URLs. * The main project (`KGraph-MCP-Hackathon`) structure is in place from previous MVPs. * `PlannedStep`, `MCPTool`, `MCPPrompt` dataclasses are defined. * `SimplePlannerAgent` is functional. * `.cursor/rules/` are set up, especially `python_gradio_basic.mdc` and `agent_logic.mdc`. * Conventional Commits are being used. --- **Task List for MVP 4 - Sprint 3 - Cursor IDE / Claude Focus:** *(Each task implies: writing code, extensive testing (unit and integration for MCP calls), linting, formatting, type-checking, and committing with Conventional Commits.)* **Task 3.1: Enhance `MCPTool` Ontology and Update `data/initial_tools.json`** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 3 * **Description:** 1. Modify the `MCPTool` dataclass in `kg_services/ontology.py` to include an `execution_type: str` field (defaulting to `"simulated"`) and ensure `invocation_command_stub: str` can now store a full URL. 2. In the main project's `data/initial_tools.json`, update the entries for the live "summarizer_v1" and "sentiment_analyzer_v1" tools: * Set their `execution_type` to `"remote_mcp_gradio"`. * Update their `invocation_command_stub` to the full live Hugging Face Space URL for their `/mcp` endpoint. 3. Ensure other stub tools in `data/initial_tools.json` retain `execution_type: "simulated"` or have it added with this default. * **Acceptance Criteria:** 1. `MCPTool` dataclass in `kg_services/ontology.py` is updated with `execution_type`. 2. `data/initial_tools.json` is updated with correct `execution_type` and live URLs for the deployed MCP servers. 3. Existing tests for `MCPTool` instantiation and KG loading (if they check all fields) are updated and pass. * **TDD Approach:** Update tests in `tests/kg_services/test_ontology.py` for `MCPTool` to include `execution_type`. Update tests in `tests/kg_services/test_knowledge_graph.py` for `load_tools_from_json` to verify the new field is loaded correctly. * **Guidance for Claude / Cursor:** ```cursor feat(ontology): add execution_type to MCPTool and update KG data **Objective:** Enhance the `MCPTool` definition to support different execution types and update the KG data to point to live MCP servers. **Action 1: Modify `kg_services/ontology.py`** 1. Open `@kg_services/ontology.py`. 2. Modify the `MCPTool` dataclass: ```python # In kg_services/ontology.py # ... (other imports and dataclasses) ... @dataclass class MCPTool: tool_id: str name: str description: str tags: List[str] = field(default_factory=list) # invocation_command_stub can now be a command, a URL, or other identifier invocation_command_stub: str = "" execution_type: str = "simulated" # New field, defaults to "simulated" # Possible values for execution_type: "simulated", "remote_mcp_gradio", "local_docker" (future) ``` 3. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/kg_services/test_ontology.py`** 1. Open `@tests/kg_services/test_ontology.py`. 2. Update `test_mcp_tool_creation`: * Test instantiation with `execution_type` explicitly set. * Test that `execution_type` defaults to "simulated" if not provided. **Action 3: Modify `data/initial_tools.json` in the main KGraph-MCP project** 1. Open `@data/initial_tools.json`. 2. For the "summarizer_v1" tool: * Update `invocation_command_stub` to its live HF Space URL (e.g., `"https://YOUR-HF-USERNAME-mcp-summarizer-tool-gradio.hf.space/mcp"`). * Add/Set `"execution_type": "remote_mcp_gradio"`. 3. For the "sentiment_analyzer_v1" tool: * Update `invocation_command_stub` to its live HF Space URL. * Add/Set `"execution_type": "remote_mcp_gradio"`. 4. For other existing stub tools (e.g., "image_caption_generator_stub_v1", "code_linter_tool_stub_v1"): * Ensure they have `"execution_type": "simulated"` or add it if missing (relying on default is okay but explicit is clearer). * Their `invocation_command_stub` can remain empty or be a placeholder name. **Action 4: Update `tests/kg_services/test_knowledge_graph.py`** 1. Open `@tests/kg_services/test_knowledge_graph.py`. 2. In tests like `test_load_tools_valid_json`, ensure that when `MCPTool` objects are asserted, the `execution_type` field is also checked. You might need to update your sample JSON data used in these tests. Please generate the modified `MCPTool` dataclass, the updated tests for it, and provide an example of how "summarizer_v1" should look in `data/initial_tools.json`. ``` **Task 3.2: Refactor `ExecutorAgent` for Live MCP Calls & Simulation Fallback** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 3 * **Description & Acceptance Criteria:** As per your original Task 3.2. Key is robust error handling for HTTP calls and JSON parsing, plus correct MCP payload construction. * **Detailed Guidance for Claude / Cursor:** ```cursor feat(agent): implement real MCP calls and simulation fallback in McpExecutorAgent **Objective:** Refactor the ExecutorAgent to make live HTTP POST calls to Gradio MCP servers, handle responses and errors, and fall back to simulation for other tool types. **Action 1: Modify `agents/executor.py`** 1. Open `@agents/executor.py`. 2. If it's still `StubExecutorAgent`, rename the class to `McpExecutorAgent`. 3. Rename the method `simulate_execution` to `execute_plan_step(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]`. 4. Implement the core logic: ```python # In agents/executor.py import requests import json # For payload construction and parsing from typing import Dict, Any, List # Ensure List is imported from kg_services.ontology import PlannedStep # Adjust path if necessary class McpExecutorAgent: def __init__(self): print("McpExecutorAgent initialized.") def _run_simulation(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]: # This is the simulation logic from MVP3 Sprint 3 and Sprint 4 (Task 4.1) # It should be moved here or called if it's complex. tool_id = plan.tool.tool_id status = "simulated_success" message = f"Tool '{plan.tool.name}' execution SIMULATED successfully." specific_mock_output = f"Generic SIMULATED output for tool ID '{tool_id}' with inputs: {inputs}." # (Paste or call the refined input-aware simulation logic from MVP3 Sprint 4 here) # For example: if tool_id == "image_caption_generator_stub_v1": # Assuming this is still simulated input_image_key = plan.prompt.input_variables[0] if plan.prompt.input_variables else "image_url_or_data" image_ref = inputs.get(input_image_key, "undefined_image_reference") if image_ref == "undefined_image_reference" or not image_ref.strip(): status = "simulated_error" message = "Error (Simulated): Image reference is missing." specific_mock_output = {"error_details": "No image reference provided."} else: specific_mock_output = f"Mocked Caption (Simulated): A vivid depiction related to '{image_ref}'." # Add other SIMULATED tool logic here return { "status": status, "tool_id_used": tool_id, "tool_name_used": plan.tool.name, "prompt_id_used": plan.prompt.prompt_id, "prompt_name_used": plan.prompt.name, "message": message, "inputs_received": inputs, "tool_specific_output": specific_mock_output, "execution_mode": "simulated" } def execute_plan_step(self, plan: PlannedStep, inputs: Dict[str, str]) -> Dict[str, Any]: tool = plan.tool print(f"Executor: Preparing to execute tool '{tool.name}' (type: {tool.execution_type})") if tool.execution_type == "remote_mcp_gradio" and tool.invocation_command_stub: mcp_endpoint_url = tool.invocation_command_stub print(f"Executor: Making LIVE MCP call to {mcp_endpoint_url}") mcp_data_payload_list: List[Any] = [] if plan.prompt.input_variables: for var_name in plan.prompt.input_variables: # Gradio functions usually expect arguments in order. # Missing inputs might need to be None or raise an error earlier. mcp_data_payload_list.append(inputs.get(var_name)) else: # Prompt might have no defined input variables, meaning tool takes fixed/no args from prompt print(f"Executor: Prompt '{plan.prompt.name}' has no input_variables. Sending empty data list for tool '{tool.name}'.") # Some tools might still expect an empty list in data, others might error. # This depends on the MCP server's Gradio function signature. # If the Gradio fn takes no args, data should be []. If it takes optional args, it might be [None, None]. mcp_payload = {"data": mcp_data_payload_list} try: # Consider adding headers if needed, e.g. {"Content-Type": "application/json"} # but `requests` usually handles this for `json=` param. response = requests.post(mcp_endpoint_url, json=mcp_payload, timeout=45) # Increased timeout response.raise_for_status() # Raises HTTPError for 4xx/5xx status codes response_json = response.json() # Gradio MCP responses typically have results in response_json["data"][0] # For multi-output Gradio functions, response_json["data"] is a list of outputs. if "data" in response_json and isinstance(response_json["data"], list) and len(response_json["data"]) > 0: actual_tool_output = response_json["data"][0] else: # This case means the Gradio server returned something unexpected for MCP raise KeyError("Expected 'data' field with at least one item in MCP response list.") return { "status": "success_live_mcp", "tool_id_used": tool.tool_id, "tool_name_used": tool.name, "prompt_id_used": plan.prompt.prompt_id, "prompt_name_used": plan.prompt.name, "message": f"Successfully called live MCP tool '{tool.name}'.", "inputs_sent_to_tool": mcp_data_payload_list, # For debugging "tool_specific_output": actual_tool_output, "execution_mode": "live_mcp" } except requests.exceptions.HTTPError as http_err: error_message = f"HTTP Error {http_err.response.status_code} calling MCP server {mcp_endpoint_url}: {http_err.response.text[:200]}" print(error_message) return {"status": "error_live_mcp_http", "message": error_message, "tool_specific_output": None, "execution_mode": "live_mcp_failed"} except requests.exceptions.RequestException as req_err: # Other network errors error_message = f"Network Error calling MCP server {mcp_endpoint_url}: {req_err}" print(error_message) return {"status": "error_live_mcp_network", "message": error_message, "tool_specific_output": None, "execution_mode": "live_mcp_failed"} except (json.JSONDecodeError, KeyError, IndexError, TypeError) as parse_err: # MCP response parsing errors error_message = f"Error parsing response from MCP server {mcp_endpoint_url}: {parse_err}. Response: {response.text[:200]}" print(error_message) return {"status": "error_mcp_response_parsing", "message": error_message, "tool_specific_output": None, "execution_mode": "live_mcp_failed"} elif tool.execution_type == "simulated": print(f"Executor: Falling back to simulation for tool '{tool.name}'") return self._run_simulation(plan, inputs) else: unknown_type_msg = f"Executor: Unknown execution_type '{tool.execution_type}' for tool '{tool.name}'." print(unknown_type_msg) return {"status": "error_unknown_execution_type", "message": unknown_type_msg, "tool_specific_output": None, "execution_mode": "failed"} ``` 5. Ensure `requests` and `json` are imported. Add `requests` to `KGraph-MCP-Hackathon/requirements.txt`. 6. Apply coding standards from `@.cursor/rules/agent_logic.mdc` and `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/agents/test_executor.py`** 1. Open `@tests/agents/test_executor.py`. 2. Rename test class/methods if `StubExecutorAgent` was renamed to `McpExecutorAgent`. 3. Add new tests for the live MCP call path using `@patch('requests.post')` from `unittest.mock` (or `mocker.patch` if using `pytest-mock`): * `test_execute_plan_step_live_mcp_success`: * Mock `requests.post` to return a `Mock` response object with `status_code=200` and `json()` method returning `{"data": ["mocked live output"]}`. * Create a `PlannedStep` with a tool having `execution_type="remote_mcp_gradio"` and a valid `invocation_command_stub` (URL). * Call `agent.execute_plan_step()` and assert the returned dictionary has `status="success_live_mcp"` and `tool_specific_output="mocked live output"`. * Assert `requests.post` was called with the correct URL and JSON payload (check `mcp_data_payload_list` construction). * `test_execute_plan_step_live_mcp_http_error`: Mock `requests.post` to raise `requests.exceptions.HTTPError`. Assert error status and message. * `test_execute_plan_step_live_mcp_network_error`: Mock `requests.post` to raise `requests.exceptions.ConnectionError`. Assert error status. * `test_execute_plan_step_live_mcp_bad_response_json`: Mock `requests.post`'s `json()` method to raise `json.JSONDecodeError` or return malformed JSON. Assert error status. 4. Ensure tests for the `_run_simulation` fallback path are still present and cover various simulated tools correctly. Please generate the modified `McpExecutorAgent` class (including the `_run_simulation` helper method which should contain the input-aware simulation logic from MVP3 Sprint 4) and its comprehensive unit tests. Add `requests` to the main project's `requirements.txt`. ``` **Task 3.3: Update `app.py` to Use (Renamed) `McpExecutorAgent` & Finalize Initialization** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 3 * **Description:** In `app.py` (main KGraph-MCP project): 1. Update the import if `StubExecutorAgent` was renamed to `McpExecutorAgent`. 2. Update the instantiation in the global service initialization block. 3. Ensure `handle_execute_plan` calls `executor_agent_instance.execute_plan_step(...)`. 4. The UI formatting logic in `handle_execute_plan` should already be mostly compatible with the dictionary structure returned by `execute_plan_step`. Minor tweaks might be needed to display the `execution_mode` or more detailed error messages. * **Acceptance Criteria:** 1. `app.py` correctly instantiates and uses `McpExecutorAgent`. 2. The app runs, and the UI can (in theory) trigger live calls or simulations based on tool type. 3. Relevant unit tests in `tests/test_app_handlers.py` are updated for any changes to the executor's return structure if they impact `handle_execute_plan`'s formatting. * **Guidance for Claude / Cursor:** ```cursor refactor(app): integrate McpExecutorAgent for live and simulated execution **Objective:** Update `app.py` in the main KGraph-MCP project to use the refactored `McpExecutorAgent`. **Action 1: Modify `app.py`** 1. Open `@app.py`. 2. If `StubExecutorAgent` was renamed to `McpExecutorAgent` in `agents/executor.py`, update the import statement: `from agents.executor import McpExecutorAgent # Was StubExecutorAgent` 3. In the "Global Service Initialization" block, update the instantiation: ```python # executor_agent_instance = StubExecutorAgent() # Old executor_agent_instance = McpExecutorAgent() # New, assuming variable name is executor_agent_instance ``` (Ensure the variable name `executor_agent_instance` is used consistently where it was `stub_executor_agent_instance`). 4. Review the `handle_execute_plan` function. The structure of the dictionary returned by `McpExecutorAgent.execute_plan_step` should be similar enough that the existing Markdown formatting works. However, you might want to add the `execution_mode` to the display: ```python # Inside handle_execute_plan, when formatting result_md_parts # ... execution_mode = execution_result.get('execution_mode', 'unknown_mode') result_md_parts.append(f"- Execution Mode: `{execution_mode}`") # ... ``` 5. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/test_app_handlers.py`** 1. Open `@tests/test_app_handlers.py`. 2. Update tests for `handle_execute_plan`. The mocked `executor_agent_instance` should now be an instance of `McpExecutorAgent` (or a mock of it). 3. When mocking `executor_agent_instance.execute_plan_step`, ensure the mock return dictionaries include the new `execution_mode` key, and assert that the Markdown output from `handle_execute_plan` includes this information. Please generate the necessary modifications in `app.py` and suggest updates for its tests. ``` **Task 3.4: Final Sprint Checks & Local Integration Test** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 3 * **Description:** 1. Ensure `requests` is in the main project's `requirements.txt`. 2. Update `requirements.lock` (`uv pip compile ...`). 3. Run `just install`, `just lint`, `just format`, `just type-check`, `just test` for the main project. 4. **Crucially:** Run the main `app.py` locally. Test by providing a query that should select one of your *live* Gradio MCP servers (Summarizer or Sentiment Analyzer). Fill in inputs and click "Execute Plan". Observe the terminal output of both the main app and the live Gradio MCP server Space (if it shows logs) to confirm an actual HTTP call was made and a real response was received and displayed. Also test a tool that should still be simulated. 5. Commit changes. Push and verify CI. * **Acceptance Criteria:** 1. All local checks and unit tests pass. 2. Local E2E test confirms a successful live call to at least one deployed MCP server and a successful fallback to simulation for another. 3. CI pipeline is green. * **Guidance for Claude / Cursor:** This is primarily a manual developer testing and wrap-up phase. --- **End of MVP 4 - Sprint 3 Review:** * **What's Done:** * `MCPTool` ontology now supports distinguishing execution types (live vs. simulated). * `data/initial_tools.json` points to the live Gradio MCP server URLs. * The `McpExecutorAgent` is implemented, capable of making actual HTTP POST calls to "remote_mcp_gradio" tools and falling back to input-aware simulation for "simulated" tools. * Error handling for live calls is included. * `app.py` correctly uses the `McpExecutorAgent`. * **What's Next (MVP 4 - Sprint 4):** * Focus on thorough end-to-end testing of the live integrations. * Polish the UI to clearly differentiate between live and simulated results and improve error display. * **Blockers/Issues:** (Note any issues with live MCP call payloads, response parsing, CORS if running things very locally, or unexpected behavior from the live Gradio servers). This sprint is a major milestone, as it demonstrates the core capability of your agent orchestrating real, external services via MCP! --- Okay, let's detail **MVP 4 - Sprint 4: "End-to-End Testing with Live MCP Tools & UI Polish"**. This sprint is crucial for validating the core functionality of MVP 4 and ensuring the user experience is clear about what's happening (live call vs. simulation) and how errors are handled. **Sprint Goal (MVP 4 - Sprint 4):** Rigorously test the end-to-end flow involving live MCP Tool calls to the deployed Gradio servers (Summarizer and Sentiment Analyzer). Refine the Gradio UI in the main KGraph-MCP application to clearly differentiate between results from live calls and simulated executions, and to present errors from live calls gracefully. **Assumptions for Claude 4.0 in Cursor IDE:** * MVP 4 - Sprint 3 is complete: The `McpExecutorAgent` can make live calls and fall back to simulation. `app.py` uses this agent. The live Gradio MCP servers (Summarizer, Sentiment Analyzer) are deployed and accessible. * All necessary classes and configurations are in place. * `.cursor/rules/` are set up. * Conventional Commits are being used. --- **Task List for MVP 4 - Sprint 4 - Cursor IDE / Claude Focus:** *(Each task implies: extensive manual testing, UI refinement, code adjustments, linting, formatting, type-checking, and committing with Conventional Commits.)* **Task 4.1: Rigorous Manual End-to-End Testing of Live MCP Tool Integration** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 4 * **Description:** 1. Execute a comprehensive suite of manual end-to-end tests using the main KGraph-MCP Gradio application (`app.py`). 2. **Live Tool Scenarios (Summarizer):** * Query: "Summarize a long news article." * Input: Paste a substantial text block into the dynamic input field for `document_content`. Vary `max_length` / `min_length`. * Verify: The live Summarizer MCP server (on its HF Space) is called (check its logs if possible, or at least the network request from your local app if inspecting). The *actual summary* from the live tool is displayed in your main app's UI. * Test with empty input text to the summarizer (should ideally be caught by the live tool and return an error, which your main app should then display). 3. **Live Tool Scenarios (Sentiment Analyzer):** * Query: "Analyze the sentiment of this customer review." * Input: Provide various text snippets (clearly positive, clearly negative, neutral) into the dynamic input field for `feedback_text`. * Verify: The live Sentiment Analyzer MCP server is called. The *actual sentiment label and score* are displayed. * Test with empty input. 4. **Simulated Tool Fallback:** * Query: "Generate a caption for an image" (assuming your "image_caption_generator_stub_v1" is still `execution_type: "simulated"`). * Input: Provide a dummy image URL. * Verify: The `McpExecutorAgent` correctly falls back to the `_run_simulation` method, and the *simulated, input-aware caption* is displayed. 5. **Error Path Testing (Live Calls):** * Temporarily make one of your live MCP server Spaces private or stop it to simulate unavailability. Run a query that targets it. Verify your main app's `McpExecutorAgent` catches the network/HTTP error and displays a user-friendly error message in the UI. * (If possible to contrive) Send a malformed payload or trigger an internal error in one of the live MCP tools to see how your main app handles non-JSON or error JSON responses. 6. Document all test cases and their outcomes. Identify and log any bugs. * **Acceptance Criteria:** 1. Successful end-to-end execution for both live MCP tools is verified. 2. Input passing and result parsing for live tools are correct. 3. Fallback to simulation for non-live tools works as expected. 4. Basic error handling for live call failures is functional. * **Guidance for Claude / Cursor:** ```cursor test(app): create e2e test plan for live MCP tool integration (MVP4 Sprint 4) **Objective:** Define a comprehensive set of manual end-to-end test cases for MVP4's live tool integration. **Action: Generate E2E Test Cases** Please outline 8-10 detailed manual E2E test cases. For each: 1. **Test Case ID:** (e.g., `MVP4_S4_TC01`) 2. **Description:** (e.g., "Successful summarization via live MCP tool with long text.") 3. **Preconditions:** (e.g., "Summarizer MCP Server Space is live and accessible. Main KGraph-MCP app is running.") 4. **User Query in Main App:** (e.g., "Provide a concise summary of a technical document.") 5. **Steps & Inputs for Dynamic Fields:** (e.g., "Paste content of `sample_long_text.txt` into 'document_content' field. Set Max Length to 100.") 6. **Expected Result in Main App UI:** (e.g., "Execution Mode: 'live_mcp'. Tool Output: A plausible summary of the input text. Status: 'success_live_mcp'.") 7. **Verification Points:** (e.g., "Check main app terminal for 'Making LIVE MCP call...' log. Check live Summarizer Space logs for request if possible.") Include test cases for: - Successful summarization (various lengths). - Successful sentiment analysis (positive, negative, neutral inputs). - Fallback to a simulated tool. - Live tool error: Summarizer with empty input text. - Live tool error: Sentiment analyzer with empty input text. - Network error simulation: Test targeting a (temporarily) unavailable live MCP server. - (Optional) Test with prompt that has no input variables, calling a live tool that expects no specific data beyond being triggered. This plan will guide manual E2E testing. ``` *(Developer executes these tests, using Claude to help debug issues found in `McpExecutorAgent` or `app.py`'s handling of live responses.)* **Task 4.2: UI Polish - Clearly Differentiate Live vs. Simulated Execution Results** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 4 * **Description:** In `app.py`'s `handle_execute_plan` function, enhance the Markdown formatting of the `execution_result` to make it visually obvious to the user whether the output came from a live MCP tool or a simulation. * Use distinct emojis (e.g., ๐ŸŒ for live, ๐Ÿงช for simulated). * Use different sub-headings or emphasis in the Markdown. * Ensure the `execution_mode` field (added to the executor's response in Sprint 3) is used to drive this conditional formatting. * **Acceptance Criteria:** 1. The UI clearly and visually distinguishes between results from "Live MCP Tool Call" and "Simulated Execution". 2. This distinction is driven by the `execution_mode` in the executor's response. * **Guidance for Claude / Cursor:** ```cursor feat(ui): differentiate live vs. simulated execution results display **Objective:** Update `app.py` to make the source of execution results (live or simulated) visually distinct in the Gradio UI. **Action: Modify `handle_execute_plan` in `app.py`** 1. Open `@app.py`. 2. In the `handle_execute_plan` function, where `result_md_parts` are being assembled from `execution_result`: ```python # Inside handle_execute_plan, after getting execution_result # ... execution_mode = execution_result.get('execution_mode', 'unknown_mode') status = execution_result.get('status', 'unknown') tool_name = execution_result.get('tool_name_used', 'N/A') result_md_parts = [] if execution_mode == "live_mcp": if status == "success_live_mcp": result_md_parts.append(f"๐ŸŒ **Live MCP Tool Call Success ('{tool_name}'):**") else: # Assumes other live statuses are errors result_md_parts.append(f"โŒ๐ŸŒ **Live MCP Tool Call Error ('{tool_name}'):**") elif execution_mode == "simulated": if status == "simulated_success": result_md_parts.append(f"๐Ÿงช **Simulated Execution Success ('{tool_name}'):**") else: # Assumes other simulated statuses are errors result_md_parts.append(f"โŒ๐Ÿงช **Simulated Execution Error ('{tool_name}'):**") else: # Fallback for unknown or failed modes before simulation/live attempt result_md_parts.append(f"โš ๏ธ **Execution Status ('{tool_name}'):** `{status}` (Mode: `{execution_mode}`)") result_md_parts.append(f"- Message: {execution_result.get('message', 'No message.')}") # Add inputs sent if it's a live call for transparency if execution_mode == "live_mcp" and "inputs_sent_to_tool" in execution_result: inputs_sent_str = json.dumps(execution_result["inputs_sent_to_tool"], indent=2) result_md_parts.append(f"- Inputs Sent to Live Tool:\n ```json\n{inputs_sent_str}\n ```") elif "inputs_received" in execution_result: # For simulated, show what executor received inputs_received_str = json.dumps(execution_result["inputs_received"], indent=2) result_md_parts.append(f"- Inputs for Simulation:\n ```json\n{inputs_received_str}\n ```") tool_output = execution_result.get('tool_specific_output') if tool_output is not None: # Check for None explicitly result_md_parts.append("- Tool's Output:") if isinstance(tool_output, (dict, list)): result_md_parts.append(f" ```json\n{json.dumps(tool_output, indent=2)}\n ```") else: result_md_parts.append(f" ```text\n{str(tool_output)}\n ```") elif status not in ["simulated_error", "error_live_mcp_http", "error_live_mcp_network", "error_mcp_response_parsing"]: # Don't add "No output" if it's an error state where output isn't expected result_md_parts.append("- Tool's Output: (No specific output returned or applicable)") return "\n\n".join(result_md_parts) # Use double newline for better spacing in Markdown ``` 3. Apply coding standards. **Action 2: (If needed) Update `tests/test_app_handlers.py`** - If the structure of the Markdown output significantly changes, tests for `handle_execute_plan` might need adjustment to assert the new formatting, including the emojis and execution mode indicators. Please generate the refined formatting logic within `handle_execute_plan`. ``` **Task 4.3: Enhance UI Display of Errors from Live MCP Calls** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 4 * **Description:** In `app.py`'s `handle_execute_plan`, improve how errors originating from live MCP calls (as structured by `McpExecutorAgent` like `status="error_live_mcp_http"`, `status="error_mcp_response_parsing"`) are presented in the `execution_output_display` Markdown. * Make error messages clear and user-friendly. * Distinguish between network errors, server-side errors (e.g., 500 from the Gradio Space), and issues parsing the MCP response. * The formatting changes from Task 4.2 should already help by using "โŒ๐ŸŒ". This task is about ensuring the `message` content from the executor is well-presented. * **Acceptance Criteria:** 1. Errors from live MCP tool calls are displayed in a clear, understandable way in the UI, distinguishing different error types if possible. * **Guidance for Claude / Cursor:** ```cursor refactor(ui): improve display of errors from live MCP calls **Objective:** Ensure that when live MCP calls fail, the error information is presented clearly and gracefully in the Gradio UI. **Action: Review and Refine Error Message Formatting in `handle_execute_plan` (`app.py`)** 1. Open `@app.py`. 2. Examine the `handle_execute_plan` function, specifically the part where it formats the `execution_result` dictionary from `McpExecutorAgent` into Markdown. 3. The previous task (4.2) already introduced formatting for `status == "error_live_mcp_http"` etc. 4. Confirm that the `message` field from `execution_result` (which contains details from `requests.exceptions` or parsing issues) is included in the Markdown output for these error states and is readable. 5. Consider if any additional text or formatting would make these error messages more helpful to the user (e.g., "The live tool might be temporarily unavailable or experiencing issues."). **Example (Conceptual - building on Task 4.2):** ```python # Inside handle_execute_plan, error formatting part # ... # elif status == "error_live_mcp_http": # result_md_parts.append(f"โŒ๐ŸŒ **Live MCP Tool Call Failed ('{tool_name}'): HTTP Error**") # result_md_parts.append(f"- Details: {execution_result.get('message', 'No specific details.')}") # result_md_parts.append(f"- Suggestion: The tool's server might be down or there was a network issue. Please try again later.") # elif status == "error_mcp_response_parsing": # result_md_parts.append(f"โŒ๐ŸŒ **Live MCP Tool Call Failed ('{tool_name}'): Invalid Response**") # result_md_parts.append(f"- Details: {execution_result.get('message', 'Could not understand the tool's response.')}") # result_md_parts.append(f"- Suggestion: The tool might have returned an unexpected data format.") # ... ``` No major code changes might be needed if Task 4.2 already presented the `message` field well. This task is more about ensuring the content of those messages (coming from the executor) is user-friendly when displayed. Claude, please review the error formatting from Task 4.2 and suggest if any further specific phrasing for different error statuses (`error_live_mcp_http`, `error_live_mcp_network`, `error_mcp_response_parsing`) within `handle_execute_plan` would improve user understanding. ``` **Task 4.4: Final Sprint Checks & CI** * Status: Pending * **Parent MVP:** MVP 4 * **Parent Sprint (MVP 4):** Sprint 4 * **Description:** 1. Final review of all changes in `app.py` and `agents/executor.py`. 2. Run `just install`, `just lint`, `just format`, `just type-check`, `just test`. Fix any issues. 3. Commit changes using Conventional Commits (e.g., `fix(ui): improve error display for live mcp calls`, `style(ui): differentiate live/simulated results visually`). 4. Push to GitHub and verify CI pipeline passes. * **Acceptance Criteria:** All local checks pass, CI is green. Sprint 4 of MVP4 is complete. * **Guidance for Claude / Cursor:** Manual developer checklist. --- **End of MVP 4 - Sprint 4 Review:** * **What's Done:** * Thorough end-to-end testing of live MCP tool integration has been performed. * The Gradio UI now clearly differentiates between results from live tool calls and simulated executions. * The UI presents errors from live MCP calls more gracefully and informatively. * **What's Next (MVP 4 - Sprint 5):** * Final documentation updates for the entire MVP 4 (main project README and Track 1 tool READMEs). * Recording demo videos for all submissions. * Final preparation of all Hugging Face Spaces for submission. * **Blockers/Issues:** (Note any flakiness with live Spaces, unexpected error messages from tools, or UI challenges in presenting all information clearly). This sprint solidifies the user-facing aspects of your live MCP tool integration, making the demo much more compelling and understandable. The final sprint of MVP4 will be about packaging everything for the hackathon judges. --- Okay, this MVP 5 is where you showcase a deep understanding of MCP's potential by conceptualizing how your Knowledge Graph can intelligently drive the `sampling/createMessage` primitive. This is ambitious for a hackathon, so the focus is on *demonstrating the concept* rather than a fully implemented sampling loop. Here's a 5-sprint plan for **Hackathon MVP 5: "KG-Informed Model Preferences for Sampling (Conceptual)"**: **Recap of Hackathon MVP 5 Goal:** * **Goal:** Demonstrate how the Knowledge Graph can inform the `modelPreferences` and other aspects of a conceptual MCP `sampling/createMessage` request. Optionally, simulate an LLM call based on these preferences to suggest model hints or prompt refinements. * **Core Primitives:** Tool, Prompt, Resource (as context), conceptual Sampling. * **Builds Upon:** MVP 4 (live/simulated tool execution, UI structure). --- **Sprint Structure for MVP 5 (Building on MVP 4 Foundation):** **Sprint 1 (MVP 5): Enhance KG Ontology & Data for Model Preferences** * **Goal:** Extend the `MCPTool` and/or `MCPPrompt` dataclasses and the `data/initial_*.json` files to include metadata related to model preferences (cost, speed, intelligence needs) and other sampling hints. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 1: Enhance KG Ontology & Data for Model Preferences **Task 1.1: Update `MCPTool` and/or `MCPPrompt` Ontology for Sampling Preferences** - Status: Pending - Description: In `kg_services/ontology.py`: - Add new fields to `MCPTool` and/or `MCPPrompt` dataclasses to store sampling-related preferences. Examples: - `preferred_model_hints: Optional[List[str]] = None` (e.g., ["claude-3-haiku", "gpt-4o-mini"]) - `cost_priority_score: Optional[float] = None` (0.0 to 1.0) - `speed_priority_score: Optional[float] = None` (0.0 to 1.0) - `intelligence_priority_score: Optional[float] = None` (0.0 to 1.0) - `default_sampling_temperature: Optional[float] = None` - `default_max_tokens_sampling: Optional[int] = None` - `default_system_prompt_hint: Optional[str] = None` - Decide if these preferences best fit at the Tool level, Prompt level, or both. *For simplicity, start by adding them to `MCPPrompt` as a prompt might have specific needs.* - Acceptance Criteria: Dataclass(es) updated with new optional fields and type hints. Tests for dataclass instantiation updated. - TDD: Update tests in `tests/kg_services/test_ontology.py` to verify instantiation with these new fields. - Guidance for Claude / Cursor: ```cursor feat(ontology): add sampling preference fields to MCPPrompt **Objective:** Extend the `MCPPrompt` dataclass to store metadata relevant for constructing MCP sampling requests. **Action 1: Modify `kg_services/ontology.py`** 1. Open `@kg_services/ontology.py`. 2. Modify the `MCPPrompt` dataclass: ```python # In kg_services/ontology.py (MCPPrompt dataclass) # ... (existing fields) ... preferred_model_hints: Optional[List[str]] = field(default_factory=list) # e.g., ["claude-3-sonnet", "gpt-4o"] cost_priority_score: Optional[float] = None # Normalized 0.0 (low prio) to 1.0 (high prio) speed_priority_score: Optional[float] = None intelligence_priority_score: Optional[float] = None default_sampling_temperature: Optional[float] = None # e.g., 0.7 default_max_tokens_sampling: Optional[int] = None # e.g., 512 default_system_prompt_hint: Optional[str] = None # e.g., "You are an expert summarizer." # Context inclusion hint for this specific prompt when it initiates sampling sampling_context_inclusion_hint: Optional[str] = "thisServer" # "none", "thisServer", "allServers" ``` 3. Ensure `Optional` and `List` are imported from `typing`. 4. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/kg_services/test_ontology.py`** 1. Open `@tests/kg_services/test_ontology.py`. 2. Update `test_mcp_prompt_creation`: * Test instantiation with some of these new optional fields set. * Test that they default to `None` or empty list correctly if not provided. Please generate the modified `MCPPrompt` dataclass and its updated tests. ``` **Task 1.2: Update `data/initial_prompts.json` with Sampling Preference Data** - Status: Pending - Description: In the main project's `data/initial_prompts.json`, add values for these new sampling preference fields for at least 2-3 of your existing prompts. Choose diverse preferences (e.g., one prompt prioritizes speed, another intelligence). - Acceptance Criteria: `initial_prompts.json` is updated with sample sampling preference data for several prompts. - Guidance for Claude / Cursor: ```cursor chore(data): add sampling preference data to initial prompts **Objective:** Populate `initial_prompts.json` with examples of the new sampling preference fields. **Action: Modify `data/initial_prompts.json`** 1. Open `@data/initial_prompts.json` in the main KGraph-MCP project. 2. For 2-3 existing `MCPPrompt` objects, add some of the new fields: * Example for "summarizer_short_form_v1": ```json { // ... existing fields ... "target_tool_id": "summarizer_v1", "template_string": "Provide a very short summary (1-2 sentences) of the following text: {{text_input}}", "input_variables": ["text_input"], "preferred_model_hints": ["claude-3-haiku", "gpt-3.5-turbo-instruct"], "speed_priority_score": 0.9, "cost_priority_score": 0.8, "intelligence_priority_score": 0.3, "default_max_tokens_sampling": 100, "sampling_context_inclusion_hint": "none" // This prompt needs no extra context for its own sampling } ``` * Example for a more complex analysis prompt (if you have one, or adapt one): ```json { // ... (prompt for sentiment_analyzer_v1) ... "preferred_model_hints": ["claude-3-sonnet", "gpt-4o"], "speed_priority_score": 0.5, "cost_priority_score": 0.4, "intelligence_priority_score": 0.9, "default_sampling_temperature": 0.5, "default_max_tokens_sampling": 500, "default_system_prompt_hint": "You are a highly analytical assistant specialized in understanding nuanced text.", "sampling_context_inclusion_hint": "thisServer" // Might need context from its own server } ``` 3. Ensure the JSON remains valid. Please provide examples of how to update two existing prompt entries in `initial_prompts.json`. ``` **Task 1.3: Update `InMemoryKG` Loading for New Prompt Fields** - Status: Pending - Description: Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.load_prompts_from_json` to correctly parse and store these new optional fields when creating `MCPPrompt` objects. - Acceptance Criteria: `InMemoryKG` correctly loads all fields of `MCPPrompt`, including new optional ones. Tests pass. - TDD: Update tests in `tests/kg_services/test_knowledge_graph.py` for `load_prompts_from_json` to verify the new fields are loaded. - Guidance for Claude / Cursor: ```cursor fix(kg): ensure InMemoryKG loads new sampling preference fields for prompts **Objective:** Update `InMemoryKG` to correctly load all fields of the enhanced `MCPPrompt` dataclass. **Action: Review and Confirm `kg_services/knowledge_graph.py`** 1. Open `@kg_services/knowledge_graph.py`. 2. Review the `InMemoryKG.load_prompts_from_json` method. 3. Because `MCPPrompt` is a dataclass and the new fields are optional with defaults, the line `prompt = MCPPrompt(**prompt_data)` should *already* handle loading these new fields if they exist in `prompt_data`, and use defaults if they don't. No explicit code change might be needed in this method itself for loading, assuming the JSON keys match the dataclass field names. **Action: Update tests in `tests/kg_services/test_knowledge_graph.py`** 1. Open `@tests/kg_services/test_knowledge_graph.py`. 2. In `test_load_prompts_from_json`: * When creating the temporary sample JSON for the test, include some of the new sampling preference fields for at least one prompt object. * When asserting the loaded `MCPPrompt` object, assert that these new fields have been correctly populated (or are `None`/default if not in the sample JSON). Claude, please confirm if `MCPPrompt(**prompt_data)` is sufficient for loading. If so, focus on generating the updated test case for `test_load_prompts_from_json` that verifies the new fields are loaded. ``` --- **Sprint 2 (MVP 5): Planner/Agent Logic to Construct Conceptual Sampling Request** * **Goal:** Enhance an agent (e.g., a new method in `SimplePlannerAgent` or a dedicated `SamplingAgentStub`) to construct the JSON structure of an MCP `sampling/createMessage` request, populating `modelPreferences` and other fields based on KG data from the selected `PlannedStep`. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 2: Planner/Agent Logic to Construct Conceptual Sampling Request **Task 2.1: Add Method to Agent for Constructing Sampling Request JSON** - Status: Pending - Description: In `agents/planner.py` (or a new `agents/sampling_suggester.py`): - Add a method like `construct_conceptual_sampling_request(self, plan: PlannedStep, task_context_text: str) -> Dict[str, Any]`. - This method takes the current `PlannedStep` (which contains the chosen Tool and Prompt) and some `task_context_text` (e.g., the original user query or a refined instruction). - It queries the `plan.prompt` (and/or `plan.tool`) for the sampling preference fields (e.g., `preferred_model_hints`, `cost_priority_score`, etc.). - It constructs a dictionary matching the MCP `sampling/createMessage` [params structure](https://modelcontextprotocol.io/docs/roots-sampling/sampling#message-format). - `messages`: Initially, just a simple user message with `task_context_text`. - `modelPreferences`: Populate `hints`, `costPriority`, `speedPriority`, `intelligencePriority` from the KG data. Handle `None` values gracefully (i.e., omit keys if no preference specified). - `systemPrompt`: Use `plan.prompt.default_system_prompt_hint`. - `includeContext`: Use `plan.prompt.sampling_context_inclusion_hint`. - `temperature`, `maxTokens`: Use defaults from `plan.prompt`. - Acceptance Criteria: Method correctly constructs the sampling request dictionary based on KG data. Tests pass. - TDD: In `tests/agents/test_planner.py` (or new test file), test `construct_conceptual_sampling_request` with various `PlannedStep` objects having different sampling preferences. Assert the output dictionary structure and values. - Guidance for Claude / Cursor: ```cursor feat(agent): implement construction of conceptual MCP sampling request **Objective:** Enable an agent to generate the JSON structure for an MCP `sampling/createMessage` request based on KG data. **Action 1: Modify `agents/planner.py` (or create new agent)** 1. Open `@agents/planner.py` (or if creating a new agent, e.g., `agents/sampling_coordinator.py`). Let's add to `SimplePlannerAgent` for now. 2. Import `Dict`, `Any`, `Optional`, `List` from `typing`. 3. Add the new method to `SimplePlannerAgent`: ```python # In agents/planner.py (SimplePlannerAgent class) # ... (existing methods) ... def construct_conceptual_sampling_request(self, plan: PlannedStep, task_context_text: str) -> Dict[str, Any]: """ Constructs a conceptual MCP sampling/createMessage request params dictionary based on the preferences stored in the plan's prompt. """ prompt_prefs = plan.prompt # Preferences are on the MCPPrompt object messages = [{"role": "user", "content": {"type": "text", "text": task_context_text}}] model_preferences: Dict[str, Any] = {} if prompt_prefs.preferred_model_hints: model_preferences["hints"] = [{"name": hint} for hint in prompt_prefs.preferred_model_hints] if prompt_prefs.cost_priority_score is not None: model_preferences["costPriority"] = prompt_prefs.cost_priority_score if prompt_prefs.speed_priority_score is not None: model_preferences["speedPriority"] = prompt_prefs.speed_priority_score if prompt_prefs.intelligence_priority_score is not None: model_preferences["intelligencePriority"] = prompt_prefs.intelligence_priority_score # Only include modelPreferences if it's not empty sampling_params: Dict[str, Any] = {"messages": messages} if model_preferences: # Only add if there's something in it sampling_params["modelPreferences"] = model_preferences if prompt_prefs.default_system_prompt_hint: sampling_params["systemPrompt"] = prompt_prefs.default_system_prompt_hint # Use the hint from prompt, default to 'thisServer' or 'none' if not set sampling_params["includeContext"] = prompt_prefs.sampling_context_inclusion_hint or "none" if prompt_prefs.default_sampling_temperature is not None: sampling_params["temperature"] = prompt_prefs.default_sampling_temperature # maxTokens is required by MCP spec for sampling/createMessage sampling_params["maxTokens"] = prompt_prefs.default_max_tokens_sampling or 256 # Default if not set # stopSequences and metadata are optional, omit for now unless specified # if prompt_prefs.stop_sequences: # sampling_params["stopSequences"] = prompt_prefs.stop_sequences return sampling_params # This is the "params" part of the MCP request ``` 4. Apply coding standards. **Action 2: Add tests to `tests/agents/test_planner.py`** 1. Open `@tests/agents/test_planner.py`. 2. Add a new test class or functions for `construct_conceptual_sampling_request`: * `test_construct_sampling_basic`: Provide a `PlannedStep` with a prompt having minimal sampling prefs. Assert basic structure (messages, maxTokens). * `test_construct_sampling_with_full_model_preferences`: Provide a prompt with all model preference fields set. Assert they appear correctly in the output dict. * `test_construct_sampling_handles_none_prefs`: Provide a prompt where optional prefs like `cost_priority_score` are `None`. Assert these keys are omitted from `modelPreferences`. * `test_construct_sampling_uses_default_max_tokens`: If `default_max_tokens_sampling` is None on prompt, assert the hardcoded default (e.g., 256) is used. Please generate the method implementation and its unit tests. ``` ``` --- **Sprint 3 (MVP 5): Gradio UI for Triggering and Displaying Conceptual Sampling Request** * **Goal:** Add a new button/section to the Gradio UI. When a plan is selected, the user can click this new button. This will trigger the agent to construct the conceptual `sampling/createMessage` JSON, which is then displayed in the UI. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 3: Gradio UI for Conceptual Sampling Request **Task 3.1: Add UI Elements for Sampling Request in `app.py`** - Status: Pending - Description: In `app.py`'s `gr.Blocks()` layout: - After the "Execute Plan (Simulated)" button and its output area, add: - A `gr.Markdown("--- \n ## ๐Ÿ’ก Refine with AI Assistance (Conceptual Sampling)")`. - A new `gr.Button("๐Ÿ”ฌ Construct Conceptual Sampling Request")`, let's call it `construct_sampling_button`. - A new `gr.JSON(label="Conceptual MCP Sampling Request Parameters")` component for displaying the generated JSON, let's call it `sampling_request_json_output`. - (Optional Stretch for later in sprint) A `gr.Textbox(label="Refinement Task Context", placeholder="e.g., Refine this plan for brevity, or suggest alternative models.")` for user to provide `task_context_text`. For now, can use the original query. - Acceptance Criteria: New UI elements are added to the layout. - Guidance for Claude / Cursor: ```cursor feat(ui): add UI elements for conceptual sampling request **Objective:** Extend the Gradio UI in `app.py` with controls to trigger and display a conceptual sampling request. **Action: Modify `gr.Blocks()` layout in `app.py`** 1. Open `@app.py`. 2. In the `gr.Blocks()` definition, after the `execution_output_display` Markdown component, add: ```python # Inside gr.Blocks() # ... (after execution_output_display) with gr.Accordion("๐Ÿ’ก Refine with AI Assistance (Conceptual MCP Sampling)", open=False): gr.Markdown( "This section demonstrates how the Knowledge Graph can inform an MCP `sampling/createMessage` request. " "It constructs the *parameters* for such a request based on the selected plan's prompt preferences." ) # For MVP5 Sprint 3, we'll use the original user query as context. # A dedicated textbox for sampling_task_context can be added later. # sampling_task_context_input = gr.Textbox(label="Refinement Task / Context for Sampling", # placeholder="e.g., 'Refine the prompt for conciseness.' or 'Suggest an alternative tool.'") construct_sampling_button = gr.Button("๐Ÿ”ฌ Construct Conceptual Sampling Request", elem_id="construct_sampling_button") sampling_request_json_output = gr.JSON(label="Conceptual MCP sampling/createMessage Params", elem_id="sampling_request_json_output") # Placeholder for optional LLM call result in Sprint 4 of MVP5 # sampling_llm_refinement_output = gr.Markdown(label="LLM Refinement Suggestion", elem_id="sampling_llm_refinement_output") ``` 3. Ensure the new components are correctly placed. ``` **Task 3.2: Implement Gradio Handler for Constructing Sampling Request** - Status: Pending - Description: In `app.py`: - Create a new function `handle_construct_sampling_request(original_user_query: str)`. - This function will re-run `planner_agent_instance.generate_plan(original_user_query, top_k_plans=1)` to get the current `PlannedStep`. (State management for the "current plan" can be complex in Gradio without `gr.State` or more elaborate patterns; re-planning is simpler for a hackathon demo). - If a plan is found, call `planner_agent_instance.construct_conceptual_sampling_request(current_plan, original_user_query)` (using original query as `task_context_text` for now). - Return the generated dictionary to be displayed in `sampling_request_json_output`. - Handle cases where no plan is active. - Wire `construct_sampling_button.click` to this new handler. Inputs: `query_input`. Outputs: `sampling_request_json_output`. - Acceptance Criteria: Clicking the button generates and displays the conceptual sampling request JSON. Tests pass. - TDD: Unit test `handle_construct_sampling_request` in `tests/test_app_handlers.py` by mocking the planner's methods and asserting the returned JSON structure. - Guidance for Claude / Cursor: ```cursor feat(app): implement handler for constructing and displaying sampling request JSON **Objective:** Wire up the new UI button to generate and show the conceptual sampling request. **Action 1: Implement `handle_construct_sampling_request` in `app.py`** 1. Open `@app.py`. 2. Define the new handler function: ```python # In app.py # ... (imports, service init, other handlers) ... def handle_construct_sampling_request(original_user_query: str) -> Dict[str, Any]: if not planner_agent_instance: return {"error": "Backend services not available."} if not original_user_query.strip(): return {"info": "Original query is missing. Cannot determine current plan for sampling."} planned_steps = planner_agent_instance.generate_plan(original_user_query, top_k_plans=1) if not planned_steps: return {"info": "No active plan found to construct a sampling request from."} current_plan = planned_steps[0] # For now, use the original query as the primary message content for sampling. # A dedicated input for sampling task context can be added later. task_context_for_sampling = ( f"Original User Goal: '{original_user_query}'. " f"Current Planned Action: Use tool '{current_plan.tool.name}' with prompt '{current_plan.prompt.name}'. " f"Prompt Template: '{current_plan.prompt.template_string}'. " f"Consider refining this plan or suggesting alternative models based on stored preferences." ) conceptual_sampling_params = planner_agent_instance.construct_conceptual_sampling_request( current_plan, task_context_for_sampling ) return conceptual_sampling_params # This dictionary will be displayed by gr.JSON ``` **Action 2: Wire `construct_sampling_button` in `gr.Blocks()` in `app.py`** 1. Locate `construct_sampling_button`. 2. Add its `.click()` handler: ```python # Inside gr.Blocks() construct_sampling_button.click( fn=handle_construct_sampling_request, inputs=[query_input], # Use the main query input for context outputs=[sampling_request_json_output] ) ``` **Action 3: Add tests to `tests/test_app_handlers.py`** 1. Open `@tests/test_app_handlers.py`. 2. Add `test_handle_construct_sampling_request()`: * Mock `app.planner_agent_instance.generate_plan` to return a `PlannedStep`. * Mock `app.planner_agent_instance.construct_conceptual_sampling_request` to return a specific dict. * Call `handle_construct_sampling_request("test query")` and assert it returns the expected dict. * Test cases: no plan found by planner. Please generate the code for `handle_construct_sampling_request`, its wiring, and its unit tests. ``` ``` --- **Sprint 4 (MVP 5): (Optional Stretch) Simulate LLM Call for Sampling Refinement** * **Goal:** If time permits, make an actual LLM call using the conceptually constructed sampling request parameters (or a simplified version) to ask for a model hint or simple prompt refinement. Display this LLM suggestion. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 4: (Optional Stretch) Simulate LLM Call for Sampling Refinement **Task 4.1: Add LLM Call Logic to Agent for Sampling Refinement** - Status: Pending - Description: In `agents/planner.py` (or `SamplingCoordinator`): - Add a new method `get_sampling_refinement_suggestion(self, conceptual_sampling_params: Dict[str, Any], original_prompt_template: str) -> str`. - This method takes the `conceptual_sampling_params` and the `original_prompt_template`. - It constructs a new prompt *for an LLM (e.g., Claude Haiku, GPT-3.5-turbo via OpenAI/Azure)*. This meta-prompt asks the LLM: "Given these MCP sampling preferences: [relevant parts of `conceptual_sampling_params.modelPreferences`], and this original prompt template: '{{original_prompt_template}}', suggest one alternative `preferred_model_hint` (e.g., 'claude-3-opus' or 'gpt-4o-mini') OR suggest a one-sentence refinement to the prompt template to better align with an intelligence priority of [value]." (Pick one refinement task for simplicity). - It calls the chosen LLM API (using `EmbeddingService` if it's adapted for chat, or a new LLM client). - Returns the LLM's textual suggestion. - Acceptance Criteria: Method makes an LLM call and returns a textual suggestion. Tests (mocking LLM) pass. - Guidance for Claude / Cursor: ```cursor feat(agent): implement LLM call for sampling refinement suggestion **Objective:** Add a method to an agent that uses an LLM to suggest refinements based on conceptual sampling parameters. **Action 1: Modify `agents/planner.py` (SimplePlannerAgent)** 1. Open `@agents/planner.py`. 2. Ensure `EmbeddingService` can also make chat completion calls or add a new simple LLM service. For hackathon, re-using `EmbeddingService`'s client if it's OpenAI/Azure and adding a chat method is okay. ```python # In kg_services/embedder.py - IF REUSING FOR CHAT (simplified for hackathon) # class EmbeddingService: # ... # def get_chat_completion(self, system_prompt: str, user_prompt: str, model: str = "gpt-3.5-turbo") -> Optional[str]: # try: # # Assuming self.client is OpenAI or AzureOpenAI client # response = self.client.chat.completions.create( # model=model, # or your Azure deployment name for chat # messages=[ # {"role": "system", "content": system_prompt}, # {"role": "user", "content": user_prompt} # ] # ) # return response.choices[0].message.content # except Exception as e: # print(f"Error in get_chat_completion: {e}") # return None ``` 3. Add `get_sampling_refinement_suggestion` to `SimplePlannerAgent`: ```python # In agents/planner.py (SimplePlannerAgent class) # ... def get_sampling_refinement_suggestion( self, conceptual_sampling_params: Dict[str, Any], original_prompt_template: str, original_user_query: str ) -> str: if not self.embedder: # Assuming embedder can also do chat completions return "Error: LLM service for refinement not available." prefs_summary_parts = [] if "modelPreferences" in conceptual_sampling_params: mp = conceptual_sampling_params["modelPreferences"] if "hints" in mp: prefs_summary_parts.append(f"- Preferred Model Hints: {', '.join(h['name'] for h in mp['hints'])}") if "costPriority" in mp: prefs_summary_parts.append(f"- Cost Priority: {mp['costPriority']:.1f}") if "speedPriority" in mp: prefs_summary_parts.append(f"- Speed Priority: {mp['speedPriority']:.1f}") if "intelligencePriority" in mp: prefs_summary_parts.append(f"- Intelligence Priority: {mp['intelligencePriority']:.1f}") prefs_summary = "\n".join(prefs_summary_parts) if prefs_summary_parts else "No specific model preferences given." system_prompt_for_refinement = "You are an AI assistant helping to optimize LLM sampling requests." user_prompt_for_refinement = ( f"The user's original goal was: '{original_user_query}'.\n" f"An action plan involves using a prompt with this template:\n```\n{original_prompt_template}\n```\n" f"The conceptual MCP sampling preferences for this are:\n{prefs_summary}\n\n" f"Please suggest ONE of the following:\n" f"1. An alternative `preferred_model_hint` (e.g., 'claude-3-opus', 'gpt-4o-mini', 'mistral-large-latest') that might be suitable, OR\n" f"2. A one-sentence refinement to the prompt template text to better align with the preferences (especially if intelligence is prioritized).\n" f"Be concise." ) # Use one of your sponsor LLM API credits here (OpenAI, Anthropic, Mistral) # This uses the get_chat_completion method assumed to be in EmbeddingService # You would choose an appropriate model based on your credits, e.g. "gpt-3.5-turbo", "claude-3-haiku-20240307" # For Anthropic, the API call is different. For Mistral, also different. # This example assumes an OpenAI-compatible client in self.embedder # CHOOSE A MODEL YOU HAVE CREDITS FOR AND KNOW HOW TO CALL. # For example, if using OpenAI client in embedder: llm_suggestion = self.embedder.get_chat_completion( system_prompt_for_refinement, user_prompt_for_refinement, model="gpt-3.5-turbo" # Or your preferred model ) return llm_suggestion if llm_suggestion else "Could not get a refinement suggestion from the LLM." ``` 4. Apply coding standards. **Action 2: Add tests to `tests/agents/test_planner.py`** - Add `test_get_sampling_refinement_suggestion()`: - Mock `embedder.get_chat_completion` to return a sample textual suggestion. - Provide sample `conceptual_sampling_params` and `original_prompt_template`. - Assert the method returns the mocked suggestion. - Test the case where `get_chat_completion` returns `None`. Please generate the method, assuming `EmbeddingService` is updated with `get_chat_completion` for an OpenAI-compatible model, and its tests. Also add `get_chat_completion` to `EmbeddingService` in `@kg_services/embedder.py` and its test in `@tests/kg_services/test_embedder.py` (mocking the client call). ``` **Task 4.2: Update Gradio UI to Trigger and Display LLM Refinement Suggestion** - Status: Pending - Description: In `app.py`: - Add a new `gr.Markdown(label="LLM Refinement Suggestion")` component (`sampling_llm_refinement_output`) below `sampling_request_json_output`. - Modify `handle_construct_sampling_request`: - It should now also call `planner_agent_instance.get_sampling_refinement_suggestion()`. - It needs to return an additional value for `sampling_llm_refinement_output`. - Update `construct_sampling_button.click()` outputs. - Acceptance Criteria: UI displays the LLM's suggestion after constructing the conceptual sampling request. - Guidance for Claude / Cursor: ```cursor feat(ui): display LLM-generated sampling refinement suggestions **Objective:** Update `app.py` to call the new agent method for sampling refinement and display its output. **Action 1: Modify `gr.Blocks()` layout in `app.py`** 1. Open `@app.py`. 2. In the Accordion for "Refine with AI Assistance", below `sampling_request_json_output`, add: ```python # Inside the Accordion sampling_llm_refinement_output = gr.Markdown(label="LLM Refinement Suggestion", elem_id="sampling_llm_refinement_output") ``` **Action 2: Modify `handle_construct_sampling_request` in `app.py`** 1. Change its return signature to `-> Tuple[Dict[str, Any], str]`. 2. After generating `conceptual_sampling_params` and getting `current_plan`: ```python # Inside handle_construct_sampling_request # ... (after conceptual_sampling_params = ...) llm_suggestion_text = "Could not generate LLM refinement suggestion (Planner or LLM service error)." if planner_agent_instance and current_plan: # Ensure planner and plan exist try: llm_suggestion_text = planner_agent_instance.get_sampling_refinement_suggestion( conceptual_sampling_params, current_plan.prompt.template_string, # Pass original template original_user_query # Pass original query for context ) except Exception as e: print(f"Error getting LLM refinement suggestion: {e}") llm_suggestion_text = f"Error during refinement suggestion: {e}" return conceptual_sampling_params, llm_suggestion_text # Return tuple ``` **Action 3: Update `construct_sampling_button.click()` in `app.py`** 1. Update the `outputs` list: ```python # In construct_sampling_button.click() outputs=[sampling_request_json_output, sampling_llm_refinement_output] ``` **Action 4: Update `tests/test_app_handlers.py`** 1. Update tests for `handle_construct_sampling_request`: * Mock `planner_agent_instance.get_sampling_refinement_suggestion` to return a sample string. * Assert that the handler function now returns a tuple with the conceptual params dict and the suggestion string. Please generate the code modifications for `app.py` and its tests. ``` ``` --- **Sprint 5 (MVP 5): Final Testing, Documentation Update for MVP 5, & Submission Prep** * **Goal:** Test the conceptual sampling features, update all READMEs to reflect MVP 5's advanced conceptual demonstration, and prepare all artifacts. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 5: Final Testing, Documentation Update & Submission Prep **Task 5.1: Manual E2E Testing for Conceptual Sampling Features** - Status: Pending - Description: In the Gradio app: - Perform queries to get various `PlannedStep`s. - For each, click "Construct Conceptual Sampling Request". - Verify the generated JSON parameters reflect the KG preferences of the selected prompt. - (If Task 4.x was done) Verify the LLM refinement suggestion is displayed and makes sense. - Test with prompts that have minimal vs. rich sampling preferences in `data/initial_prompts.json`. - Acceptance Criteria: Conceptual sampling request generation and optional LLM refinement display work correctly. **Task 5.2: Update Project `README.md` and Hugging Face Space `README.md` for MVP 5** - Status: Pending - Description: Significantly update documentation: - Explain the new "Refine with AI Assistance (Conceptual Sampling)" section in the UI. - Detail how the KG (Tool/Prompt sampling preference fields) informs the construction of the `sampling/createMessage` parameters. - If the LLM refinement call was implemented, describe that feature. - Highlight this as an innovative use of MCP concepts and KG for advanced agent behavior. This is key for the "Most Innovative Use of MCP Award". - Acceptance Criteria: READMEs accurately and compellingly describe MVP 5's conceptual demonstration. - Guidance for Claude / Cursor: ```cursor docs(readme): update all documentation for MVP5 completion (conceptual sampling) **Objective:** Reflect the advanced conceptual sampling features in all project READMEs. **Action: Request Content for READMEs** Please draft updated text for the "How KGraph-MCP Works (Current MVP5 Functionality)" section for the main GitHub `README.md`. This section should clearly explain: 1. How `MCPPrompt` entities in the KG now store sampling preferences (model hints, priorities, etc.). 2. How the new UI section allows users to see a conceptual `sampling/createMessage` request. 3. How the `modelPreferences` and other parameters in this request are dynamically generated from the KG. 4. (If implemented) How an additional LLM call can provide refinement suggestions based on these preferences. Highlight that this demonstrates a pathway to more intelligent, self-optimizing agent interactions using MCP, even if the full sampling client loop isn't part of this hackathon MVP. This section is critical for the "Most Innovative Use of MCP" judging criterion. ``` **Task 5.3: Final Code Review, Cleanup, All Checks, Tag, and Video Prep** - Status: Pending - Description: - Final code review of all MVP5 changes. - `just install`, `lint`, `format`, `type-check`, `test`. - Commit (`chore(release): complete MVP5 - conceptual KG-informed sampling`). Push. Verify CI. - Update the main demo video to include a walkthrough of the conceptual sampling features. - Ensure all Hugging Face Space READMEs (main demo, Track 1 tools) are finalized with correct tags and links. - Acceptance Criteria: MVP5 is feature-complete, documented, demo video updated, and CI green. Project ready for final hackathon submission. ``` --- **End of MVP 5 - Sprint 5 & Overall MVP 5 Review:** * **What's Done:** * Hackathon MVP 5 ("KG-Informed Model Preferences for Sampling (Conceptual)") is complete. * The KG is enhanced with sampling preference metadata for Prompts. * The agent can construct a conceptual MCP `sampling/createMessage` request using this KG data. * The Gradio UI displays this conceptual request. * (Optionally) An LLM call provides refinement suggestions based on these preferences, showcased in the UI. * Documentation and demo video are updated to highlight this innovative aspect. * **Hackathon Submission Readiness:** The project now has a strong narrative for innovation, showing how a KG can drive not just tool selection but also the nuanced parameters of advanced MCP interactions like sampling. This completes the detailed sprint planning for all 5 MVPs. This structured, iterative approach, leveraging Claude effectively within Cursor, should set you up for a very successful hackathon! Remember to be flexible with these sprint plans during the actual event and adjust scope as needed. Good luck! --- Okay, this MVP 5 is where you showcase a deep understanding of MCP's potential by conceptualizing how your Knowledge Graph can intelligently drive the `sampling/createMessage` primitive. This is ambitious for a hackathon, so the focus is on *demonstrating the concept* rather than a fully implemented sampling loop. Here's a 5-sprint plan for **Hackathon MVP 5: "KG-Informed Model Preferences for Sampling (Conceptual)"**: **Recap of Hackathon MVP 5 Goal:** * **Goal:** Demonstrate how the Knowledge Graph can inform the `modelPreferences` and other aspects of a conceptual MCP `sampling/createMessage` request. Optionally, simulate an LLM call based on these preferences to suggest model hints or prompt refinements. * **Core Primitives:** Tool, Prompt, Resource (as context), conceptual Sampling. * **Builds Upon:** MVP 4 (live/simulated tool execution, UI structure). --- **Sprint Structure for MVP 5 (Building on MVP 4 Foundation):** **Sprint 1 (MVP 5): Enhance KG Ontology & Data for Model Preferences** * **Goal:** Extend the `MCPTool` and/or `MCPPrompt` dataclasses and the `data/initial_*.json` files to include metadata related to model preferences (cost, speed, intelligence needs) and other sampling hints. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 1: Enhance KG Ontology & Data for Model Preferences **Task 1.1: Update `MCPTool` and/or `MCPPrompt` Ontology for Sampling Preferences** - Status: Pending - Description: In `kg_services/ontology.py`: - Add new fields to `MCPTool` and/or `MCPPrompt` dataclasses to store sampling-related preferences. Examples: - `preferred_model_hints: Optional[List[str]] = None` (e.g., ["claude-3-haiku", "gpt-4o-mini"]) - `cost_priority_score: Optional[float] = None` (0.0 to 1.0) - `speed_priority_score: Optional[float] = None` (0.0 to 1.0) - `intelligence_priority_score: Optional[float] = None` (0.0 to 1.0) - `default_sampling_temperature: Optional[float] = None` - `default_max_tokens_sampling: Optional[int] = None` - `default_system_prompt_hint: Optional[str] = None` - Decide if these preferences best fit at the Tool level, Prompt level, or both. *For simplicity, start by adding them to `MCPPrompt` as a prompt might have specific needs.* - Acceptance Criteria: Dataclass(es) updated with new optional fields and type hints. Tests for dataclass instantiation updated. - TDD: Update tests in `tests/kg_services/test_ontology.py` to verify instantiation with these new fields. - Guidance for Claude / Cursor: ```cursor feat(ontology): add sampling preference fields to MCPPrompt **Objective:** Extend the `MCPPrompt` dataclass to store metadata relevant for constructing MCP sampling requests. **Action 1: Modify `kg_services/ontology.py`** 1. Open `@kg_services/ontology.py`. 2. Modify the `MCPPrompt` dataclass: ```python # In kg_services/ontology.py (MCPPrompt dataclass) # ... (existing fields) ... preferred_model_hints: Optional[List[str]] = field(default_factory=list) # e.g., ["claude-3-sonnet", "gpt-4o"] cost_priority_score: Optional[float] = None # Normalized 0.0 (low prio) to 1.0 (high prio) speed_priority_score: Optional[float] = None intelligence_priority_score: Optional[float] = None default_sampling_temperature: Optional[float] = None # e.g., 0.7 default_max_tokens_sampling: Optional[int] = None # e.g., 512 default_system_prompt_hint: Optional[str] = None # e.g., "You are an expert summarizer." # Context inclusion hint for this specific prompt when it initiates sampling sampling_context_inclusion_hint: Optional[str] = "thisServer" # "none", "thisServer", "allServers" ``` 3. Ensure `Optional` and `List` are imported from `typing`. 4. Apply coding standards from `@.cursor/rules/python_gradio_basic.mdc`. **Action 2: Update `tests/kg_services/test_ontology.py`** 1. Open `@tests/kg_services/test_ontology.py`. 2. Update `test_mcp_prompt_creation`: * Test instantiation with some of these new optional fields set. * Test that they default to `None` or empty list correctly if not provided. Please generate the modified `MCPPrompt` dataclass and its updated tests. ``` **Task 1.2: Update `data/initial_prompts.json` with Sampling Preference Data** - Status: Pending - Description: In the main project's `data/initial_prompts.json`, add values for these new sampling preference fields for at least 2-3 of your existing prompts. Choose diverse preferences (e.g., one prompt prioritizes speed, another intelligence). - Acceptance Criteria: `initial_prompts.json` is updated with sample sampling preference data for several prompts. - Guidance for Claude / Cursor: ```cursor chore(data): add sampling preference data to initial prompts **Objective:** Populate `initial_prompts.json` with examples of the new sampling preference fields. **Action: Modify `data/initial_prompts.json`** 1. Open `@data/initial_prompts.json` in the main KGraph-MCP project. 2. For 2-3 existing `MCPPrompt` objects, add some of the new fields: * Example for "summarizer_short_form_v1": ```json { // ... existing fields ... "target_tool_id": "summarizer_v1", "template_string": "Provide a very short summary (1-2 sentences) of the following text: {{text_input}}", "input_variables": ["text_input"], "preferred_model_hints": ["claude-3-haiku", "gpt-3.5-turbo-instruct"], "speed_priority_score": 0.9, "cost_priority_score": 0.8, "intelligence_priority_score": 0.3, "default_max_tokens_sampling": 100, "sampling_context_inclusion_hint": "none" // This prompt needs no extra context for its own sampling } ``` * Example for a more complex analysis prompt (if you have one, or adapt one): ```json { // ... (prompt for sentiment_analyzer_v1) ... "preferred_model_hints": ["claude-3-sonnet", "gpt-4o"], "speed_priority_score": 0.5, "cost_priority_score": 0.4, "intelligence_priority_score": 0.9, "default_sampling_temperature": 0.5, "default_max_tokens_sampling": 500, "default_system_prompt_hint": "You are a highly analytical assistant specialized in understanding nuanced text.", "sampling_context_inclusion_hint": "thisServer" // Might need context from its own server } ``` 3. Ensure the JSON remains valid. Please provide examples of how to update two existing prompt entries in `initial_prompts.json`. ``` **Task 1.3: Update `InMemoryKG` Loading for New Prompt Fields** - Status: Pending - Description: Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.load_prompts_from_json` to correctly parse and store these new optional fields when creating `MCPPrompt` objects. - Acceptance Criteria: `InMemoryKG` correctly loads all fields of `MCPPrompt`, including new optional ones. Tests pass. - TDD: Update tests in `tests/kg_services/test_knowledge_graph.py` for `load_prompts_from_json` to verify the new fields are loaded. - Guidance for Claude / Cursor: ```cursor fix(kg): ensure InMemoryKG loads new sampling preference fields for prompts **Objective:** Update `InMemoryKG` to correctly load all fields of the enhanced `MCPPrompt` dataclass. **Action: Review and Confirm `kg_services/knowledge_graph.py`** 1. Open `@kg_services/knowledge_graph.py`. 2. Review the `InMemoryKG.load_prompts_from_json` method. 3. Because `MCPPrompt` is a dataclass and the new fields are optional with defaults, the line `prompt = MCPPrompt(**prompt_data)` should *already* handle loading these new fields if they exist in `prompt_data`, and use defaults if they don't. No explicit code change might be needed in this method itself for loading, assuming the JSON keys match the dataclass field names. **Action: Update tests in `tests/kg_services/test_knowledge_graph.py`** 1. Open `@tests/kg_services/test_knowledge_graph.py`. 2. In `test_load_prompts_from_json`: * When creating the temporary sample JSON for the test, include some of the new sampling preference fields for at least one prompt object. * When asserting the loaded `MCPPrompt` object, assert that these new fields have been correctly populated (or are `None`/default if not in the sample JSON). Claude, please confirm if `MCPPrompt(**prompt_data)` is sufficient for loading. If so, focus on generating the updated test case for `test_load_prompts_from_json` that verifies the new fields are loaded. ``` --- **Sprint 2 (MVP 5): Planner/Agent Logic to Construct Conceptual Sampling Request** * **Goal:** Enhance an agent (e.g., a new method in `SimplePlannerAgent` or a dedicated `SamplingAgentStub`) to construct the JSON structure of an MCP `sampling/createMessage` request, populating `modelPreferences` and other fields based on KG data from the selected `PlannedStep`. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 2: Planner/Agent Logic to Construct Conceptual Sampling Request **Task 2.1: Add Method to Agent for Constructing Sampling Request JSON** - Status: Pending - Description: In `agents/planner.py` (or a new `agents/sampling_suggester.py`): - Add a method like `construct_conceptual_sampling_request(self, plan: PlannedStep, task_context_text: str) -> Dict[str, Any]`. - This method takes the current `PlannedStep` (which contains the chosen Tool and Prompt) and some `task_context_text` (e.g., the original user query or a refined instruction). - It queries the `plan.prompt` (and/or `plan.tool`) for the sampling preference fields (e.g., `preferred_model_hints`, `cost_priority_score`, etc.). - It constructs a dictionary matching the MCP `sampling/createMessage` [params structure](https://modelcontextprotocol.io/docs/roots-sampling/sampling#message-format). - `messages`: Initially, just a simple user message with `task_context_text`. - `modelPreferences`: Populate `hints`, `costPriority`, `speedPriority`, `intelligencePriority` from the KG data. Handle `None` values gracefully (i.e., omit keys if no preference specified). - `systemPrompt`: Use `plan.prompt.default_system_prompt_hint`. - `includeContext`: Use `plan.prompt.sampling_context_inclusion_hint`. - `temperature`, `maxTokens`: Use defaults from `plan.prompt`. - Acceptance Criteria: Method correctly constructs the sampling request dictionary based on KG data. Tests pass. - TDD: In `tests/agents/test_planner.py` (or new test file), test `construct_conceptual_sampling_request` with various `PlannedStep` objects having different sampling preferences. Assert the output dictionary structure and values. - Guidance for Claude / Cursor: ```cursor feat(agent): implement construction of conceptual MCP sampling request **Objective:** Enable an agent to generate the JSON structure for an MCP `sampling/createMessage` request based on KG data. **Action 1: Modify `agents/planner.py` (or create new agent)** 1. Open `@agents/planner.py` (or if creating a new agent, e.g., `agents/sampling_coordinator.py`). Let's add to `SimplePlannerAgent` for now. 2. Import `Dict`, `Any`, `Optional`, `List` from `typing`. 3. Add the new method to `SimplePlannerAgent`: ```python # In agents/planner.py (SimplePlannerAgent class) # ... (existing methods) ... def construct_conceptual_sampling_request(self, plan: PlannedStep, task_context_text: str) -> Dict[str, Any]: """ Constructs a conceptual MCP sampling/createMessage request params dictionary based on the preferences stored in the plan's prompt. """ prompt_prefs = plan.prompt # Preferences are on the MCPPrompt object messages = [{"role": "user", "content": {"type": "text", "text": task_context_text}}] model_preferences: Dict[str, Any] = {} if prompt_prefs.preferred_model_hints: model_preferences["hints"] = [{"name": hint} for hint in prompt_prefs.preferred_model_hints] if prompt_prefs.cost_priority_score is not None: model_preferences["costPriority"] = prompt_prefs.cost_priority_score if prompt_prefs.speed_priority_score is not None: model_preferences["speedPriority"] = prompt_prefs.speed_priority_score if prompt_prefs.intelligence_priority_score is not None: model_preferences["intelligencePriority"] = prompt_prefs.intelligence_priority_score # Only include modelPreferences if it's not empty sampling_params: Dict[str, Any] = {"messages": messages} if model_preferences: # Only add if there's something in it sampling_params["modelPreferences"] = model_preferences if prompt_prefs.default_system_prompt_hint: sampling_params["systemPrompt"] = prompt_prefs.default_system_prompt_hint # Use the hint from prompt, default to 'thisServer' or 'none' if not set sampling_params["includeContext"] = prompt_prefs.sampling_context_inclusion_hint or "none" if prompt_prefs.default_sampling_temperature is not None: sampling_params["temperature"] = prompt_prefs.default_sampling_temperature # maxTokens is required by MCP spec for sampling/createMessage sampling_params["maxTokens"] = prompt_prefs.default_max_tokens_sampling or 256 # Default if not set # stopSequences and metadata are optional, omit for now unless specified # if prompt_prefs.stop_sequences: # sampling_params["stopSequences"] = prompt_prefs.stop_sequences return sampling_params # This is the "params" part of the MCP request ``` 4. Apply coding standards. **Action 2: Add tests to `tests/agents/test_planner.py`** 1. Open `@tests/agents/test_planner.py`. 2. Add a new test class or functions for `construct_conceptual_sampling_request`: * `test_construct_sampling_basic`: Provide a `PlannedStep` with a prompt having minimal sampling prefs. Assert basic structure (messages, maxTokens). * `test_construct_sampling_with_full_model_preferences`: Provide a prompt with all model preference fields set. Assert they appear correctly in the output dict. * `test_construct_sampling_handles_none_prefs`: Provide a prompt where optional prefs like `cost_priority_score` are `None`. Assert these keys are omitted from `modelPreferences`. * `test_construct_sampling_uses_default_max_tokens`: If `default_max_tokens_sampling` is None on prompt, assert the hardcoded default (e.g., 256) is used. Please generate the method implementation and its unit tests. ``` ``` --- **Sprint 3 (MVP 5): Gradio UI for Triggering and Displaying Conceptual Sampling Request** * **Goal:** Add a new button/section to the Gradio UI. When a plan is selected, the user can click this new button. This will trigger the agent to construct the conceptual `sampling/createMessage` JSON, which is then displayed in the UI. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 3: Gradio UI for Conceptual Sampling Request **Task 3.1: Add UI Elements for Sampling Request in `app.py`** - Status: Pending - Description: In `app.py`'s `gr.Blocks()` layout: - After the "Execute Plan (Simulated)" button and its output area, add: - A `gr.Markdown("--- \n ## ๐Ÿ’ก Refine with AI Assistance (Conceptual Sampling)")`. - A new `gr.Button("๐Ÿ”ฌ Construct Conceptual Sampling Request")`, let's call it `construct_sampling_button`. - A new `gr.JSON(label="Conceptual MCP Sampling Request Parameters")` component for displaying the generated JSON, let's call it `sampling_request_json_output`. - (Optional Stretch for later in sprint) A `gr.Textbox(label="Refinement Task Context", placeholder="e.g., Refine this plan for brevity, or suggest alternative models.")` for user to provide `task_context_text`. For now, can use the original query. - Acceptance Criteria: New UI elements are added to the layout. - Guidance for Claude / Cursor: ```cursor feat(ui): add UI elements for conceptual sampling request **Objective:** Extend the Gradio UI in `app.py` with controls to trigger and display a conceptual sampling request. **Action: Modify `gr.Blocks()` layout in `app.py`** 1. Open `@app.py`. 2. In the `gr.Blocks()` definition, after the `execution_output_display` Markdown component, add: ```python # Inside gr.Blocks() # ... (after execution_output_display) with gr.Accordion("๐Ÿ’ก Refine with AI Assistance (Conceptual MCP Sampling)", open=False): gr.Markdown( "This section demonstrates how the Knowledge Graph can inform an MCP `sampling/createMessage` request. " "It constructs the *parameters* for such a request based on the selected plan's prompt preferences." ) # For MVP5 Sprint 3, we'll use the original user query as context. # A dedicated textbox for sampling_task_context can be added later. # sampling_task_context_input = gr.Textbox(label="Refinement Task / Context for Sampling", # placeholder="e.g., 'Refine the prompt for conciseness.' or 'Suggest an alternative tool.'") construct_sampling_button = gr.Button("๐Ÿ”ฌ Construct Conceptual Sampling Request", elem_id="construct_sampling_button") sampling_request_json_output = gr.JSON(label="Conceptual MCP sampling/createMessage Params", elem_id="sampling_request_json_output") # Placeholder for optional LLM call result in Sprint 4 of MVP5 # sampling_llm_refinement_output = gr.Markdown(label="LLM Refinement Suggestion", elem_id="sampling_llm_refinement_output") ``` 3. Ensure the new components are correctly placed. ``` **Task 3.2: Implement Gradio Handler for Constructing Sampling Request** - Status: Pending - Description: In `app.py`: - Create a new function `handle_construct_sampling_request(original_user_query: str)`. - This function will re-run `planner_agent_instance.generate_plan(original_user_query, top_k_plans=1)` to get the current `PlannedStep`. (State management for the "current plan" can be complex in Gradio without `gr.State` or more elaborate patterns; re-planning is simpler for a hackathon demo). - If a plan is found, call `planner_agent_instance.construct_conceptual_sampling_request(current_plan, original_user_query)` (using original query as `task_context_text` for now). - Return the generated dictionary to be displayed in `sampling_request_json_output`. - Handle cases where no plan is active. - Wire `construct_sampling_button.click` to this new handler. Inputs: `query_input`. Outputs: `sampling_request_json_output`. - Acceptance Criteria: Clicking the button generates and displays the conceptual sampling request JSON. Tests pass. - TDD: Unit test `handle_construct_sampling_request` in `tests/test_app_handlers.py` by mocking the planner's methods and asserting the returned JSON structure. - Guidance for Claude / Cursor: ```cursor feat(app): implement handler for constructing and displaying sampling request JSON **Objective:** Wire up the new UI button to generate and show the conceptual sampling request. **Action 1: Implement `handle_construct_sampling_request` in `app.py`** 1. Open `@app.py`. 2. Define the new handler function: ```python # In app.py # ... (imports, service init, other handlers) ... def handle_construct_sampling_request(original_user_query: str) -> Dict[str, Any]: if not planner_agent_instance: return {"error": "Backend services not available."} if not original_user_query.strip(): return {"info": "Original query is missing. Cannot determine current plan for sampling."} planned_steps = planner_agent_instance.generate_plan(original_user_query, top_k_plans=1) if not planned_steps: return {"info": "No active plan found to construct a sampling request from."} current_plan = planned_steps[0] # For now, use the original query as the primary message content for sampling. # A dedicated input for sampling task context can be added later. task_context_for_sampling = ( f"Original User Goal: '{original_user_query}'. " f"Current Planned Action: Use tool '{current_plan.tool.name}' with prompt '{current_plan.prompt.name}'. " f"Prompt Template: '{current_plan.prompt.template_string}'. " f"Consider refining this plan or suggesting alternative models based on stored preferences." ) conceptual_sampling_params = planner_agent_instance.construct_conceptual_sampling_request( current_plan, task_context_for_sampling ) return conceptual_sampling_params # This dictionary will be displayed by gr.JSON ``` **Action 2: Wire `construct_sampling_button` in `gr.Blocks()` in `app.py`** 1. Locate `construct_sampling_button`. 2. Add its `.click()` handler: ```python # Inside gr.Blocks() construct_sampling_button.click( fn=handle_construct_sampling_request, inputs=[query_input], # Use the main query input for context outputs=[sampling_request_json_output] ) ``` **Action 3: Add tests to `tests/test_app_handlers.py`** 1. Open `@tests/test_app_handlers.py`. 2. Add `test_handle_construct_sampling_request()`: * Mock `app.planner_agent_instance.generate_plan` to return a `PlannedStep`. * Mock `app.planner_agent_instance.construct_conceptual_sampling_request` to return a specific dict. * Call `handle_construct_sampling_request("test query")` and assert it returns the expected dict. * Test cases: no plan found by planner. Please generate the code for `handle_construct_sampling_request`, its wiring, and its unit tests. ``` ``` --- **Sprint 4 (MVP 5): (Optional Stretch) Simulate LLM Call for Sampling Refinement** * **Goal:** If time permits, make an actual LLM call using the conceptually constructed sampling request parameters (or a simplified version) to ask for a model hint or simple prompt refinement. Display this LLM suggestion. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 4: (Optional Stretch) Simulate LLM Call for Sampling Refinement **Task 4.1: Add LLM Call Logic to Agent for Sampling Refinement** - Status: Pending - Description: In `agents/planner.py` (or `SamplingCoordinator`): - Add a new method `get_sampling_refinement_suggestion(self, conceptual_sampling_params: Dict[str, Any], original_prompt_template: str) -> str`. - This method takes the `conceptual_sampling_params` and the `original_prompt_template`. - It constructs a new prompt *for an LLM (e.g., Claude Haiku, GPT-3.5-turbo via OpenAI/Azure)*. This meta-prompt asks the LLM: "Given these MCP sampling preferences: [relevant parts of `conceptual_sampling_params.modelPreferences`], and this original prompt template: '{{original_prompt_template}}', suggest one alternative `preferred_model_hint` (e.g., 'claude-3-opus' or 'gpt-4o-mini') OR suggest a one-sentence refinement to the prompt template to better align with an intelligence priority of [value]." (Pick one refinement task for simplicity). - It calls the chosen LLM API (using `EmbeddingService` if it's adapted for chat, or a new LLM client). - Returns the LLM's textual suggestion. - Acceptance Criteria: Method makes an LLM call and returns a textual suggestion. Tests (mocking LLM) pass. - Guidance for Claude / Cursor: ```cursor feat(agent): implement LLM call for sampling refinement suggestion **Objective:** Add a method to an agent that uses an LLM to suggest refinements based on conceptual sampling parameters. **Action 1: Modify `agents/planner.py` (SimplePlannerAgent)** 1. Open `@agents/planner.py`. 2. Ensure `EmbeddingService` can also make chat completion calls or add a new simple LLM service. For hackathon, re-using `EmbeddingService`'s client if it's OpenAI/Azure and adding a chat method is okay. ```python # In kg_services/embedder.py - IF REUSING FOR CHAT (simplified for hackathon) # class EmbeddingService: # ... # def get_chat_completion(self, system_prompt: str, user_prompt: str, model: str = "gpt-3.5-turbo") -> Optional[str]: # try: # # Assuming self.client is OpenAI or AzureOpenAI client # response = self.client.chat.completions.create( # model=model, # or your Azure deployment name for chat # messages=[ # {"role": "system", "content": system_prompt}, # {"role": "user", "content": user_prompt} # ] # ) # return response.choices[0].message.content # except Exception as e: # print(f"Error in get_chat_completion: {e}") # return None ``` 3. Add `get_sampling_refinement_suggestion` to `SimplePlannerAgent`: ```python # In agents/planner.py (SimplePlannerAgent class) # ... def get_sampling_refinement_suggestion( self, conceptual_sampling_params: Dict[str, Any], original_prompt_template: str, original_user_query: str ) -> str: if not self.embedder: # Assuming embedder can also do chat completions return "Error: LLM service for refinement not available." prefs_summary_parts = [] if "modelPreferences" in conceptual_sampling_params: mp = conceptual_sampling_params["modelPreferences"] if "hints" in mp: prefs_summary_parts.append(f"- Preferred Model Hints: {', '.join(h['name'] for h in mp['hints'])}") if "costPriority" in mp: prefs_summary_parts.append(f"- Cost Priority: {mp['costPriority']:.1f}") if "speedPriority" in mp: prefs_summary_parts.append(f"- Speed Priority: {mp['speedPriority']:.1f}") if "intelligencePriority" in mp: prefs_summary_parts.append(f"- Intelligence Priority: {mp['intelligencePriority']:.1f}") prefs_summary = "\n".join(prefs_summary_parts) if prefs_summary_parts else "No specific model preferences given." system_prompt_for_refinement = "You are an AI assistant helping to optimize LLM sampling requests." user_prompt_for_refinement = ( f"The user's original goal was: '{original_user_query}'.\n" f"An action plan involves using a prompt with this template:\n```\n{original_prompt_template}\n```\n" f"The conceptual MCP sampling preferences for this are:\n{prefs_summary}\n\n" f"Please suggest ONE of the following:\n" f"1. An alternative `preferred_model_hint` (e.g., 'claude-3-opus', 'gpt-4o-mini', 'mistral-large-latest') that might be suitable, OR\n" f"2. A one-sentence refinement to the prompt template text to better align with the preferences (especially if intelligence is prioritized).\n" f"Be concise." ) # Use one of your sponsor LLM API credits here (OpenAI, Anthropic, Mistral) # This uses the get_chat_completion method assumed to be in EmbeddingService # You would choose an appropriate model based on your credits, e.g. "gpt-3.5-turbo", "claude-3-haiku-20240307" # For Anthropic, the API call is different. For Mistral, also different. # This example assumes an OpenAI-compatible client in self.embedder # CHOOSE A MODEL YOU HAVE CREDITS FOR AND KNOW HOW TO CALL. # For example, if using OpenAI client in embedder: llm_suggestion = self.embedder.get_chat_completion( system_prompt_for_refinement, user_prompt_for_refinement, model="gpt-3.5-turbo" # Or your preferred model ) return llm_suggestion if llm_suggestion else "Could not get a refinement suggestion from the LLM." ``` 4. Apply coding standards. **Action 2: Add tests to `tests/agents/test_planner.py`** - Add `test_get_sampling_refinement_suggestion()`: - Mock `embedder.get_chat_completion` to return a sample textual suggestion. - Provide sample `conceptual_sampling_params` and `original_prompt_template`. - Assert the method returns the mocked suggestion. - Test the case where `get_chat_completion` returns `None`. Please generate the method, assuming `EmbeddingService` is updated with `get_chat_completion` for an OpenAI-compatible model, and its tests. Also add `get_chat_completion` to `EmbeddingService` in `@kg_services/embedder.py` and its test in `@tests/kg_services/test_embedder.py` (mocking the client call). ``` **Task 4.2: Update Gradio UI to Trigger and Display LLM Refinement Suggestion** - Status: Pending - Description: In `app.py`: - Add a new `gr.Markdown(label="LLM Refinement Suggestion")` component (`sampling_llm_refinement_output`) below `sampling_request_json_output`. - Modify `handle_construct_sampling_request`: - It should now also call `planner_agent_instance.get_sampling_refinement_suggestion()`. - It needs to return an additional value for `sampling_llm_refinement_output`. - Update `construct_sampling_button.click()` outputs. - Acceptance Criteria: UI displays the LLM's suggestion after constructing the conceptual sampling request. - Guidance for Claude / Cursor: ```cursor feat(ui): display LLM-generated sampling refinement suggestions **Objective:** Update `app.py` to call the new agent method for sampling refinement and display its output. **Action 1: Modify `gr.Blocks()` layout in `app.py`** 1. Open `@app.py`. 2. In the Accordion for "Refine with AI Assistance", below `sampling_request_json_output`, add: ```python # Inside the Accordion sampling_llm_refinement_output = gr.Markdown(label="LLM Refinement Suggestion", elem_id="sampling_llm_refinement_output") ``` **Action 2: Modify `handle_construct_sampling_request` in `app.py`** 1. Change its return signature to `-> Tuple[Dict[str, Any], str]`. 2. After generating `conceptual_sampling_params` and getting `current_plan`: ```python # Inside handle_construct_sampling_request # ... (after conceptual_sampling_params = ...) llm_suggestion_text = "Could not generate LLM refinement suggestion (Planner or LLM service error)." if planner_agent_instance and current_plan: # Ensure planner and plan exist try: llm_suggestion_text = planner_agent_instance.get_sampling_refinement_suggestion( conceptual_sampling_params, current_plan.prompt.template_string, # Pass original template original_user_query # Pass original query for context ) except Exception as e: print(f"Error getting LLM refinement suggestion: {e}") llm_suggestion_text = f"Error during refinement suggestion: {e}" return conceptual_sampling_params, llm_suggestion_text # Return tuple ``` **Action 3: Update `construct_sampling_button.click()` in `app.py`** 1. Update the `outputs` list: ```python # In construct_sampling_button.click() outputs=[sampling_request_json_output, sampling_llm_refinement_output] ``` **Action 4: Update `tests/test_app_handlers.py`** 1. Update tests for `handle_construct_sampling_request`: * Mock `planner_agent_instance.get_sampling_refinement_suggestion` to return a sample string. * Assert that the handler function now returns a tuple with the conceptual params dict and the suggestion string. Please generate the code modifications for `app.py` and its tests. ``` ``` --- **Sprint 5 (MVP 5): Final Testing, Documentation Update for MVP 5, & Submission Prep** * **Goal:** Test the conceptual sampling features, update all READMEs to reflect MVP 5's advanced conceptual demonstration, and prepare all artifacts. * **Tasks for Claude / Cursor IDE:** ```markdown ### MVP 5 - Sprint 5: Final Testing, Documentation Update & Submission Prep **Task 5.1: Manual E2E Testing for Conceptual Sampling Features** - Status: Pending - Description: In the Gradio app: - Perform queries to get various `PlannedStep`s. - For each, click "Construct Conceptual Sampling Request". - Verify the generated JSON parameters reflect the KG preferences of the selected prompt. - (If Task 4.x was done) Verify the LLM refinement suggestion is displayed and makes sense. - Test with prompts that have minimal vs. rich sampling preferences in `data/initial_prompts.json`. - Acceptance Criteria: Conceptual sampling request generation and optional LLM refinement display work correctly. **Task 5.2: Update Project `README.md` and Hugging Face Space `README.md` for MVP 5** - Status: Pending - Description: Significantly update documentation: - Explain the new "Refine with AI Assistance (Conceptual Sampling)" section in the UI. - Detail how the KG (Tool/Prompt sampling preference fields) informs the construction of the `sampling/createMessage` parameters. - If the LLM refinement call was implemented, describe that feature. - Highlight this as an innovative use of MCP concepts and KG for advanced agent behavior. This is key for the "Most Innovative Use of MCP Award". - Acceptance Criteria: READMEs accurately and compellingly describe MVP 5's conceptual demonstration. - Guidance for Claude / Cursor: ```cursor docs(readme): update all documentation for MVP5 completion (conceptual sampling) **Objective:** Reflect the advanced conceptual sampling features in all project READMEs. **Action: Request Content for READMEs** Please draft updated text for the "How KGraph-MCP Works (Current MVP5 Functionality)" section for the main GitHub `README.md`. This section should clearly explain: 1. How `MCPPrompt` entities in the KG now store sampling preferences (model hints, priorities, etc.). 2. How the new UI section allows users to see a conceptual `sampling/createMessage` request. 3. How the `modelPreferences` and other parameters in this request are dynamically generated from the KG. 4. (If implemented) How an additional LLM call can provide refinement suggestions based on these preferences. Highlight that this demonstrates a pathway to more intelligent, self-optimizing agent interactions using MCP, even if the full sampling client loop isn't part of this hackathon MVP. This section is critical for the "Most Innovative Use of MCP" judging criterion. ``` **Task 5.3: Final Code Review, Cleanup, All Checks, Tag, and Video Prep** - Status: Pending - Description: - Final code review of all MVP5 changes. - `just install`, `lint`, `format`, `type-check`, `test`. - Commit (`chore(release): complete MVP5 - conceptual KG-informed sampling`). Push. Verify CI. - Update the main demo video to include a walkthrough of the conceptual sampling features. - Ensure all Hugging Face Space READMEs (main demo, Track 1 tools) are finalized with correct tags and links. - Acceptance Criteria: MVP5 is feature-complete, documented, demo video updated, and CI green. Project ready for final hackathon submission. ``` --- **End of MVP 5 - Sprint 5 & Overall MVP 5 Review:** * **What's Done:** * Hackathon MVP 5 ("KG-Informed Model Preferences for Sampling (Conceptual)") is complete. * The KG is enhanced with sampling preference metadata for Prompts. * The agent can construct a conceptual MCP `sampling/createMessage` request using this KG data. * The Gradio UI displays this conceptual request. * (Optionally) An LLM call provides refinement suggestions based on these preferences, showcased in the UI. * Documentation and demo video are updated to highlight this innovative aspect. * **Hackathon Submission Readiness:** The project now has a strong narrative for innovation, showing how a KG can drive not just tool selection but also the nuanced parameters of advanced MCP interactions like sampling. This completes the detailed sprint planning for all 5 MVPs. This structured, iterative approach, leveraging Claude effectively within Cursor, should set you up for a very successful hackathon! Remember to be flexible with these sprint plans during the actual event and adjust scope as needed. Good luck! ---