# MVP 2 Sprint 2 - Comprehensive Plan ## Enhanced Planner for Tool+Prompt Pairs **Date**: 2025-06-08 **Sprint Goal**: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects **Duration**: 3-5 hours **Status**: ๐Ÿš€ **READY TO START** ## ๐ŸŽฏ Sprint 2 Objectives ### Goal Evolution: MVP1 โ†’ MVP2 Sprint 2 - **MVP1**: `User Query โ†’ Tool Discovery โ†’ Tool Suggestion` - **MVP2 Sprint 2**: `User Query โ†’ Tool Discovery โ†’ Prompt Selection โ†’ (Tool + Prompt) Suggestion` ### Key Deliverables 1. **PlannedStep Ontology** - New dataclass for structured tool+prompt pairs 2. **Enhanced SimplePlannerAgent** - Semantic tool+prompt selection logic 3. **Updated Application Integration** - Backend support for new planner output 4. **Comprehensive Testing** - Full coverage of new planning workflow ## ๐Ÿ“‹ Task Breakdown ### Task 2.1: Define PlannedStep Dataclass (60 mins) **Files**: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py` **Objective**: Create structured data representation for planner output **Implementation**: ```python @dataclass class PlannedStep: """Represents a planned step combining a tool and its prompt.""" tool: MCPTool prompt: MCPPrompt relevance_score: Optional[float] = None # Future use ``` **Testing Requirements**: - Test PlannedStep creation with valid tool+prompt pairs - Validate type safety and field access - Test optional relevance_score functionality ### Task 2.2: Refactor SimplePlannerAgent (180 mins) **Files**: `agents/planner.py`, `tests/agents/test_planner.py` **Objective**: Implement combined tool+prompt selection logic **Key Algorithm**: 1. **Tool Selection**: Find relevant tools using semantic search 2. **Prompt Filtering**: Get prompts targeting each selected tool 3. **Prompt Ranking**: Semantically rank prompts against user query 4. **PlannedStep Assembly**: Create structured output **Implementation Strategy**: ```python def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]: # 1. Get query embedding query_embedding = self.embedder.get_embedding(user_query) # 2. Find candidate tools tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3) # 3. For each tool, find and rank prompts planned_steps = [] for tool_id in tool_ids: tool = self.kg.get_tool_by_id(tool_id) prompts = [p for p in self.kg.prompts.values() if p.target_tool_id == tool.tool_id] # 4. Select best prompt semantically best_prompt = self._select_best_prompt(prompts, query_embedding) if best_prompt: planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt)) return planned_steps[:top_k_plans] ``` **Testing Requirements**: - Test no tools found scenario - Test tool found but no prompts scenario - Test tool with single prompt selection - Test tool with multiple prompts - semantic selection - Test top_k_plans limiting functionality ### Task 2.3: Update Application Integration (45 mins) **Files**: `app.py`, `tests/test_app.py` **Objective**: Update backend to use new planner method **Changes Required**: 1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()` 2. Handle `PlannedStep` output format (temporary backward compatibility) 3. Ensure no UI crashes during transition **Implementation**: ```python def handle_find_tools(query: str) -> dict: if not planner_agent: return {"error": "Planner not available"} planned_steps = planner_agent.generate_plan(query, top_k_plans=1) if not planned_steps: return {"info": f"No actionable plans found for: '{query}'"} # Temporary: extract tool for display (UI update in Sprint 3) first_plan = planned_steps[0] return format_tool_for_display(first_plan.tool) ``` ### Task 2.4: Quality Assurance & Deployment (30 mins) **Objective**: Ensure code quality and system stability **Checklist**: - [ ] Run `just lint` - Code style compliance - [ ] Run `just format` - Automatic formatting - [ ] Run `just type-check` - Type safety validation - [ ] Run `just test` - Full test suite execution - [ ] Manual integration testing - [ ] Update requirements.lock if needed - [ ] Commit and push changes - [ ] Verify CI pipeline success ## ๐Ÿ”ง Technical Architecture ### Data Flow Evolution ``` User Query โ†“ Query Embedding (OpenAI) โ†“ Tool Semantic Search (Knowledge Graph) โ†“ Prompt Filtering (by target_tool_id) โ†“ Prompt Semantic Ranking (vs Query) โ†“ PlannedStep Assembly โ†“ Structured Output (Tool + Prompt) ``` ### New Components Introduced 1. **PlannedStep Dataclass** - Structured output format 2. **Enhanced Planning Logic** - Tool+prompt selection 3. **Semantic Prompt Ranking** - Context-aware prompt selection 4. **Backward Compatible Interface** - Smooth transition support ### Integration Points - **Knowledge Graph**: Extended prompt search capabilities - **Embedding Service**: Dual-purpose tool+prompt ranking - **Application Layer**: Updated method signatures and handling ## ๐Ÿงช Testing Strategy ### Unit Test Coverage - **PlannedStep Tests**: Creation, validation, type safety - **Planner Logic Tests**: All selection scenarios and edge cases - **Integration Tests**: End-to-end workflow validation - **Error Handling Tests**: Graceful failure scenarios ### Test Scenarios 1. **Happy Path**: Query โ†’ Tool โ†’ Prompt โ†’ PlannedStep 2. **No Tools Found**: Empty result handling 3. **Tool Without Prompts**: Graceful skipping 4. **Multiple Prompts**: Semantic selection validation 5. **Edge Cases**: Empty queries, API failures ### Manual Testing Checklist - [ ] Application starts successfully with new planner - [ ] Tool suggestions still work (backward compatibility) - [ ] No crashes in UI during tool selection - [ ] Logging shows enhanced planning information ## ๐Ÿ“Š Success Metrics | Metric | Target | Validation Method | |--------|--------|------------------| | PlannedStep Creation | โœ… Complete | Unit tests pass | | Tool+Prompt Selection | โœ… Semantic accuracy | Integration tests | | Backward Compatibility | โœ… No breaking changes | Manual testing | | Code Quality | โœ… All checks pass | CI pipeline | | Test Coverage | โœ… >90% for new code | pytest coverage | ## ๐Ÿ”„ Sprint Dependencies ### Prerequisites (Completed in Sprint 1) - โœ… MCPPrompt ontology established - โœ… Knowledge graph extended for prompts - โœ… Vector indexing for prompt search - โœ… Initial prompt dataset created ### Deliverables for Sprint 3 - โœ… PlannedStep objects ready for UI display - โœ… Enhanced planner generating structured output - โœ… Backend integration supporting rich display - โœ… Test coverage preventing regressions ## ๐Ÿšจ Risk Mitigation ### Potential Challenges 1. **Semantic Prompt Selection Complexity** - *Risk*: Overly complex ranking logic - *Mitigation*: Start with simple cosine similarity, iterate 2. **Performance with Multiple Prompts** - *Risk*: Slow response times - *Mitigation*: Use pre-computed embeddings, limit candidates 3. **Test Complexity** - *Risk*: Difficult to mock complex interactions - *Mitigation*: Break into smaller, testable units 4. **Backward Compatibility** - *Risk*: Breaking existing functionality - *Mitigation*: Careful interface design, thorough testing ## ๐ŸŽฏ Sprint 3 Preparation ### Ready for Next Sprint After Sprint 2 completion, Sprint 3 can focus on: - UI enhancements to display PlannedStep information - Rich prompt template display with variables - Interactive input field generation - Enhanced user experience for tool+prompt workflows --- *Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs* *Estimated effort: 3-5 hours* *Focus: Backend logic enhancement and structured output*