| # MVP 2 Sprint 2 - Comprehensive Plan | |
| ## Enhanced Planner for Tool+Prompt Pairs | |
| **Date**: 2025-06-08 | |
| **Sprint Goal**: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects | |
| **Duration**: 3-5 hours | |
| **Status**: π **READY TO START** | |
| ## π― Sprint 2 Objectives | |
| ### Goal Evolution: MVP1 β MVP2 Sprint 2 | |
| - **MVP1**: `User Query β Tool Discovery β Tool Suggestion` | |
| - **MVP2 Sprint 2**: `User Query β Tool Discovery β Prompt Selection β (Tool + Prompt) Suggestion` | |
| ### Key Deliverables | |
| 1. **PlannedStep Ontology** - New dataclass for structured tool+prompt pairs | |
| 2. **Enhanced SimplePlannerAgent** - Semantic tool+prompt selection logic | |
| 3. **Updated Application Integration** - Backend support for new planner output | |
| 4. **Comprehensive Testing** - Full coverage of new planning workflow | |
| ## π Task Breakdown | |
| ### Task 2.1: Define PlannedStep Dataclass (60 mins) | |
| **Files**: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py` | |
| **Objective**: Create structured data representation for planner output | |
| **Implementation**: | |
| ```python | |
| @dataclass | |
| class PlannedStep: | |
| """Represents a planned step combining a tool and its prompt.""" | |
| tool: MCPTool | |
| prompt: MCPPrompt | |
| relevance_score: Optional[float] = None # Future use | |
| ``` | |
| **Testing Requirements**: | |
| - Test PlannedStep creation with valid tool+prompt pairs | |
| - Validate type safety and field access | |
| - Test optional relevance_score functionality | |
| ### Task 2.2: Refactor SimplePlannerAgent (180 mins) | |
| **Files**: `agents/planner.py`, `tests/agents/test_planner.py` | |
| **Objective**: Implement combined tool+prompt selection logic | |
| **Key Algorithm**: | |
| 1. **Tool Selection**: Find relevant tools using semantic search | |
| 2. **Prompt Filtering**: Get prompts targeting each selected tool | |
| 3. **Prompt Ranking**: Semantically rank prompts against user query | |
| 4. **PlannedStep Assembly**: Create structured output | |
| **Implementation Strategy**: | |
| ```python | |
| def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]: | |
| # 1. Get query embedding | |
| query_embedding = self.embedder.get_embedding(user_query) | |
| # 2. Find candidate tools | |
| tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3) | |
| # 3. For each tool, find and rank prompts | |
| planned_steps = [] | |
| for tool_id in tool_ids: | |
| tool = self.kg.get_tool_by_id(tool_id) | |
| prompts = [p for p in self.kg.prompts.values() | |
| if p.target_tool_id == tool.tool_id] | |
| # 4. Select best prompt semantically | |
| best_prompt = self._select_best_prompt(prompts, query_embedding) | |
| if best_prompt: | |
| planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt)) | |
| return planned_steps[:top_k_plans] | |
| ``` | |
| **Testing Requirements**: | |
| - Test no tools found scenario | |
| - Test tool found but no prompts scenario | |
| - Test tool with single prompt selection | |
| - Test tool with multiple prompts - semantic selection | |
| - Test top_k_plans limiting functionality | |
| ### Task 2.3: Update Application Integration (45 mins) | |
| **Files**: `app.py`, `tests/test_app.py` | |
| **Objective**: Update backend to use new planner method | |
| **Changes Required**: | |
| 1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()` | |
| 2. Handle `PlannedStep` output format (temporary backward compatibility) | |
| 3. Ensure no UI crashes during transition | |
| **Implementation**: | |
| ```python | |
| def handle_find_tools(query: str) -> dict: | |
| if not planner_agent: | |
| return {"error": "Planner not available"} | |
| planned_steps = planner_agent.generate_plan(query, top_k_plans=1) | |
| if not planned_steps: | |
| return {"info": f"No actionable plans found for: '{query}'"} | |
| # Temporary: extract tool for display (UI update in Sprint 3) | |
| first_plan = planned_steps[0] | |
| return format_tool_for_display(first_plan.tool) | |
| ``` | |
| ### Task 2.4: Quality Assurance & Deployment (30 mins) | |
| **Objective**: Ensure code quality and system stability | |
| **Checklist**: | |
| - [ ] Run `just lint` - Code style compliance | |
| - [ ] Run `just format` - Automatic formatting | |
| - [ ] Run `just type-check` - Type safety validation | |
| - [ ] Run `just test` - Full test suite execution | |
| - [ ] Manual integration testing | |
| - [ ] Update requirements.lock if needed | |
| - [ ] Commit and push changes | |
| - [ ] Verify CI pipeline success | |
| ## π§ Technical Architecture | |
| ### Data Flow Evolution | |
| ``` | |
| User Query | |
| β | |
| Query Embedding (OpenAI) | |
| β | |
| Tool Semantic Search (Knowledge Graph) | |
| β | |
| Prompt Filtering (by target_tool_id) | |
| β | |
| Prompt Semantic Ranking (vs Query) | |
| β | |
| PlannedStep Assembly | |
| β | |
| Structured Output (Tool + Prompt) | |
| ``` | |
| ### New Components Introduced | |
| 1. **PlannedStep Dataclass** - Structured output format | |
| 2. **Enhanced Planning Logic** - Tool+prompt selection | |
| 3. **Semantic Prompt Ranking** - Context-aware prompt selection | |
| 4. **Backward Compatible Interface** - Smooth transition support | |
| ### Integration Points | |
| - **Knowledge Graph**: Extended prompt search capabilities | |
| - **Embedding Service**: Dual-purpose tool+prompt ranking | |
| - **Application Layer**: Updated method signatures and handling | |
| ## π§ͺ Testing Strategy | |
| ### Unit Test Coverage | |
| - **PlannedStep Tests**: Creation, validation, type safety | |
| - **Planner Logic Tests**: All selection scenarios and edge cases | |
| - **Integration Tests**: End-to-end workflow validation | |
| - **Error Handling Tests**: Graceful failure scenarios | |
| ### Test Scenarios | |
| 1. **Happy Path**: Query β Tool β Prompt β PlannedStep | |
| 2. **No Tools Found**: Empty result handling | |
| 3. **Tool Without Prompts**: Graceful skipping | |
| 4. **Multiple Prompts**: Semantic selection validation | |
| 5. **Edge Cases**: Empty queries, API failures | |
| ### Manual Testing Checklist | |
| - [ ] Application starts successfully with new planner | |
| - [ ] Tool suggestions still work (backward compatibility) | |
| - [ ] No crashes in UI during tool selection | |
| - [ ] Logging shows enhanced planning information | |
| ## π Success Metrics | |
| | Metric | Target | Validation Method | | |
| |--------|--------|------------------| | |
| | PlannedStep Creation | β Complete | Unit tests pass | | |
| | Tool+Prompt Selection | β Semantic accuracy | Integration tests | | |
| | Backward Compatibility | β No breaking changes | Manual testing | | |
| | Code Quality | β All checks pass | CI pipeline | | |
| | Test Coverage | β >90% for new code | pytest coverage | | |
| ## π Sprint Dependencies | |
| ### Prerequisites (Completed in Sprint 1) | |
| - β MCPPrompt ontology established | |
| - β Knowledge graph extended for prompts | |
| - β Vector indexing for prompt search | |
| - β Initial prompt dataset created | |
| ### Deliverables for Sprint 3 | |
| - β PlannedStep objects ready for UI display | |
| - β Enhanced planner generating structured output | |
| - β Backend integration supporting rich display | |
| - β Test coverage preventing regressions | |
| ## π¨ Risk Mitigation | |
| ### Potential Challenges | |
| 1. **Semantic Prompt Selection Complexity** | |
| - *Risk*: Overly complex ranking logic | |
| - *Mitigation*: Start with simple cosine similarity, iterate | |
| 2. **Performance with Multiple Prompts** | |
| - *Risk*: Slow response times | |
| - *Mitigation*: Use pre-computed embeddings, limit candidates | |
| 3. **Test Complexity** | |
| - *Risk*: Difficult to mock complex interactions | |
| - *Mitigation*: Break into smaller, testable units | |
| 4. **Backward Compatibility** | |
| - *Risk*: Breaking existing functionality | |
| - *Mitigation*: Careful interface design, thorough testing | |
| ## π― Sprint 3 Preparation | |
| ### Ready for Next Sprint | |
| After Sprint 2 completion, Sprint 3 can focus on: | |
| - UI enhancements to display PlannedStep information | |
| - Rich prompt template display with variables | |
| - Interactive input field generation | |
| - Enhanced user experience for tool+prompt workflows | |
| --- | |
| *Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs* | |
| *Estimated effort: 3-5 hours* | |
| *Focus: Backend logic enhancement and structured output* |