# MVP 2 Sprint 2 - Task Summary & Execution Guide **Date**: 2025-06-08 **Sprint Goal**: Enhanced Planner for Tool+Prompt Pairs **Status**: ๐Ÿš€ **READY FOR EXECUTION** **Task Management**: Tasks added to `tasks.json` (IDs 26-29) ## ๐ŸŽฏ Sprint Overview Transform the `SimplePlannerAgent` from suggesting only tools to suggesting **tool+prompt pairs** as structured `PlannedStep` objects, enabling the next evolution toward complete tool+prompt guidance. ### Goal Evolution - **Current (MVP1)**: `User Query โ†’ Tool Discovery โ†’ Tool Suggestion` - **Sprint 2 Target**: `User Query โ†’ Tool Discovery โ†’ Prompt Selection โ†’ (Tool + Prompt) Suggestion` ## ๐Ÿ“‹ Task Execution Order ### Task 26: Define PlannedStep Dataclass (60 mins) **Status**: Todo **Dependencies**: None **Priority**: ๐Ÿ”ด **HIGH** (Foundation for all other tasks) **Execution Command for Claude**: ``` Implement Task 26: Define PlannedStep Dataclass **Objective**: Create structured data representation for planner output combining MCPTool and MCPPrompt. **Action 1: Modify `kg_services/ontology.py`** 1. Open @kg_services/ontology.py 2. Add PlannedStep dataclass below existing MCPPrompt class 3. Include fields: tool (MCPTool), prompt (MCPPrompt), relevance_score (Optional[float] = None) 4. Add proper type hints and imports 5. Apply coding standards from @.cursor/rules/python_gradio_basic.mdc **Action 2: Add Tests in `tests/kg_services/test_ontology.py`** 1. Open @tests/kg_services/test_ontology.py 2. Add test_planned_step_creation() function 3. Test PlannedStep instantiation with valid MCPTool and MCPPrompt 4. Test type safety and field access 5. Test optional relevance_score functionality Generate the complete implementation. ``` ### Task 27: Refactor SimplePlannerAgent (180 mins) **Status**: Todo **Dependencies**: Task 26 **Priority**: ๐Ÿ”ด **HIGH** (Core logic transformation) **Execution Command for Claude**: ``` Implement Task 27: Refactor SimplePlannerAgent for Tool+Prompt Planning **Objective**: Implement combined tool+prompt selection logic with semantic ranking. **Action 1: Modify `agents/planner.py`** 1. Open @agents/planner.py 2. Import PlannedStep from kg_services.ontology 3. Rename suggest_tools method to generate_plan 4. Implement algorithm: - Tool Selection: Use existing semantic search for tools - Prompt Filtering: Get prompts by target_tool_id - Prompt Ranking: Semantic similarity against query - PlannedStep Assembly: Create structured output 5. Add _select_best_prompt helper method 6. Return List[PlannedStep] instead of List[MCPTool] **Action 2: Update `tests/agents/test_planner.py`** 1. Update all test methods for new generate_plan signature 2. Mock InMemoryKG prompt methods 3. Test scenarios: no tools, no prompts for tool, single prompt, multiple prompts 4. Verify PlannedStep output structure Generate the complete refactored implementation. ``` ### Task 28: Update Application Integration (45 mins) **Status**: Todo **Dependencies**: Task 27 **Priority**: ๐ŸŸก **MEDIUM** (Integration layer) **Execution Command for Claude**: ``` Implement Task 28: Update Application Integration for New Planner **Objective**: Ensure application backend uses enhanced planner without breaking UI. **Action 1: Modify `app.py`** 1. Open @app.py 2. Update handle_find_tools function: - Change planner call from suggest_tools to generate_plan - Handle List[PlannedStep] return type - Extract tool from PlannedStep for current UI (temporary) - Add proper error handling for empty results 3. Import PlannedStep if needed **Action 2: Update `tests/test_app.py`** 1. Update mocked planner method calls 2. Test new generate_plan integration 3. Verify backward compatibility for UI display Maintain backward compatibility until Sprint 3 UI updates. ``` ### Task 29: Quality Assurance & Deployment (30 mins) **Status**: Todo **Dependencies**: Task 28 **Priority**: ๐ŸŸข **LOW** (Quality gates) **Execution Command for Claude**: ``` Implement Task 29: Quality Assurance & Deployment **Objective**: Ensure code quality, system stability, and deployment readiness. **Actions**: 1. Run `just lint` and fix any style issues 2. Run `just format` to apply formatting 3. Run `just type-check` and resolve type issues 4. Run `just test` and ensure all tests pass 5. Manual integration testing: - Verify application starts successfully - Test tool+prompt planning workflow - Confirm no UI crashes 6. Update requirements.lock if needed 7. Commit changes with conventional commit format 8. Push and verify CI pipeline Document any issues found for Sprint 3. ``` ## ๐Ÿ”ง Technical Implementation Details ### PlannedStep Structure ```python @dataclass class PlannedStep: """Represents a planned step combining a tool and its prompt.""" tool: MCPTool prompt: MCPPrompt relevance_score: Optional[float] = None ``` ### Enhanced Planning Algorithm ```python def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]: # 1. Get query embedding query_embedding = self.embedder.get_embedding(user_query) # 2. Find candidate tools (semantic search) tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3) # 3. For each tool, find and rank prompts planned_steps = [] for tool_id in tool_ids: tool = self.kg.get_tool_by_id(tool_id) # Filter prompts for this tool prompts = [p for p in self.kg.prompts.values() if p.target_tool_id == tool.tool_id] # Select best prompt semantically best_prompt = self._select_best_prompt(prompts, query_embedding) if best_prompt: planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt)) return planned_steps[:top_k_plans] ``` ### Semantic Prompt Selection ```python def _select_best_prompt(self, prompts: List[MCPPrompt], query_embedding: List[float]) -> Optional[MCPPrompt]: if not prompts: return None if len(prompts) == 1: return prompts[0] best_prompt = None best_similarity = -1.0 for prompt in prompts: # Create embedding text from prompt prompt_text = f"{prompt.name} - {prompt.description} - {prompt.use_case}" prompt_embedding = self.embedder.get_embedding(prompt_text) if prompt_embedding: similarity = self.kg._cosine_similarity(query_embedding, prompt_embedding) if similarity > best_similarity: best_similarity = similarity best_prompt = prompt return best_prompt ``` ## ๐Ÿงช Testing Strategy ### Key Test Scenarios 1. **PlannedStep Creation**: Valid instantiation and field access 2. **No Tools Found**: Empty list return from generate_plan 3. **Tool Without Prompts**: Graceful handling and skipping 4. **Single Prompt for Tool**: Direct selection 5. **Multiple Prompts for Tool**: Semantic ranking selection 6. **Application Integration**: Backward compatible UI interaction ### Test Coverage Targets - **Unit Tests**: >95% coverage for new PlannedStep and planning logic - **Integration Tests**: End-to-end workflow validation - **Regression Tests**: Ensure no breaking changes to existing functionality ## ๐Ÿ“Š Success Criteria | Component | Success Metric | Validation | |-----------|---------------|------------| | PlannedStep | Dataclass works correctly | Unit tests pass | | Enhanced Planner | Tool+prompt selection accurate | Integration tests | | Application | No UI crashes, backward compatible | Manual testing | | Code Quality | All quality checks pass | CI pipeline | ## ๐Ÿ”„ Sprint 3 Preparation Upon Sprint 2 completion, the system will be ready for Sprint 3 which focuses on: - **UI Enhancement**: Display rich PlannedStep information - **Prompt Template Rendering**: Show template strings with variables - **Interactive Elements**: Dynamic input field generation - **User Experience**: Enhanced tool+prompt workflow interface ## ๐Ÿšจ Potential Challenges & Mitigations 1. **Semantic Prompt Selection Complexity** - *Challenge*: Multiple prompts with similar semantics - *Mitigation*: Start with simple cosine similarity, add tie-breaking rules 2. **Performance with Prompt Embeddings** - *Challenge*: Additional API calls for prompt ranking - *Mitigation*: Use pre-computed embeddings where possible 3. **Backward Compatibility** - *Challenge*: UI expects tool-only format - *Mitigation*: Extract tool from PlannedStep for display 4. **Test Complexity** - *Challenge*: Mocking complex tool+prompt interactions - *Mitigation*: Use focused unit tests with clear test data --- **Ready for Execution**: All tasks are well-defined with clear objectives, detailed implementation guidance, and comprehensive acceptance criteria. The task dependency chain ensures proper execution order and minimal blocking. *Sprint 2 Task Summary created for MVP 2 - Enhanced Planner for Tool+Prompt Pairs*