| # MVP 2 Sprint 2 - Task Summary & Execution Guide | |
| **Date**: 2025-06-08 | |
| **Sprint Goal**: Enhanced Planner for Tool+Prompt Pairs | |
| **Status**: π **READY FOR EXECUTION** | |
| **Task Management**: Tasks added to `tasks.json` (IDs 26-29) | |
| ## π― Sprint Overview | |
| Transform the `SimplePlannerAgent` from suggesting only tools to suggesting **tool+prompt pairs** as structured `PlannedStep` objects, enabling the next evolution toward complete tool+prompt guidance. | |
| ### Goal Evolution | |
| - **Current (MVP1)**: `User Query β Tool Discovery β Tool Suggestion` | |
| - **Sprint 2 Target**: `User Query β Tool Discovery β Prompt Selection β (Tool + Prompt) Suggestion` | |
| ## π Task Execution Order | |
| ### Task 26: Define PlannedStep Dataclass (60 mins) | |
| **Status**: Todo | |
| **Dependencies**: None | |
| **Priority**: π΄ **HIGH** (Foundation for all other tasks) | |
| **Execution Command for Claude**: | |
| ``` | |
| Implement Task 26: Define PlannedStep Dataclass | |
| **Objective**: Create structured data representation for planner output combining MCPTool and MCPPrompt. | |
| **Action 1: Modify `kg_services/ontology.py`** | |
| 1. Open @kg_services/ontology.py | |
| 2. Add PlannedStep dataclass below existing MCPPrompt class | |
| 3. Include fields: tool (MCPTool), prompt (MCPPrompt), relevance_score (Optional[float] = None) | |
| 4. Add proper type hints and imports | |
| 5. Apply coding standards from @.cursor/rules/python_gradio_basic.mdc | |
| **Action 2: Add Tests in `tests/kg_services/test_ontology.py`** | |
| 1. Open @tests/kg_services/test_ontology.py | |
| 2. Add test_planned_step_creation() function | |
| 3. Test PlannedStep instantiation with valid MCPTool and MCPPrompt | |
| 4. Test type safety and field access | |
| 5. Test optional relevance_score functionality | |
| Generate the complete implementation. | |
| ``` | |
| ### Task 27: Refactor SimplePlannerAgent (180 mins) | |
| **Status**: Todo | |
| **Dependencies**: Task 26 | |
| **Priority**: π΄ **HIGH** (Core logic transformation) | |
| **Execution Command for Claude**: | |
| ``` | |
| Implement Task 27: Refactor SimplePlannerAgent for Tool+Prompt Planning | |
| **Objective**: Implement combined tool+prompt selection logic with semantic ranking. | |
| **Action 1: Modify `agents/planner.py`** | |
| 1. Open @agents/planner.py | |
| 2. Import PlannedStep from kg_services.ontology | |
| 3. Rename suggest_tools method to generate_plan | |
| 4. Implement algorithm: | |
| - Tool Selection: Use existing semantic search for tools | |
| - Prompt Filtering: Get prompts by target_tool_id | |
| - Prompt Ranking: Semantic similarity against query | |
| - PlannedStep Assembly: Create structured output | |
| 5. Add _select_best_prompt helper method | |
| 6. Return List[PlannedStep] instead of List[MCPTool] | |
| **Action 2: Update `tests/agents/test_planner.py`** | |
| 1. Update all test methods for new generate_plan signature | |
| 2. Mock InMemoryKG prompt methods | |
| 3. Test scenarios: no tools, no prompts for tool, single prompt, multiple prompts | |
| 4. Verify PlannedStep output structure | |
| Generate the complete refactored implementation. | |
| ``` | |
| ### Task 28: Update Application Integration (45 mins) | |
| **Status**: Todo | |
| **Dependencies**: Task 27 | |
| **Priority**: π‘ **MEDIUM** (Integration layer) | |
| **Execution Command for Claude**: | |
| ``` | |
| Implement Task 28: Update Application Integration for New Planner | |
| **Objective**: Ensure application backend uses enhanced planner without breaking UI. | |
| **Action 1: Modify `app.py`** | |
| 1. Open @app.py | |
| 2. Update handle_find_tools function: | |
| - Change planner call from suggest_tools to generate_plan | |
| - Handle List[PlannedStep] return type | |
| - Extract tool from PlannedStep for current UI (temporary) | |
| - Add proper error handling for empty results | |
| 3. Import PlannedStep if needed | |
| **Action 2: Update `tests/test_app.py`** | |
| 1. Update mocked planner method calls | |
| 2. Test new generate_plan integration | |
| 3. Verify backward compatibility for UI display | |
| Maintain backward compatibility until Sprint 3 UI updates. | |
| ``` | |
| ### Task 29: Quality Assurance & Deployment (30 mins) | |
| **Status**: Todo | |
| **Dependencies**: Task 28 | |
| **Priority**: π’ **LOW** (Quality gates) | |
| **Execution Command for Claude**: | |
| ``` | |
| Implement Task 29: Quality Assurance & Deployment | |
| **Objective**: Ensure code quality, system stability, and deployment readiness. | |
| **Actions**: | |
| 1. Run `just lint` and fix any style issues | |
| 2. Run `just format` to apply formatting | |
| 3. Run `just type-check` and resolve type issues | |
| 4. Run `just test` and ensure all tests pass | |
| 5. Manual integration testing: | |
| - Verify application starts successfully | |
| - Test tool+prompt planning workflow | |
| - Confirm no UI crashes | |
| 6. Update requirements.lock if needed | |
| 7. Commit changes with conventional commit format | |
| 8. Push and verify CI pipeline | |
| Document any issues found for Sprint 3. | |
| ``` | |
| ## π§ Technical Implementation Details | |
| ### PlannedStep Structure | |
| ```python | |
| @dataclass | |
| class PlannedStep: | |
| """Represents a planned step combining a tool and its prompt.""" | |
| tool: MCPTool | |
| prompt: MCPPrompt | |
| relevance_score: Optional[float] = None | |
| ``` | |
| ### Enhanced Planning Algorithm | |
| ```python | |
| def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]: | |
| # 1. Get query embedding | |
| query_embedding = self.embedder.get_embedding(user_query) | |
| # 2. Find candidate tools (semantic search) | |
| tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3) | |
| # 3. For each tool, find and rank prompts | |
| planned_steps = [] | |
| for tool_id in tool_ids: | |
| tool = self.kg.get_tool_by_id(tool_id) | |
| # Filter prompts for this tool | |
| prompts = [p for p in self.kg.prompts.values() | |
| if p.target_tool_id == tool.tool_id] | |
| # Select best prompt semantically | |
| best_prompt = self._select_best_prompt(prompts, query_embedding) | |
| if best_prompt: | |
| planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt)) | |
| return planned_steps[:top_k_plans] | |
| ``` | |
| ### Semantic Prompt Selection | |
| ```python | |
| def _select_best_prompt(self, prompts: List[MCPPrompt], | |
| query_embedding: List[float]) -> Optional[MCPPrompt]: | |
| if not prompts: | |
| return None | |
| if len(prompts) == 1: | |
| return prompts[0] | |
| best_prompt = None | |
| best_similarity = -1.0 | |
| for prompt in prompts: | |
| # Create embedding text from prompt | |
| prompt_text = f"{prompt.name} - {prompt.description} - {prompt.use_case}" | |
| prompt_embedding = self.embedder.get_embedding(prompt_text) | |
| if prompt_embedding: | |
| similarity = self.kg._cosine_similarity(query_embedding, prompt_embedding) | |
| if similarity > best_similarity: | |
| best_similarity = similarity | |
| best_prompt = prompt | |
| return best_prompt | |
| ``` | |
| ## π§ͺ Testing Strategy | |
| ### Key Test Scenarios | |
| 1. **PlannedStep Creation**: Valid instantiation and field access | |
| 2. **No Tools Found**: Empty list return from generate_plan | |
| 3. **Tool Without Prompts**: Graceful handling and skipping | |
| 4. **Single Prompt for Tool**: Direct selection | |
| 5. **Multiple Prompts for Tool**: Semantic ranking selection | |
| 6. **Application Integration**: Backward compatible UI interaction | |
| ### Test Coverage Targets | |
| - **Unit Tests**: >95% coverage for new PlannedStep and planning logic | |
| - **Integration Tests**: End-to-end workflow validation | |
| - **Regression Tests**: Ensure no breaking changes to existing functionality | |
| ## π Success Criteria | |
| | Component | Success Metric | Validation | | |
| |-----------|---------------|------------| | |
| | PlannedStep | Dataclass works correctly | Unit tests pass | | |
| | Enhanced Planner | Tool+prompt selection accurate | Integration tests | | |
| | Application | No UI crashes, backward compatible | Manual testing | | |
| | Code Quality | All quality checks pass | CI pipeline | | |
| ## π Sprint 3 Preparation | |
| Upon Sprint 2 completion, the system will be ready for Sprint 3 which focuses on: | |
| - **UI Enhancement**: Display rich PlannedStep information | |
| - **Prompt Template Rendering**: Show template strings with variables | |
| - **Interactive Elements**: Dynamic input field generation | |
| - **User Experience**: Enhanced tool+prompt workflow interface | |
| ## π¨ Potential Challenges & Mitigations | |
| 1. **Semantic Prompt Selection Complexity** | |
| - *Challenge*: Multiple prompts with similar semantics | |
| - *Mitigation*: Start with simple cosine similarity, add tie-breaking rules | |
| 2. **Performance with Prompt Embeddings** | |
| - *Challenge*: Additional API calls for prompt ranking | |
| - *Mitigation*: Use pre-computed embeddings where possible | |
| 3. **Backward Compatibility** | |
| - *Challenge*: UI expects tool-only format | |
| - *Mitigation*: Extract tool from PlannedStep for display | |
| 4. **Test Complexity** | |
| - *Challenge*: Mocking complex tool+prompt interactions | |
| - *Mitigation*: Use focused unit tests with clear test data | |
| --- | |
| **Ready for Execution**: All tasks are well-defined with clear objectives, detailed implementation guidance, and comprehensive acceptance criteria. The task dependency chain ensures proper execution order and minimal blocking. | |
| *Sprint 2 Task Summary created for MVP 2 - Enhanced Planner for Tool+Prompt Pairs* |