# MVP 2 Sprint 2 - Comprehensive Plan
## Enhanced Planner for Tool+Prompt Pairs

**Date**: 2025-06-08  
**Sprint Goal**: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects  
**Duration**: 3-5 hours  
**Status**: 🚀 **READY TO START**

## 🎯 Sprint 2 Objectives

### Goal Evolution: MVP1 → MVP2 Sprint 2
- **MVP1**: `User Query → Tool Discovery → Tool Suggestion`
- **MVP2 Sprint 2**: `User Query → Tool Discovery → Prompt Selection → (Tool + Prompt) Suggestion`

### Key Deliverables
1. **PlannedStep Ontology** - New dataclass for structured tool+prompt pairs
2. **Enhanced SimplePlannerAgent** - Semantic tool+prompt selection logic  
3. **Updated Application Integration** - Backend support for new planner output
4. **Comprehensive Testing** - Full coverage of new planning workflow

## 📋 Task Breakdown

### Task 2.1: Define PlannedStep Dataclass (60 mins)
**Files**: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py`

**Objective**: Create structured data representation for planner output

**Implementation**:
```python
@dataclass
class PlannedStep:
    """Represents a planned step combining a tool and its prompt."""
    tool: MCPTool
    prompt: MCPPrompt
    relevance_score: Optional[float] = None  # Future use
```

**Testing Requirements**:
- Test PlannedStep creation with valid tool+prompt pairs
- Validate type safety and field access
- Test optional relevance_score functionality

### Task 2.2: Refactor SimplePlannerAgent (180 mins)
**Files**: `agents/planner.py`, `tests/agents/test_planner.py`

**Objective**: Implement combined tool+prompt selection logic

**Key Algorithm**:
1. **Tool Selection**: Find relevant tools using semantic search
2. **Prompt Filtering**: Get prompts targeting each selected tool
3. **Prompt Ranking**: Semantically rank prompts against user query
4. **PlannedStep Assembly**: Create structured output

**Implementation Strategy**:
```python
def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
    # 1. Get query embedding
    query_embedding = self.embedder.get_embedding(user_query)
    
    # 2. Find candidate tools
    tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
    
    # 3. For each tool, find and rank prompts
    planned_steps = []
    for tool_id in tool_ids:
        tool = self.kg.get_tool_by_id(tool_id)
        prompts = [p for p in self.kg.prompts.values() 
                  if p.target_tool_id == tool.tool_id]
        
        # 4. Select best prompt semantically
        best_prompt = self._select_best_prompt(prompts, query_embedding)
        if best_prompt:
            planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
    
    return planned_steps[:top_k_plans]
```

**Testing Requirements**:
- Test no tools found scenario
- Test tool found but no prompts scenario  
- Test tool with single prompt selection
- Test tool with multiple prompts - semantic selection
- Test top_k_plans limiting functionality

### Task 2.3: Update Application Integration (45 mins)
**Files**: `app.py`, `tests/test_app.py`

**Objective**: Update backend to use new planner method

**Changes Required**:
1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()`
2. Handle `PlannedStep` output format (temporary backward compatibility)
3. Ensure no UI crashes during transition

**Implementation**:
```python
def handle_find_tools(query: str) -> dict:
    if not planner_agent:
        return {"error": "Planner not available"}
    
    planned_steps = planner_agent.generate_plan(query, top_k_plans=1)
    
    if not planned_steps:
        return {"info": f"No actionable plans found for: '{query}'"}
    
    # Temporary: extract tool for display (UI update in Sprint 3)
    first_plan = planned_steps[0]
    return format_tool_for_display(first_plan.tool)
```

### Task 2.4: Quality Assurance & Deployment (30 mins)
**Objective**: Ensure code quality and system stability

**Checklist**:
- [ ] Run `just lint` - Code style compliance
- [ ] Run `just format` - Automatic formatting
- [ ] Run `just type-check` - Type safety validation  
- [ ] Run `just test` - Full test suite execution
- [ ] Manual integration testing
- [ ] Update requirements.lock if needed
- [ ] Commit and push changes
- [ ] Verify CI pipeline success

## 🔧 Technical Architecture

### Data Flow Evolution
```
User Query
    ↓
Query Embedding (OpenAI)
    ↓
Tool Semantic Search (Knowledge Graph)
    ↓
Prompt Filtering (by target_tool_id)
    ↓
Prompt Semantic Ranking (vs Query)
    ↓
PlannedStep Assembly
    ↓
Structured Output (Tool + Prompt)
```

### New Components Introduced
1. **PlannedStep Dataclass** - Structured output format
2. **Enhanced Planning Logic** - Tool+prompt selection
3. **Semantic Prompt Ranking** - Context-aware prompt selection
4. **Backward Compatible Interface** - Smooth transition support

### Integration Points
- **Knowledge Graph**: Extended prompt search capabilities
- **Embedding Service**: Dual-purpose tool+prompt ranking
- **Application Layer**: Updated method signatures and handling

## 🧪 Testing Strategy

### Unit Test Coverage
- **PlannedStep Tests**: Creation, validation, type safety
- **Planner Logic Tests**: All selection scenarios and edge cases
- **Integration Tests**: End-to-end workflow validation
- **Error Handling Tests**: Graceful failure scenarios

### Test Scenarios
1. **Happy Path**: Query → Tool → Prompt → PlannedStep
2. **No Tools Found**: Empty result handling
3. **Tool Without Prompts**: Graceful skipping
4. **Multiple Prompts**: Semantic selection validation
5. **Edge Cases**: Empty queries, API failures

### Manual Testing Checklist
- [ ] Application starts successfully with new planner
- [ ] Tool suggestions still work (backward compatibility)
- [ ] No crashes in UI during tool selection
- [ ] Logging shows enhanced planning information

## 📊 Success Metrics

| Metric | Target | Validation Method |
|--------|--------|------------------|
| PlannedStep Creation | ✅ Complete | Unit tests pass |
| Tool+Prompt Selection | ✅ Semantic accuracy | Integration tests |
| Backward Compatibility | ✅ No breaking changes | Manual testing |
| Code Quality | ✅ All checks pass | CI pipeline |
| Test Coverage | ✅ >90% for new code | pytest coverage |

## 🔄 Sprint Dependencies

### Prerequisites (Completed in Sprint 1)
- ✅ MCPPrompt ontology established  
- ✅ Knowledge graph extended for prompts
- ✅ Vector indexing for prompt search
- ✅ Initial prompt dataset created

### Deliverables for Sprint 3
- ✅ PlannedStep objects ready for UI display
- ✅ Enhanced planner generating structured output
- ✅ Backend integration supporting rich display
- ✅ Test coverage preventing regressions

## 🚨 Risk Mitigation

### Potential Challenges
1. **Semantic Prompt Selection Complexity**
   - *Risk*: Overly complex ranking logic
   - *Mitigation*: Start with simple cosine similarity, iterate

2. **Performance with Multiple Prompts**
   - *Risk*: Slow response times
   - *Mitigation*: Use pre-computed embeddings, limit candidates

3. **Test Complexity**
   - *Risk*: Difficult to mock complex interactions
   - *Mitigation*: Break into smaller, testable units

4. **Backward Compatibility**
   - *Risk*: Breaking existing functionality
   - *Mitigation*: Careful interface design, thorough testing

## 🎯 Sprint 3 Preparation

### Ready for Next Sprint
After Sprint 2 completion, Sprint 3 can focus on:
- UI enhancements to display PlannedStep information
- Rich prompt template display with variables
- Interactive input field generation
- Enhanced user experience for tool+prompt workflows

---

*Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs*  
*Estimated effort: 3-5 hours*  
*Focus: Backend logic enhancement and structured output*