File size: 7,929 Bytes
1f2d50a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# MVP 2 Sprint 2 - Comprehensive Plan
## Enhanced Planner for Tool+Prompt Pairs

**Date**: 2025-06-08  
**Sprint Goal**: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects  
**Duration**: 3-5 hours  
**Status**: πŸš€ **READY TO START**

## 🎯 Sprint 2 Objectives

### Goal Evolution: MVP1 β†’ MVP2 Sprint 2
- **MVP1**: `User Query β†’ Tool Discovery β†’ Tool Suggestion`
- **MVP2 Sprint 2**: `User Query β†’ Tool Discovery β†’ Prompt Selection β†’ (Tool + Prompt) Suggestion`

### Key Deliverables
1. **PlannedStep Ontology** - New dataclass for structured tool+prompt pairs
2. **Enhanced SimplePlannerAgent** - Semantic tool+prompt selection logic  
3. **Updated Application Integration** - Backend support for new planner output
4. **Comprehensive Testing** - Full coverage of new planning workflow

## πŸ“‹ Task Breakdown

### Task 2.1: Define PlannedStep Dataclass (60 mins)
**Files**: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py`

**Objective**: Create structured data representation for planner output

**Implementation**:
```python
@dataclass
class PlannedStep:
    """Represents a planned step combining a tool and its prompt."""
    tool: MCPTool
    prompt: MCPPrompt
    relevance_score: Optional[float] = None  # Future use
```

**Testing Requirements**:
- Test PlannedStep creation with valid tool+prompt pairs
- Validate type safety and field access
- Test optional relevance_score functionality

### Task 2.2: Refactor SimplePlannerAgent (180 mins)
**Files**: `agents/planner.py`, `tests/agents/test_planner.py`

**Objective**: Implement combined tool+prompt selection logic

**Key Algorithm**:
1. **Tool Selection**: Find relevant tools using semantic search
2. **Prompt Filtering**: Get prompts targeting each selected tool
3. **Prompt Ranking**: Semantically rank prompts against user query
4. **PlannedStep Assembly**: Create structured output

**Implementation Strategy**:
```python
def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
    # 1. Get query embedding
    query_embedding = self.embedder.get_embedding(user_query)
    
    # 2. Find candidate tools
    tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
    
    # 3. For each tool, find and rank prompts
    planned_steps = []
    for tool_id in tool_ids:
        tool = self.kg.get_tool_by_id(tool_id)
        prompts = [p for p in self.kg.prompts.values() 
                  if p.target_tool_id == tool.tool_id]
        
        # 4. Select best prompt semantically
        best_prompt = self._select_best_prompt(prompts, query_embedding)
        if best_prompt:
            planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
    
    return planned_steps[:top_k_plans]
```

**Testing Requirements**:
- Test no tools found scenario
- Test tool found but no prompts scenario  
- Test tool with single prompt selection
- Test tool with multiple prompts - semantic selection
- Test top_k_plans limiting functionality

### Task 2.3: Update Application Integration (45 mins)
**Files**: `app.py`, `tests/test_app.py`

**Objective**: Update backend to use new planner method

**Changes Required**:
1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()`
2. Handle `PlannedStep` output format (temporary backward compatibility)
3. Ensure no UI crashes during transition

**Implementation**:
```python
def handle_find_tools(query: str) -> dict:
    if not planner_agent:
        return {"error": "Planner not available"}
    
    planned_steps = planner_agent.generate_plan(query, top_k_plans=1)
    
    if not planned_steps:
        return {"info": f"No actionable plans found for: '{query}'"}
    
    # Temporary: extract tool for display (UI update in Sprint 3)
    first_plan = planned_steps[0]
    return format_tool_for_display(first_plan.tool)
```

### Task 2.4: Quality Assurance & Deployment (30 mins)
**Objective**: Ensure code quality and system stability

**Checklist**:
- [ ] Run `just lint` - Code style compliance
- [ ] Run `just format` - Automatic formatting
- [ ] Run `just type-check` - Type safety validation  
- [ ] Run `just test` - Full test suite execution
- [ ] Manual integration testing
- [ ] Update requirements.lock if needed
- [ ] Commit and push changes
- [ ] Verify CI pipeline success

## πŸ”§ Technical Architecture

### Data Flow Evolution
```
User Query
    ↓
Query Embedding (OpenAI)
    ↓
Tool Semantic Search (Knowledge Graph)
    ↓
Prompt Filtering (by target_tool_id)
    ↓
Prompt Semantic Ranking (vs Query)
    ↓
PlannedStep Assembly
    ↓
Structured Output (Tool + Prompt)
```

### New Components Introduced
1. **PlannedStep Dataclass** - Structured output format
2. **Enhanced Planning Logic** - Tool+prompt selection
3. **Semantic Prompt Ranking** - Context-aware prompt selection
4. **Backward Compatible Interface** - Smooth transition support

### Integration Points
- **Knowledge Graph**: Extended prompt search capabilities
- **Embedding Service**: Dual-purpose tool+prompt ranking
- **Application Layer**: Updated method signatures and handling

## πŸ§ͺ Testing Strategy

### Unit Test Coverage
- **PlannedStep Tests**: Creation, validation, type safety
- **Planner Logic Tests**: All selection scenarios and edge cases
- **Integration Tests**: End-to-end workflow validation
- **Error Handling Tests**: Graceful failure scenarios

### Test Scenarios
1. **Happy Path**: Query β†’ Tool β†’ Prompt β†’ PlannedStep
2. **No Tools Found**: Empty result handling
3. **Tool Without Prompts**: Graceful skipping
4. **Multiple Prompts**: Semantic selection validation
5. **Edge Cases**: Empty queries, API failures

### Manual Testing Checklist
- [ ] Application starts successfully with new planner
- [ ] Tool suggestions still work (backward compatibility)
- [ ] No crashes in UI during tool selection
- [ ] Logging shows enhanced planning information

## πŸ“Š Success Metrics

| Metric | Target | Validation Method |
|--------|--------|------------------|
| PlannedStep Creation | βœ… Complete | Unit tests pass |
| Tool+Prompt Selection | βœ… Semantic accuracy | Integration tests |
| Backward Compatibility | βœ… No breaking changes | Manual testing |
| Code Quality | βœ… All checks pass | CI pipeline |
| Test Coverage | βœ… >90% for new code | pytest coverage |

## πŸ”„ Sprint Dependencies

### Prerequisites (Completed in Sprint 1)
- βœ… MCPPrompt ontology established  
- βœ… Knowledge graph extended for prompts
- βœ… Vector indexing for prompt search
- βœ… Initial prompt dataset created

### Deliverables for Sprint 3
- βœ… PlannedStep objects ready for UI display
- βœ… Enhanced planner generating structured output
- βœ… Backend integration supporting rich display
- βœ… Test coverage preventing regressions

## 🚨 Risk Mitigation

### Potential Challenges
1. **Semantic Prompt Selection Complexity**
   - *Risk*: Overly complex ranking logic
   - *Mitigation*: Start with simple cosine similarity, iterate

2. **Performance with Multiple Prompts**
   - *Risk*: Slow response times
   - *Mitigation*: Use pre-computed embeddings, limit candidates

3. **Test Complexity**
   - *Risk*: Difficult to mock complex interactions
   - *Mitigation*: Break into smaller, testable units

4. **Backward Compatibility**
   - *Risk*: Breaking existing functionality
   - *Mitigation*: Careful interface design, thorough testing

## 🎯 Sprint 3 Preparation

### Ready for Next Sprint
After Sprint 2 completion, Sprint 3 can focus on:
- UI enhancements to display PlannedStep information
- Rich prompt template display with variables
- Interactive input field generation
- Enhanced user experience for tool+prompt workflows

---

*Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs*  
*Estimated effort: 3-5 hours*  
*Focus: Backend logic enhancement and structured output*