File size: 7,929 Bytes
1f2d50a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# MVP 2 Sprint 2 - Comprehensive Plan
## Enhanced Planner for Tool+Prompt Pairs
**Date**: 2025-06-08
**Sprint Goal**: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects
**Duration**: 3-5 hours
**Status**: π **READY TO START**
## π― Sprint 2 Objectives
### Goal Evolution: MVP1 β MVP2 Sprint 2
- **MVP1**: `User Query β Tool Discovery β Tool Suggestion`
- **MVP2 Sprint 2**: `User Query β Tool Discovery β Prompt Selection β (Tool + Prompt) Suggestion`
### Key Deliverables
1. **PlannedStep Ontology** - New dataclass for structured tool+prompt pairs
2. **Enhanced SimplePlannerAgent** - Semantic tool+prompt selection logic
3. **Updated Application Integration** - Backend support for new planner output
4. **Comprehensive Testing** - Full coverage of new planning workflow
## π Task Breakdown
### Task 2.1: Define PlannedStep Dataclass (60 mins)
**Files**: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py`
**Objective**: Create structured data representation for planner output
**Implementation**:
```python
@dataclass
class PlannedStep:
"""Represents a planned step combining a tool and its prompt."""
tool: MCPTool
prompt: MCPPrompt
relevance_score: Optional[float] = None # Future use
```
**Testing Requirements**:
- Test PlannedStep creation with valid tool+prompt pairs
- Validate type safety and field access
- Test optional relevance_score functionality
### Task 2.2: Refactor SimplePlannerAgent (180 mins)
**Files**: `agents/planner.py`, `tests/agents/test_planner.py`
**Objective**: Implement combined tool+prompt selection logic
**Key Algorithm**:
1. **Tool Selection**: Find relevant tools using semantic search
2. **Prompt Filtering**: Get prompts targeting each selected tool
3. **Prompt Ranking**: Semantically rank prompts against user query
4. **PlannedStep Assembly**: Create structured output
**Implementation Strategy**:
```python
def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
# 1. Get query embedding
query_embedding = self.embedder.get_embedding(user_query)
# 2. Find candidate tools
tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
# 3. For each tool, find and rank prompts
planned_steps = []
for tool_id in tool_ids:
tool = self.kg.get_tool_by_id(tool_id)
prompts = [p for p in self.kg.prompts.values()
if p.target_tool_id == tool.tool_id]
# 4. Select best prompt semantically
best_prompt = self._select_best_prompt(prompts, query_embedding)
if best_prompt:
planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
return planned_steps[:top_k_plans]
```
**Testing Requirements**:
- Test no tools found scenario
- Test tool found but no prompts scenario
- Test tool with single prompt selection
- Test tool with multiple prompts - semantic selection
- Test top_k_plans limiting functionality
### Task 2.3: Update Application Integration (45 mins)
**Files**: `app.py`, `tests/test_app.py`
**Objective**: Update backend to use new planner method
**Changes Required**:
1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()`
2. Handle `PlannedStep` output format (temporary backward compatibility)
3. Ensure no UI crashes during transition
**Implementation**:
```python
def handle_find_tools(query: str) -> dict:
if not planner_agent:
return {"error": "Planner not available"}
planned_steps = planner_agent.generate_plan(query, top_k_plans=1)
if not planned_steps:
return {"info": f"No actionable plans found for: '{query}'"}
# Temporary: extract tool for display (UI update in Sprint 3)
first_plan = planned_steps[0]
return format_tool_for_display(first_plan.tool)
```
### Task 2.4: Quality Assurance & Deployment (30 mins)
**Objective**: Ensure code quality and system stability
**Checklist**:
- [ ] Run `just lint` - Code style compliance
- [ ] Run `just format` - Automatic formatting
- [ ] Run `just type-check` - Type safety validation
- [ ] Run `just test` - Full test suite execution
- [ ] Manual integration testing
- [ ] Update requirements.lock if needed
- [ ] Commit and push changes
- [ ] Verify CI pipeline success
## π§ Technical Architecture
### Data Flow Evolution
```
User Query
β
Query Embedding (OpenAI)
β
Tool Semantic Search (Knowledge Graph)
β
Prompt Filtering (by target_tool_id)
β
Prompt Semantic Ranking (vs Query)
β
PlannedStep Assembly
β
Structured Output (Tool + Prompt)
```
### New Components Introduced
1. **PlannedStep Dataclass** - Structured output format
2. **Enhanced Planning Logic** - Tool+prompt selection
3. **Semantic Prompt Ranking** - Context-aware prompt selection
4. **Backward Compatible Interface** - Smooth transition support
### Integration Points
- **Knowledge Graph**: Extended prompt search capabilities
- **Embedding Service**: Dual-purpose tool+prompt ranking
- **Application Layer**: Updated method signatures and handling
## π§ͺ Testing Strategy
### Unit Test Coverage
- **PlannedStep Tests**: Creation, validation, type safety
- **Planner Logic Tests**: All selection scenarios and edge cases
- **Integration Tests**: End-to-end workflow validation
- **Error Handling Tests**: Graceful failure scenarios
### Test Scenarios
1. **Happy Path**: Query β Tool β Prompt β PlannedStep
2. **No Tools Found**: Empty result handling
3. **Tool Without Prompts**: Graceful skipping
4. **Multiple Prompts**: Semantic selection validation
5. **Edge Cases**: Empty queries, API failures
### Manual Testing Checklist
- [ ] Application starts successfully with new planner
- [ ] Tool suggestions still work (backward compatibility)
- [ ] No crashes in UI during tool selection
- [ ] Logging shows enhanced planning information
## π Success Metrics
| Metric | Target | Validation Method |
|--------|--------|------------------|
| PlannedStep Creation | β
Complete | Unit tests pass |
| Tool+Prompt Selection | β
Semantic accuracy | Integration tests |
| Backward Compatibility | β
No breaking changes | Manual testing |
| Code Quality | β
All checks pass | CI pipeline |
| Test Coverage | β
>90% for new code | pytest coverage |
## π Sprint Dependencies
### Prerequisites (Completed in Sprint 1)
- β
MCPPrompt ontology established
- β
Knowledge graph extended for prompts
- β
Vector indexing for prompt search
- β
Initial prompt dataset created
### Deliverables for Sprint 3
- β
PlannedStep objects ready for UI display
- β
Enhanced planner generating structured output
- β
Backend integration supporting rich display
- β
Test coverage preventing regressions
## π¨ Risk Mitigation
### Potential Challenges
1. **Semantic Prompt Selection Complexity**
- *Risk*: Overly complex ranking logic
- *Mitigation*: Start with simple cosine similarity, iterate
2. **Performance with Multiple Prompts**
- *Risk*: Slow response times
- *Mitigation*: Use pre-computed embeddings, limit candidates
3. **Test Complexity**
- *Risk*: Difficult to mock complex interactions
- *Mitigation*: Break into smaller, testable units
4. **Backward Compatibility**
- *Risk*: Breaking existing functionality
- *Mitigation*: Careful interface design, thorough testing
## π― Sprint 3 Preparation
### Ready for Next Sprint
After Sprint 2 completion, Sprint 3 can focus on:
- UI enhancements to display PlannedStep information
- Rich prompt template display with variables
- Interactive input field generation
- Enhanced user experience for tool+prompt workflows
---
*Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs*
*Estimated effort: 3-5 hours*
*Focus: Backend logic enhancement and structured output* |