kgraph-mcp-agent-platform / docs /progress /mvp2_sprint2_plan.md
BasalGanglia's picture
πŸ† Multi-Track Hackathon Submission
1f2d50a verified
# MVP 2 Sprint 2 - Comprehensive Plan
## Enhanced Planner for Tool+Prompt Pairs
**Date**: 2025-06-08
**Sprint Goal**: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects
**Duration**: 3-5 hours
**Status**: πŸš€ **READY TO START**
## 🎯 Sprint 2 Objectives
### Goal Evolution: MVP1 β†’ MVP2 Sprint 2
- **MVP1**: `User Query β†’ Tool Discovery β†’ Tool Suggestion`
- **MVP2 Sprint 2**: `User Query β†’ Tool Discovery β†’ Prompt Selection β†’ (Tool + Prompt) Suggestion`
### Key Deliverables
1. **PlannedStep Ontology** - New dataclass for structured tool+prompt pairs
2. **Enhanced SimplePlannerAgent** - Semantic tool+prompt selection logic
3. **Updated Application Integration** - Backend support for new planner output
4. **Comprehensive Testing** - Full coverage of new planning workflow
## πŸ“‹ Task Breakdown
### Task 2.1: Define PlannedStep Dataclass (60 mins)
**Files**: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py`
**Objective**: Create structured data representation for planner output
**Implementation**:
```python
@dataclass
class PlannedStep:
"""Represents a planned step combining a tool and its prompt."""
tool: MCPTool
prompt: MCPPrompt
relevance_score: Optional[float] = None # Future use
```
**Testing Requirements**:
- Test PlannedStep creation with valid tool+prompt pairs
- Validate type safety and field access
- Test optional relevance_score functionality
### Task 2.2: Refactor SimplePlannerAgent (180 mins)
**Files**: `agents/planner.py`, `tests/agents/test_planner.py`
**Objective**: Implement combined tool+prompt selection logic
**Key Algorithm**:
1. **Tool Selection**: Find relevant tools using semantic search
2. **Prompt Filtering**: Get prompts targeting each selected tool
3. **Prompt Ranking**: Semantically rank prompts against user query
4. **PlannedStep Assembly**: Create structured output
**Implementation Strategy**:
```python
def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
# 1. Get query embedding
query_embedding = self.embedder.get_embedding(user_query)
# 2. Find candidate tools
tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
# 3. For each tool, find and rank prompts
planned_steps = []
for tool_id in tool_ids:
tool = self.kg.get_tool_by_id(tool_id)
prompts = [p for p in self.kg.prompts.values()
if p.target_tool_id == tool.tool_id]
# 4. Select best prompt semantically
best_prompt = self._select_best_prompt(prompts, query_embedding)
if best_prompt:
planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
return planned_steps[:top_k_plans]
```
**Testing Requirements**:
- Test no tools found scenario
- Test tool found but no prompts scenario
- Test tool with single prompt selection
- Test tool with multiple prompts - semantic selection
- Test top_k_plans limiting functionality
### Task 2.3: Update Application Integration (45 mins)
**Files**: `app.py`, `tests/test_app.py`
**Objective**: Update backend to use new planner method
**Changes Required**:
1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()`
2. Handle `PlannedStep` output format (temporary backward compatibility)
3. Ensure no UI crashes during transition
**Implementation**:
```python
def handle_find_tools(query: str) -> dict:
if not planner_agent:
return {"error": "Planner not available"}
planned_steps = planner_agent.generate_plan(query, top_k_plans=1)
if not planned_steps:
return {"info": f"No actionable plans found for: '{query}'"}
# Temporary: extract tool for display (UI update in Sprint 3)
first_plan = planned_steps[0]
return format_tool_for_display(first_plan.tool)
```
### Task 2.4: Quality Assurance & Deployment (30 mins)
**Objective**: Ensure code quality and system stability
**Checklist**:
- [ ] Run `just lint` - Code style compliance
- [ ] Run `just format` - Automatic formatting
- [ ] Run `just type-check` - Type safety validation
- [ ] Run `just test` - Full test suite execution
- [ ] Manual integration testing
- [ ] Update requirements.lock if needed
- [ ] Commit and push changes
- [ ] Verify CI pipeline success
## πŸ”§ Technical Architecture
### Data Flow Evolution
```
User Query
↓
Query Embedding (OpenAI)
↓
Tool Semantic Search (Knowledge Graph)
↓
Prompt Filtering (by target_tool_id)
↓
Prompt Semantic Ranking (vs Query)
↓
PlannedStep Assembly
↓
Structured Output (Tool + Prompt)
```
### New Components Introduced
1. **PlannedStep Dataclass** - Structured output format
2. **Enhanced Planning Logic** - Tool+prompt selection
3. **Semantic Prompt Ranking** - Context-aware prompt selection
4. **Backward Compatible Interface** - Smooth transition support
### Integration Points
- **Knowledge Graph**: Extended prompt search capabilities
- **Embedding Service**: Dual-purpose tool+prompt ranking
- **Application Layer**: Updated method signatures and handling
## πŸ§ͺ Testing Strategy
### Unit Test Coverage
- **PlannedStep Tests**: Creation, validation, type safety
- **Planner Logic Tests**: All selection scenarios and edge cases
- **Integration Tests**: End-to-end workflow validation
- **Error Handling Tests**: Graceful failure scenarios
### Test Scenarios
1. **Happy Path**: Query β†’ Tool β†’ Prompt β†’ PlannedStep
2. **No Tools Found**: Empty result handling
3. **Tool Without Prompts**: Graceful skipping
4. **Multiple Prompts**: Semantic selection validation
5. **Edge Cases**: Empty queries, API failures
### Manual Testing Checklist
- [ ] Application starts successfully with new planner
- [ ] Tool suggestions still work (backward compatibility)
- [ ] No crashes in UI during tool selection
- [ ] Logging shows enhanced planning information
## πŸ“Š Success Metrics
| Metric | Target | Validation Method |
|--------|--------|------------------|
| PlannedStep Creation | βœ… Complete | Unit tests pass |
| Tool+Prompt Selection | βœ… Semantic accuracy | Integration tests |
| Backward Compatibility | βœ… No breaking changes | Manual testing |
| Code Quality | βœ… All checks pass | CI pipeline |
| Test Coverage | βœ… >90% for new code | pytest coverage |
## πŸ”„ Sprint Dependencies
### Prerequisites (Completed in Sprint 1)
- βœ… MCPPrompt ontology established
- βœ… Knowledge graph extended for prompts
- βœ… Vector indexing for prompt search
- βœ… Initial prompt dataset created
### Deliverables for Sprint 3
- βœ… PlannedStep objects ready for UI display
- βœ… Enhanced planner generating structured output
- βœ… Backend integration supporting rich display
- βœ… Test coverage preventing regressions
## 🚨 Risk Mitigation
### Potential Challenges
1. **Semantic Prompt Selection Complexity**
- *Risk*: Overly complex ranking logic
- *Mitigation*: Start with simple cosine similarity, iterate
2. **Performance with Multiple Prompts**
- *Risk*: Slow response times
- *Mitigation*: Use pre-computed embeddings, limit candidates
3. **Test Complexity**
- *Risk*: Difficult to mock complex interactions
- *Mitigation*: Break into smaller, testable units
4. **Backward Compatibility**
- *Risk*: Breaking existing functionality
- *Mitigation*: Careful interface design, thorough testing
## 🎯 Sprint 3 Preparation
### Ready for Next Sprint
After Sprint 2 completion, Sprint 3 can focus on:
- UI enhancements to display PlannedStep information
- Rich prompt template display with variables
- Interactive input field generation
- Enhanced user experience for tool+prompt workflows
---
*Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs*
*Estimated effort: 3-5 hours*
*Focus: Backend logic enhancement and structured output*