kgraph-mcp-agent-platform / docs /progress /mvp2_sprint2_task_summary.md
BasalGanglia's picture
πŸ† Multi-Track Hackathon Submission
1f2d50a verified
# MVP 2 Sprint 2 - Task Summary & Execution Guide
**Date**: 2025-06-08
**Sprint Goal**: Enhanced Planner for Tool+Prompt Pairs
**Status**: πŸš€ **READY FOR EXECUTION**
**Task Management**: Tasks added to `tasks.json` (IDs 26-29)
## 🎯 Sprint Overview
Transform the `SimplePlannerAgent` from suggesting only tools to suggesting **tool+prompt pairs** as structured `PlannedStep` objects, enabling the next evolution toward complete tool+prompt guidance.
### Goal Evolution
- **Current (MVP1)**: `User Query β†’ Tool Discovery β†’ Tool Suggestion`
- **Sprint 2 Target**: `User Query β†’ Tool Discovery β†’ Prompt Selection β†’ (Tool + Prompt) Suggestion`
## πŸ“‹ Task Execution Order
### Task 26: Define PlannedStep Dataclass (60 mins)
**Status**: Todo
**Dependencies**: None
**Priority**: πŸ”΄ **HIGH** (Foundation for all other tasks)
**Execution Command for Claude**:
```
Implement Task 26: Define PlannedStep Dataclass
**Objective**: Create structured data representation for planner output combining MCPTool and MCPPrompt.
**Action 1: Modify `kg_services/ontology.py`**
1. Open @kg_services/ontology.py
2. Add PlannedStep dataclass below existing MCPPrompt class
3. Include fields: tool (MCPTool), prompt (MCPPrompt), relevance_score (Optional[float] = None)
4. Add proper type hints and imports
5. Apply coding standards from @.cursor/rules/python_gradio_basic.mdc
**Action 2: Add Tests in `tests/kg_services/test_ontology.py`**
1. Open @tests/kg_services/test_ontology.py
2. Add test_planned_step_creation() function
3. Test PlannedStep instantiation with valid MCPTool and MCPPrompt
4. Test type safety and field access
5. Test optional relevance_score functionality
Generate the complete implementation.
```
### Task 27: Refactor SimplePlannerAgent (180 mins)
**Status**: Todo
**Dependencies**: Task 26
**Priority**: πŸ”΄ **HIGH** (Core logic transformation)
**Execution Command for Claude**:
```
Implement Task 27: Refactor SimplePlannerAgent for Tool+Prompt Planning
**Objective**: Implement combined tool+prompt selection logic with semantic ranking.
**Action 1: Modify `agents/planner.py`**
1. Open @agents/planner.py
2. Import PlannedStep from kg_services.ontology
3. Rename suggest_tools method to generate_plan
4. Implement algorithm:
- Tool Selection: Use existing semantic search for tools
- Prompt Filtering: Get prompts by target_tool_id
- Prompt Ranking: Semantic similarity against query
- PlannedStep Assembly: Create structured output
5. Add _select_best_prompt helper method
6. Return List[PlannedStep] instead of List[MCPTool]
**Action 2: Update `tests/agents/test_planner.py`**
1. Update all test methods for new generate_plan signature
2. Mock InMemoryKG prompt methods
3. Test scenarios: no tools, no prompts for tool, single prompt, multiple prompts
4. Verify PlannedStep output structure
Generate the complete refactored implementation.
```
### Task 28: Update Application Integration (45 mins)
**Status**: Todo
**Dependencies**: Task 27
**Priority**: 🟑 **MEDIUM** (Integration layer)
**Execution Command for Claude**:
```
Implement Task 28: Update Application Integration for New Planner
**Objective**: Ensure application backend uses enhanced planner without breaking UI.
**Action 1: Modify `app.py`**
1. Open @app.py
2. Update handle_find_tools function:
- Change planner call from suggest_tools to generate_plan
- Handle List[PlannedStep] return type
- Extract tool from PlannedStep for current UI (temporary)
- Add proper error handling for empty results
3. Import PlannedStep if needed
**Action 2: Update `tests/test_app.py`**
1. Update mocked planner method calls
2. Test new generate_plan integration
3. Verify backward compatibility for UI display
Maintain backward compatibility until Sprint 3 UI updates.
```
### Task 29: Quality Assurance & Deployment (30 mins)
**Status**: Todo
**Dependencies**: Task 28
**Priority**: 🟒 **LOW** (Quality gates)
**Execution Command for Claude**:
```
Implement Task 29: Quality Assurance & Deployment
**Objective**: Ensure code quality, system stability, and deployment readiness.
**Actions**:
1. Run `just lint` and fix any style issues
2. Run `just format` to apply formatting
3. Run `just type-check` and resolve type issues
4. Run `just test` and ensure all tests pass
5. Manual integration testing:
- Verify application starts successfully
- Test tool+prompt planning workflow
- Confirm no UI crashes
6. Update requirements.lock if needed
7. Commit changes with conventional commit format
8. Push and verify CI pipeline
Document any issues found for Sprint 3.
```
## πŸ”§ Technical Implementation Details
### PlannedStep Structure
```python
@dataclass
class PlannedStep:
"""Represents a planned step combining a tool and its prompt."""
tool: MCPTool
prompt: MCPPrompt
relevance_score: Optional[float] = None
```
### Enhanced Planning Algorithm
```python
def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
# 1. Get query embedding
query_embedding = self.embedder.get_embedding(user_query)
# 2. Find candidate tools (semantic search)
tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
# 3. For each tool, find and rank prompts
planned_steps = []
for tool_id in tool_ids:
tool = self.kg.get_tool_by_id(tool_id)
# Filter prompts for this tool
prompts = [p for p in self.kg.prompts.values()
if p.target_tool_id == tool.tool_id]
# Select best prompt semantically
best_prompt = self._select_best_prompt(prompts, query_embedding)
if best_prompt:
planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
return planned_steps[:top_k_plans]
```
### Semantic Prompt Selection
```python
def _select_best_prompt(self, prompts: List[MCPPrompt],
query_embedding: List[float]) -> Optional[MCPPrompt]:
if not prompts:
return None
if len(prompts) == 1:
return prompts[0]
best_prompt = None
best_similarity = -1.0
for prompt in prompts:
# Create embedding text from prompt
prompt_text = f"{prompt.name} - {prompt.description} - {prompt.use_case}"
prompt_embedding = self.embedder.get_embedding(prompt_text)
if prompt_embedding:
similarity = self.kg._cosine_similarity(query_embedding, prompt_embedding)
if similarity > best_similarity:
best_similarity = similarity
best_prompt = prompt
return best_prompt
```
## πŸ§ͺ Testing Strategy
### Key Test Scenarios
1. **PlannedStep Creation**: Valid instantiation and field access
2. **No Tools Found**: Empty list return from generate_plan
3. **Tool Without Prompts**: Graceful handling and skipping
4. **Single Prompt for Tool**: Direct selection
5. **Multiple Prompts for Tool**: Semantic ranking selection
6. **Application Integration**: Backward compatible UI interaction
### Test Coverage Targets
- **Unit Tests**: >95% coverage for new PlannedStep and planning logic
- **Integration Tests**: End-to-end workflow validation
- **Regression Tests**: Ensure no breaking changes to existing functionality
## πŸ“Š Success Criteria
| Component | Success Metric | Validation |
|-----------|---------------|------------|
| PlannedStep | Dataclass works correctly | Unit tests pass |
| Enhanced Planner | Tool+prompt selection accurate | Integration tests |
| Application | No UI crashes, backward compatible | Manual testing |
| Code Quality | All quality checks pass | CI pipeline |
## πŸ”„ Sprint 3 Preparation
Upon Sprint 2 completion, the system will be ready for Sprint 3 which focuses on:
- **UI Enhancement**: Display rich PlannedStep information
- **Prompt Template Rendering**: Show template strings with variables
- **Interactive Elements**: Dynamic input field generation
- **User Experience**: Enhanced tool+prompt workflow interface
## 🚨 Potential Challenges & Mitigations
1. **Semantic Prompt Selection Complexity**
- *Challenge*: Multiple prompts with similar semantics
- *Mitigation*: Start with simple cosine similarity, add tie-breaking rules
2. **Performance with Prompt Embeddings**
- *Challenge*: Additional API calls for prompt ranking
- *Mitigation*: Use pre-computed embeddings where possible
3. **Backward Compatibility**
- *Challenge*: UI expects tool-only format
- *Mitigation*: Extract tool from PlannedStep for display
4. **Test Complexity**
- *Challenge*: Mocking complex tool+prompt interactions
- *Mitigation*: Use focused unit tests with clear test data
---
**Ready for Execution**: All tasks are well-defined with clear objectives, detailed implementation guidance, and comprehensive acceptance criteria. The task dependency chain ensures proper execution order and minimal blocking.
*Sprint 2 Task Summary created for MVP 2 - Enhanced Planner for Tool+Prompt Pairs*