A newer version of the Gradio SDK is available:
6.2.0
MVP 2 Sprint 2 - Comprehensive Plan
Enhanced Planner for Tool+Prompt Pairs
Date: 2025-06-08
Sprint Goal: Modify SimplePlannerAgent to select both relevant MCPTool and corresponding MCPPrompt, returning structured PlannedStep objects
Duration: 3-5 hours
Status: π READY TO START
π― Sprint 2 Objectives
Goal Evolution: MVP1 β MVP2 Sprint 2
- MVP1:
User Query β Tool Discovery β Tool Suggestion - MVP2 Sprint 2:
User Query β Tool Discovery β Prompt Selection β (Tool + Prompt) Suggestion
Key Deliverables
- PlannedStep Ontology - New dataclass for structured tool+prompt pairs
- Enhanced SimplePlannerAgent - Semantic tool+prompt selection logic
- Updated Application Integration - Backend support for new planner output
- Comprehensive Testing - Full coverage of new planning workflow
π Task Breakdown
Task 2.1: Define PlannedStep Dataclass (60 mins)
Files: kg_services/ontology.py, tests/kg_services/test_ontology.py
Objective: Create structured data representation for planner output
Implementation:
@dataclass
class PlannedStep:
"""Represents a planned step combining a tool and its prompt."""
tool: MCPTool
prompt: MCPPrompt
relevance_score: Optional[float] = None # Future use
Testing Requirements:
- Test PlannedStep creation with valid tool+prompt pairs
- Validate type safety and field access
- Test optional relevance_score functionality
Task 2.2: Refactor SimplePlannerAgent (180 mins)
Files: agents/planner.py, tests/agents/test_planner.py
Objective: Implement combined tool+prompt selection logic
Key Algorithm:
- Tool Selection: Find relevant tools using semantic search
- Prompt Filtering: Get prompts targeting each selected tool
- Prompt Ranking: Semantically rank prompts against user query
- PlannedStep Assembly: Create structured output
Implementation Strategy:
def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
# 1. Get query embedding
query_embedding = self.embedder.get_embedding(user_query)
# 2. Find candidate tools
tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
# 3. For each tool, find and rank prompts
planned_steps = []
for tool_id in tool_ids:
tool = self.kg.get_tool_by_id(tool_id)
prompts = [p for p in self.kg.prompts.values()
if p.target_tool_id == tool.tool_id]
# 4. Select best prompt semantically
best_prompt = self._select_best_prompt(prompts, query_embedding)
if best_prompt:
planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
return planned_steps[:top_k_plans]
Testing Requirements:
- Test no tools found scenario
- Test tool found but no prompts scenario
- Test tool with single prompt selection
- Test tool with multiple prompts - semantic selection
- Test top_k_plans limiting functionality
Task 2.3: Update Application Integration (45 mins)
Files: app.py, tests/test_app.py
Objective: Update backend to use new planner method
Changes Required:
- Update
handle_find_toolsto callgenerate_plan()instead ofsuggest_tools() - Handle
PlannedStepoutput format (temporary backward compatibility) - Ensure no UI crashes during transition
Implementation:
def handle_find_tools(query: str) -> dict:
if not planner_agent:
return {"error": "Planner not available"}
planned_steps = planner_agent.generate_plan(query, top_k_plans=1)
if not planned_steps:
return {"info": f"No actionable plans found for: '{query}'"}
# Temporary: extract tool for display (UI update in Sprint 3)
first_plan = planned_steps[0]
return format_tool_for_display(first_plan.tool)
Task 2.4: Quality Assurance & Deployment (30 mins)
Objective: Ensure code quality and system stability
Checklist:
- Run
just lint- Code style compliance - Run
just format- Automatic formatting - Run
just type-check- Type safety validation - Run
just test- Full test suite execution - Manual integration testing
- Update requirements.lock if needed
- Commit and push changes
- Verify CI pipeline success
π§ Technical Architecture
Data Flow Evolution
User Query
β
Query Embedding (OpenAI)
β
Tool Semantic Search (Knowledge Graph)
β
Prompt Filtering (by target_tool_id)
β
Prompt Semantic Ranking (vs Query)
β
PlannedStep Assembly
β
Structured Output (Tool + Prompt)
New Components Introduced
- PlannedStep Dataclass - Structured output format
- Enhanced Planning Logic - Tool+prompt selection
- Semantic Prompt Ranking - Context-aware prompt selection
- Backward Compatible Interface - Smooth transition support
Integration Points
- Knowledge Graph: Extended prompt search capabilities
- Embedding Service: Dual-purpose tool+prompt ranking
- Application Layer: Updated method signatures and handling
π§ͺ Testing Strategy
Unit Test Coverage
- PlannedStep Tests: Creation, validation, type safety
- Planner Logic Tests: All selection scenarios and edge cases
- Integration Tests: End-to-end workflow validation
- Error Handling Tests: Graceful failure scenarios
Test Scenarios
- Happy Path: Query β Tool β Prompt β PlannedStep
- No Tools Found: Empty result handling
- Tool Without Prompts: Graceful skipping
- Multiple Prompts: Semantic selection validation
- Edge Cases: Empty queries, API failures
Manual Testing Checklist
- Application starts successfully with new planner
- Tool suggestions still work (backward compatibility)
- No crashes in UI during tool selection
- Logging shows enhanced planning information
π Success Metrics
| Metric | Target | Validation Method |
|---|---|---|
| PlannedStep Creation | β Complete | Unit tests pass |
| Tool+Prompt Selection | β Semantic accuracy | Integration tests |
| Backward Compatibility | β No breaking changes | Manual testing |
| Code Quality | β All checks pass | CI pipeline |
| Test Coverage | β >90% for new code | pytest coverage |
π Sprint Dependencies
Prerequisites (Completed in Sprint 1)
- β MCPPrompt ontology established
- β Knowledge graph extended for prompts
- β Vector indexing for prompt search
- β Initial prompt dataset created
Deliverables for Sprint 3
- β PlannedStep objects ready for UI display
- β Enhanced planner generating structured output
- β Backend integration supporting rich display
- β Test coverage preventing regressions
π¨ Risk Mitigation
Potential Challenges
Semantic Prompt Selection Complexity
- Risk: Overly complex ranking logic
- Mitigation: Start with simple cosine similarity, iterate
Performance with Multiple Prompts
- Risk: Slow response times
- Mitigation: Use pre-computed embeddings, limit candidates
Test Complexity
- Risk: Difficult to mock complex interactions
- Mitigation: Break into smaller, testable units
Backward Compatibility
- Risk: Breaking existing functionality
- Mitigation: Careful interface design, thorough testing
π― Sprint 3 Preparation
Ready for Next Sprint
After Sprint 2 completion, Sprint 3 can focus on:
- UI enhancements to display PlannedStep information
- Rich prompt template display with variables
- Interactive input field generation
- Enhanced user experience for tool+prompt workflows
Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs
Estimated effort: 3-5 hours
Focus: Backend logic enhancement and structured output