Spaces:

BasalGanglia
/

kgraph-mcp-agent-platform

Sleeping

App Files Files Community

kgraph-mcp-agent-platform / docs /progress /mvp2_sprint2_plan.md

BasalGanglia

🏆 Multi-Track Hackathon Submission

1f2d50a verified 6 months ago

preview code

raw

history blame contribute delete

7.93 kB

	# MVP 2 Sprint 2 - Comprehensive Plan
	## Enhanced Planner for Tool+Prompt Pairs

	Date: 2025-06-08
	Sprint Goal: Modify `SimplePlannerAgent` to select both relevant `MCPTool` and corresponding `MCPPrompt`, returning structured `PlannedStep` objects
	Duration: 3-5 hours
	Status: 🚀 READY TO START

	## 🎯 Sprint 2 Objectives

	### Goal Evolution: MVP1 → MVP2 Sprint 2
	- MVP1: `User Query → Tool Discovery → Tool Suggestion`
	- MVP2 Sprint 2: `User Query → Tool Discovery → Prompt Selection → (Tool + Prompt) Suggestion`

	### Key Deliverables
	1. PlannedStep Ontology - New dataclass for structured tool+prompt pairs
	2. Enhanced SimplePlannerAgent - Semantic tool+prompt selection logic
	3. Updated Application Integration - Backend support for new planner output
	4. Comprehensive Testing - Full coverage of new planning workflow

	## 📋 Task Breakdown

	### Task 2.1: Define PlannedStep Dataclass (60 mins)
	Files: `kg_services/ontology.py`, `tests/kg_services/test_ontology.py`

	Objective: Create structured data representation for planner output

	Implementation:
	```python
	@dataclass
	class PlannedStep:
	"""Represents a planned step combining a tool and its prompt."""
	tool: MCPTool
	prompt: MCPPrompt
	relevance_score: Optional[float] = None # Future use
	```

	Testing Requirements:
	- Test PlannedStep creation with valid tool+prompt pairs
	- Validate type safety and field access
	- Test optional relevance_score functionality

	### Task 2.2: Refactor SimplePlannerAgent (180 mins)
	Files: `agents/planner.py`, `tests/agents/test_planner.py`

	Objective: Implement combined tool+prompt selection logic

	Key Algorithm:
	1. Tool Selection: Find relevant tools using semantic search
	2. Prompt Filtering: Get prompts targeting each selected tool
	3. Prompt Ranking: Semantically rank prompts against user query
	4. PlannedStep Assembly: Create structured output

	Implementation Strategy:
	```python
	def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
	# 1. Get query embedding
	query_embedding = self.embedder.get_embedding(user_query)

	# 2. Find candidate tools
	tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)

	# 3. For each tool, find and rank prompts
	planned_steps = []
	for tool_id in tool_ids:
	tool = self.kg.get_tool_by_id(tool_id)
	prompts = [p for p in self.kg.prompts.values()
	if p.target_tool_id == tool.tool_id]

	# 4. Select best prompt semantically
	best_prompt = self._select_best_prompt(prompts, query_embedding)
	if best_prompt:
	planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))

	return planned_steps[:top_k_plans]
	```

	Testing Requirements:
	- Test no tools found scenario
	- Test tool found but no prompts scenario
	- Test tool with single prompt selection
	- Test tool with multiple prompts - semantic selection
	- Test top_k_plans limiting functionality

	### Task 2.3: Update Application Integration (45 mins)
	Files: `app.py`, `tests/test_app.py`

	Objective: Update backend to use new planner method

	Changes Required:
	1. Update `handle_find_tools` to call `generate_plan()` instead of `suggest_tools()`
	2. Handle `PlannedStep` output format (temporary backward compatibility)
	3. Ensure no UI crashes during transition

	Implementation:
	```python
	def handle_find_tools(query: str) -> dict:
	if not planner_agent:
	return {"error": "Planner not available"}

	planned_steps = planner_agent.generate_plan(query, top_k_plans=1)

	if not planned_steps:
	return {"info": f"No actionable plans found for: '{query}'"}

	# Temporary: extract tool for display (UI update in Sprint 3)
	first_plan = planned_steps[0]
	return format_tool_for_display(first_plan.tool)
	```

	### Task 2.4: Quality Assurance & Deployment (30 mins)
	Objective: Ensure code quality and system stability

	Checklist:
	- [ ] Run `just lint` - Code style compliance
	- [ ] Run `just format` - Automatic formatting
	- [ ] Run `just type-check` - Type safety validation
	- [ ] Run `just test` - Full test suite execution
	- [ ] Manual integration testing
	- [ ] Update requirements.lock if needed
	- [ ] Commit and push changes
	- [ ] Verify CI pipeline success

	## 🔧 Technical Architecture

	### Data Flow Evolution
	```
	User Query
	↓
	Query Embedding (OpenAI)
	↓
	Tool Semantic Search (Knowledge Graph)
	↓
	Prompt Filtering (by target_tool_id)
	↓
	Prompt Semantic Ranking (vs Query)
	↓
	PlannedStep Assembly
	↓
	Structured Output (Tool + Prompt)
	```

	### New Components Introduced
	1. PlannedStep Dataclass - Structured output format
	2. Enhanced Planning Logic - Tool+prompt selection
	3. Semantic Prompt Ranking - Context-aware prompt selection
	4. Backward Compatible Interface - Smooth transition support

	### Integration Points
	- Knowledge Graph: Extended prompt search capabilities
	- Embedding Service: Dual-purpose tool+prompt ranking
	- Application Layer: Updated method signatures and handling

	## 🧪 Testing Strategy

	### Unit Test Coverage
	- PlannedStep Tests: Creation, validation, type safety
	- Planner Logic Tests: All selection scenarios and edge cases
	- Integration Tests: End-to-end workflow validation
	- Error Handling Tests: Graceful failure scenarios

	### Test Scenarios
	1. Happy Path: Query → Tool → Prompt → PlannedStep
	2. No Tools Found: Empty result handling
	3. Tool Without Prompts: Graceful skipping
	4. Multiple Prompts: Semantic selection validation
	5. Edge Cases: Empty queries, API failures

	### Manual Testing Checklist
	- [ ] Application starts successfully with new planner
	- [ ] Tool suggestions still work (backward compatibility)
	- [ ] No crashes in UI during tool selection
	- [ ] Logging shows enhanced planning information

	## 📊 Success Metrics

	\| Metric \| Target \| Validation Method \|
	\|--------\|--------\|------------------\|
	\| PlannedStep Creation \| ✅ Complete \| Unit tests pass \|
	\| Tool+Prompt Selection \| ✅ Semantic accuracy \| Integration tests \|
	\| Backward Compatibility \| ✅ No breaking changes \| Manual testing \|
	\| Code Quality \| ✅ All checks pass \| CI pipeline \|
	\| Test Coverage \| ✅ >90% for new code \| pytest coverage \|

	## 🔄 Sprint Dependencies

	### Prerequisites (Completed in Sprint 1)
	- ✅ MCPPrompt ontology established
	- ✅ Knowledge graph extended for prompts
	- ✅ Vector indexing for prompt search
	- ✅ Initial prompt dataset created

	### Deliverables for Sprint 3
	- ✅ PlannedStep objects ready for UI display
	- ✅ Enhanced planner generating structured output
	- ✅ Backend integration supporting rich display
	- ✅ Test coverage preventing regressions

	## 🚨 Risk Mitigation

	### Potential Challenges
	1. Semantic Prompt Selection Complexity
	- Risk: Overly complex ranking logic
	- Mitigation: Start with simple cosine similarity, iterate

	2. Performance with Multiple Prompts
	- Risk: Slow response times
	- Mitigation: Use pre-computed embeddings, limit candidates

	3. Test Complexity
	- Risk: Difficult to mock complex interactions
	- Mitigation: Break into smaller, testable units

	4. Backward Compatibility
	- Risk: Breaking existing functionality
	- Mitigation: Careful interface design, thorough testing

	## 🎯 Sprint 3 Preparation

	### Ready for Next Sprint
	After Sprint 2 completion, Sprint 3 can focus on:
	- UI enhancements to display PlannedStep information
	- Rich prompt template display with variables
	- Interactive input field generation
	- Enhanced user experience for tool+prompt workflows

	---

	Plan created for MVP 2 Sprint 2 - Enhanced Planner for Tool+Prompt Pairs
	Estimated effort: 3-5 hours
	Focus: Backend logic enhancement and structured output