Spaces:

BasalGanglia
/

kgraph-mcp-agent-platform

Sleeping

App Files Files Community

kgraph-mcp-agent-platform / docs /progress /mvp2_sprint2_task_summary.md

BasalGanglia

🏆 Multi-Track Hackathon Submission

1f2d50a verified 6 months ago

preview code

raw

history blame contribute delete

9.04 kB

	# MVP 2 Sprint 2 - Task Summary & Execution Guide

	Date: 2025-06-08
	Sprint Goal: Enhanced Planner for Tool+Prompt Pairs
	Status: 🚀 READY FOR EXECUTION
	Task Management: Tasks added to `tasks.json` (IDs 26-29)

	## 🎯 Sprint Overview

	Transform the `SimplePlannerAgent` from suggesting only tools to suggesting tool+prompt pairs as structured `PlannedStep` objects, enabling the next evolution toward complete tool+prompt guidance.

	### Goal Evolution
	- Current (MVP1): `User Query → Tool Discovery → Tool Suggestion`
	- Sprint 2 Target: `User Query → Tool Discovery → Prompt Selection → (Tool + Prompt) Suggestion`

	## 📋 Task Execution Order

	### Task 26: Define PlannedStep Dataclass (60 mins)
	Status: Todo
	Dependencies: None
	Priority: 🔴 HIGH (Foundation for all other tasks)

	Execution Command for Claude:
	```
	Implement Task 26: Define PlannedStep Dataclass

	Objective: Create structured data representation for planner output combining MCPTool and MCPPrompt.

	Action 1: Modify `kg_services/ontology.py`
	1. Open @kg_services/ontology.py
	2. Add PlannedStep dataclass below existing MCPPrompt class
	3. Include fields: tool (MCPTool), prompt (MCPPrompt), relevance_score (Optional[float] = None)
	4. Add proper type hints and imports
	5. Apply coding standards from @.cursor/rules/python_gradio_basic.mdc

	Action 2: Add Tests in `tests/kg_services/test_ontology.py`
	1. Open @tests/kg_services/test_ontology.py
	2. Add test_planned_step_creation() function
	3. Test PlannedStep instantiation with valid MCPTool and MCPPrompt
	4. Test type safety and field access
	5. Test optional relevance_score functionality

	Generate the complete implementation.
	```

	### Task 27: Refactor SimplePlannerAgent (180 mins)
	Status: Todo
	Dependencies: Task 26
	Priority: 🔴 HIGH (Core logic transformation)

	Execution Command for Claude:
	```
	Implement Task 27: Refactor SimplePlannerAgent for Tool+Prompt Planning

	Objective: Implement combined tool+prompt selection logic with semantic ranking.

	Action 1: Modify `agents/planner.py`
	1. Open @agents/planner.py
	2. Import PlannedStep from kg_services.ontology
	3. Rename suggest_tools method to generate_plan
	4. Implement algorithm:
	- Tool Selection: Use existing semantic search for tools
	- Prompt Filtering: Get prompts by target_tool_id
	- Prompt Ranking: Semantic similarity against query
	- PlannedStep Assembly: Create structured output
	5. Add _select_best_prompt helper method
	6. Return List[PlannedStep] instead of List[MCPTool]

	Action 2: Update `tests/agents/test_planner.py`
	1. Update all test methods for new generate_plan signature
	2. Mock InMemoryKG prompt methods
	3. Test scenarios: no tools, no prompts for tool, single prompt, multiple prompts
	4. Verify PlannedStep output structure

	Generate the complete refactored implementation.
	```

	### Task 28: Update Application Integration (45 mins)
	Status: Todo
	Dependencies: Task 27
	Priority: 🟡 MEDIUM (Integration layer)

	Execution Command for Claude:
	```
	Implement Task 28: Update Application Integration for New Planner

	Objective: Ensure application backend uses enhanced planner without breaking UI.

	Action 1: Modify `app.py`
	1. Open @app.py
	2. Update handle_find_tools function:
	- Change planner call from suggest_tools to generate_plan
	- Handle List[PlannedStep] return type
	- Extract tool from PlannedStep for current UI (temporary)
	- Add proper error handling for empty results
	3. Import PlannedStep if needed

	Action 2: Update `tests/test_app.py`
	1. Update mocked planner method calls
	2. Test new generate_plan integration
	3. Verify backward compatibility for UI display

	Maintain backward compatibility until Sprint 3 UI updates.
	```

	### Task 29: Quality Assurance & Deployment (30 mins)
	Status: Todo
	Dependencies: Task 28
	Priority: 🟢 LOW (Quality gates)

	Execution Command for Claude:
	```
	Implement Task 29: Quality Assurance & Deployment

	Objective: Ensure code quality, system stability, and deployment readiness.

	Actions:
	1. Run `just lint` and fix any style issues
	2. Run `just format` to apply formatting
	3. Run `just type-check` and resolve type issues
	4. Run `just test` and ensure all tests pass
	5. Manual integration testing:
	- Verify application starts successfully
	- Test tool+prompt planning workflow
	- Confirm no UI crashes
	6. Update requirements.lock if needed
	7. Commit changes with conventional commit format
	8. Push and verify CI pipeline

	Document any issues found for Sprint 3.
	```

	## 🔧 Technical Implementation Details

	### PlannedStep Structure
	```python
	@dataclass
	class PlannedStep:
	"""Represents a planned step combining a tool and its prompt."""
	tool: MCPTool
	prompt: MCPPrompt
	relevance_score: Optional[float] = None
	```

	### Enhanced Planning Algorithm
	```python
	def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
	# 1. Get query embedding
	query_embedding = self.embedder.get_embedding(user_query)

	# 2. Find candidate tools (semantic search)
	tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)

	# 3. For each tool, find and rank prompts
	planned_steps = []
	for tool_id in tool_ids:
	tool = self.kg.get_tool_by_id(tool_id)

	# Filter prompts for this tool
	prompts = [p for p in self.kg.prompts.values()
	if p.target_tool_id == tool.tool_id]

	# Select best prompt semantically
	best_prompt = self._select_best_prompt(prompts, query_embedding)

	if best_prompt:
	planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))

	return planned_steps[:top_k_plans]
	```

	### Semantic Prompt Selection
	```python
	def _select_best_prompt(self, prompts: List[MCPPrompt],
	query_embedding: List[float]) -> Optional[MCPPrompt]:
	if not prompts:
	return None
	if len(prompts) == 1:
	return prompts[0]

	best_prompt = None
	best_similarity = -1.0

	for prompt in prompts:
	# Create embedding text from prompt
	prompt_text = f"{prompt.name} - {prompt.description} - {prompt.use_case}"
	prompt_embedding = self.embedder.get_embedding(prompt_text)

	if prompt_embedding:
	similarity = self.kg._cosine_similarity(query_embedding, prompt_embedding)
	if similarity > best_similarity:
	best_similarity = similarity
	best_prompt = prompt

	return best_prompt
	```

	## 🧪 Testing Strategy

	### Key Test Scenarios
	1. PlannedStep Creation: Valid instantiation and field access
	2. No Tools Found: Empty list return from generate_plan
	3. Tool Without Prompts: Graceful handling and skipping
	4. Single Prompt for Tool: Direct selection
	5. Multiple Prompts for Tool: Semantic ranking selection
	6. Application Integration: Backward compatible UI interaction

	### Test Coverage Targets
	- Unit Tests: >95% coverage for new PlannedStep and planning logic
	- Integration Tests: End-to-end workflow validation
	- Regression Tests: Ensure no breaking changes to existing functionality

	## 📊 Success Criteria

	\| Component \| Success Metric \| Validation \|
	\|-----------\|---------------\|------------\|
	\| PlannedStep \| Dataclass works correctly \| Unit tests pass \|
	\| Enhanced Planner \| Tool+prompt selection accurate \| Integration tests \|
	\| Application \| No UI crashes, backward compatible \| Manual testing \|
	\| Code Quality \| All quality checks pass \| CI pipeline \|

	## 🔄 Sprint 3 Preparation

	Upon Sprint 2 completion, the system will be ready for Sprint 3 which focuses on:
	- UI Enhancement: Display rich PlannedStep information
	- Prompt Template Rendering: Show template strings with variables
	- Interactive Elements: Dynamic input field generation
	- User Experience: Enhanced tool+prompt workflow interface

	## 🚨 Potential Challenges & Mitigations

	1. Semantic Prompt Selection Complexity
	- Challenge: Multiple prompts with similar semantics
	- Mitigation: Start with simple cosine similarity, add tie-breaking rules

	2. Performance with Prompt Embeddings
	- Challenge: Additional API calls for prompt ranking
	- Mitigation: Use pre-computed embeddings where possible

	3. Backward Compatibility
	- Challenge: UI expects tool-only format
	- Mitigation: Extract tool from PlannedStep for display

	4. Test Complexity
	- Challenge: Mocking complex tool+prompt interactions
	- Mitigation: Use focused unit tests with clear test data

	---

	Ready for Execution: All tasks are well-defined with clear objectives, detailed implementation guidance, and comprehensive acceptance criteria. The task dependency chain ensures proper execution order and minimal blocking.

	Sprint 2 Task Summary created for MVP 2 - Enhanced Planner for Tool+Prompt Pairs