Spaces:

BasalGanglia
/

kgraph-mcp-agent-platform

Sleeping

App Files Files Community

kgraph-mcp-agent-platform / docs /progress /mvp2_sprint2_task_summary.md

BasalGanglia

🏆 Multi-Track Hackathon Submission

1f2d50a verified 6 months ago

preview code

raw

history blame contribute delete

9.04 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

MVP 2 Sprint 2 - Task Summary & Execution Guide

Date: 2025-06-08
Sprint Goal: Enhanced Planner for Tool+Prompt Pairs
Status: 🚀 READY FOR EXECUTION
Task Management: Tasks added to tasks.json (IDs 26-29)

🎯 Sprint Overview

Transform the SimplePlannerAgent from suggesting only tools to suggesting tool+prompt pairs as structured PlannedStep objects, enabling the next evolution toward complete tool+prompt guidance.

Goal Evolution

Current (MVP1): User Query → Tool Discovery → Tool Suggestion
Sprint 2 Target: User Query → Tool Discovery → Prompt Selection → (Tool + Prompt) Suggestion

📋 Task Execution Order

Task 26: Define PlannedStep Dataclass (60 mins)

Status: Todo
Dependencies: None
Priority: 🔴 HIGH (Foundation for all other tasks)

Execution Command for Claude:

Implement Task 26: Define PlannedStep Dataclass

**Objective**: Create structured data representation for planner output combining MCPTool and MCPPrompt.

**Action 1: Modify `kg_services/ontology.py`**
1. Open @kg_services/ontology.py  
2. Add PlannedStep dataclass below existing MCPPrompt class
3. Include fields: tool (MCPTool), prompt (MCPPrompt), relevance_score (Optional[float] = None)
4. Add proper type hints and imports
5. Apply coding standards from @.cursor/rules/python_gradio_basic.mdc

**Action 2: Add Tests in `tests/kg_services/test_ontology.py`**  
1. Open @tests/kg_services/test_ontology.py
2. Add test_planned_step_creation() function
3. Test PlannedStep instantiation with valid MCPTool and MCPPrompt
4. Test type safety and field access
5. Test optional relevance_score functionality

Generate the complete implementation.

Task 27: Refactor SimplePlannerAgent (180 mins)

Status: Todo
Dependencies: Task 26
Priority: 🔴 HIGH (Core logic transformation)

Execution Command for Claude:

Implement Task 27: Refactor SimplePlannerAgent for Tool+Prompt Planning

**Objective**: Implement combined tool+prompt selection logic with semantic ranking.

**Action 1: Modify `agents/planner.py`**
1. Open @agents/planner.py
2. Import PlannedStep from kg_services.ontology
3. Rename suggest_tools method to generate_plan
4. Implement algorithm:
   - Tool Selection: Use existing semantic search for tools
   - Prompt Filtering: Get prompts by target_tool_id
   - Prompt Ranking: Semantic similarity against query
   - PlannedStep Assembly: Create structured output
5. Add _select_best_prompt helper method
6. Return List[PlannedStep] instead of List[MCPTool]

**Action 2: Update `tests/agents/test_planner.py`**
1. Update all test methods for new generate_plan signature
2. Mock InMemoryKG prompt methods
3. Test scenarios: no tools, no prompts for tool, single prompt, multiple prompts
4. Verify PlannedStep output structure

Generate the complete refactored implementation.

Task 28: Update Application Integration (45 mins)

Status: Todo
Dependencies: Task 27
Priority: 🟡 MEDIUM (Integration layer)

Execution Command for Claude:

Implement Task 28: Update Application Integration for New Planner

**Objective**: Ensure application backend uses enhanced planner without breaking UI.

**Action 1: Modify `app.py`**
1. Open @app.py
2. Update handle_find_tools function:
   - Change planner call from suggest_tools to generate_plan
   - Handle List[PlannedStep] return type
   - Extract tool from PlannedStep for current UI (temporary)
   - Add proper error handling for empty results
3. Import PlannedStep if needed

**Action 2: Update `tests/test_app.py`**
1. Update mocked planner method calls
2. Test new generate_plan integration
3. Verify backward compatibility for UI display

Maintain backward compatibility until Sprint 3 UI updates.

Task 29: Quality Assurance & Deployment (30 mins)

Status: Todo
Dependencies: Task 28
Priority: 🟢 LOW (Quality gates)

Execution Command for Claude:

Implement Task 29: Quality Assurance & Deployment

**Objective**: Ensure code quality, system stability, and deployment readiness.

**Actions**:
1. Run `just lint` and fix any style issues
2. Run `just format` to apply formatting
3. Run `just type-check` and resolve type issues
4. Run `just test` and ensure all tests pass
5. Manual integration testing:
   - Verify application starts successfully
   - Test tool+prompt planning workflow
   - Confirm no UI crashes
6. Update requirements.lock if needed
7. Commit changes with conventional commit format
8. Push and verify CI pipeline

Document any issues found for Sprint 3.

🔧 Technical Implementation Details

PlannedStep Structure

@dataclass
class PlannedStep:
    """Represents a planned step combining a tool and its prompt."""
    tool: MCPTool
    prompt: MCPPrompt
    relevance_score: Optional[float] = None

Enhanced Planning Algorithm

def generate_plan(self, user_query: str, top_k_plans: int = 1) -> List[PlannedStep]:
    # 1. Get query embedding
    query_embedding = self.embedder.get_embedding(user_query)
    
    # 2. Find candidate tools (semantic search)
    tool_ids = self.kg.find_similar_tools(query_embedding, top_k=3)
    
    # 3. For each tool, find and rank prompts
    planned_steps = []
    for tool_id in tool_ids:
        tool = self.kg.get_tool_by_id(tool_id)
        
        # Filter prompts for this tool
        prompts = [p for p in self.kg.prompts.values() 
                  if p.target_tool_id == tool.tool_id]
        
        # Select best prompt semantically
        best_prompt = self._select_best_prompt(prompts, query_embedding)
        
        if best_prompt:
            planned_steps.append(PlannedStep(tool=tool, prompt=best_prompt))
    
    return planned_steps[:top_k_plans]

Semantic Prompt Selection

def _select_best_prompt(self, prompts: List[MCPPrompt], 
                       query_embedding: List[float]) -> Optional[MCPPrompt]:
    if not prompts:
        return None
    if len(prompts) == 1:
        return prompts[0]
    
    best_prompt = None
    best_similarity = -1.0
    
    for prompt in prompts:
        # Create embedding text from prompt
        prompt_text = f"{prompt.name} - {prompt.description} - {prompt.use_case}"
        prompt_embedding = self.embedder.get_embedding(prompt_text)
        
        if prompt_embedding:
            similarity = self.kg._cosine_similarity(query_embedding, prompt_embedding)
            if similarity > best_similarity:
                best_similarity = similarity
                best_prompt = prompt
    
    return best_prompt

🧪 Testing Strategy

Key Test Scenarios

PlannedStep Creation: Valid instantiation and field access
No Tools Found: Empty list return from generate_plan
Tool Without Prompts: Graceful handling and skipping
Single Prompt for Tool: Direct selection
Multiple Prompts for Tool: Semantic ranking selection
Application Integration: Backward compatible UI interaction

Test Coverage Targets

Unit Tests: >95% coverage for new PlannedStep and planning logic
Integration Tests: End-to-end workflow validation
Regression Tests: Ensure no breaking changes to existing functionality

📊 Success Criteria

Component	Success Metric	Validation
PlannedStep	Dataclass works correctly	Unit tests pass
Enhanced Planner	Tool+prompt selection accurate	Integration tests
Application	No UI crashes, backward compatible	Manual testing
Code Quality	All quality checks pass	CI pipeline

🔄 Sprint 3 Preparation

Upon Sprint 2 completion, the system will be ready for Sprint 3 which focuses on:

UI Enhancement: Display rich PlannedStep information
Prompt Template Rendering: Show template strings with variables
Interactive Elements: Dynamic input field generation
User Experience: Enhanced tool+prompt workflow interface

🚨 Potential Challenges & Mitigations

Semantic Prompt Selection Complexity
- Challenge: Multiple prompts with similar semantics
- Mitigation: Start with simple cosine similarity, add tie-breaking rules
Performance with Prompt Embeddings
- Challenge: Additional API calls for prompt ranking
- Mitigation: Use pre-computed embeddings where possible
Backward Compatibility
- Challenge: UI expects tool-only format
- Mitigation: Extract tool from PlannedStep for display
Test Complexity
- Challenge: Mocking complex tool+prompt interactions
- Mitigation: Use focused unit tests with clear test data

Ready for Execution: All tasks are well-defined with clear objectives, detailed implementation guidance, and comprehensive acceptance criteria. The task dependency chain ensures proper execution order and minimal blocking.

Sprint 2 Task Summary created for MVP 2 - Enhanced Planner for Tool+Prompt Pairs