# Sprint 2 (MVP 1): Real Embeddings & Semantic Search Logic ## Sprint Overview - **Goal:** Integrate actual LLM calls for generating embeddings, build the vector index within the `InMemoryKG`, and implement the core semantic search functionality. *Still no UI, focus on backend KG capabilities.* - **Duration:** Estimated 3-5 hours (flexible within Hackathon Day 1, following Sprint 1). - **Core Primitives Focused On:** Tool (its description being embedded and searched). - **Key Artifacts by End of Sprint:** - `kg_services/embedder.py`: `EmbeddingService.get_embedding` method now makes live API calls. - `kg_services/knowledge_graph.py`: `InMemoryKG.build_vector_index` now uses real embeddings, and `InMemoryKG.find_similar_tools` performs actual cosine similarity search. - Updated unit tests, potentially including tests that mock the LLM API calls. - Updated `requirements.txt` (if new LLM client libraries were added) and `requirements.lock`. - All code linted, formatted, type-checked, and passing CI. ## Task List ### Task 2.1: Implement Live LLM API Call in `EmbeddingService` - **Status:** Todo - **Parent MVP:** MVP 1 - **Parent Sprint (MVP 1):** Sprint 2 - **Description:** Modify `kg_services/embedder.py`'s `EmbeddingService.get_embedding` method to make actual API calls to your chosen LLM provider (OpenAI, Anthropic, or Azure OpenAI) to generate text embeddings. - Ensure API keys are handled securely via environment variables (e.g., loaded using `python-dotenv` for local dev, and set as secrets in GitHub Actions/Hugging Face Spaces). - Add necessary LLM client libraries (e.g., `openai`, `anthropic`) to `requirements.txt` if not already there. - **Acceptance Criteria:** 1. `get_embedding` method successfully calls the chosen LLM API and returns a valid embedding vector (list of floats). 2. Handles potential API errors gracefully (e.g., logs an error and returns `None` or raises a custom exception). 3. `requirements.txt` updated with LLM client library. 4. Unit tests (with API mocking) pass. - **TDD Approach:** In `tests/kg_services/test_embedder.py`, refactor/add tests: - `test_get_embedding_live_success`: Mocks the LLM client's `create` (or equivalent) method to return a sample successful embedding response. Verifies the method processes this correctly. - `test_get_embedding_api_error`: Mocks the LLM client to raise an API error. Verifies `get_embedding` handles this gracefully. ### Task 2.2: Implement Real Vector Index Building in `InMemoryKG` - **Status:** Todo - **Parent MVP:** MVP 1 - **Parent Sprint (MVP 1):** Sprint 2 - **Description:** Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.build_vector_index` method. - It should now iterate through the loaded `self.tools`. - For each tool, construct a descriptive text string (e.g., from `name`, `description`, `tags`). - Use the (now live) `EmbeddingService` instance to get a real embedding for this text. - Store these real embeddings in `self.tool_embeddings` and corresponding `tool_id`s in `self.tool_ids_for_vectors`. - Handle cases where `get_embedding` might return `None` (e.g., skip that tool or use a zero vector with a warning). - **Acceptance Criteria:** 1. `build_vector_index` populates `self.tool_embeddings` with actual vectors from the LLM API. 2. Correctly associates embeddings with `tool_id`s. 3. Handles potential embedding failures for individual tools. 4. Unit tests pass. - **TDD Approach:** In `tests/kg_services/test_knowledge_graph.py`: - `test_build_vector_index_with_real_embeddings`: - Needs a mock `EmbeddingService` that returns predictable (but distinct) vectors for different inputs. - Load sample tools into `InMemoryKG`. - Call `build_vector_index` with the mock embedder. - Assert that `self.tool_embeddings` contains the expected number of vectors and that they match what the mock embedder would have returned. - Assert `self.tool_ids_for_vectors` is populated correctly. - `test_build_vector_index_handles_embedding_failure`: - Mock `EmbeddingService.get_embedding` to return `None` for one of the tools. - Assert that the index is built for other tools and the failed one is handled (e.g., skipped or has a zero vector). ### Task 2.3: Implement Cosine Similarity Search in `InMemoryKG` - **Status:** Todo - **Parent MVP:** MVP 1 - **Parent Sprint (MVP 1):** Sprint 2 - **Description:** Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.find_similar_tools` method. - It should now use `numpy` to perform cosine similarity calculations between the input `query_embedding` and each of the real embeddings stored in `self.tool_embeddings`. - Return the `tool_id`s of the `top_k` most similar tools. - Ensure `numpy` is in `requirements.txt`. - **Acceptance Criteria:** 1. `find_similar_tools` correctly calculates cosine similarities and returns the top_k tool IDs. 2. Handles empty `self.tool_embeddings` case. 3. `numpy` is listed in `requirements.txt`. 4. Unit tests pass. - **TDD Approach:** In `tests/kg_services/test_knowledge_graph.py`: - `test_cosine_similarity_calculation` (if `_cosine_similarity` is a helper, test it directly with known vectors). - `test_find_similar_tools_with_populated_index`: - Manually set `kg.tool_embeddings` and `kg.tool_ids_for_vectors` with a few known vectors and IDs. - Provide a `query_embedding` that is known to be most similar to one of them. - Call `find_similar_tools` and assert that the correct `tool_id`(s) are returned in the correct order. - `test_find_similar_tools_empty_index`: Assert it returns an empty list. - `test_find_similar_tools_top_k_respected`: Test with different `top_k` values. ### Task 2.4: Update Dependencies & Run All Checks - **Status:** Todo - **Parent MVP:** MVP 1 - **Parent Sprint (MVP 1):** Sprint 2 - **Description:** 1. Ensure `requirements.txt` includes `openai` (or `anthropic`) and `numpy`. 2. Ensure `requirements-dev.txt` includes `python-dotenv` and `unittest.mock` (mock is part of stdlib, but good to be aware if you were using an external mocking lib). 3. Regenerate `requirements.lock`: `uv pip compile requirements.txt requirements-dev.txt --all-extras -o requirements.lock`. 4. Run `just install` (or `uv pip sync requirements.lock`). 5. Run `just lint`, `just format`, `just type-check`, `just test`. 6. Commit all changes. 7. Push to GitHub and verify CI pipeline passes. *Note: Live API calls in CI for tests are usually avoided. Ensure your tests for `EmbeddingService` use mocks. The `build_vector_index` tests should also use a mocked embedder.* - **Acceptance Criteria:** 1. `requirements.lock` is updated. 2. All `just` checks pass locally. 3. Code committed and pushed. 4. GitHub Actions CI pipeline passes for the sprint's commits (leveraging mocks for API calls). ## End of Sprint 2 Review - **What's Done:** - `EmbeddingService` can now generate real embeddings using an LLM API. - `InMemoryKG` can build a vector index using these real embeddings. - `InMemoryKG` can perform semantic search (cosine similarity) over the indexed tools. - Unit tests cover the new functionalities, using mocks for external API calls. - The backend logic for tool suggestion based on semantic similarity is complete. - **What's Next (Sprint 3):** - Implement the `SimplePlannerAgent` logic that ties together the `EmbeddingService` and `InMemoryKG` to process a user query and suggest tools. - **Blockers/Issues:** - API key setup and management (ensure it's smooth for local dev and CI doesn't expose keys). - Potential rate limits or costs if generating many embeddings for testing (though for 3-5 tools, this should be minimal). ## Implementation Guidance ### Task 2.1 Implementation Details ```python # In kg_services/embedder.py # Refactor the EmbeddingService class: # - In __init__: # - Initialize the appropriate LLM client (OpenAI, AzureOpenAI, or Anthropic). # - Read API keys and any necessary endpoint/deployment information from environment variables. # - Add python-dotenv to requirements-dev.txt and load .env in __init__ if a .env file exists. # - In get_embedding(self, text: str) -> Optional[List[float]]: # - Replace the placeholder logic with an actual API call to the embedding endpoint. # - Include error handling (try-except block) for API calls. # - Ensure the text is preprocessed if necessary. ``` ### Task 2.2 Implementation Details ```python # In kg_services/knowledge_graph.py # Refactor InMemoryKG.build_vector_index(self, embedder: EmbeddingService): # - Clear self.tool_embeddings and self.tool_ids_for_vectors at the start. # - Iterate through self.tools.items(). # - For each tool_id, tool: # - Construct a meaningful text: f"{tool.name} - {tool.description} Tags: {', '.join(tool.tags)}" # - Call embedding = embedder.get_embedding(text_to_embed) # - If embedding is not None and is valid: # - Append embedding to self.tool_embeddings # - Append tool_id to self.tool_ids_for_vectors # - Else (embedding failed): # - Log a warning # - Optionally, append a zero vector and the tool_id ``` ### Task 2.3 Implementation Details ```python # In kg_services/knowledge_graph.py # Refactor InMemoryKG._cosine_similarity(self, vec1: List[float], vec2: List[float]) -> float: # - Ensure inputs vec1 and vec2 are converted to np.array. # - Perform dot product. # - Calculate norms. # - Handle potential division by zero if a norm is zero (return 0.0 similarity). # - Return the cosine similarity. # Refactor InMemoryKG.find_similar_tools(self, query_embedding: List[float], top_k: int = 3) -> List[str]: # - If not self.tool_embeddings or not query_embedding, return []. # - Calculate similarities: Iterate through self.tool_embeddings, calling _cosine_similarity. # - Create pairs of (similarity_score, tool_id). # - Sort these pairs in descending order of similarity score. # - Return the tool_ids from the top top_k pairs. ```