# Sprint 2 (MVP 1): Real Embeddings & Semantic Search Logic

## Sprint Overview

- **Goal:** Integrate actual LLM calls for generating embeddings, build the vector index within the `InMemoryKG`, and implement the core semantic search functionality. *Still no UI, focus on backend KG capabilities.*
- **Duration:** Estimated 3-5 hours (flexible within Hackathon Day 1, following Sprint 1).
- **Core Primitives Focused On:** Tool (its description being embedded and searched).
- **Key Artifacts by End of Sprint:**
  - `kg_services/embedder.py`: `EmbeddingService.get_embedding` method now makes live API calls.
  - `kg_services/knowledge_graph.py`: `InMemoryKG.build_vector_index` now uses real embeddings, and `InMemoryKG.find_similar_tools` performs actual cosine similarity search.
  - Updated unit tests, potentially including tests that mock the LLM API calls.
  - Updated `requirements.txt` (if new LLM client libraries were added) and `requirements.lock`.
  - All code linted, formatted, type-checked, and passing CI.

## Task List

### Task 2.1: Implement Live LLM API Call in `EmbeddingService`

- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:** Modify `kg_services/embedder.py`'s `EmbeddingService.get_embedding` method to make actual API calls to your chosen LLM provider (OpenAI, Anthropic, or Azure OpenAI) to generate text embeddings.
  - Ensure API keys are handled securely via environment variables (e.g., loaded using `python-dotenv` for local dev, and set as secrets in GitHub Actions/Hugging Face Spaces).
  - Add necessary LLM client libraries (e.g., `openai`, `anthropic`) to `requirements.txt` if not already there.
- **Acceptance Criteria:**
  1. `get_embedding` method successfully calls the chosen LLM API and returns a valid embedding vector (list of floats).
  2. Handles potential API errors gracefully (e.g., logs an error and returns `None` or raises a custom exception).
  3. `requirements.txt` updated with LLM client library.
  4. Unit tests (with API mocking) pass.
- **TDD Approach:** In `tests/kg_services/test_embedder.py`, refactor/add tests:
  - `test_get_embedding_live_success`: Mocks the LLM client's `create` (or equivalent) method to return a sample successful embedding response. Verifies the method processes this correctly.
  - `test_get_embedding_api_error`: Mocks the LLM client to raise an API error. Verifies `get_embedding` handles this gracefully.

### Task 2.2: Implement Real Vector Index Building in `InMemoryKG`

- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:** Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.build_vector_index` method.
  - It should now iterate through the loaded `self.tools`.
  - For each tool, construct a descriptive text string (e.g., from `name`, `description`, `tags`).
  - Use the (now live) `EmbeddingService` instance to get a real embedding for this text.
  - Store these real embeddings in `self.tool_embeddings` and corresponding `tool_id`s in `self.tool_ids_for_vectors`.
  - Handle cases where `get_embedding` might return `None` (e.g., skip that tool or use a zero vector with a warning).
- **Acceptance Criteria:**
  1. `build_vector_index` populates `self.tool_embeddings` with actual vectors from the LLM API.
  2. Correctly associates embeddings with `tool_id`s.
  3. Handles potential embedding failures for individual tools.
  4. Unit tests pass.
- **TDD Approach:** In `tests/kg_services/test_knowledge_graph.py`:
  - `test_build_vector_index_with_real_embeddings`:
    - Needs a mock `EmbeddingService` that returns predictable (but distinct) vectors for different inputs.
    - Load sample tools into `InMemoryKG`.
    - Call `build_vector_index` with the mock embedder.
    - Assert that `self.tool_embeddings` contains the expected number of vectors and that they match what the mock embedder would have returned.
    - Assert `self.tool_ids_for_vectors` is populated correctly.
  - `test_build_vector_index_handles_embedding_failure`:
    - Mock `EmbeddingService.get_embedding` to return `None` for one of the tools.
    - Assert that the index is built for other tools and the failed one is handled (e.g., skipped or has a zero vector).

### Task 2.3: Implement Cosine Similarity Search in `InMemoryKG`

- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:** Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.find_similar_tools` method.
  - It should now use `numpy` to perform cosine similarity calculations between the input `query_embedding` and each of the real embeddings stored in `self.tool_embeddings`.
  - Return the `tool_id`s of the `top_k` most similar tools.
  - Ensure `numpy` is in `requirements.txt`.
- **Acceptance Criteria:**
  1. `find_similar_tools` correctly calculates cosine similarities and returns the top_k tool IDs.
  2. Handles empty `self.tool_embeddings` case.
  3. `numpy` is listed in `requirements.txt`.
  4. Unit tests pass.
- **TDD Approach:** In `tests/kg_services/test_knowledge_graph.py`:
  - `test_cosine_similarity_calculation` (if `_cosine_similarity` is a helper, test it directly with known vectors).
  - `test_find_similar_tools_with_populated_index`:
    - Manually set `kg.tool_embeddings` and `kg.tool_ids_for_vectors` with a few known vectors and IDs.
    - Provide a `query_embedding` that is known to be most similar to one of them.
    - Call `find_similar_tools` and assert that the correct `tool_id`(s) are returned in the correct order.
  - `test_find_similar_tools_empty_index`: Assert it returns an empty list.
  - `test_find_similar_tools_top_k_respected`: Test with different `top_k` values.

### Task 2.4: Update Dependencies & Run All Checks

- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:**
  1. Ensure `requirements.txt` includes `openai` (or `anthropic`) and `numpy`.
  2. Ensure `requirements-dev.txt` includes `python-dotenv` and `unittest.mock` (mock is part of stdlib, but good to be aware if you were using an external mocking lib).
  3. Regenerate `requirements.lock`: `uv pip compile requirements.txt requirements-dev.txt --all-extras -o requirements.lock`.
  4. Run `just install` (or `uv pip sync requirements.lock`).
  5. Run `just lint`, `just format`, `just type-check`, `just test`.
  6. Commit all changes.
  7. Push to GitHub and verify CI pipeline passes. *Note: Live API calls in CI for tests are usually avoided. Ensure your tests for `EmbeddingService` use mocks. The `build_vector_index` tests should also use a mocked embedder.*
- **Acceptance Criteria:**
  1. `requirements.lock` is updated.
  2. All `just` checks pass locally.
  3. Code committed and pushed.
  4. GitHub Actions CI pipeline passes for the sprint's commits (leveraging mocks for API calls).

## End of Sprint 2 Review

- **What's Done:**
  - `EmbeddingService` can now generate real embeddings using an LLM API.
  - `InMemoryKG` can build a vector index using these real embeddings.
  - `InMemoryKG` can perform semantic search (cosine similarity) over the indexed tools.
  - Unit tests cover the new functionalities, using mocks for external API calls.
  - The backend logic for tool suggestion based on semantic similarity is complete.
- **What's Next (Sprint 3):**
  - Implement the `SimplePlannerAgent` logic that ties together the `EmbeddingService` and `InMemoryKG` to process a user query and suggest tools.
- **Blockers/Issues:**
  - API key setup and management (ensure it's smooth for local dev and CI doesn't expose keys).
  - Potential rate limits or costs if generating many embeddings for testing (though for 3-5 tools, this should be minimal).

## Implementation Guidance

### Task 2.1 Implementation Details

```python
# In kg_services/embedder.py
# Refactor the EmbeddingService class:
# - In __init__:
#     - Initialize the appropriate LLM client (OpenAI, AzureOpenAI, or Anthropic).
#     - Read API keys and any necessary endpoint/deployment information from environment variables.
#     - Add python-dotenv to requirements-dev.txt and load .env in __init__ if a .env file exists.
# - In get_embedding(self, text: str) -> Optional[List[float]]:
#     - Replace the placeholder logic with an actual API call to the embedding endpoint.
#     - Include error handling (try-except block) for API calls.
#     - Ensure the text is preprocessed if necessary.
```

### Task 2.2 Implementation Details

```python
# In kg_services/knowledge_graph.py
# Refactor InMemoryKG.build_vector_index(self, embedder: EmbeddingService):
# - Clear self.tool_embeddings and self.tool_ids_for_vectors at the start.
# - Iterate through self.tools.items().
# - For each tool_id, tool:
#     - Construct a meaningful text: f"{tool.name} - {tool.description} Tags: {', '.join(tool.tags)}"
#     - Call embedding = embedder.get_embedding(text_to_embed)
#     - If embedding is not None and is valid:
#         - Append embedding to self.tool_embeddings
#         - Append tool_id to self.tool_ids_for_vectors
#     - Else (embedding failed):
#         - Log a warning
#         - Optionally, append a zero vector and the tool_id
```

### Task 2.3 Implementation Details

```python
# In kg_services/knowledge_graph.py
# Refactor InMemoryKG._cosine_similarity(self, vec1: List[float], vec2: List[float]) -> float:
# - Ensure inputs vec1 and vec2 are converted to np.array.
# - Perform dot product.
# - Calculate norms.
# - Handle potential division by zero if a norm is zero (return 0.0 similarity).
# - Return the cosine similarity.
# Refactor InMemoryKG.find_similar_tools(self, query_embedding: List[float], top_k: int = 3) -> List[str]:
# - If not self.tool_embeddings or not query_embedding, return [].
# - Calculate similarities: Iterate through self.tool_embeddings, calling _cosine_similarity.
# - Create pairs of (similarity_score, tool_id).
# - Sort these pairs in descending order of similarity score.
# - Return the tool_ids from the top top_k pairs.
```