File size: 10,024 Bytes
1f2d50a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# Sprint 2 (MVP 1): Real Embeddings & Semantic Search Logic
## Sprint Overview
- **Goal:** Integrate actual LLM calls for generating embeddings, build the vector index within the `InMemoryKG`, and implement the core semantic search functionality. *Still no UI, focus on backend KG capabilities.*
- **Duration:** Estimated 3-5 hours (flexible within Hackathon Day 1, following Sprint 1).
- **Core Primitives Focused On:** Tool (its description being embedded and searched).
- **Key Artifacts by End of Sprint:**
- `kg_services/embedder.py`: `EmbeddingService.get_embedding` method now makes live API calls.
- `kg_services/knowledge_graph.py`: `InMemoryKG.build_vector_index` now uses real embeddings, and `InMemoryKG.find_similar_tools` performs actual cosine similarity search.
- Updated unit tests, potentially including tests that mock the LLM API calls.
- Updated `requirements.txt` (if new LLM client libraries were added) and `requirements.lock`.
- All code linted, formatted, type-checked, and passing CI.
## Task List
### Task 2.1: Implement Live LLM API Call in `EmbeddingService`
- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:** Modify `kg_services/embedder.py`'s `EmbeddingService.get_embedding` method to make actual API calls to your chosen LLM provider (OpenAI, Anthropic, or Azure OpenAI) to generate text embeddings.
- Ensure API keys are handled securely via environment variables (e.g., loaded using `python-dotenv` for local dev, and set as secrets in GitHub Actions/Hugging Face Spaces).
- Add necessary LLM client libraries (e.g., `openai`, `anthropic`) to `requirements.txt` if not already there.
- **Acceptance Criteria:**
1. `get_embedding` method successfully calls the chosen LLM API and returns a valid embedding vector (list of floats).
2. Handles potential API errors gracefully (e.g., logs an error and returns `None` or raises a custom exception).
3. `requirements.txt` updated with LLM client library.
4. Unit tests (with API mocking) pass.
- **TDD Approach:** In `tests/kg_services/test_embedder.py`, refactor/add tests:
- `test_get_embedding_live_success`: Mocks the LLM client's `create` (or equivalent) method to return a sample successful embedding response. Verifies the method processes this correctly.
- `test_get_embedding_api_error`: Mocks the LLM client to raise an API error. Verifies `get_embedding` handles this gracefully.
### Task 2.2: Implement Real Vector Index Building in `InMemoryKG`
- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:** Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.build_vector_index` method.
- It should now iterate through the loaded `self.tools`.
- For each tool, construct a descriptive text string (e.g., from `name`, `description`, `tags`).
- Use the (now live) `EmbeddingService` instance to get a real embedding for this text.
- Store these real embeddings in `self.tool_embeddings` and corresponding `tool_id`s in `self.tool_ids_for_vectors`.
- Handle cases where `get_embedding` might return `None` (e.g., skip that tool or use a zero vector with a warning).
- **Acceptance Criteria:**
1. `build_vector_index` populates `self.tool_embeddings` with actual vectors from the LLM API.
2. Correctly associates embeddings with `tool_id`s.
3. Handles potential embedding failures for individual tools.
4. Unit tests pass.
- **TDD Approach:** In `tests/kg_services/test_knowledge_graph.py`:
- `test_build_vector_index_with_real_embeddings`:
- Needs a mock `EmbeddingService` that returns predictable (but distinct) vectors for different inputs.
- Load sample tools into `InMemoryKG`.
- Call `build_vector_index` with the mock embedder.
- Assert that `self.tool_embeddings` contains the expected number of vectors and that they match what the mock embedder would have returned.
- Assert `self.tool_ids_for_vectors` is populated correctly.
- `test_build_vector_index_handles_embedding_failure`:
- Mock `EmbeddingService.get_embedding` to return `None` for one of the tools.
- Assert that the index is built for other tools and the failed one is handled (e.g., skipped or has a zero vector).
### Task 2.3: Implement Cosine Similarity Search in `InMemoryKG`
- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:** Modify `kg_services/knowledge_graph.py`'s `InMemoryKG.find_similar_tools` method.
- It should now use `numpy` to perform cosine similarity calculations between the input `query_embedding` and each of the real embeddings stored in `self.tool_embeddings`.
- Return the `tool_id`s of the `top_k` most similar tools.
- Ensure `numpy` is in `requirements.txt`.
- **Acceptance Criteria:**
1. `find_similar_tools` correctly calculates cosine similarities and returns the top_k tool IDs.
2. Handles empty `self.tool_embeddings` case.
3. `numpy` is listed in `requirements.txt`.
4. Unit tests pass.
- **TDD Approach:** In `tests/kg_services/test_knowledge_graph.py`:
- `test_cosine_similarity_calculation` (if `_cosine_similarity` is a helper, test it directly with known vectors).
- `test_find_similar_tools_with_populated_index`:
- Manually set `kg.tool_embeddings` and `kg.tool_ids_for_vectors` with a few known vectors and IDs.
- Provide a `query_embedding` that is known to be most similar to one of them.
- Call `find_similar_tools` and assert that the correct `tool_id`(s) are returned in the correct order.
- `test_find_similar_tools_empty_index`: Assert it returns an empty list.
- `test_find_similar_tools_top_k_respected`: Test with different `top_k` values.
### Task 2.4: Update Dependencies & Run All Checks
- **Status:** Todo
- **Parent MVP:** MVP 1
- **Parent Sprint (MVP 1):** Sprint 2
- **Description:**
1. Ensure `requirements.txt` includes `openai` (or `anthropic`) and `numpy`.
2. Ensure `requirements-dev.txt` includes `python-dotenv` and `unittest.mock` (mock is part of stdlib, but good to be aware if you were using an external mocking lib).
3. Regenerate `requirements.lock`: `uv pip compile requirements.txt requirements-dev.txt --all-extras -o requirements.lock`.
4. Run `just install` (or `uv pip sync requirements.lock`).
5. Run `just lint`, `just format`, `just type-check`, `just test`.
6. Commit all changes.
7. Push to GitHub and verify CI pipeline passes. *Note: Live API calls in CI for tests are usually avoided. Ensure your tests for `EmbeddingService` use mocks. The `build_vector_index` tests should also use a mocked embedder.*
- **Acceptance Criteria:**
1. `requirements.lock` is updated.
2. All `just` checks pass locally.
3. Code committed and pushed.
4. GitHub Actions CI pipeline passes for the sprint's commits (leveraging mocks for API calls).
## End of Sprint 2 Review
- **What's Done:**
- `EmbeddingService` can now generate real embeddings using an LLM API.
- `InMemoryKG` can build a vector index using these real embeddings.
- `InMemoryKG` can perform semantic search (cosine similarity) over the indexed tools.
- Unit tests cover the new functionalities, using mocks for external API calls.
- The backend logic for tool suggestion based on semantic similarity is complete.
- **What's Next (Sprint 3):**
- Implement the `SimplePlannerAgent` logic that ties together the `EmbeddingService` and `InMemoryKG` to process a user query and suggest tools.
- **Blockers/Issues:**
- API key setup and management (ensure it's smooth for local dev and CI doesn't expose keys).
- Potential rate limits or costs if generating many embeddings for testing (though for 3-5 tools, this should be minimal).
## Implementation Guidance
### Task 2.1 Implementation Details
```python
# In kg_services/embedder.py
# Refactor the EmbeddingService class:
# - In __init__:
# - Initialize the appropriate LLM client (OpenAI, AzureOpenAI, or Anthropic).
# - Read API keys and any necessary endpoint/deployment information from environment variables.
# - Add python-dotenv to requirements-dev.txt and load .env in __init__ if a .env file exists.
# - In get_embedding(self, text: str) -> Optional[List[float]]:
# - Replace the placeholder logic with an actual API call to the embedding endpoint.
# - Include error handling (try-except block) for API calls.
# - Ensure the text is preprocessed if necessary.
```
### Task 2.2 Implementation Details
```python
# In kg_services/knowledge_graph.py
# Refactor InMemoryKG.build_vector_index(self, embedder: EmbeddingService):
# - Clear self.tool_embeddings and self.tool_ids_for_vectors at the start.
# - Iterate through self.tools.items().
# - For each tool_id, tool:
# - Construct a meaningful text: f"{tool.name} - {tool.description} Tags: {', '.join(tool.tags)}"
# - Call embedding = embedder.get_embedding(text_to_embed)
# - If embedding is not None and is valid:
# - Append embedding to self.tool_embeddings
# - Append tool_id to self.tool_ids_for_vectors
# - Else (embedding failed):
# - Log a warning
# - Optionally, append a zero vector and the tool_id
```
### Task 2.3 Implementation Details
```python
# In kg_services/knowledge_graph.py
# Refactor InMemoryKG._cosine_similarity(self, vec1: List[float], vec2: List[float]) -> float:
# - Ensure inputs vec1 and vec2 are converted to np.array.
# - Perform dot product.
# - Calculate norms.
# - Handle potential division by zero if a norm is zero (return 0.0 similarity).
# - Return the cosine similarity.
# Refactor InMemoryKG.find_similar_tools(self, query_embedding: List[float], top_k: int = 3) -> List[str]:
# - If not self.tool_embeddings or not query_embedding, return [].
# - Calculate similarities: Iterate through self.tool_embeddings, calling _cosine_similarity.
# - Create pairs of (similarity_score, tool_id).
# - Sort these pairs in descending order of similarity score.
# - Return the tool_ids from the top top_k pairs.
``` |