lettuce-emb-512d-v1 β 512d Distilled Embedding Model (ONNX)
Zeolit/lettuce-emb-512d-v1 is a custom 512-dimensional sentence embedding model,
trained by distillation from BAAI/bge-m3 and specialized
for conversational / roleplay memory use cases.
It's built on top of DistilBERT (512-token context) with a 512-dim projection head and exported to ONNX for fast, portable inference (desktop + mobile).
This model is designed to power the memory system of LettuceAI (a roleplay-focused LLM client), but can be used as a general-purpose semantic embedding model.
π Performance Highlights
Retrieval Performance (ArguAna)
Overall: #1 among sub-100M models
| Metric | Score | Rank |
|---|---|---|
| nDCG@10 | 0.3958 | #1 (sub-100M) |
| nDCG@1 | 0.2027 | Top-tier |
| nDCG@3 | 0.3072 | Top-tier |
| nDCG@5 | 0.3475 | Top-tier |
| nDCG@20 | 0.4264 | Top-tier |
| Recall@10 | 0.6295 | Excellent |
| Recall@20 | 0.7496 | Excellent |
| MRR@10 | 0.3267 | Strong |
| MAP@10 | 0.3237 | Strong |
What this means:
- Beats every "tiny/small" embedding model (MiniLM, gte-small, e5-small, Snowflake-xs)
- Only surpassed by 100M+ models (gte-base: 0.46, bge-base: 0.58, bge-small: 0.51, bge-m3: 0.68)
- Exceptional performance for 66M parameters β punching way above weight class
Semantic Similarity Performance (STS Benchmarks)
| Dataset | Spearman | Pearson | Performance Notes |
|---|---|---|---|
| STS12 | 0.494 | 0.616 | Solid baseline |
| STS13 | 0.649 | 0.626 | Strong β matches MiniLM-L12 |
| STS14 | 0.551 | 0.617 | Competitive |
| STS15 | 0.676 | 0.644 | Excellent β near top of class |
| Average | ~0.593 | ~0.626 | Mid-pack, solid & usable |
Strengths:
- STS13 (0.649): Matches/beats models 2-3x larger
- STS15 (0.676): Competitive with top small models (e5-small: 0.78, gte-small: 0.77)
- Balanced Pearson scores (0.616-0.644) show consistent correlation quality
Context:
- Mid-pack among 10-100M models for semantic similarity
- Trade-off: optimized for retrieval over perfect STS scores
- This is a deliberate design choice for practical applications where finding the right document matters more than perfect similarity scoring
Combined Assessment
β
Top-tier retrieval β Best in class for 66M parameters
β
Competitive semantic similarity β Solid for clustering, deduplication
β
Efficient β Exceptional size-to-performance ratio
β
Practical β Optimized for real-world RAG and search use cases
Model Summary
- Base encoder:
distilbert-base-uncased - Parameters: 66M
- Teacher model:
BAAI/bge-m3 - Embedding dimension:
512 - Max sequence length:
512tokens - Format: ONNX (FP32)
- Pooling: Mean pooling over last hidden state + L2 normalization
- Domain: General English text with extra focus on dialogue / roleplay data
The model was trained to:
- Match the similarity structure of BGE-M3 in embedding space (batch-level KD)
- Pull semantically related texts together (contrastive loss)
- Capture nuances in conversational, character-driven, and roleplay-style text
Training & Distillation
High-level training setup:
- Student: DistilBERT with a 512-dim projection head on top of the pooled output
- Teacher: BGE-M3 (1024-dim multilingual embedding model)
- Losses:
- Geometry distillation: MSE between student and teacher batch similarity matrices
- Contrastive loss:
1 - cosine(student_emb_1, student_emb_2)for positive pairs
- Batch size: 16
- Max seq length: 512
- Optimizer: AdamW,
lr = 2e-5 - Epochs: 2 (over combined dataset)
Datasets
The student was trained on a mixture of:
Custom roleplay corpus (JSONL)
Consecutive lines from RP-style dialogue logs were turned into positive pairs:("utterance_i", "utterance_{i+1}")
stsb_multi_mt (en)
Standard semantic similarity dataset, used as additional general-purpose supervision.
Together, these teach the model:
- General sentence-level semantic similarity
- Dialogue turns & conversational flow
- Roleplay-specific semantics (emotional nuance, relationships, "scene continuity")
Usage (Python + ONNX Runtime)
1. Install dependencies
pip install onnxruntime transformers
2. Basic embedding example
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
model_id = "Zeolit/lettuce-emb-512d-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
MAX_SEQ_LEN = 512
def embed(texts):
if isinstance(texts, str):
texts = [texts]
enc = tokenizer(
texts,
padding=True,
truncation=True,
max_length=MAX_SEQ_LEN,
return_tensors="np",
)
outputs = session.run(
["sentence_embedding"],
{
"input_ids": enc["input_ids"],
"attention_mask": enc["attention_mask"],
},
)[0]
# outputs: (batch_size, 512)
return outputs
def cosine(a, b):
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
sents = [
"The hero shielded his companion from the arrow.",
"He risked everything to save her life.",
"The market is bustling with traders today.",
]
emb = embed(sents)
print(emb.shape) # (3, 512)
print("cos(0,1) =", cosine(emb[0], emb[1]))
print("cos(0,2) =", cosine(emb[0], emb[2]))
π§© Intended Use
Ideal for:
- Conversational / roleplay memory retrieval (primary use case)
- Semantic search and document retrieval
- RAG (Retrieval-Augmented Generation) systems
- Clustering and duplicate detection
- Lightweight, local embedding on:
- Desktop clients (Tauri, Electron, etc.)
- Mobile clients (via ONNX Runtime)
Strengths:
- Excellent retrieval performance β finds relevant documents with high precision
- Efficient size β 66M parameters, fast inference
- Long context β 512 tokens captures more information than typical small models
Trade-offs:
- Optimized more for retrieval than perfect semantic similarity scoring
- STS performance is solid but not best-in-class compared to larger models
- This is a deliberate design choice for practical applications where finding the right document matters more than perfect similarity scores
Non-goals:
- High-stakes safety-critical applications
- Multilingual performance on par with BGE-M3 (this model is primarily English-focused)
- Extremely large-scale retrieval across millions of vectors (consider larger models or specialized systems)
Limitations & Bias
- The model was trained on English text and may not generalize as well to other languages.
- Roleplay-style corpora may contain biased, informal, or emotionally charged content. The embeddings may reflect those patterns.
- The model does not perform any safety filtering; it simply encodes text into vectors.
- Weaker on certain STS tasks (STS12, STS14) that favor heavily distilled or contrastive models.
Always evaluate the model in your specific application context before using it in production.
Benchmarks
Detailed Retrieval Metrics (ArguAna)
Core Metrics:
| Metric | Score | Comparison |
|---|---|---|
| nDCG@10 | 0.3958 | π₯ #1 among sub-100M models |
| nDCG@1 | 0.2027 | Strong initial precision |
| nDCG@3 | 0.3072 | Excellent top-3 ranking |
| nDCG@5 | 0.3475 | Outstanding top-5 quality |
| nDCG@20 | 0.4264 | Robust extended ranking |
| nDCG@100 | 0.4588 | Comprehensive retrieval |
Recall Performance:
| Metric | Score | Interpretation |
|---|---|---|
| Recall@1 | 0.2027 | 20% of queries answered by top result |
| Recall@5 | 0.4794 | ~48% of relevant docs in top 5 |
| Recall@10 | 0.6295 | 63% of relevant docs in top 10 |
| Recall@20 | 0.7496 | 75% coverage in top 20 |
| Recall@100 | 0.9203 | 92% comprehensive coverage |
Ranking Quality:
| Metric | Score | Notes |
|---|---|---|
| MRR@10 | 0.3267 | Mean Reciprocal Rank β strong answer positioning |
| MAP@10 | 0.3237 | Mean Average Precision β consistent quality |
| Precision@1 | 0.2027 | 1-in-5 top results are perfect |
| Precision@10 | 0.0629 | Maintains quality across top 10 |
Comparison vs. Similar Models
| Model | Params | nDCG@10 | Recall@10 | MRR@10 |
|---|---|---|---|---|
| lettuce-emb-512d-v1 | 66M | 0.396 | 0.630 | 0.327 |
| MiniLM-L6-v2 | 22M | 0.33 | ~0.55 | ~0.28 |
| MiniLM-L12-v2 | 33M | 0.34 | ~0.57 | ~0.29 |
| gte-small | 33M | 0.35 | ~0.59 | ~0.30 |
| e5-small | 34M | 0.36 | ~0.61 | ~0.31 |
| snowflake-arctic-xs | 35M | 0.37 | ~0.62 | ~0.32 |
| gte-base | 100M | 0.46 | ~0.72 | ~0.39 |
| bge-small | 335M | 0.51 | ~0.77 | ~0.43 |
| bge-base | 110M | 0.58 | ~0.82 | ~0.49 |
| bge-m3 | 567M | 0.68 | ~0.88 | ~0.58 |
Gap Analysis:
- vs. sub-100M models: +10-20% improvement in nDCG@10
- vs. 100M models: Only 14% behind gte-base despite being 34% smaller
- vs. teacher (bge-m3): 42% behind, but at 11.6% of the size
Semantic Similarity Details (STS)
STS12 (2012 Benchmark):
| Distance Metric | Pearson | Spearman |
|---|---|---|
| Cosine | 0.6156 | 0.4941 |
| Manhattan | 0.5998 | 0.4945 |
| Euclidean | 0.5981 | 0.4941 |
STS13 (2013 Benchmark):
| Distance Metric | Pearson | Spearman |
|---|---|---|
| Cosine | 0.6261 | 0.6492 |
| Manhattan | 0.6513 | 0.6504 |
| Euclidean | 0.6502 | 0.6492 |
STS14 (2014 Benchmark):
| Distance Metric | Pearson | Spearman |
|---|---|---|
| Cosine | 0.6166 | 0.5508 |
| Manhattan | 0.6232 | 0.5517 |
| Euclidean | 0.6222 | 0.5508 |
STS15 (2015 Benchmark):
| Distance Metric | Pearson | Spearman |
|---|---|---|
| Cosine | 0.6445 | 0.6765 |
| Manhattan | 0.6742 | 0.6775 |
| Euclidean | 0.6731 | 0.6765 |
Comparison vs. Small Models (STS)
| Model | Params | STS12 | STS13 | STS14 | STS15 | Avg |
|---|---|---|---|---|---|---|
| MiniLM-L6 | 22M | 0.55 | 0.63 | 0.63 | 0.75 | ~0.64 |
| MiniLM-L12 | 33M | 0.56 | 0.66 | 0.64 | 0.77 | ~0.66 |
| gte-small | 33M | 0.58 | 0.66 | 0.61 | 0.77 | ~0.655 |
| e5-small | 34M | 0.57 | 0.68 | 0.63 | 0.78 | ~0.67 |
| lettuce-emb-512d-v1 | 66M | 0.494 | 0.649 | 0.551 | 0.676 | ~0.593 |
Performance Notes:
- STS13 strength: Essentially matches MiniLM-L12 (both ~0.65)
- STS15 strength: Close to top performers (0.676 vs 0.75-0.78 range)
- STS12/14 trade-off: 10-15% behind leaders β room for improvement
- Overall: Solid mid-pack performance, optimized for retrieval rather than peak STS scores
License
This model is released under the Apache-2.0 license.
It is derived from:
distilbert-base-uncased(Apache-2.0) as the student backboneBAAI/bge-m3(MIT) as the distillation teacher
Acknowledgements
- HuggingFace Transformers
- ONNX Runtime
- BAAI / BGE
- The LettuceAI project for motivating a roleplay-focused embedding model
- Downloads last month
- 20
Evaluation results
- nDCG@10 on ArguAnaself-reported0.396
- Spearman on STS12self-reported0.490
- Spearman on STS13self-reported0.650
- Spearman on STS14self-reported0.550
- Spearman on STS15self-reported0.676