lettuce-emb-512d-v1 — 512d Distilled Embedding Model (ONNX)

Zeolit/lettuce-emb-512d-v1 is a custom 512-dimensional sentence embedding model, trained by distillation from BAAI/bge-m3 and specialized for conversational / roleplay memory use cases.

It's built on top of DistilBERT (512-token context) with a 512-dim projection head and exported to ONNX for fast, portable inference (desktop + mobile).

This model is designed to power the memory system of LettuceAI (a roleplay-focused LLM client), but can be used as a general-purpose semantic embedding model.

🏆 Performance Highlights

Retrieval Performance (ArguAna)

Overall: #1 among sub-100M models

Metric	Score	Rank
nDCG@10	0.3958	#1 (sub-100M)
nDCG@1	0.2027	Top-tier
nDCG@3	0.3072	Top-tier
nDCG@5	0.3475	Top-tier
nDCG@20	0.4264	Top-tier
Recall@10	0.6295	Excellent
Recall@20	0.7496	Excellent
MRR@10	0.3267	Strong
MAP@10	0.3237	Strong

What this means:

Beats every "tiny/small" embedding model (MiniLM, gte-small, e5-small, Snowflake-xs)
Only surpassed by 100M+ models (gte-base: 0.46, bge-base: 0.58, bge-small: 0.51, bge-m3: 0.68)
Exceptional performance for 66M parameters — punching way above weight class

Semantic Similarity Performance (STS Benchmarks)

Dataset	Spearman	Pearson	Performance Notes
STS12	0.494	0.616	Solid baseline
STS13	0.649	0.626	Strong — matches MiniLM-L12
STS14	0.551	0.617	Competitive
STS15	0.676	0.644	Excellent — near top of class
Average	~0.593	~0.626	Mid-pack, solid & usable

Strengths:

STS13 (0.649): Matches/beats models 2-3x larger
STS15 (0.676): Competitive with top small models (e5-small: 0.78, gte-small: 0.77)
Balanced Pearson scores (0.616-0.644) show consistent correlation quality

Context:

Mid-pack among 10-100M models for semantic similarity
Trade-off: optimized for retrieval over perfect STS scores
This is a deliberate design choice for practical applications where finding the right document matters more than perfect similarity scoring

Combined Assessment

✅ Top-tier retrieval — Best in class for 66M parameters
✅ Competitive semantic similarity — Solid for clustering, deduplication
✅ Efficient — Exceptional size-to-performance ratio
✅ Practical — Optimized for real-world RAG and search use cases

Model Summary

Base encoder: distilbert-base-uncased
Parameters: 66M
Teacher model: BAAI/bge-m3
Embedding dimension: 512
Max sequence length: 512 tokens
Format: ONNX (FP32)
Pooling: Mean pooling over last hidden state + L2 normalization
Domain: General English text with extra focus on dialogue / roleplay data

The model was trained to:

Match the similarity structure of BGE-M3 in embedding space (batch-level KD)
Pull semantically related texts together (contrastive loss)
Capture nuances in conversational, character-driven, and roleplay-style text

Training & Distillation

High-level training setup:

Student: DistilBERT with a 512-dim projection head on top of the pooled output
Teacher: BGE-M3 (1024-dim multilingual embedding model)
Losses:
- Geometry distillation: MSE between student and teacher batch similarity matrices
- Contrastive loss: 1 - cosine(student_emb_1, student_emb_2) for positive pairs
Batch size: 16
Max seq length: 512
Optimizer: AdamW, lr = 2e-5
Epochs: 2 (over combined dataset)

Datasets

The student was trained on a mixture of:

Custom roleplay corpus (JSONL)
Consecutive lines from RP-style dialogue logs were turned into positive pairs:
- ("utterance_i", "utterance_{i+1}")
stsb_multi_mt (en)
Standard semantic similarity dataset, used as additional general-purpose supervision.

Together, these teach the model:

General sentence-level semantic similarity
Dialogue turns & conversational flow
Roleplay-specific semantics (emotional nuance, relationships, "scene continuity")

Usage (Python + ONNX Runtime)

1. Install dependencies

pip install onnxruntime transformers

2. Basic embedding example

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

model_id = "Zeolit/lettuce-emb-512d-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

MAX_SEQ_LEN = 512

def embed(texts):
    if isinstance(texts, str):
        texts = [texts]

    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=MAX_SEQ_LEN,
        return_tensors="np",
    )

    outputs = session.run(
        ["sentence_embedding"],
        {
            "input_ids": enc["input_ids"],
            "attention_mask": enc["attention_mask"],
        },
    )[0]
    # outputs: (batch_size, 512)
    return outputs

def cosine(a, b):
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

sents = [
    "The hero shielded his companion from the arrow.",
    "He risked everything to save her life.",
    "The market is bustling with traders today.",
]

emb = embed(sents)
print(emb.shape)  # (3, 512)
print("cos(0,1) =", cosine(emb[0], emb[1]))
print("cos(0,2) =", cosine(emb[0], emb[2]))

🧩 Intended Use

Ideal for:

Conversational / roleplay memory retrieval (primary use case)
Semantic search and document retrieval
RAG (Retrieval-Augmented Generation) systems
Clustering and duplicate detection
Lightweight, local embedding on:
- Desktop clients (Tauri, Electron, etc.)
- Mobile clients (via ONNX Runtime)

Strengths:

Excellent retrieval performance — finds relevant documents with high precision
Efficient size — 66M parameters, fast inference
Long context — 512 tokens captures more information than typical small models

Trade-offs:

Optimized more for retrieval than perfect semantic similarity scoring
STS performance is solid but not best-in-class compared to larger models
This is a deliberate design choice for practical applications where finding the right document matters more than perfect similarity scores

Non-goals:

High-stakes safety-critical applications
Multilingual performance on par with BGE-M3 (this model is primarily English-focused)
Extremely large-scale retrieval across millions of vectors (consider larger models or specialized systems)

Limitations & Bias

The model was trained on English text and may not generalize as well to other languages.
Roleplay-style corpora may contain biased, informal, or emotionally charged content. The embeddings may reflect those patterns.
The model does not perform any safety filtering; it simply encodes text into vectors.
Weaker on certain STS tasks (STS12, STS14) that favor heavily distilled or contrastive models.

Always evaluate the model in your specific application context before using it in production.

Benchmarks

Detailed Retrieval Metrics (ArguAna)

Core Metrics:

Metric	Score	Comparison
nDCG@10	0.3958	🥇 #1 among sub-100M models
nDCG@1	0.2027	Strong initial precision
nDCG@3	0.3072	Excellent top-3 ranking
nDCG@5	0.3475	Outstanding top-5 quality
nDCG@20	0.4264	Robust extended ranking
nDCG@100	0.4588	Comprehensive retrieval

Recall Performance:

Metric	Score	Interpretation
Recall@1	0.2027	20% of queries answered by top result
Recall@5	0.4794	~48% of relevant docs in top 5
Recall@10	0.6295	63% of relevant docs in top 10
Recall@20	0.7496	75% coverage in top 20
Recall@100	0.9203	92% comprehensive coverage

Ranking Quality:

Metric	Score	Notes
MRR@10	0.3267	Mean Reciprocal Rank — strong answer positioning
MAP@10	0.3237	Mean Average Precision — consistent quality
Precision@1	0.2027	1-in-5 top results are perfect
Precision@10	0.0629	Maintains quality across top 10

Comparison vs. Similar Models

Model	Params	nDCG@10	Recall@10	MRR@10
lettuce-emb-512d-v1	66M	0.396	0.630	0.327
MiniLM-L6-v2	22M	0.33	~0.55	~0.28
MiniLM-L12-v2	33M	0.34	~0.57	~0.29
gte-small	33M	0.35	~0.59	~0.30
e5-small	34M	0.36	~0.61	~0.31
snowflake-arctic-xs	35M	0.37	~0.62	~0.32
gte-base	100M	0.46	~0.72	~0.39
bge-small	335M	0.51	~0.77	~0.43
bge-base	110M	0.58	~0.82	~0.49
bge-m3	567M	0.68	~0.88	~0.58

Gap Analysis:

vs. sub-100M models: +10-20% improvement in nDCG@10
vs. 100M models: Only 14% behind gte-base despite being 34% smaller
vs. teacher (bge-m3): 42% behind, but at 11.6% of the size

Semantic Similarity Details (STS)

STS12 (2012 Benchmark):

Distance Metric	Pearson	Spearman
Cosine	0.6156	0.4941
Manhattan	0.5998	0.4945
Euclidean	0.5981	0.4941

STS13 (2013 Benchmark):

Distance Metric	Pearson	Spearman
Cosine	0.6261	0.6492
Manhattan	0.6513	0.6504
Euclidean	0.6502	0.6492

STS14 (2014 Benchmark):

Distance Metric	Pearson	Spearman
Cosine	0.6166	0.5508
Manhattan	0.6232	0.5517
Euclidean	0.6222	0.5508

STS15 (2015 Benchmark):

Distance Metric	Pearson	Spearman
Cosine	0.6445	0.6765
Manhattan	0.6742	0.6775
Euclidean	0.6731	0.6765

Comparison vs. Small Models (STS)

Model	Params	STS12	STS13	STS14	STS15	Avg
MiniLM-L6	22M	0.55	0.63	0.63	0.75	~0.64
MiniLM-L12	33M	0.56	0.66	0.64	0.77	~0.66
gte-small	33M	0.58	0.66	0.61	0.77	~0.655
e5-small	34M	0.57	0.68	0.63	0.78	~0.67
lettuce-emb-512d-v1	66M	0.494	0.649	0.551	0.676	~0.593

Performance Notes:

STS13 strength: Essentially matches MiniLM-L12 (both ~0.65)
STS15 strength: Close to top performers (0.676 vs 0.75-0.78 range)
STS12/14 trade-off: 10-15% behind leaders — room for improvement
Overall: Solid mid-pack performance, optimized for retrieval rather than peak STS scores

License

This model is released under the Apache-2.0 license.

It is derived from:

distilbert-base-uncased (Apache-2.0) as the student backbone
BAAI/bge-m3 (MIT) as the distillation teacher

Acknowledgements

HuggingFace Transformers
ONNX Runtime
BAAI / BGE
The LettuceAI project for motivating a roleplay-focused embedding model

Downloads last month: 20

Safetensors

Model size

66.4M params

Tensor type

F32

Evaluation results

nDCG@10 on ArguAna
self-reported

0.396
Spearman on STS12
self-reported

0.490
Spearman on STS13
self-reported

0.650
Spearman on STS14
self-reported

0.550
Spearman on STS15
self-reported

0.676

View on Papers With Code