lettuce-emb-512d-v1 β€” 512d Distilled Embedding Model (ONNX)

Zeolit/lettuce-emb-512d-v1 is a custom 512-dimensional sentence embedding model, trained by distillation from BAAI/bge-m3 and specialized for conversational / roleplay memory use cases.

It's built on top of DistilBERT (512-token context) with a 512-dim projection head and exported to ONNX for fast, portable inference (desktop + mobile).

This model is designed to power the memory system of LettuceAI (a roleplay-focused LLM client), but can be used as a general-purpose semantic embedding model.


πŸ† Performance Highlights

Retrieval Performance (ArguAna)

Overall: #1 among sub-100M models

Metric Score Rank
nDCG@10 0.3958 #1 (sub-100M)
nDCG@1 0.2027 Top-tier
nDCG@3 0.3072 Top-tier
nDCG@5 0.3475 Top-tier
nDCG@20 0.4264 Top-tier
Recall@10 0.6295 Excellent
Recall@20 0.7496 Excellent
MRR@10 0.3267 Strong
MAP@10 0.3237 Strong

What this means:

  • Beats every "tiny/small" embedding model (MiniLM, gte-small, e5-small, Snowflake-xs)
  • Only surpassed by 100M+ models (gte-base: 0.46, bge-base: 0.58, bge-small: 0.51, bge-m3: 0.68)
  • Exceptional performance for 66M parameters β€” punching way above weight class

Semantic Similarity Performance (STS Benchmarks)

Dataset Spearman Pearson Performance Notes
STS12 0.494 0.616 Solid baseline
STS13 0.649 0.626 Strong β€” matches MiniLM-L12
STS14 0.551 0.617 Competitive
STS15 0.676 0.644 Excellent β€” near top of class
Average ~0.593 ~0.626 Mid-pack, solid & usable

Strengths:

  • STS13 (0.649): Matches/beats models 2-3x larger
  • STS15 (0.676): Competitive with top small models (e5-small: 0.78, gte-small: 0.77)
  • Balanced Pearson scores (0.616-0.644) show consistent correlation quality

Context:

  • Mid-pack among 10-100M models for semantic similarity
  • Trade-off: optimized for retrieval over perfect STS scores
  • This is a deliberate design choice for practical applications where finding the right document matters more than perfect similarity scoring

Combined Assessment

βœ… Top-tier retrieval β€” Best in class for 66M parameters
βœ… Competitive semantic similarity β€” Solid for clustering, deduplication
βœ… Efficient β€” Exceptional size-to-performance ratio
βœ… Practical β€” Optimized for real-world RAG and search use cases


Model Summary

  • Base encoder: distilbert-base-uncased
  • Parameters: 66M
  • Teacher model: BAAI/bge-m3
  • Embedding dimension: 512
  • Max sequence length: 512 tokens
  • Format: ONNX (FP32)
  • Pooling: Mean pooling over last hidden state + L2 normalization
  • Domain: General English text with extra focus on dialogue / roleplay data

The model was trained to:

  • Match the similarity structure of BGE-M3 in embedding space (batch-level KD)
  • Pull semantically related texts together (contrastive loss)
  • Capture nuances in conversational, character-driven, and roleplay-style text

Training & Distillation

High-level training setup:

  • Student: DistilBERT with a 512-dim projection head on top of the pooled output
  • Teacher: BGE-M3 (1024-dim multilingual embedding model)
  • Losses:
    • Geometry distillation: MSE between student and teacher batch similarity matrices
    • Contrastive loss: 1 - cosine(student_emb_1, student_emb_2) for positive pairs
  • Batch size: 16
  • Max seq length: 512
  • Optimizer: AdamW, lr = 2e-5
  • Epochs: 2 (over combined dataset)

Datasets

The student was trained on a mixture of:

  1. Custom roleplay corpus (JSONL)
    Consecutive lines from RP-style dialogue logs were turned into positive pairs:

    • ("utterance_i", "utterance_{i+1}")
  2. stsb_multi_mt (en)
    Standard semantic similarity dataset, used as additional general-purpose supervision.

Together, these teach the model:

  • General sentence-level semantic similarity
  • Dialogue turns & conversational flow
  • Roleplay-specific semantics (emotional nuance, relationships, "scene continuity")

Usage (Python + ONNX Runtime)

1. Install dependencies

pip install onnxruntime transformers

2. Basic embedding example

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

model_id = "Zeolit/lettuce-emb-512d-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

MAX_SEQ_LEN = 512

def embed(texts):
    if isinstance(texts, str):
        texts = [texts]

    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=MAX_SEQ_LEN,
        return_tensors="np",
    )

    outputs = session.run(
        ["sentence_embedding"],
        {
            "input_ids": enc["input_ids"],
            "attention_mask": enc["attention_mask"],
        },
    )[0]
    # outputs: (batch_size, 512)
    return outputs

def cosine(a, b):
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

sents = [
    "The hero shielded his companion from the arrow.",
    "He risked everything to save her life.",
    "The market is bustling with traders today.",
]

emb = embed(sents)
print(emb.shape)  # (3, 512)
print("cos(0,1) =", cosine(emb[0], emb[1]))
print("cos(0,2) =", cosine(emb[0], emb[2]))

🧩 Intended Use

Ideal for:

  • Conversational / roleplay memory retrieval (primary use case)
  • Semantic search and document retrieval
  • RAG (Retrieval-Augmented Generation) systems
  • Clustering and duplicate detection
  • Lightweight, local embedding on:
    • Desktop clients (Tauri, Electron, etc.)
    • Mobile clients (via ONNX Runtime)

Strengths:

  • Excellent retrieval performance β€” finds relevant documents with high precision
  • Efficient size β€” 66M parameters, fast inference
  • Long context β€” 512 tokens captures more information than typical small models

Trade-offs:

  • Optimized more for retrieval than perfect semantic similarity scoring
  • STS performance is solid but not best-in-class compared to larger models
  • This is a deliberate design choice for practical applications where finding the right document matters more than perfect similarity scores

Non-goals:

  • High-stakes safety-critical applications
  • Multilingual performance on par with BGE-M3 (this model is primarily English-focused)
  • Extremely large-scale retrieval across millions of vectors (consider larger models or specialized systems)

Limitations & Bias

  • The model was trained on English text and may not generalize as well to other languages.
  • Roleplay-style corpora may contain biased, informal, or emotionally charged content. The embeddings may reflect those patterns.
  • The model does not perform any safety filtering; it simply encodes text into vectors.
  • Weaker on certain STS tasks (STS12, STS14) that favor heavily distilled or contrastive models.

Always evaluate the model in your specific application context before using it in production.


Benchmarks

Detailed Retrieval Metrics (ArguAna)

Core Metrics:

Metric Score Comparison
nDCG@10 0.3958 πŸ₯‡ #1 among sub-100M models
nDCG@1 0.2027 Strong initial precision
nDCG@3 0.3072 Excellent top-3 ranking
nDCG@5 0.3475 Outstanding top-5 quality
nDCG@20 0.4264 Robust extended ranking
nDCG@100 0.4588 Comprehensive retrieval

Recall Performance:

Metric Score Interpretation
Recall@1 0.2027 20% of queries answered by top result
Recall@5 0.4794 ~48% of relevant docs in top 5
Recall@10 0.6295 63% of relevant docs in top 10
Recall@20 0.7496 75% coverage in top 20
Recall@100 0.9203 92% comprehensive coverage

Ranking Quality:

Metric Score Notes
MRR@10 0.3267 Mean Reciprocal Rank β€” strong answer positioning
MAP@10 0.3237 Mean Average Precision β€” consistent quality
Precision@1 0.2027 1-in-5 top results are perfect
Precision@10 0.0629 Maintains quality across top 10

Comparison vs. Similar Models

Model Params nDCG@10 Recall@10 MRR@10
lettuce-emb-512d-v1 66M 0.396 0.630 0.327
MiniLM-L6-v2 22M 0.33 ~0.55 ~0.28
MiniLM-L12-v2 33M 0.34 ~0.57 ~0.29
gte-small 33M 0.35 ~0.59 ~0.30
e5-small 34M 0.36 ~0.61 ~0.31
snowflake-arctic-xs 35M 0.37 ~0.62 ~0.32
gte-base 100M 0.46 ~0.72 ~0.39
bge-small 335M 0.51 ~0.77 ~0.43
bge-base 110M 0.58 ~0.82 ~0.49
bge-m3 567M 0.68 ~0.88 ~0.58

Gap Analysis:

  • vs. sub-100M models: +10-20% improvement in nDCG@10
  • vs. 100M models: Only 14% behind gte-base despite being 34% smaller
  • vs. teacher (bge-m3): 42% behind, but at 11.6% of the size

Semantic Similarity Details (STS)

STS12 (2012 Benchmark):

Distance Metric Pearson Spearman
Cosine 0.6156 0.4941
Manhattan 0.5998 0.4945
Euclidean 0.5981 0.4941

STS13 (2013 Benchmark):

Distance Metric Pearson Spearman
Cosine 0.6261 0.6492
Manhattan 0.6513 0.6504
Euclidean 0.6502 0.6492

STS14 (2014 Benchmark):

Distance Metric Pearson Spearman
Cosine 0.6166 0.5508
Manhattan 0.6232 0.5517
Euclidean 0.6222 0.5508

STS15 (2015 Benchmark):

Distance Metric Pearson Spearman
Cosine 0.6445 0.6765
Manhattan 0.6742 0.6775
Euclidean 0.6731 0.6765

Comparison vs. Small Models (STS)

Model Params STS12 STS13 STS14 STS15 Avg
MiniLM-L6 22M 0.55 0.63 0.63 0.75 ~0.64
MiniLM-L12 33M 0.56 0.66 0.64 0.77 ~0.66
gte-small 33M 0.58 0.66 0.61 0.77 ~0.655
e5-small 34M 0.57 0.68 0.63 0.78 ~0.67
lettuce-emb-512d-v1 66M 0.494 0.649 0.551 0.676 ~0.593

Performance Notes:

  • STS13 strength: Essentially matches MiniLM-L12 (both ~0.65)
  • STS15 strength: Close to top performers (0.676 vs 0.75-0.78 range)
  • STS12/14 trade-off: 10-15% behind leaders β€” room for improvement
  • Overall: Solid mid-pack performance, optimized for retrieval rather than peak STS scores

License

This model is released under the Apache-2.0 license.

It is derived from:

  • distilbert-base-uncased (Apache-2.0) as the student backbone
  • BAAI/bge-m3 (MIT) as the distillation teacher

Acknowledgements

Downloads last month
20
Safetensors
Model size
66.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results