EmbeddingGemma looks really promising — especially the focus on efficiency. In my own experiments, I quantized DistilBERT (SST-2) into an int8 ONNX model and saw latency reductions of around 20–30% with almost no accuracy drop. I’m curious if the team has explored quantization-aware training (QAT) for EmbeddingGemma, or if post-training quantization is sufficient to maintain embedding quality? Here’s my demo repo for reference: https://www.linkedin.com/posts/dr-mm-alam-93991120b_demofirst-aichips-edgeai-activity-7381674484098883584-0Rwn/?utm_source=share&utm_medium=member_desktop&rcm=ACoAADVZuP0BheDJgKL8dWk-bNo7Yd4zhsOnNL4
Masoom Alam
mmalam786
AI & ML interests
None yet
Recent Activity
commented on
an
article
about 2 months ago
Welcome EmbeddingGemma, Google's new efficient embedding model
commented on
an
article
about 2 months ago
Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers
updated
a model
about 2 months ago
mmalam786/distilbert-sst2-int8-onnx-demo
Organizations
None yet