--- pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - arctic - embedding - snowflake2_m_uint8 - snowflake - transformers.js license: apache-2.0 language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh --- # Final Update, September 20, 2025 This model is obsolete now, please use https://huggingface.co/electroglyph/snowflake-arctic-embed-m-v2.0-ONNX-uint8 This model is still fine, but my latest one is a little more accurate # Update I've updated this model to be compatible with Fastembed. I removed the `sentence_embedding` output and quantized the main model output instead. This now outputs a dimension 768 multivector. To use the output you should use CLS pooling with normalization disabled. # snowflake2_m_uint8 This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0 I have added a linear quantization node before the `token_embeddings` output so that it directly outputs a dimension 768 uint8 multivector. This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections. I took the liberty of removing the `sentence_embedding` output (since I would've had to re-create it), I can add it back in if anybody wants it. # Quantization method Linear quantization for the scale -7 to 7. Here's what the graph of the original output looks like: ![original model graph](./graph_old.png) Here's what the new graph in this model looks like: ![modified model graph](./graph_new.png) # Benchmark I used beir-qdrant with the scifact dataset. quantized output (this model): ``` ndcg: {'NDCG@1': 0.59333, 'NDCG@3': 0.64619, 'NDCG@5': 0.6687, 'NDCG@10': 0.69228, 'NDCG@100': 0.72204, 'NDCG@1000': 0.72747} recall: {'Recall@1': 0.56094, 'Recall@3': 0.68394, 'Recall@5': 0.73983, 'Recall@10': 0.80689, 'Recall@100': 0.94833, 'Recall@1000': 0.99333} precision: {'P@1': 0.59333, 'P@3': 0.25, 'P@5': 0.16467, 'P@10': 0.09167, 'P@100': 0.01077, 'P@1000': 0.00112} ``` unquantized output (model_uint8.onnx): ``` ndcg: {'NDCG@1': 0.59333, 'NDCG@3': 0.65417, 'NDCG@5': 0.6741, 'NDCG@10': 0.69675, 'NDCG@100': 0.7242, 'NDCG@1000': 0.7305} recall: {'Recall@1': 0.56094, 'Recall@3': 0.69728, 'Recall@5': 0.74817, 'Recall@10': 0.81356, 'Recall@100': 0.945, 'Recall@1000': 0.99667} precision: {'P@1': 0.59333, 'P@3': 0.25444, 'P@5': 0.16667, 'P@10': 0.09233, 'P@100': 0.01073, 'P@1000': 0.00113} ``` # Example inference/benchmark code and how to use the model with Fastembed After installing beir-qdrant make sure to upgrade fastembed. ```python # pip install qdrant_client beir-qdrant # pip install -U fastembed from fastembed import TextEmbedding from fastembed.common.model_description import PoolingType, ModelSource from beir import util from beir.datasets.data_loader import GenericDataLoader from beir.retrieval.evaluation import EvaluateRetrieval from qdrant_client import QdrantClient from qdrant_client.models import Datatype from beir_qdrant.retrieval.models.fastembed import DenseFastEmbedModelAdapter from beir_qdrant.retrieval.search.dense import DenseQdrantSearch TextEmbedding.add_custom_model( model="electroglyph/snowflake2_m_uint8", pooling=PoolingType.CLS, normalization=False, sources=ModelSource(hf="electroglyph/snowflake2_m_uint8"), dim=768, model_file="snowflake2_m_uint8.onnx", ) dataset = "scifact" url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset) data_path = util.download_and_unzip(url, "datasets") corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test") qdrant_client = QdrantClient("http://localhost:6333") model = DenseQdrantSearch( qdrant_client, model=DenseFastEmbedModelAdapter( model_name="electroglyph/snowflake2_m_uint8" ), collection_name="scifact-uint8", initialize=True, datatype=Datatype.UINT8 ) retriever = EvaluateRetrieval(model) results = retriever.retrieve(corpus, queries) ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values) print(f"ndcg: {ndcg}\nrecall: {recall}\nprecision: {precision}") ```