PyLate model based on ozayezerceli/ettin-encoder-32M-TR

This is a PyLate model finetuned from ozayezerceli/ettin-encoder-32M-TR on the msmarco-tr dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.

Model Details

Model Description

  • Model Type: PyLate model
  • Base model: ozayezerceli/ettin-encoder-32M-TR
  • Document Length: 180 tokens
  • Query Length: 32 tokens
  • Output Dimensionality: 128 tokens
  • Similarity Function: MaxSim
  • Training Dataset:
  • Language: tr

Model Sources

Full Model Architecture

ColBERT(
  (0): Transformer({'max_seq_length': 179, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 384, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

Use this model with PyLate to index and retrieve documents. The index uses FastPLAID for efficient similarity search.

Indexing documents

Load the ColBERT model and initialize the PLAID index, then encode and index your documents:

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path="pylate_model_id",
)

# Step 2: Initialize the PLAID index
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
    override=True,  # This overwrites the existing index if any
)

# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["document 1 text", "document 2 text", "document 3 text"]

documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
    show_progress_bar=True,
)

# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:

# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
)

Retrieving top-k documents for queries

Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:

# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)

# Step 2: Encode the queries
queries_embeddings = model.encode(
    ["query for document 3", "query for document 1"],
    batch_size=32,
    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
    show_progress_bar=True,
)

# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
    queries_embeddings=queries_embeddings,
    k=10,  # Retrieve the top 10 matches for each query
)

Reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

from pylate import rank, models

queries = [
    "query A",
    "query B",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="pylate_model_id",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)

Training Details

Training Dataset

msmarco-tr

  • Dataset: msmarco-tr at ffad30a
  • Size: 910,904 training samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 17.07 tokens
    • max: 32 tokens
    • min: 25 tokens
    • mean: 31.99 tokens
    • max: 32 tokens
    • min: 21 tokens
    • mean: 31.99 tokens
    • max: 32 tokens
  • Samples:
    query positive negative
    sinir dokusundaki miyelin kılıfı nerede Miyelin, bir tabaka oluşturan akson dielektrik (elektriksel olarak yalıtkan) malzemeyi çevreleyen yağlı bir beyaz maddedir, miyelin kılıfı, genellikle sadece bir nöronun aksonu etrafında bulunur. Sinir sisteminin düzgün çalışması için gereklidir. Bir tür glial hücrenin bir dış büyümesidir. Miyelin kılıfının üretimi miyelinasyon olarak adlandırılır. İnsanlarda, miyelin kılıfı 14'üncü haftada başlar. İnsanlarda, dört temel doku tipi vardır: epitel dokusu, bağ dokusu, kas dokusu ve sinir dokusu. Her genel doku tipi içinde, belirli doku tipleri vardır. Bunu bir futbol takımı gibi düşünün.Her biri sahada kendi 'iş' olan bireysel oyuncular vardır.n insanlar, dört temel doku tipi vardır: epitel dokusu, bağ dokusu, kas dokusu ve sinir dokusu. Bu genel doku tipinde, her bir genel doku tipinde vardır.
    Okulların Makine Mühendisliğini Sundukları Şeyler Makine Mühendisliği Teknolojisi Dereceleri için Üst Okullar. Pennsylvania Eyalet Üniversitesi - Harrisburg, Purdue Üniversitesi ve Houston Üniversitesi, makine mühendisliği teknolojisi (MET) alanında lisans derecesi sunan üç okuldur. Bu üniversitelerdeki MET programları hakkında daha fazla bilgi edinmek için okumaya devam edin. Mühendis tanımı, motorların veya makinelerin tasarımında, yapımında ve kullanımında veya çeşitli mühendislik dallarından herhangi birinde eğitimli ve yetenekli bir kişi: bir makine mühendisi; bir inşaat mühendisi. Daha fazla bilgi için bkz.
    kim navigatör karıştırma valfleri taşır BRADLEY THERMOSTATIC MIXING VANAS. Bradley Corporation, armatür ve sıhhi tesisat ürünlerinin üretiminde lider, dört hat üretir. termostatik karıştırma valfleri (TMVs). Bradley Navigator Yüksek Düşük termostatik karıştırma valfleri vardır. Dıştan gelen talebin çok düşük olduğu uygulamalar için idealdir. Hidrolik Valfler. Eaton valfleri, tüm dünyadaki pazarlarda müşterilerimiz için rekabet avantajı sağlar. Geniş bir seçenek yelpazesinde benzersiz kalite sunan yüksek değerli hidrolik valf ürünlerimiz, gerçek endüstri liderlerinin tüm özelliklerini ve performans seviyelerini içerir. Endüstriyel Valfler.
  • Loss: pylate.losses.contrastive.Contrastive

Evaluation Dataset

msmarco-tr

  • Dataset: msmarco-tr at ffad30a
  • Size: 9,202 evaluation samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 16.9 tokens
    • max: 32 tokens
    • min: 29 tokens
    • mean: 32.0 tokens
    • max: 32 tokens
    • min: 28 tokens
    • mean: 31.99 tokens
    • max: 32 tokens
  • Samples:
    query positive negative
    Ermin hangi hayvandır 1 Aslında ermine kelimesi beyaz kürklü bir hayvanı ifade ederken, sırt üstü kahverengi kürklü ve karnında baş ve beyaz kürklü bireyler için stoat kullanılır. Dünyada kaç hayvan türü var? İşte kaba bir sayım ve bilim adamlarının sayılara nasıl ulaştıklarına dair kısa bir açıklama. Dünyada kaç hayvan türü var? İşte kaba bir sayım ve bilim adamlarının sayılara nasıl ulaştıklarına dair kısa bir açıklama. Kaç hayvan türü var? https://www.thoughtco.com/how-many-animal-türleri-on-planet-130923 Strauss, Bob.
    Abacus nereden çıktı Abacus: Kısa Bir Tarih. Abacus, kökeni Yunanca abax veya abakon (masa veya tablet anlamına gelir) kelimelerinden gelen ve muhtemelen kum anlamına gelen Semitik abq kelimesinden kaynaklanan Latince bir kelimedir. Abacus, büyük sayıları saymak için kullanılan birçok sayma cihazından biridir. Hücre apeksinde, bir flagellum için çapa alanı olan bazal gövdedir. Bazal cisimler, dokuz periferik mikrotübül üçlüsü ile centrioles'inkine benzer bir alt yapıya sahiptir (görüntünün alt merkezindeki yapıya bakınız).
    Başın arkasında radyasyon tedavisi yüz kızarıklığına neden olur mu Radyasyon Terapisinin En Yaygın Yan Etkileri. Cilt reaksiyonu: Radyasyon tedavisinin yaygın bir yan etkisi, tedavi edilen vücut bölgesinde cilt tahrişidir. Cilt reaksiyonu, hafif kızarıklık ve kuruluktan (güneş yanığına benzer) bazı hastalarda cildin şiddetli soyulmasına (desquamation) kadar değişebilir. Bu açıklama amfizemi işaret edebilir. Bu, sigara içme geçmişiniz varsa daha da muhtemeldir. Radyasyon terapisi bilinen nedenlerden biri değildir. Bu konuda daha fazla cevap almak ve semptomlarınızı çözmeye yardımcı olmak için bir pulmonologla takip etmenizi isteyeceğim. Umarım bu, sorgunuzu tamamen ele alır. Sigara içme geçmişiniz varsa, daha da fazla umut eder. Radyasyon terapisi, bu sorunun çözümüne yardımcı olmanızı ve bu sorunun cevabını takip etmenizi isterim.
  • Loss: pylate.losses.contrastive.Contrastive

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 3e-06
  • num_train_epochs: 1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0000 1 3.619
0.0018 100 2.4315
0.0035 200 1.6119
0.0053 300 1.304
0.0070 400 1.0853
0.0088 500 0.9767
0.0105 600 0.8725
0.0123 700 0.8096
0.0141 800 0.8027
0.0158 900 0.7689
0.0176 1000 0.73
0.0193 1100 0.7424
0.0211 1200 0.754
0.0228 1300 0.716
0.0246 1400 0.6821
0.0263 1500 0.6501
0.0281 1600 0.6927
0.0299 1700 0.6392
0.0316 1800 0.6635
0.0334 1900 0.6913
0.0351 2000 0.6061
0.0369 2100 0.6164
0.0386 2200 0.6423
0.0404 2300 0.6177
0.0422 2400 0.6503
0.0439 2500 0.5883
0.0457 2600 0.6803
0.0474 2700 0.6367
0.0492 2800 0.5826
0.0509 2900 0.605
0.0527 3000 0.6439
0.0545 3100 0.5931
0.0562 3200 0.5527
0.0580 3300 0.5603
0.0597 3400 0.6135
0.0615 3500 0.5449
0.0632 3600 0.5591
0.0650 3700 0.5143
0.0667 3800 0.5616
0.0685 3900 0.5587
0.0703 4000 0.5455
0.0720 4100 0.59
0.0738 4200 0.5328
0.0755 4300 0.5635
0.0773 4400 0.5446
0.0790 4500 0.5268
0.0808 4600 0.5113
0.0826 4700 0.5601
0.0843 4800 0.5118
0.0861 4900 0.5096
0.0878 5000 0.5721
0.0896 5100 0.5159
0.0913 5200 0.5057
0.0931 5300 0.5588
0.0948 5400 0.5312
0.0966 5500 0.5405
0.0984 5600 0.4901
0.1001 5700 0.4862
0.1019 5800 0.4865
0.1036 5900 0.5623
0.1054 6000 0.5505
0.1071 6100 0.5264
0.1089 6200 0.5081
0.1107 6300 0.5181
0.1124 6400 0.5189
0.1142 6500 0.4838
0.1159 6600 0.4445
0.1177 6700 0.4773
0.1194 6800 0.4977
0.1212 6900 0.5432
0.1230 7000 0.4468
0.1247 7100 0.5046
0.1265 7200 0.5403
0.1282 7300 0.4603
0.1300 7400 0.5605
0.1317 7500 0.5327
0.1335 7600 0.4809
0.1352 7700 0.4764
0.1370 7800 0.475
0.1388 7900 0.4966
0.1405 8000 0.4853
0.1423 8100 0.5495
0.1440 8200 0.4416
0.1458 8300 0.4711
0.1475 8400 0.4707
0.1493 8500 0.4964
0.1511 8600 0.496
0.1528 8700 0.4244
0.1546 8800 0.4495
0.1563 8900 0.5045
0.1581 9000 0.4765
0.1598 9100 0.457
0.1616 9200 0.4489
0.1634 9300 0.4894
0.1651 9400 0.4661
0.1669 9500 0.4305
0.1686 9600 0.4714
0.1704 9700 0.4451
0.1721 9800 0.4976
0.1739 9900 0.3982
0.1756 10000 0.4563
0.1774 10100 0.5073
0.1792 10200 0.4425
0.1809 10300 0.4225
0.1827 10400 0.4651
0.1844 10500 0.4615
0.1862 10600 0.451
0.1879 10700 0.4564
0.1897 10800 0.4465
0.1915 10900 0.4763
0.1932 11000 0.4234
0.1950 11100 0.4708
0.1967 11200 0.4921
0.1985 11300 0.3989
0.2002 11400 0.4292
0.2020 11500 0.4408
0.2038 11600 0.4336
0.2055 11700 0.4584
0.2073 11800 0.4263
0.2090 11900 0.449
0.2108 12000 0.4176
0.2125 12100 0.4277
0.2143 12200 0.449
0.2160 12300 0.4344
0.2178 12400 0.448
0.2196 12500 0.4215
0.2213 12600 0.4207
0.2231 12700 0.4515
0.2248 12800 0.4721
0.2266 12900 0.3955
0.2283 13000 0.3967
0.2301 13100 0.4249
0.2319 13200 0.3995
0.2336 13300 0.4344
0.2354 13400 0.4372
0.2371 13500 0.409
0.2389 13600 0.4563
0.2406 13700 0.4774
0.2424 13800 0.4741
0.2442 13900 0.4025
0.2459 14000 0.4317
0.2477 14100 0.4447
0.2494 14200 0.4075
0.2512 14300 0.3978
0.2529 14400 0.4557
0.2547 14500 0.5149
0.2564 14600 0.5241
0.2582 14700 0.4658
0.2600 14800 0.4291
0.2617 14900 0.4024
0.2635 15000 0.385
0.2652 15100 0.419
0.2670 15200 0.4326
0.2687 15300 0.3958
0.2705 15400 0.4686
0.2723 15500 0.391
0.2740 15600 0.3902
0.2758 15700 0.4507
0.2775 15800 0.4086
0.2793 15900 0.4593
0.2810 16000 0.3708
0.2828 16100 0.3855
0.2845 16200 0.4328
0.2863 16300 0.4165
0.2881 16400 0.4213
0.2898 16500 0.4252
0.2916 16600 0.4378
0.2933 16700 0.3989
0.2951 16800 0.4109
0.2968 16900 0.3761
0.2986 17000 0.4226
0.3004 17100 0.3868
0.3021 17200 0.3842
0.3039 17300 0.3906
0.3056 17400 0.4098
0.3074 17500 0.4119
0.3091 17600 0.4069
0.3109 17700 0.4371
0.3127 17800 0.4051
0.3144 17900 0.4056
0.3162 18000 0.4019
0.3179 18100 0.4082
0.3197 18200 0.4271
0.3214 18300 0.327
0.3232 18400 0.4342
0.3249 18500 0.3808
0.3267 18600 0.3694
0.3285 18700 0.4273
0.3302 18800 0.4286
0.3320 18900 0.3844
0.3337 19000 0.3815
0.3355 19100 0.3941
0.3372 19200 0.3767
0.3390 19300 0.3954
0.3408 19400 0.3917
0.3425 19500 0.3736
0.3443 19600 0.3689
0.3460 19700 0.4003
0.3478 19800 0.3904
0.3495 19900 0.4236
0.3513 20000 0.3917
0.3531 20100 0.3587
0.3548 20200 0.3901
0.3566 20300 0.4165
0.3583 20400 0.3609
0.3601 20500 0.4269
0.3618 20600 0.3774
0.3636 20700 0.4004
0.3653 20800 0.3942
0.3671 20900 0.4091
0.3689 21000 0.3615
0.3706 21100 0.4067
0.3724 21200 0.3496
0.3741 21300 0.3757
0.3759 21400 0.4088
0.3776 21500 0.4301
0.3794 21600 0.3786
0.3812 21700 0.4224
0.3829 21800 0.4049
0.3847 21900 0.3983
0.3864 22000 0.3848
0.3882 22100 0.3807
0.3899 22200 0.3476
0.3917 22300 0.4042
0.3935 22400 0.3554
0.3952 22500 0.409
0.3970 22600 0.3966
0.3987 22700 0.3726
0.4005 22800 0.3709
0.4022 22900 0.3839
0.4040 23000 0.3556
0.4057 23100 0.3789
0.4075 23200 0.3793
0.4093 23300 0.3772
0.4110 23400 0.3775
0.4128 23500 0.3532
0.4145 23600 0.414
0.4163 23700 0.3801
0.4180 23800 0.4054
0.4198 23900 0.3479
0.4216 24000 0.4083
0.4233 24100 0.36
0.4251 24200 0.3935
0.4268 24300 0.3503
0.4286 24400 0.3264
0.4303 24500 0.4038
0.4321 24600 0.3518
0.4339 24700 0.3463
0.4356 24800 0.3509
0.4374 24900 0.3425
0.4391 25000 0.3336
0.4409 25100 0.4048
0.4426 25200 0.3399
0.4444 25300 0.3854
0.4461 25400 0.3817
0.4479 25500 0.3507
0.4497 25600 0.3793
0.4514 25700 0.3534
0.4532 25800 0.3893
0.4549 25900 0.3581
0.4567 26000 0.3576
0.4584 26100 0.3706
0.4602 26200 0.3781
0.4620 26300 0.3886
0.4637 26400 0.3205
0.4655 26500 0.3832
0.4672 26600 0.4126
0.4690 26700 0.3276
0.4707 26800 0.3718
0.4725 26900 0.4142
0.4742 27000 0.3287
0.4760 27100 0.3847
0.4778 27200 0.3567
0.4795 27300 0.372
0.4813 27400 0.384
0.4830 27500 0.3728
0.4848 27600 0.3795
0.4865 27700 0.3653
0.4883 27800 0.3419
0.4901 27900 0.3697
0.4918 28000 0.3441
0.4936 28100 0.3829
0.4953 28200 0.3886
0.4971 28300 0.389
0.4988 28400 0.3833
0.5006 28500 0.3488
0.5024 28600 0.3559
0.5041 28700 0.3922
0.5059 28800 0.3616
0.5076 28900 0.3908
0.5094 29000 0.3875
0.5111 29100 0.3577
0.5129 29200 0.3834
0.5146 29300 0.3792
0.5164 29400 0.3793
0.5182 29500 0.3549
0.5199 29600 0.3363
0.5217 29700 0.3467
0.5234 29800 0.3289
0.5252 29900 0.4189
0.5269 30000 0.3805
0.5287 30100 0.416
0.5305 30200 0.3853
0.5322 30300 0.374
0.5340 30400 0.3798
0.5357 30500 0.3489
0.5375 30600 0.3962
0.5392 30700 0.4032
0.5410 30800 0.3946
0.5428 30900 0.3468
0.5445 31000 0.3582
0.5463 31100 0.3604
0.5480 31200 0.345
0.5498 31300 0.3459
0.5515 31400 0.3461
0.5533 31500 0.3658
0.5550 31600 0.3708
0.5568 31700 0.3546
0.5586 31800 0.3971
0.5603 31900 0.3584
0.5621 32000 0.3197
0.5638 32100 0.3789
0.5656 32200 0.3573
0.5673 32300 0.3439
0.5691 32400 0.3366
0.5709 32500 0.3197
0.5726 32600 0.3508
0.5744 32700 0.4047
0.5761 32800 0.317
0.5779 32900 0.3543
0.5796 33000 0.3923
0.5814 33100 0.346
0.5832 33200 0.3733
0.5849 33300 0.3145
0.5867 33400 0.3408
0.5884 33500 0.4192
0.5902 33600 0.3588
0.5919 33700 0.3377
0.5937 33800 0.3478
0.5954 33900 0.3373
0.5972 34000 0.355
0.5990 34100 0.3262
0.6007 34200 0.3327
0.6025 34300 0.3705
0.6042 34400 0.3229
0.6060 34500 0.3487
0.6077 34600 0.3598
0.6095 34700 0.3499
0.6113 34800 0.3414
0.6130 34900 0.3534
0.6148 35000 0.3292
0.6165 35100 0.3487
0.6183 35200 0.3465
0.6200 35300 0.3653
0.6218 35400 0.3145
0.6236 35500 0.3787
0.6253 35600 0.3302
0.6271 35700 0.3348
0.6288 35800 0.355
0.6306 35900 0.3697
0.6323 36000 0.3532
0.6341 36100 0.3799
0.6358 36200 0.333
0.6376 36300 0.3614
0.6394 36400 0.3268
0.6411 36500 0.3295
0.6429 36600 0.3527
0.6446 36700 0.3267
0.6464 36800 0.3626
0.6481 36900 0.322
0.6499 37000 0.3311
0.6517 37100 0.3336
0.6534 37200 0.3547
0.6552 37300 0.3631
0.6569 37400 0.3328
0.6587 37500 0.3483
0.6604 37600 0.3553
0.6622 37700 0.3419
0.6639 37800 0.368
0.6657 37900 0.3218
0.6675 38000 0.3351
0.6692 38100 0.3906
0.6710 38200 0.3555
0.6727 38300 0.3557
0.6745 38400 0.3411
0.6762 38500 0.329
0.6780 38600 0.3554
0.6798 38700 0.3765
0.6815 38800 0.3867
0.6833 38900 0.3112
0.6850 39000 0.316
0.6868 39100 0.3006
0.6885 39200 0.3202
0.6903 39300 0.3337
0.6921 39400 0.3384
0.6938 39500 0.3845
0.6956 39600 0.3808
0.6973 39700 0.3612
0.6991 39800 0.3269
0.7008 39900 0.3425
0.7026 40000 0.3833
0.7043 40100 0.3548
0.7061 40200 0.3518
0.7079 40300 0.3281
0.7096 40400 0.3627
0.7114 40500 0.3398
0.7131 40600 0.3139
0.7149 40700 0.3155
0.7166 40800 0.341
0.7184 40900 0.3401
0.7202 41000 0.3678
0.7219 41100 0.3134
0.7237 41200 0.32
0.7254 41300 0.3497
0.7272 41400 0.3561
0.7289 41500 0.3501
0.7307 41600 0.3404
0.7325 41700 0.3193
0.7342 41800 0.3517
0.7360 41900 0.3446
0.7377 42000 0.3302
0.7395 42100 0.3384
0.7412 42200 0.3506
0.7430 42300 0.3595
0.7447 42400 0.3088
0.7465 42500 0.338
0.7483 42600 0.3416
0.7500 42700 0.3678
0.7518 42800 0.3949
0.7535 42900 0.3258
0.7553 43000 0.342
0.7570 43100 0.3443
0.7588 43200 0.3364
0.7606 43300 0.3707
0.7623 43400 0.3485
0.7641 43500 0.3374
0.7658 43600 0.2852
0.7676 43700 0.3529
0.7693 43800 0.366
0.7711 43900 0.334
0.7729 44000 0.334
0.7746 44100 0.3827
0.7764 44200 0.3711
0.7781 44300 0.3501
0.7799 44400 0.3291
0.7816 44500 0.3249
0.7834 44600 0.3402
0.7851 44700 0.3452
0.7869 44800 0.3606
0.7887 44900 0.3503
0.7904 45000 0.3513
0.7922 45100 0.3245
0.7939 45200 0.3252
0.7957 45300 0.332
0.7974 45400 0.3306
0.7992 45500 0.3038
0.8010 45600 0.3345
0.8027 45700 0.343
0.8045 45800 0.3069
0.8062 45900 0.3775
0.8080 46000 0.3268
0.8097 46100 0.3168
0.8115 46200 0.2599
0.8133 46300 0.2762
0.8150 46400 0.3322
0.8168 46500 0.3384
0.8185 46600 0.3319
0.8203 46700 0.3151
0.8220 46800 0.3132
0.8238 46900 0.3474
0.8255 47000 0.3414
0.8273 47100 0.3143
0.8291 47200 0.3334
0.8308 47300 0.3312
0.8326 47400 0.322
0.8343 47500 0.3121
0.8361 47600 0.3206
0.8378 47700 0.3384
0.8396 47800 0.3505
0.8414 47900 0.3309
0.8431 48000 0.3456
0.8449 48100 0.3759
0.8466 48200 0.3352
0.8484 48300 0.3063
0.8501 48400 0.3239
0.8519 48500 0.3247
0.8536 48600 0.316
0.8554 48700 0.3099
0.8572 48800 0.3655
0.8589 48900 0.3145
0.8607 49000 0.3206
0.8624 49100 0.3528
0.8642 49200 0.3615
0.8659 49300 0.3213
0.8677 49400 0.3162
0.8695 49500 0.3326
0.8712 49600 0.321
0.8730 49700 0.2965
0.8747 49800 0.3473
0.8765 49900 0.2954
0.8782 50000 0.3059
0.8800 50100 0.3537
0.8818 50200 0.3537
0.8835 50300 0.3764
0.8853 50400 0.2796
0.8870 50500 0.3295
0.8888 50600 0.3075
0.8905 50700 0.3451
0.8923 50800 0.345
0.8940 50900 0.3299
0.8958 51000 0.3451
0.8976 51100 0.3225
0.8993 51200 0.3458
0.9011 51300 0.3225
0.9028 51400 0.3429
0.9046 51500 0.3253
0.9063 51600 0.3442
0.9081 51700 0.314
0.9099 51800 0.314
0.9116 51900 0.362
0.9134 52000 0.3216
0.9151 52100 0.3273
0.9169 52200 0.3118
0.9186 52300 0.3297
0.9204 52400 0.3391
0.9222 52500 0.3739
0.9239 52600 0.3481
0.9257 52700 0.3357
0.9274 52800 0.3202
0.9292 52900 0.3445
0.9309 53000 0.3548
0.9327 53100 0.3385
0.9344 53200 0.3264
0.9362 53300 0.3479
0.9380 53400 0.3402
0.9397 53500 0.3075
0.9415 53600 0.3269
0.9432 53700 0.2983
0.9450 53800 0.3265
0.9467 53900 0.3615
0.9485 54000 0.3459
0.9503 54100 0.3278
0.9520 54200 0.3496
0.9538 54300 0.333
0.9555 54400 0.3234
0.9573 54500 0.3302
0.9590 54600 0.3425
0.9608 54700 0.3263
0.9626 54800 0.3454
0.9643 54900 0.3443
0.9661 55000 0.3343
0.9678 55100 0.3204
0.9696 55200 0.3089
0.9713 55300 0.3572
0.9731 55400 0.3134
0.9748 55500 0.3189
0.9766 55600 0.3195
0.9784 55700 0.3498
0.9801 55800 0.3635
0.9819 55900 0.3368
0.9836 56000 0.3309
0.9854 56100 0.3437
0.9871 56200 0.3375
0.9889 56300 0.331
0.9907 56400 0.3245
0.9924 56500 0.3188
0.9942 56600 0.2976
0.9959 56700 0.3017
0.9977 56800 0.3497
0.9994 56900 0.359

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • PyLate: 1.3.4
  • Transformers: 4.56.2
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084"
}

PyLate

@misc{PyLate,
title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
author={Chaffin, Antoine and Sourty, Raphaël},
url={https://github.com/lightonai/pylate},
year={2024}
}
Downloads last month
14
Safetensors
Model size
31.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for newmindai/col-ettin-encoder-32M-TR

Finetuned
(1)
this model

Dataset used to train newmindai/col-ettin-encoder-32M-TR

Collection including newmindai/col-ettin-encoder-32M-TR