notebook_test / README.md
electroglyph's picture
Upload model
a89bfe1 verified
metadata
language:
  - en
tags:
  - unsloth
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:106628
  - loss:MultipleNegativesRankingLoss
base_model: Alibaba-NLP/gte-modernbert-base
widget:
  - source_sentence: ace-v
    sentences:
      - >-
        The floor plan was drafted at 1/4 inch scale where each quarter inch
        equals one foot.
      - Fingerprint examiners follow the ACE-V methodology for identification.
      - Most modern streaming services offer content in 1080p full HD quality.
  - source_sentence: adult learner
    sentences:
      - The adult learner brings valuable life experience to the classroom.
      - Accounts payable represents money owed to suppliers and vendors.
      - The inspection confirmed all above grade work met code requirements.
  - source_sentence: 1/4 inch scale
    sentences:
      - Precise adjustments require accurate action gauge readings.
      - The quality inspector identified adhesion failure in the sample.
      - >-
        The architect created drawings at 1/4 inch scale for the client
        presentation.
  - source_sentence: acrylic paint
    sentences:
      - Artists prefer acrylic paint for its fast drying time.
      - The company reported strong adjusted EBITDA growth this quarter.
      - The clinic specializes in adolescent health services.
  - source_sentence: adult learning
    sentences:
      - Solar developers calculate AEP, or annual energy production.
      - The course was designed using adult learning best practices.
      - >-
        The wizard cast Abi-Dalzim's horrid wilting, draining moisture from
        enemies.
datasets:
  - electroglyph/technical
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer

This model was finetuned with Unsloth.

based on Alibaba-NLP/gte-modernbert-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base on the technical dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-modernbert-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'adult learning',
    'The course was designed using adult learning best practices.',
    'Solar developers calculate AEP, or annual energy production.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7228, 0.1468],
#         [0.7228, 1.0000, 0.1683],
#         [0.1468, 0.1683, 1.0000]])

Training Details

Training Dataset

technical

  • Dataset: technical at 05eeb90
  • Size: 106,628 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 4.93 tokens
    • max: 12 tokens
    • min: 9 tokens
    • mean: 13.75 tokens
    • max: 25 tokens
  • Samples:
    anchor positive
    .308 The .308 Winchester is a popular rifle cartridge used for hunting and target shooting.
    .308 Many precision rifles are chambered in .308 for its excellent long-range accuracy.
    .308 The sniper selected a .308 caliber round for the mission.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 333
  • learning_rate: 3e-05
  • num_train_epochs: 5
  • lr_scheduler_type: constant_with_warmup
  • warmup_steps: 100
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 333
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1558 50 3.4086
0.3115 100 3.3329
0.4673 150 3.2148
0.6231 200 2.9797
0.7788 250 2.7541
0.9346 300 2.5277
1.0903 350 2.3069
1.2461 400 2.1593
1.4019 450 2.0781
1.5576 500 1.9385
1.7134 550 1.9052
1.8692 600 1.8768
2.0249 650 1.8272
2.1807 700 1.7906
2.3364 750 1.7607
2.4922 800 1.7375
2.6480 850 1.6952
2.8037 900 1.6664
2.9595 950 1.6216
3.1153 1000 1.5601
3.2710 1050 1.571
3.4268 1100 1.5735
3.5826 1150 1.5455
3.7383 1200 1.5577
3.8941 1250 1.5426
4.0498 1300 1.5276
4.2056 1350 1.5178
4.3614 1400 1.4611
4.5171 1450 1.4822
4.6729 1500 1.4987
4.8287 1550 1.4507
4.9844 1600 1.4501

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}