---
language:
- en
tags:
- unsloth
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:106628
- loss:MultipleNegativesRankingLoss
base_model: Alibaba-NLP/gte-modernbert-base
widget:
- source_sentence: ace-v
sentences:
- The floor plan was drafted at 1/4 inch scale where each quarter inch equals one
foot.
- Fingerprint examiners follow the ACE-V methodology for identification.
- Most modern streaming services offer content in 1080p full HD quality.
- source_sentence: adult learner
sentences:
- The adult learner brings valuable life experience to the classroom.
- Accounts payable represents money owed to suppliers and vendors.
- The inspection confirmed all above grade work met code requirements.
- source_sentence: 1/4 inch scale
sentences:
- Precise adjustments require accurate action gauge readings.
- The quality inspector identified adhesion failure in the sample.
- The architect created drawings at 1/4 inch scale for the client presentation.
- source_sentence: acrylic paint
sentences:
- Artists prefer acrylic paint for its fast drying time.
- The company reported strong adjusted EBITDA growth this quarter.
- The clinic specializes in adolescent health services.
- source_sentence: adult learning
sentences:
- Solar developers calculate AEP, or annual energy production.
- The course was designed using adult learning best practices.
- The wizard cast Abi-Dalzim's horrid wilting, draining moisture from enemies.
datasets:
- electroglyph/technical
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer
This model was finetuned with [Unsloth](https://github.com/unslothai/unsloth).
[
](https://github.com/unslothai/unsloth)
based on Alibaba-NLP/gte-modernbert-base
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) on the [technical](https://huggingface.co/datasets/electroglyph/technical) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- [technical](https://huggingface.co/datasets/electroglyph/technical)
- **Language:** en
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'adult learning',
'The course was designed using adult learning best practices.',
'Solar developers calculate AEP, or annual energy production.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7228, 0.1468],
# [0.7228, 1.0000, 0.1683],
# [0.1468, 0.1683, 1.0000]])
```
## Training Details
### Training Dataset
#### technical
* Dataset: [technical](https://huggingface.co/datasets/electroglyph/technical) at [05eeb90](https://huggingface.co/datasets/electroglyph/technical/tree/05eeb90e13d6bca725a5888f1ba206b2878f9c97)
* Size: 106,628 training samples
* Columns: anchor and positive
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string |
| details |
.308 | The .308 Winchester is a popular rifle cartridge used for hunting and target shooting. |
| .308 | Many precision rifles are chambered in .308 for its excellent long-range accuracy. |
| .308 | The sniper selected a .308 caliber round for the mission. |
* Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 333
- `learning_rate`: 3e-05
- `num_train_epochs`: 5
- `lr_scheduler_type`: constant_with_warmup
- `warmup_steps`: 100
- `fp16`: True
- `batch_sampler`: no_duplicates
#### All Hyperparameters