EricLee
/

text2vec-roberta-512

Sentence Similarity

feature-extraction

text-embeddings-inference

Model card Files Files and versions

EricLee commited on Jun 6, 2023

Commit

f033a37

·

1 Parent(s): 6d463d2

Update README.md

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -9,5 +9,20 @@ tags:
 datasets:
 - shibing624/nli_zh
 pipeline_tag: sentence-similarity
 ---
-Based on the derivative model of https://huggingface.co/shibing624/text2vec-base-chinese, replace MacBERT with hfl/chinese-roberta-wwm-ext, expand max_seq_length from 128 to 512, and keep other training conditions unchanged。

 datasets:
 - shibing624/nli_zh
 pipeline_tag: sentence-similarity
 ---
+简介：
+参考 https://github.com/shibing624/text2vec
+基于Cosent模型架构，使用hfl/chinese-roberta-wwm-ext作为基座模型，在中文STS-B数据集上重新微调训练，将max_seq_length从原有的128扩展到了512
+eval_spearman:0.833
+---
+下游任务：
+基于text2vec库或sentence-transformer库均可调用。
+文本向量表征：
+```
+>>> from text2vec import SentenceModel, EncoderType
+>>> model = SentenceModel('EricLee/text2vec-roberta-512', encoder_type=EncoderType.FIRST_LAST_AVG, max_seq_length=512)
+>>> model.encode("今天天气不错啊")
+Embedding shape: (768,)
+```