Update README.md
Browse files
README.md
CHANGED
|
@@ -9,5 +9,20 @@ tags:
|
|
| 9 |
datasets:
|
| 10 |
- shibing624/nli_zh
|
| 11 |
pipeline_tag: sentence-similarity
|
|
|
|
| 12 |
---
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
datasets:
|
| 10 |
- shibing624/nli_zh
|
| 11 |
pipeline_tag: sentence-similarity
|
| 12 |
+
|
| 13 |
---
|
| 14 |
+
简介:
|
| 15 |
+
参考 https://github.com/shibing624/text2vec
|
| 16 |
+
基于Cosent模型架构,使用hfl/chinese-roberta-wwm-ext作为基座模型,在中文STS-B数据集上重新微调训练,将max_seq_length从原有的128扩展到了512
|
| 17 |
+
eval_spearman:0.833
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
下游任务:
|
| 21 |
+
基于text2vec库或sentence-transformer库均可调用。
|
| 22 |
+
文本向量表征:
|
| 23 |
+
```
|
| 24 |
+
>>> from text2vec import SentenceModel, EncoderType
|
| 25 |
+
>>> model = SentenceModel('EricLee/text2vec-roberta-512', encoder_type=EncoderType.FIRST_LAST_AVG, max_seq_length=512)
|
| 26 |
+
>>> model.encode("今天天气不错啊")
|
| 27 |
+
Embedding shape: (768,)
|
| 28 |
+
```
|