Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
|
|
| 18 |
|
| 19 |
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
|
| 20 |
|
| 21 |
-
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid)
|
| 22 |
|
| 23 |
### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
|
| 24 |
| Model | LC | WR | Avg. Length |
|
|
|
|
| 18 |
|
| 19 |
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
|
| 20 |
|
| 21 |
+
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid).
|
| 22 |
|
| 23 |
### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
|
| 24 |
| Model | LC | WR | Avg. Length |
|