wzhouad
/

gemma-2-9b-it-WPO-HB

Text Generation

alignment-handbook

text-generation-inference

Model card Files Files and versions

wzhouad commited on Aug 21, 2024

Commit

5934cb2

·

verified ·

1 Parent(s): 50d192f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
 In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
-We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid)
 ### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
 |                Model                           | LC | WR | Avg. Length |

 In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
+We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid).
 ### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
 |                Model                           | LC | WR | Avg. Length |