EleutherAI
/

neox_mistral_7b_dpo_ultrafeedback

Model card Files Files and versions

qanthony commited on Sep 17, 2024

Commit

9a6b5e0

·

verified ·

1 Parent(s): 77cec35

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -2,9 +2,11 @@
 license: apache-2.0
 ---
 This is a direct preference optimization (DPO) model produced by:
 1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
-2. Loading the model into the [GPT-NeoX library](https://github.com/EleutherAI/gpt-neox) and running DPO on the Zephyr 7B recipe (ultrafeedback). The full instructions (config, settings, etc) for reproducing this model in GPT-NeoX is at: TODO
 | Model | gsm8k 5-shot flexible-extract | MMLU 5-shot acc | ARC Challenge 25-shot acc_norm | HellaSwag 10-shot acc_norm | Winogrande 5-shot acc | TruthfulQA mc2 0-shot acc |
 |-------|-------------------------------|-----------------|--------------------------------|----------------------------|------------------------|---------------------------|

 license: apache-2.0
 ---
+[GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)!
 This is a direct preference optimization (DPO) model produced by:
 1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
+2. Loading the model into the [GPT-NeoX library](https://github.com/EleutherAI/gpt-neox) and running DPO on the Zephyr 7B recipe (ultrafeedback). Example usage for running post-training in GPT-NeoX is at: https://github.com/EleutherAI/gpt-neox/tree/main/post-training
 | Model | gsm8k 5-shot flexible-extract | MMLU 5-shot acc | ARC Challenge 25-shot acc_norm | HellaSwag 10-shot acc_norm | Winogrande 5-shot acc | TruthfulQA mc2 0-shot acc |
 |-------|-------------------------------|-----------------|--------------------------------|----------------------------|------------------------|---------------------------|