qanthony commited on
Commit
9a6b5e0
·
verified ·
1 Parent(s): 77cec35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -2,9 +2,11 @@
2
  license: apache-2.0
3
  ---
4
 
 
 
5
  This is a direct preference optimization (DPO) model produced by:
6
  1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
7
- 2. Loading the model into the [GPT-NeoX library](https://github.com/EleutherAI/gpt-neox) and running DPO on the Zephyr 7B recipe (ultrafeedback). The full instructions (config, settings, etc) for reproducing this model in GPT-NeoX is at: TODO
8
 
9
  | Model | gsm8k 5-shot flexible-extract | MMLU 5-shot acc | ARC Challenge 25-shot acc_norm | HellaSwag 10-shot acc_norm | Winogrande 5-shot acc | TruthfulQA mc2 0-shot acc |
10
  |-------|-------------------------------|-----------------|--------------------------------|----------------------------|------------------------|---------------------------|
 
2
  license: apache-2.0
3
  ---
4
 
5
+ [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)!
6
+
7
  This is a direct preference optimization (DPO) model produced by:
8
  1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
9
+ 2. Loading the model into the [GPT-NeoX library](https://github.com/EleutherAI/gpt-neox) and running DPO on the Zephyr 7B recipe (ultrafeedback). Example usage for running post-training in GPT-NeoX is at: https://github.com/EleutherAI/gpt-neox/tree/main/post-training
10
 
11
  | Model | gsm8k 5-shot flexible-extract | MMLU 5-shot acc | ARC Challenge 25-shot acc_norm | HellaSwag 10-shot acc_norm | Winogrande 5-shot acc | TruthfulQA mc2 0-shot acc |
12
  |-------|-------------------------------|-----------------|--------------------------------|----------------------------|------------------------|---------------------------|