Update README.md
Browse files
README.md
CHANGED
|
@@ -2,9 +2,11 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
|
|
|
|
|
|
| 5 |
This is a direct preference optimization (DPO) model produced by:
|
| 6 |
1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
|
| 7 |
-
2. Loading the model into the [GPT-NeoX library](https://github.com/EleutherAI/gpt-neox) and running DPO on the Zephyr 7B recipe (ultrafeedback).
|
| 8 |
|
| 9 |
| Model | gsm8k 5-shot flexible-extract | MMLU 5-shot acc | ARC Challenge 25-shot acc_norm | HellaSwag 10-shot acc_norm | Winogrande 5-shot acc | TruthfulQA mc2 0-shot acc |
|
| 10 |
|-------|-------------------------------|-----------------|--------------------------------|----------------------------|------------------------|---------------------------|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
+
[GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)!
|
| 6 |
+
|
| 7 |
This is a direct preference optimization (DPO) model produced by:
|
| 8 |
1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
|
| 9 |
+
2. Loading the model into the [GPT-NeoX library](https://github.com/EleutherAI/gpt-neox) and running DPO on the Zephyr 7B recipe (ultrafeedback). Example usage for running post-training in GPT-NeoX is at: https://github.com/EleutherAI/gpt-neox/tree/main/post-training
|
| 10 |
|
| 11 |
| Model | gsm8k 5-shot flexible-extract | MMLU 5-shot acc | ARC Challenge 25-shot acc_norm | HellaSwag 10-shot acc_norm | Winogrande 5-shot acc | TruthfulQA mc2 0-shot acc |
|
| 12 |
|-------|-------------------------------|-----------------|--------------------------------|----------------------------|------------------------|---------------------------|
|