add blog links
Browse files
README.md
CHANGED
|
@@ -2,7 +2,11 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
-
[GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)!
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
This is a direct preference optimization (DPO) model produced by:
|
| 8 |
1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
+
[GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)! For more information on this joint effort between EleutherAI and SynthLabs, view our associated blog posts:
|
| 6 |
+
|
| 7 |
+
SynthLabs: https://www.synthlabs.ai/blog/rlhf-and-rlaif-in-gpt-neox
|
| 8 |
+
|
| 9 |
+
EleutherAI: https://www.eleuther.ai/rlhf-and-rlaif-in-gpt-neox
|
| 10 |
|
| 11 |
This is a direct preference optimization (DPO) model produced by:
|
| 12 |
1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
|