EleutherAI
/

neox_mistral_7b_dpo_ultrafeedback

Model card Files Files and versions

qanthony commited on Sep 17, 2024

Commit

9384d6b

·

verified ·

1 Parent(s): 9a6b5e0

add blog links

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -2,7 +2,11 @@
 license: apache-2.0
 ---
-[GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)!
 This is a direct preference optimization (DPO) model produced by:
 1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta

 license: apache-2.0
 ---
+[GPT-NeoX](https://github.com/EleutherAI/gpt-neox) now supports preference learning (SFT, DPO, KTO)! For more information on this joint effort between EleutherAI and SynthLabs, view our associated blog posts:
+SynthLabs: https://www.synthlabs.ai/blog/rlhf-and-rlaif-in-gpt-neox
+EleutherAI: https://www.eleuther.ai/rlhf-and-rlaif-in-gpt-neox
 This is a direct preference optimization (DPO) model produced by:
 1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta