GPT-NeoX now supports preference learning (SFT, DPO, KTO)! For more information on this joint effort between EleutherAI and SynthLabs, view our associated blog posts:

SynthLabs: https://www.synthlabs.ai/blog/rlhf-and-rlaif-in-gpt-neox

EleutherAI: https://www.eleuther.ai/rlhf-and-rlaif-in-gpt-neox

This is a direct preference optimization (DPO) model produced by:

  1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
  2. Loading the model into the GPT-NeoX library and running DPO on the Zephyr 7B recipe (ultrafeedback). Example usage for running post-training in GPT-NeoX is at: https://github.com/EleutherAI/gpt-neox/tree/main/post-training
Model gsm8k 5-shot flexible-extract MMLU 5-shot acc ARC Challenge 25-shot acc_norm HellaSwag 10-shot acc_norm Winogrande 5-shot acc TruthfulQA mc2 0-shot acc
NeoX DPO from Zephyr-SFT 64.1 41.8 60 63.2 85.2 79.2
Zephyr-7b-Beta 62.5 34.3 59.8 63.6 84.4 77.6
Downloads last month
43
Safetensors
Model size
7B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support