Pythia models supervised finetuned and DPO finetuned with all of Anthropic-hh-rlhf dataset for 1 epoch.
Laura O'Mahony
lomahony
AI & ML interests
PhD student
Organizations
None yet
pythia-helpful-epoch2
Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for a second epoch.
-
lomahony/pythia-2.8b-helpful-sft-epoch2
Text Generation • 3B • Updated • 9 -
lomahony/pythia-1b-helpful-sft-epoch2
Text Generation • 1B • Updated • 10 -
lomahony/pythia-1.4b-helpful-sft-epoch2
Text Generation • 1B • Updated • 8 -
lomahony/pythia-410m-helpful-sft-epoch2
Text Generation • 0.4B • Updated • 7
Pythia-hh-all-sft-dpo
Pythia models supervised finetuned and DPO finetuned with all of Anthropic-hh-rlhf dataset for 1 epoch.
pythia-helpful-1epoch
Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
pythia-helpful-epoch2
Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for a second epoch.
-
lomahony/pythia-2.8b-helpful-sft-epoch2
Text Generation • 3B • Updated • 9 -
lomahony/pythia-1b-helpful-sft-epoch2
Text Generation • 1B • Updated • 10 -
lomahony/pythia-1.4b-helpful-sft-epoch2
Text Generation • 1B • Updated • 8 -
lomahony/pythia-410m-helpful-sft-epoch2
Text Generation • 0.4B • Updated • 7
Pythia-helpful 3 epochs