Text Classification
Transformers
PyTorch
English
deberta-v2
rlhf
Eval Results (legacy)
text-embeddings-inference
Instructions to use sileod/deberta-v3-large-tasksource-rlhf-reward-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sileod/deberta-v3-large-tasksource-rlhf-reward-model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="sileod/deberta-v3-large-tasksource-rlhf-reward-model")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("sileod/deberta-v3-large-tasksource-rlhf-reward-model") model = AutoModelForSequenceClassification.from_pretrained("sileod/deberta-v3-large-tasksource-rlhf-reward-model") - Notebooks
- Google Colab
- Kaggle
metadata
datasets:
- Anthropic/hh-rlhf
language:
- en
tags:
- rlhf
model-index:
- name: deberta-v3-large-tasksource-rlhf-reward-model
results:
- task:
type: text-classification
name: RLHF
dataset:
type: rlhf
name: Anthropic/hh-rlhf
split: validation
metrics:
- type: accuracy
value: 0,7516
verified: true
Reward model based deberta-v3-large-tasksource-nli fine-tuned on Anthropic/hh-rlhf
For 1 epoch with 1e-5 learning rate.
The data are described in the paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for OpenAssistant/reward-model-deberta-v3-large-v2).