Sentence Selection ORPO LoRA

Fine-tuned LoRA adapter for sentence selection in debate contexts.

Training Details

  • Base Model: Qwen3-30B-A3B (via SFT fine-tuned version)
  • Method: ORPO (Odds Ratio Preference Optimization)
  • Task: Select relevant sentence IDs from academic texts to support claims
  • F1 Score: 0.247 on holdout set

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
model = PeftModel.from_pretrained(base_model, "debaterhub/sentence-selection-orpo-lora")

Key Findings

  1. Format-consistent DPO pairs essential for ORPO training
  2. 2 epochs optimal (more causes overfitting)
  3. Noise augmentation fixes positional bias in training data
Downloads last month
92
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for debaterhub/sentence-selection-orpo-lora

Finetuned
Qwen/Qwen3-30B-A3B
Adapter
(30)
this model