daichira/structured-3k-mix-sft
Viewer • Updated • 3k • 831
How to use perryhsb/structeval-qwen3-4b-lora-dpo with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "perryhsb/structeval-qwen3-4b-lora-dpo")This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth) with both SFT and DPO training stages.
This repository contains LoRA adapter weights only. The base model must be loaded separately.
This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).
The model outputs structured data directly without Chain-of-Thought reasoning, reducing parse failures and token waste.
SFT Stage:
DPO Stage:
SFT Stage:
DPO Stage:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "perryhsb/structeval-qwen3-4b-lora-dpo"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
All training data sourced from permitted datasets listed in the competition rules. Non-LLM augmentation methods only (regex, format parsers, rule-based conversion).
Base model
Qwen/Qwen3-4B-Instruct-2507