qwen3_vl_8b_grpo_sld

Model Description

Qwen3-VL-8B fine-tuned with GRPO for SLD (Single-Line Diagram) component detection

Base Model: unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit Training Method: GRPO (Group Relative Policy Optimization) Task: SLD component detection and localization

Training Details

Training Framework

  • Method: GRPO with Unsloth
  • LoRA Configuration:
    • Rank (r): 16
    • Alpha: 16
    • Dropout: 0
    • Target Modules: Attention and MLP layers

Training Data

  • Dataset: SLD component detection dataset
  • Format: Component bounding boxes with metadata
  • Components: Electrical panels (TSS, PSU-P, PP-series) with voltage and ampere ratings

Training Parameters

  • Group Size: 8 trajectories per example
  • Batch Size: 2
  • Learning Rate: 5e-6
  • Temperature: 0.7
  • Reward Functions:
    • IoU with ground truth
    • Efficiency (fewer steps)
    • Centering (component in crop center)

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "qwen3_vl_8b_grpo_sld",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained("qwen3_vl_8b_grpo_sld", trust_remote_code=True)

# Prepare inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/sld_diagram.png"},
            {"type": "text", "text": "Locate the component TSS in this diagram"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Model Performance

This model was trained using GRPO to optimize for:

  1. Accurate bounding box prediction (IoU score)
  2. Efficient component search (minimal steps)
  3. Centered component detection (component in crop center)

Limitations

  • Trained specifically on electrical SLD diagrams
  • Best performance on components similar to training data
  • Requires high-resolution input images for accurate detection

Citation

@misc{qwen3_vl_grpo_sld,
  title = {qwen3_vl_8b_grpo_sld},
  author = {SLD Training Team},
  year = {2024},
  note = {GRPO-trained Qwen3-VL-8B for SLD component detection}
}

License

Apache 2.0

Downloads last month
94
Safetensors
Model size
9B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for pavan01729/qwen3_vl_8b_grpo_sld