DiVE-k QWEN2.5-7B (CUB)

Overview

DiVE-k QWEN2.5-7B-CUB is a vision-language model fine-tuned using DiVE-k (Differential Visual Reasoning using Top-k Generations) on a fine-grained visual recognition domain (e.g., CUB).

DiVE-k reformulates fine-grained image classification as a differential reasoning problem. Instead of training the model to predict a single label, it leverages the model’s own top-k predictions to construct a multiple-choice reasoning task. The model is then trained using reinforcement learning to select the correct answer among visually similar candidates, encouraging deeper visual discrimination and reasoning.

This approach improves zero-shot and base-to-novel generalization performance by teaching the model to compare subtle visual differences between competing hypotheses.

The training framework, data construction, and evaluation pipeline are described in detail in the DiVE-k repository.

👉 Source code: https://github.com/raja-kumar/DiVE-k

Example Usage

Please refer to the official DiVE-k GitHub repository for:

Model loading
Inference pipelines
Fine-grained classification setup
Training and evaluation scripts

👉 https://github.com/raja-kumar/DiVE-k

Citation

If you use this model or the DiVE-k framework in your research, please cite:

@misc{kumar2025divekdifferentialvisualreasoning,
  title={DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition},
  author={Raja Kumar and Arka Sadhu and Ram Nevatia},
  year={2025},
  eprint={2511.18305},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Downloads last month: 24

Safetensors

Model size

8B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Collection including raja-kumar/DiVE-k-QWEN2.5-7B-CUB

DiVE-k

Collection

Models and datasets for the paper DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition • 4 items • Updated 26 days ago • 1

Paper for raja-kumar/DiVE-k-QWEN2.5-7B-CUB

DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition

Paper • 2511.18305 • Published Nov 23, 2025