Cosine-Beta-KD-Task

A 1.7B multimodal LLM checkpoint distilled with Cosine-KD + Beta-KD (Task-level uncertainty weighting), built on top of MobileVLM with MobileLLaMA-1.4B-Chat as the language backbone.

This checkpoint corresponds to the Beta-KD (Task) row of the model zoo in Beta-KD: Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models.

Model Details

Item Value
Architecture MobileVLM (CLIP visual encoder + LDP projector + MobileLLaMA LLM)
Language model MobileLLaMA 1.4B
Distillation losses Cosine-KD (logit alignment) + Beta-KD task-level uncertainty loss
Training step checkpoint-18000
Total params ~1.7B
Precision fp16

Evaluation

Evaluated on six standard multimodal benchmarks (no beam search, greedy decoding to match the chat-demo behavior).

Method LLM MMEP MMEA GQA VQAT POPE MMBdev SQAI Avg.
Cosine-KD baseline MobileLLaMA 1.4B 1308.4 65.4 59.9 52.2 84.6 57.1 61.3 63.4
+ Beta-KD (Task) (this model) MobileLLaMA 1.4B 1352.0 67.6 60.8 53.9 85.4 59.1 61.2 64.7
+ Beta-KD (Instance) MobileLLaMA 1.4B 1350.3 67.5 61.2 54.2 86.0 60.2 62.9 65.3

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "jsun39/Cosine-Beta-KD-Task"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float16,
    trust_remote_code=True,
).cuda()

For full inference (image + text), please follow the inference example in the Beta-KD repo — the visual encoder / projector loading, image preprocessing, and chat template are described there.

Files

This repo contains only the files needed for inference:

  • pytorch_model.bin — fp16 weights
  • config.json, generation_config.json
  • tokenizer.model, tokenizer_config.json, special_tokens_map.json

DeepSpeed optimizer / RNG / trainer states are intentionally not uploaded.

Citation

@article{sun2026betakd,
  title   = {Beta-KD: Uncertainty-Aware Knowledge Distillation for Multimodal
             Large Language Models},
  author  = {Sun, Jingchen and Han, Shaobo and Patel, Deep and Kohno, Wataru and Jin, Can and Chen, Changyou},
  journal = {CVPR},
  year    = {2026}
}

License

Released under the Apache-2.0 license, inheriting from MobileVLM and MobileLLaMA. The visual encoder and any third-party data follow their original licenses.

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsun39/Cosine-Beta-KD-Task

Finetuned
(2)
this model

Collection including jsun39/Cosine-Beta-KD-Task

Paper for jsun39/Cosine-Beta-KD-Task