Request for Benchmark Evaluation

#1
by ginigen-ai - opened

Please enter the name of the model you would like us to evaluate.

LiquidAI/LFM2.5-1.2B-Thinking
LiquidAI/LFM2.5-1.2B-Instruct
LiquidAI/LFM2.5-1.2B-JP
LiquidAI/LFM2.5-1.2B-Base

And

Qwen/Qwen3.5-27B
Qwen/Qwen3.5-9B
Qwen/Qwen3.5-9B-Base
Qwen/Qwen3.5-4B
Qwen/Qwen3.5-4B-Base
Qwen/Qwen3.5-2B
Qwen/Qwen3.5-2B-Base
Qwen/Qwen3.5-0.8B
Qwen/Qwen3.5-0.8B-Base

Thx ! Or is it possible to release eval code to run the experiement ourselves ? ๐Ÿ˜€

Can you please evaluate the following model:

Alibaba-Apsara/DASD-4B-Thinking

Thank you both for the suggestions!

We've added all requested models to our evaluation queue:

  • LiquidAI/LFM2.5-1.2B (Instruct & Thinking)
  • Qwen/Qwen3.5 series (0.8B, 2B, 4B)
  • Alibaba-Apsara/DASD-4B-Thinking

Priority models will be evaluated and added to the leaderboard
in the coming updates.

On eval code: We're working on a public evaluation pipeline
that cleanly separates the grading logic from answer keys.
Will share when it's ready.

NEW LISTING: Llama-3.2-1B

Hi! Can u evaluate GRM Family?
OrionLLM/GRM-7b
OrionLLM/GRM-1.5b

Sign up or log in to comment