Spaces:

ginigen-ai
/

smol-worldcup

Running

App Files Files Community

Request for Benchmark Evaluation

by ginigen-ai - opened 8 days ago

Discussion

ginigen-ai

Owner 8 days ago

Please enter the name of the model you would like us to evaluate.

PierreLepagnol

8 days ago

LiquidAI/LFM2.5-1.2B-Thinking
LiquidAI/LFM2.5-1.2B-Instruct
LiquidAI/LFM2.5-1.2B-JP
LiquidAI/LFM2.5-1.2B-Base

And

Qwen/Qwen3.5-27B
Qwen/Qwen3.5-9B
Qwen/Qwen3.5-9B-Base
Qwen/Qwen3.5-4B
Qwen/Qwen3.5-4B-Base
Qwen/Qwen3.5-2B
Qwen/Qwen3.5-2B-Base
Qwen/Qwen3.5-0.8B
Qwen/Qwen3.5-0.8B-Base

Thx ! Or is it possible to release eval code to run the experiement ourselves ? 😀

F4tt4nz4Blu

7 days ago

Can you please evaluate the following model:

Alibaba-Apsara/DASD-4B-Thinking

ginigen-ai

Owner 7 days ago

Thank you both for the suggestions!

We've added all requested models to our evaluation queue:

LiquidAI/LFM2.5-1.2B (Instruct & Thinking)
Qwen/Qwen3.5 series (0.8B, 2B, 4B)
Alibaba-Apsara/DASD-4B-Thinking

Priority models will be evaluated and added to the leaderboard
in the coming updates.

On eval code: We're working on a public evaluation pipeline
that cleanly separates the grading logic from answer keys.
Will share when it's ready.

ginigen-ai

Owner 7 days ago

NEW LISTING: Llama-3.2-1B

DedeProGames

1 day ago

Hi! Can u evaluate GRM Family?
OrionLLM/GRM-7b
OrionLLM/GRM-1.5b

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment