OpenCompass

community

https://opencompass.org.cn/

OpenCompassX

open-compass

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

nebulae09 authored a paper 2 days ago

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

nebulae09 authored a paper 2 days ago

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

nebulae09 authored a paper 2 days ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

View all activity

Papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

View all Papers

nebulae09

authored 3 papers 2 days ago

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

Paper • 2509.24709 • Published Sep 29 • 6

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published 19 days ago • 15

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Paper • 2512.05111 • Published 3 days ago • 40

yuhangzang

authored a paper 2 days ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Paper • 2512.05111 • Published 3 days ago • 40

Sudanl

authored a paper 4 days ago

How Far Are We from Genuinely Useful Deep Research Agents?

Paper • 2512.01948 • Published 6 days ago • 50

jnanliu

authored 2 papers 5 days ago

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Paper • 2511.08487 • Published 26 days ago • 2

Rectifying LLM Thought from Lens of Optimization

Paper • 2512.01925 • Published 6 days ago • 23

Sudanl

updated 3 models 11 days ago

opencompass/CompassVerifier-32B

33B • Updated 11 days ago • 29 • 7

opencompass/CompassVerifier-7B

8B • Updated 11 days ago • 948 • 4

opencompass/CompassVerifier-3B

3B • Updated 11 days ago • 736 • 5

vansin

in opencompass/RISEBench_Gallery 12 days ago

cpu quota limit,can't start

#1 opened 12 days ago by

yuhangzang

authored 2 papers 13 days ago

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation

Paper • 2510.11063 • Published Oct 13 • 1

Think Visually, Reason Textually: Vision-Language Synergy in ARC

Paper • 2511.15703 • Published 18 days ago • 8

Sudanl

authored a paper 16 days ago

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Paper • 2511.08487 • Published 26 days ago • 2

jnanliu

authored a paper 17 days ago

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published 19 days ago • 15

Sudanl

updated a Space 18 days ago

ATLAS Benchmark

ATLAS for Frontier Scientific Benchmark

Shz

published a dataset 18 days ago

opencompass/ATLAS

Viewer • Updated 18 days ago • 798 • 52

Shz

updated a dataset 18 days ago

opencompass/ATLAS

Viewer • Updated 18 days ago • 798 • 52

Sudanl

authored a paper 18 days ago

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published 19 days ago • 15

Sudanl

published a Space 19 days ago

ATLAS Benchmark

ATLAS for Frontier Scientific Benchmark