SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published 13 days ago • 32
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 21 days ago • 111
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards Paper • 2510.08529 • Published Oct 9, 2025 • 18
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines Paper • 2509.21320 • Published Sep 25, 2025 • 101
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? Paper • 2509.07894 • Published Sep 9, 2025 • 31
Transition Models: Rethinking the Generative Learning Objective Paper • 2509.04394 • Published Sep 4, 2025 • 28
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28, 2025 • 140
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics Paper • 2508.18124 • Published Aug 25, 2025 • 49
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery Paper • 2508.14111 • Published Aug 18, 2025 • 33
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published Jun 12, 2025 • 73
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28, 2025 • 131
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Paper • 2505.16938 • Published May 22, 2025 • 120
Towards Physically Plausible Video Generation via VLM Planning Paper • 2503.23368 • Published Mar 30, 2025 • 40
SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing Paper • 2503.04629 • Published Mar 6, 2025 • 19
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback Paper • 2501.03916 • Published Jan 7, 2025 • 16
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 19