1 13 4

Ray

Raywithyou

AI & ML interests

None yet

Recent Activity

updated a dataset 3 days ago

table-benchmark/niat

updated a dataset 3 days ago

table-benchmark/tablebench-short

published a dataset 3 days ago

table-benchmark/tablebench-short

View all activity

Organizations

upvoted 2 papers 5 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 316

upvoted 8 papers 6 months ago

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 246

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

Paper • 2507.01001 • Published Jul 1 • 46

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13 • 24

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

Paper • 2506.09003 • Published Jun 10 • 18

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

Paper • 2506.13284 • Published Jun 16 • 26

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 187

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 273

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Paper • 2506.12278 • Published Jun 13 • 16

upvoted a paper 7 months ago

Table-R1: Inference-Time Scaling for Table Reasoning

Paper • 2505.23621 • Published May 29 • 93

upvoted an article 10 months ago

Article

Open R1: Update #3

Mar 11

•

296

upvoted a paper 11 months ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 85

Ray

AI & ML interests

Recent Activity

Organizations

Raywithyou's activity

Open R1: Update #3