SWE-Factory/DeepSWE-Agent-Kimi-K2-Trajectories-Rejection-Sampling Viewer • Updated 17 days ago • 729 • 10
SWE-Factory/DeepSWE-Agent-Kimi-K2-Trajectories-Rejection-Sampling Viewer • Updated 17 days ago • 729 • 10
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations Paper • 2507.22968 • Published Jul 30, 2025 • 24
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks Paper • 2506.10954 • Published Jun 12, 2025 • 52
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Paper • 2505.04606 • Published May 7, 2025 • 9