Running on CPU Upgrade Featured 3k The Smol Training Playbook 📚 3k The secrets to building world-class LLMs
LLM-in-Sandbox Elicits General Agentic Intelligence Paper • 2601.16206 • Published 29 days ago • 84
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 29 days ago • 90
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition Paper • 2507.18130 • Published Jul 24, 2025 • 1
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol Paper • 2510.01260 • Published Sep 25, 2025 • 3
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 176
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published Aug 20, 2025 • 43