AlexCuadron/SWE-Bench-Verified-O1-native-tool-calling-reasoning-high-results Viewer • Updated Jan 14, 2025 • 500 • 1.33k • 3
Running on CPU Upgrade 13.8k Open LLM Leaderboard 🏆 13.8k Track, rank and evaluate open LLMs and chatbots