QueEn: A Large Language Model for Quechua-English Translation Paper • 2412.05184 • Published Dec 6, 2024 • 1
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6, 2025 • 126
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 174
ARE: Scaling Up Agent Environments and Evaluations Paper • 2509.17158 • Published Sep 21, 2025 • 35
ReAct: Synergizing Reasoning and Acting in Language Models Paper • 2210.03629 • Published Oct 6, 2022 • 31
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 15
Small Language Models are the Future of Agentic AI Paper • 2506.02153 • Published Jun 2, 2025 • 23
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Paper • 2509.00997 • Published Aug 31, 2025 • 2
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation Paper • 2407.04822 • Published Jul 5, 2024 • 5
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 20
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22, 2025 • 160
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Paper • 2304.08244 • Published Apr 14, 2023 • 1
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? Paper • 2508.01780 • Published Aug 3, 2025 • 20
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published Aug 21, 2025 • 46