MixEval

community

https://mixeval.github.io/

NiJinjie

Psycoy

AI & ML interests

LLM & LMM evaluation

Recent Activity

yuexiang96 authored a paper 8 days ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

yuexiang96 authored a paper 8 days ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuexiang96 authored a paper 8 days ago

Simulating Environments with Reasoning Models for Agent Training

View all activity

models 0

None public yet

datasets 2

MixEval/MixEval-X

Viewer • Updated Feb 15 • 7.68k • 415 • 10

MixEval/MixEval

Viewer • Updated Sep 27, 2024 • 5k • 424 • 24