Paul Jones's picture

10 1 1

Paul Jones

pauljones0

AI & ML interests

the password is cheese

Recent Activity

liked a model about 2 months ago

OmniSVG/OmniSVG

commented on a paper 9 months ago

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

commented on a paper 10 months ago

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

View all activity

Organizations

None yet

upvoted a paper 10 months ago

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Paper • 2502.19414 • Published Feb 26, 2025 • 20