DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published 8 days ago • 53
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14 • 84
Learning to Identify Critical States for Reinforcement Learning from Videos Paper • 2308.07795 • Published Aug 15, 2023 • 7
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records Paper • 2308.14089 • Published Aug 27, 2023 • 30