Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published 9 days ago • 67
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published 15 days ago • 86
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published 9 days ago • 79
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published 7 days ago • 33
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published 9 days ago • 135
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 20 days ago • 88
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published 26 days ago • 122
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published 25 days ago • 28
Clinical knowledge in LLMs does not translate to human interactions Paper • 2504.18919 • Published 18 days ago • 24
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published 16 days ago • 90
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 123
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published about 1 month ago • 255
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 160
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 73
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 51