DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents Paper • 2510.19336 • Published Oct 22, 2025 • 16
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published Jun 12, 2025 • 73
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1, 2025 • 95
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1, 2025 • 67
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens Paper • 2503.08377 • Published Mar 11, 2025 • 2
PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control Paper • 2412.01223 • Published Dec 2, 2024 • 1