Shao's picture

10

Shao

Castielll

·

AI & ML interests

None yet

Organizations

None yet

upvoted 10 papers over 1 year ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 53

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 50

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

Paper • 2406.12428 • Published Jun 18, 2024 • 1

Stable Audio Open

Paper • 2407.14358 • Published Jul 19, 2024 • 26

Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16, 2024 • 27

Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11, 2024 • 16

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 48

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Paper • 2404.05674 • Published Apr 8, 2024 • 15