Generate text based on images and text input
demo of a collection of qwen3-vl models
Chat with multimodal gemma-3-12b-it or gemma-3-4b-it models