demo of a collection of multimodal vlms on hf [ocr / others]
Generate new person images with swapped clothes or poses