eltorio/ROCO-radiology
Viewer • Updated • 81.8k • 809 • 31
A specialized vision-language model for radiology, fine-tuned on the ROCO dataset.
This model aligns medical images (X-rays, CTs, MRIs) with their textual descriptions, enabling zero-shot classification and semantic search for radiology concepts.
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
model = CLIPModel.from_pretrained("spicy03/CLIP-ROCO-v1")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
# Predict
image = Image.open("chest_xray.jpg")
labels = ["Pneumonia", "Normal", "Edema"]
inputs = processor(text=labels, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
probs = outputs.logits_per_image.softmax(dim=1)
print(probs)
Base model
openai/clip-vit-base-patch32