YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
ColGemma3 Base Model
This model is based on google/gemma-3-4b-it using the ColGemma3 architecture.
Model Details
- Base Model: google/gemma-3-4b-it
- Model Type: ColGemma3 (Multi-vector late interaction retrieval)
- Embedding Dimension: 128
- Projection Layer: Randomly initialized (requires training)
Architecture
ColGemma3 extends Gemma3Model for late interaction retrieval:
- Custom projection layer: hidden_size โ 128 dimensions
- L2 normalization per token
- MaxSim scoring for multi-vector retrieval
- Supports both image and text inputs
Note: The projection layer (custom_text_proj) is randomly initialized and needs to be trained for actual use.
Usage
import torch
from colpali_engine.models import ColGemma3, ColGemmaProcessor3
# Load model and processor
model = ColGemma3.from_pretrained(
"Nayana-cognitivelab/colgemma",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
processor = ColGemmaProcessor3.from_pretrained("Nayana-cognitivelab/colgemma")
# Process images
images = [Image.open("doc.png")]
batch_images = processor.process_images(images).to(model.device)
# Process queries
queries = ["What is this document about?"]
batch_queries = processor.process_queries(queries).to(model.device)
# Generate multi-vector embeddings
with torch.no_grad():
image_embeddings = model(**batch_images) # (batch, seq_len, 128)
query_embeddings = model(**batch_queries) # (batch, seq_len, 128)
# Compute similarity scores using MaxSim
scores = processor.score(query_embeddings, image_embeddings)
print(scores) # (num_queries, num_images)
Training
This model serves as a starting point for ColGemma3 training. The projection layer needs to be trained on your retrieval task.
Citation
@misc{colpali2024,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse et al.},
year={2024},
}
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support