YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ColGemma3 Base Model

This model is based on google/gemma-3-4b-it using the ColGemma3 architecture.

Model Details

  • Base Model: google/gemma-3-4b-it
  • Model Type: ColGemma3 (Multi-vector late interaction retrieval)
  • Embedding Dimension: 128
  • Projection Layer: Randomly initialized (requires training)

Architecture

ColGemma3 extends Gemma3Model for late interaction retrieval:

  • Custom projection layer: hidden_size โ†’ 128 dimensions
  • L2 normalization per token
  • MaxSim scoring for multi-vector retrieval
  • Supports both image and text inputs

Note: The projection layer (custom_text_proj) is randomly initialized and needs to be trained for actual use.

Usage

import torch
from colpali_engine.models import ColGemma3, ColGemmaProcessor3

# Load model and processor
model = ColGemma3.from_pretrained(
    "Nayana-cognitivelab/colgemma",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = ColGemmaProcessor3.from_pretrained("Nayana-cognitivelab/colgemma")

# Process images
images = [Image.open("doc.png")]
batch_images = processor.process_images(images).to(model.device)

# Process queries
queries = ["What is this document about?"]
batch_queries = processor.process_queries(queries).to(model.device)

# Generate multi-vector embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)  # (batch, seq_len, 128)
    query_embeddings = model(**batch_queries)  # (batch, seq_len, 128)

# Compute similarity scores using MaxSim
scores = processor.score(query_embeddings, image_embeddings)
print(scores)  # (num_queries, num_images)

Training

This model serves as a starting point for ColGemma3 training. The projection layer needs to be trained on your retrieval task.

Citation

@misc{colpali2024,
    title={ColPali: Efficient Document Retrieval with Vision Language Models},
    author={Manuel Faysse et al.},
    year={2024},
}
Downloads last month
18
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support