YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ColGemma3 Base Model

This model is based on google/gemma-3-4b-it using the ColGemma3 architecture.

Model Details

Base Model: google/gemma-3-4b-it
Model Type: ColGemma3 (Multi-vector late interaction retrieval)
Embedding Dimension: 128
Projection Layer: Randomly initialized (requires training)

Architecture

ColGemma3 extends Gemma3Model for late interaction retrieval:

Custom projection layer: hidden_size → 128 dimensions
L2 normalization per token
MaxSim scoring for multi-vector retrieval
Supports both image and text inputs

Note: The projection layer (custom_text_proj) is randomly initialized and needs to be trained for actual use.

Usage

import torch
from colpali_engine.models import ColGemma3, ColGemmaProcessor3

# Load model and processor
model = ColGemma3.from_pretrained(
    "Nayana-cognitivelab/colgemma",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = ColGemmaProcessor3.from_pretrained("Nayana-cognitivelab/colgemma")

# Process images
images = [Image.open("doc.png")]
batch_images = processor.process_images(images).to(model.device)

# Process queries
queries = ["What is this document about?"]
batch_queries = processor.process_queries(queries).to(model.device)

# Generate multi-vector embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)  # (batch, seq_len, 128)
    query_embeddings = model(**batch_queries)  # (batch, seq_len, 128)

# Compute similarity scores using MaxSim
scores = processor.score(query_embeddings, image_embeddings)
print(scores)  # (num_queries, num_images)

Training

This model serves as a starting point for ColGemma3 training. The projection layer needs to be trained on your retrieval task.

Citation

@misc{colpali2024,
    title={ColPali: Efficient Document Retrieval with Vision Language Models},
    author={Manuel Faysse et al.},
    year={2024},
}

Downloads last month: 18

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support