jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 8 days ago • 21
view article Article **ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?** 6 days ago • 16
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 6 days ago • 16
ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models Paper • 2602.16609 • Published 7 days ago • 6
OriOn Collection Visual long document VLMs based on Mistral-Small-3.1-24B-Instruct-2503 and Qwen3-VL-32B-Instruct • 5 items • Updated 6 days ago • 4
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 13 days ago • 46
LateOn-Code 💻 Collection State-of-the-art late interaction code retrieval models • 6 items • Updated 6 days ago • 13
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 82
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 18 items • Updated 26 days ago • 14
view article Article TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval Dec 4, 2025 • 19
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 73
Simple Projection Variants Improve ColBERT Performance Paper • 2510.12327 • Published Oct 14, 2025 • 7
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report Paper • 2510.14880 • Published Oct 16, 2025 • 19
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 Sep 4, 2025 • 273
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15, 2025 • 31
BioClinical ModernBERT Collection This project was a collaboration between members of the Dana-Farber Cancer Institute, LightOn, MIT, OpenEvidence and Microsoft. • 3 items • Updated Sep 9, 2025 • 11