395 608

gn00029914

AI & ML interests

None yet

Recent Activity

liked a Space about 14 hours ago

burtenshaw/karpathy-llm-council

upvoted a paper about 14 hours ago

RULER: What's the Real Context Size of Your Long-Context Language Models?

upvoted a paper about 16 hours ago

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

View all activity

Organizations

upvoted a paper about 14 hours ago

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 40

upvoted 3 papers about 16 hours ago

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published 24 days ago • 34

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 31

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Paper • 2504.13173 • Published Apr 17, 2025 • 20

upvoted 5 papers 1 day ago

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published 6 days ago • 130

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17, 2025 • 36

upvoted an article 1 day ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21, 2025

•

208

upvoted 5 papers 1 day ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 162

Learning Transferable Architectures for Scalable Image Recognition

Paper • 1707.07012 • Published Jul 21, 2017 • 1

Densely Connected Convolutional Networks

Paper • 1608.06993 • Published Aug 25, 2016 • 4

CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks

Paper • 2401.14109 • Published Jan 25, 2024 • 11

A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU

Paper • 2305.17473 • Published May 27, 2023 • 1

upvoted 5 papers 2 days ago

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57

Generating Long Sequences with Sparse Transformers

Paper • 1904.10509 • Published Apr 23, 2019 • 3

Short window attention enables long-term memorization

Paper • 2509.24552 • Published Sep 29, 2025 • 4

Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

Paper • 2602.11937 • Published Feb 12 • 3

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Paper • 2411.19146 • Published Nov 28, 2024 • 20

gn00029914

AI & ML interests

Recent Activity

Organizations

gn00029914's activity

SigLIP 2: A better multilingual vision language encoder