Papers - Multilingual
updated
A Biomedical Entity Extraction Pipeline for Oncology Health Records in
Portuguese
Paper
• 2304.08999
• Published
• 3
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages
Paper
• 2309.09400
• Published
• 87
Robust Open-Vocabulary Translation from Visual Text Representations
Paper
• 2104.08211
• Published
• 1
Poro 34B and the Blessing of Multilinguality
Paper
• 2404.01856
• Published
• 15
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper
• 2404.04167
• Published
• 13
One Wide Feedforward is All You Need
Paper
• 2309.01826
• Published
• 34
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published
• 39
Paper
• 2407.10671
• Published
• 168
Meltemi: The first open Large Language Model for Greek
Paper
• 2407.20743
• Published
• 68
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing
Alignment in LLM-Ops Framework
Paper
• 2407.20729
• Published
• 28
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings
Paper
• 2407.20581
• Published
• 24
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper
• 2308.11466
• Published
• 1
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper
• 2105.13626
• Published
• 4
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
Representation
Paper
• 2103.06874
• Published
• 2