Instructions to use DeepPavlov/rubert-base-cased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DeepPavlov/rubert-base-cased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="DeepPavlov/rubert-base-cased")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("DeepPavlov/rubert-base-cased") model = AutoModel.from_pretrained("DeepPavlov/rubert-base-cased") - Inference
- Notebooks
- Google Colab
- Kaggle
metadata
language:
- ru
rubert-base-cased
RuBERT (Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERT‑base as an initialization for RuBERT[1].
08.11.2021: upload model with MLM and NSP heads
[1]: Kuratov, Y., Arkhipov, M. (2019). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint arXiv:1905.07213.