Instructions to use jmzk96/PCSciBERT_uncased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jmzk96/PCSciBERT_uncased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="jmzk96/PCSciBERT_uncased")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("jmzk96/PCSciBERT_uncased") model = AutoModelForMaskedLM.from_pretrained("jmzk96/PCSciBERT_uncased") - Notebooks
- Google Colab
- Kaggle
PCSciBERT_uncased was initiated with the uncased variant of SciBERT (https://huggingface.co/allenai/scibert_scivocab_uncased) and pre-trained on texts from 1,560,661 research articles of the physics and computer science domain in arXiv. The tokenizer for PCSciBERT_uncased uses the same vocabulary from allenai/scibert_scivocab_uncased.
The model was also evaluated on its downstream performance in named entity recognition using the adsabs/WIESP2022-NER and CS-NER (https://github.com/jd-coderepos/contributions-ner-cs/tree/main) dataset. Overall, PCSciBERT_uncased achieved higher micro F1 scores than SciBERT(uncased) for both WIESP (Micro F1: 81.54%) and CS-NER (Micro F1: 75.67%) datasets.
It improves the performance of SciBERT(uncased) on CS-NER test dataset by 0.26% and on WIESP test dataset by 0.8%.
- Downloads last month
- 14