CDS-BART
CDS-BART is designed as an easy-to-use tool, facilitating accessibility for researchers to leverage the development of mRNA vaccines and therapeutics. The model is based on BART and pre-trained with mRNA data contains nine taxonomic groups provided by the NCBI RefSeq database. It is a BART-based foundation model that can be fine-tuned for various mRNA downstream tasks such as mRFP expression, mRNA stability.
Model Description
- Developed by: Jadamba Erkhembayar, Sangheon Lee, Hyunjin Shin, Hyekyoung Lee, Jinhee Hong
- Funded by : Mogam institute for biomedical research
- Model type: BART
- Trained Database: NCBI RefSeq
- License: MIT License
Load tokenizer and model
The example code for loading pre-trained denoising model and tokenzier. BartModel has pre-trained for denoising and sequence representation tasks.
from transformers import (
BartTokenizerFast,
BartModel,
)
# Load tokenizer
tokenizer = BartTokenizerFast.from_pretrained("mogam-ai/CDS-BART-denoising")
# Load pre-trained model
model = BartModel.from_pretrained("mogam-ai/CDS-BART-denoising")
Example code
example_sequences = [
'ACGCGAGCGUCAUUUCGCGGGGCAUAUGUA'
]
encoded = tokenizer(
example_sequences,
max_length=850,
truncation=True,
padding="max_length",
return_tensors="pt"
)
output = model(
input_ids = encoded['input_ids'],
attention_mask = encoded['attention_mask']
)
hidden_states = output.last_hidden_state
Can add more here!
- Maximum length of tokenizer is 850 and maximum mRNA sequence is around 4000nt.
- It is available to extract sequence embeddings from the model.
Model Sources [optional]
- Repository: https://github.com/mogam-ai/CDS-BART
- Paper : More Information can add later
- Downloads last month
- 3