How to use
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
)
model = AutoModel.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
base_model="genbio-ai/AIDO.RNA-650M-CDS",
)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt")
embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280]
Model Variants
The following base_model options are available for embedding generation. The short name (keys) or the full model name (values) can be specified using the base_model argument.
VARIANTS = {
"aido_rna_1m_mars": "genbio-ai/AIDO.RNA-1M-MARS",
"aido_rna_25m_mars": "genbio-ai/AIDO.RNA-25M-MARS",
"aido_rna_300m_mars": "genbio-ai/AIDO.RNA-300M-MARS",
"aido_rna_650m": "genbio-ai/AIDO.RNA-650M",
"aido_rna_650m_cds": "genbio-ai/AIDO.RNA-650M-CDS",
"aido_rna_1b600m": "genbio-ai/AIDO.RNA-1.6B",
"aido_rna_1b600m_cds": "genbio-ai/AIDO.RNA-1.6B-CDS",
}
Performance Vs Original AIDO.RNA Models
Verify that the modified code produces the same embeddings as the original AIDO.RNA models.
Original AIDO.RNA code snippet:
from modelgenerator.tasks import Embed
import torch
model = Embed.from_config({"model.backbone": "aido_rna_650m"}).eval()
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
transformed_batch = model.transform({"sequences": [dna]})
embedding = model(transformed_batch) # [1, sequence_length, 1280]
embedding_mean = torch.mean(embedding, dim=1)
print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=<MeanBackward0>)
embedding_max = torch.max(embedding, dim=1)[0]
print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=<MeanBackward0>)
Modified code snippet using the wrapper:
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
)
model = AutoModel.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
base_model="genbio-ai/AIDO.RNA-650M",
)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt")
embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280]
embedding_mean = torch.mean(embedding, dim=1)
print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=<MeanBackward0>)
embedding_max = torch.max(embedding, dim=1)[0]
print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=<MeanBackward0>)
License Notice
This repository contains modified versions of GenBio AI code. Modifications include:
- Removal of reliance on modelgenerator package
- Can load specific AIDO.RNA models via the
base_modelargument
Not all of the original functionality may be preserved. These changes were made to better integrate with the mRNABench framework which focuses on embedding generation for mRNA sequences. Most of the required code was directly copied from the original GenBio AI repository with minimal changes, so please refer to the original repository for full details on the implementation.
When using this repository, please adhere to the original license terms of the GenBio AI code. This license can be found in this directory as LICENSE.
Original Repository
The original AIDO.RNA models and code are available at: https://github.com/genbio-ai/ModelGenerator
- Downloads last month
- 203