License

How to use

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Taykhoom/AIDO-RNA-Wrapper",
    trust_remote_code=True,
)

model = AutoModel.from_pretrained(
    "Taykhoom/AIDO-RNA-Wrapper",
    trust_remote_code=True,
    base_model="genbio-ai/AIDO.RNA-650M-CDS",
)

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt")

embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280]

Model Variants

The following base_model options are available for embedding generation. The short name (keys) or the full model name (values) can be specified using the base_model argument.

VARIANTS = {
    "aido_rna_1m_mars": "genbio-ai/AIDO.RNA-1M-MARS",
    "aido_rna_25m_mars": "genbio-ai/AIDO.RNA-25M-MARS",
    "aido_rna_300m_mars": "genbio-ai/AIDO.RNA-300M-MARS",
    "aido_rna_650m": "genbio-ai/AIDO.RNA-650M",
    "aido_rna_650m_cds": "genbio-ai/AIDO.RNA-650M-CDS",
    "aido_rna_1b600m": "genbio-ai/AIDO.RNA-1.6B",
    "aido_rna_1b600m_cds": "genbio-ai/AIDO.RNA-1.6B-CDS",
}

Performance Vs Original AIDO.RNA Models

Verify that the modified code produces the same embeddings as the original AIDO.RNA models.

Original AIDO.RNA code snippet:

from modelgenerator.tasks import Embed
import torch

model = Embed.from_config({"model.backbone": "aido_rna_650m"}).eval()
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
transformed_batch = model.transform({"sequences": [dna]})
embedding = model(transformed_batch) # [1, sequence_length, 1280]

embedding_mean = torch.mean(embedding, dim=1)
print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=<MeanBackward0>)

embedding_max = torch.max(embedding, dim=1)[0]
print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=<MeanBackward0>)

Modified code snippet using the wrapper:

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained(
    "Taykhoom/AIDO-RNA-Wrapper",
    trust_remote_code=True,
)

model = AutoModel.from_pretrained(
    "Taykhoom/AIDO-RNA-Wrapper",
    trust_remote_code=True,
    base_model="genbio-ai/AIDO.RNA-650M",
)

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt")

embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280]

embedding_mean = torch.mean(embedding, dim=1)
print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=<MeanBackward0>)

embedding_max = torch.max(embedding, dim=1)[0]
print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=<MeanBackward0>)

License Notice

This repository contains modified versions of GenBio AI code. Modifications include:

  • Removal of reliance on modelgenerator package
  • Can load specific AIDO.RNA models via the base_model argument

Not all of the original functionality may be preserved. These changes were made to better integrate with the mRNABench framework which focuses on embedding generation for mRNA sequences. Most of the required code was directly copied from the original GenBio AI repository with minimal changes, so please refer to the original repository for full details on the implementation.

When using this repository, please adhere to the original license terms of the GenBio AI code. This license can be found in this directory as LICENSE.

Original Repository

The original AIDO.RNA models and code are available at: https://github.com/genbio-ai/ModelGenerator

Downloads last month
203
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support