You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧬 NTv3: A Foundation Model for Genomics

NTv3 is a series of foundational models designed to understand and generate genomic sequences. It unifies representation learning, functional prediction, and controllable sequence generation within a single, efficient U-Net-like architecture. It also enables the modeling of long-range dependencies, up to 1 Mb of context, at nucleotide resolution. Pretrained on 9 trillion base pairs, NTv3 excels at functional-track prediction and genome annotation across 24 animal and plant species. It can also be fine-tuned into a controllable generative model for genomic sequence design. This is the generative model based on NTv3, capable of context-aware DNA sequence generation with desired activity levels.It builds on the post-trained NTv3 model with MDLM based fine-tuning.For more details, please refer to the NTv3 paper.

βš–οΈ License Summary

  1. The Licensed Models are only available under this License for Non-Commercial Purposes.
  2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
  3. You may not use the Licensed Models or any of its Outputs in connection with:
    1. any Commercial Purposes, unless agreed by Us under a separate licence;
    2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
    3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
    4. in violation of any applicable laws and regulations.

πŸ“‹ Model Summary

  • Architecture: Conditioned U-Net with adaptive layer norms + Transformer stack
  • Training: Masked Discrete Language Modeling (MDLM)
  • Conditioning: Species + Activity levels (0-4)
  • Tokenizer: Character-level over A T C G N + special tokens
  • Dependencies: transformers >= 4.55.0
  • Input size: Model trained on 4096bp sequences with 249bp generation length
  • Note: Custom code β†’ use trust_remote_code=True
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including InstaDeepAI/NTv3_generative