Evo-2

B+

Arc Institute

DNA Language ModelGenome GenerationRegulatory Element Predictionopen
Updated 1 month ago
NextIn take

Evo-2 is what happens when you treat DNA like language and throw serious compute at it — the results suggest that genomic foundation models might be as transformative for synbio as protein models have been for drug discovery.

Specifications

ArchitectureStripedHyena — long-context DNA sequence model with hybrid attention
Parameters~7B
Training Data300B nucleotides from OpenGenome — prokaryotic and eukaryotic genomes
LicenseApache 2.0 license. Fully open weights, code, and training data.
Hardware4x A100 80GB for full model, smaller checkpoints available
Inference CostSelf-hosted — moderate GPU requirements
API AvailableNo
Weights AvailableYes

Benchmark Performance

BenchmarkScore
ProteinGymN/A

When to Use This

  • +Regulatory element prediction and design
  • +Synthetic genome design and gene-level generation

When NOT to Use This

  • Protein-level tasks — use ESM-3 or structure models instead
  • Clinical applications — too early for regulated use

Production Readiness

research

Known Users

  • Arc Institute internal
  • Synthetic biology research labs

Grade Rationale

B+

The first DNA foundation model that can generate coherent gene-length sequences and predict regulatory elements with biological validity. B+ because the field is still figuring out how to evaluate and deploy DNA language models — the potential is enormous but the applications are early.

Sources

Update History

2026-03-01Initial entry