Evo-2
B+Arc Institute
DNA Language ModelGenome GenerationRegulatory Element Predictionopen
Updated 1 month agoNextIn take
Evo-2 is what happens when you treat DNA like language and throw serious compute at it — the results suggest that genomic foundation models might be as transformative for synbio as protein models have been for drug discovery.
Specifications
| Architecture | StripedHyena — long-context DNA sequence model with hybrid attention |
| Parameters | ~7B |
| Training Data | 300B nucleotides from OpenGenome — prokaryotic and eukaryotic genomes |
| License | Apache 2.0 license. Fully open weights, code, and training data. |
| Hardware | 4x A100 80GB for full model, smaller checkpoints available |
| Inference Cost | Self-hosted — moderate GPU requirements |
| API Available | No |
| Weights Available | Yes |
Benchmark Performance
| Benchmark | Score |
|---|---|
| ProteinGym | N/A |
When to Use This
- +Regulatory element prediction and design
- +Synthetic genome design and gene-level generation
When NOT to Use This
- −Protein-level tasks — use ESM-3 or structure models instead
- −Clinical applications — too early for regulated use
Production Readiness
research
Known Users
- Arc Institute internal
- Synthetic biology research labs
Grade Rationale
B+
The first DNA foundation model that can generate coherent gene-length sequences and predict regulatory elements with biological validity. B+ because the field is still figuring out how to evaluate and deploy DNA language models — the potential is enormous but the applications are early.
Sources
Update History
2026-03-01Initial entry