Evo-2

B+

Arc Institute

DNA Language ModelGenome GenerationRegulatory Element Predictionopen

Updated 1 month ago

NextIn take

Evo-2 is what happens when you treat DNA like language and throw serious compute at it — the results suggest that genomic foundation models might be as transformative for synbio as protein models have been for drug discovery.

Specifications

Architecture	StripedHyena — long-context DNA sequence model with hybrid attention
Parameters	~7B
Training Data	300B nucleotides from OpenGenome — prokaryotic and eukaryotic genomes
License	Apache 2.0 license. Fully open weights, code, and training data.
Hardware	4x A100 80GB for full model, smaller checkpoints available
Inference Cost	Self-hosted — moderate GPU requirements
API Available	No
Weights Available	Yes

Benchmark Performance

Benchmark	Score	Rank	Notes
ProteinGym	N/A	—	DNA-level model — evaluated on genomic tasks, not protein benchmarks

When to Use This

+Regulatory element prediction and design
+Synthetic genome design and gene-level generation

When NOT to Use This

−Protein-level tasks — use ESM-3 or structure models instead
−Clinical applications — too early for regulated use

Production Readiness

research

Known Users

Arc Institute internal
Synthetic biology research labs

Grade Rationale

B+

The first DNA foundation model that can generate coherent gene-length sequences and predict regulatory elements with biological validity. B+ because the field is still figuring out how to evaluate and deploy DNA language models — the potential is enormous but the applications are early.

Sources

Update History

2026-03-01Initial entry

← Back to Model Book