CASP15
TrustACASP remains the single most important benchmark in structural biology — if you can't perform here, nothing else matters. But it's become a victim of its own success. AlphaFold2 effectively solved the single-domain problem in CASP14, which means CASP15 is really measuring the next frontier: complexes, RNA, and ligands. That's where the real differentiation lives now. Don't over-index on CASP rankings for production decisions — a model that's 2% worse on GDT-TS but runs 100x faster and ships open weights is more useful to 90% of teams.
What It Measures
Protein structure prediction accuracy across multiple categories: single-domain proteins, multi-domain assemblies, RNA structures, and protein-ligand complexes. Evaluated via GDT-TS and lDDT scores against experimentally determined structures released after the prediction window closes. The gold standard for structure prediction since 1994.
What It Doesn't Measure
Speed of inference, computational cost, ease of deployment, or real-world applicability. CASP targets are curated academic structures — they don't represent the messy, incomplete inputs you get in drug discovery pipelines. It also doesn't measure performance on intrinsically disordered proteins, membrane proteins in native environments, or large multi-component molecular machines.
Maintainer
Protein Structure Prediction Center (UC Davis)
https://predictioncenter.org/casp15/ →Known Limitations
Biennial cadence means results are always 1–2 years stale by the time they matter. Target selection is biased toward "interesting" academic structures, not drug-discovery-relevant ones. The scoring rubric rewards global fold accuracy over local binding-site precision — you can score well on CASP and still be useless for drug design. Server categories mix automated and semi-automated methods in ways that muddy comparisons. The community has also raised concerns about potential data leakage from preprint structures.