CryoSTAR: leveraging structural priors and constraints for cryo-EM heterogeneous reconstruction | Nature Methods
Nature Methods (2024)Cite this article
15 Altmetric
Metrics details
Resolving conformational heterogeneity in cryogenic electron microscopy datasets remains an important challenge in structural biology. Previous methods have often been restricted to working exclusively on volumetric densities, neglecting the potential of incorporating any preexisting structural knowledge as prior or constraints. Here we present cryoSTAR, which harnesses atomic model information as structural regularization to elucidate such heterogeneity. Our method uniquely outputs both coarse-grained models and density maps, showcasing the molecular conformational changes at different levels. Validated against four diverse experimental datasets, spanning large complexes, a membrane protein and a small single-chain protein, our results consistently demonstrate an efficient and effective solution to conformational heterogeneity with minimal human bias. By integrating atomic model insights with cryogenic electron microscopy data, cryoSTAR represents a meaningful step forward, paving the way for a deeper understanding of dynamic biological processes.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
Prices may be subject to local taxes which are calculated during checkout
The datasets from EMPIAR-10180, EMPIAR-10073, EMPIAR-10059 and EMPIAR-10827 were analyzed in this study. The reference atomic models were obtained from the PDB (5NRL, 5GAN, 1G88, 5IRX, 7RQW, 1AKE and 4AKE).
CryoSTAR software is freely available at https://github.com/bytedance/cryostar under the Apache License, version 2.0.
Lyumkis, D. Challenges and opportunities in cryo-EM single-particle analysis. J. Biol. Chem. 294, 5181–5197 (2019).
Article CAS PubMed PubMed Central Google Scholar
Scheres, S. H. W. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
Article CAS PubMed PubMed Central Google Scholar
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Article CAS PubMed Google Scholar
Grant, T., Rohou, A. & Grigorieff, N. cisTEM, user-friendly software for single-particle image processing. eLife 7, e35383 (2018).
Article PubMed PubMed Central Google Scholar
Scheres, S. H. W. Processing of structurally heterogeneous cryo-EM data in RELION. Methods Enzymol. 579, 125–157 (2016).
Article CAS PubMed Google Scholar
Scheres, S. H. W. et al. Disentangling conformational states of macromolecules in 3D-EM through likelihood optimization. Nat. Methods 4, 27–29 (2007).
Article CAS PubMed Google Scholar
Punjani, A. & Fleet, D. J. 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213, 107702 (2021).
Article CAS PubMed Google Scholar
Penczek, P. A., Kimmel, M. & Spahn, C. M. T. Identifying conformational states of macromolecules by eigen-analysis of resampled cryo-EM images. Structure 19, 1582–1590 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tagare, H. D., Kucukelbir, A., Sigworth, F. J., Wang, H. & Rao, M. Directly reconstructing principal components of heterogeneous particles from cryo-EM images. J. Struct. Biol. 191, 245–262 (2015).
Article CAS PubMed PubMed Central Google Scholar
van Heel, M., Portugal, R. V. & Schatz, M. Multivariate statistical analysis of large datasets: single particle electron microscopy. Open J. Stat. 06, 701–739 (2016).
Article Google Scholar
Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. Cryodrgn2: Ab initio neural reconstruction of 3d protein structures from real cryo-em images. In Proceedings of the IEEE/CVF International Conference on Computer Vision 4046–4055 (IEEE, 2021).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Punjani, A. & Fleet, D. J. 3DFlex: determining structure and motion of flexible proteins from cryo-EM. Nat. Methods 20, 860–870 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, M., Toader, B. & Lederman, R. Integrating molecular models into cryoEM heterogeneity analysis using scalable high-resolution deep Gaussian mixture models. J. Mol. Biol. 435, 168014 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. & Ludtke, S. J. Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM. Nat. Methods 18, 930–936 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hamitouche, I. & Jonic, S. DeepHEMNMA: ResNet-based hybrid analysis of continuous conformational heterogeneity in cryo-EM single particle images. Front. Mol. Biosci. 9, 965645 (2022).
Article CAS PubMed PubMed Central Google Scholar
Nashed, Y. S. G. et al. Heterogeneous reconstruction of deformable atomic models in Cryo-EM. Preprint at https://arxiv.org/abs/2209.15121 (2022).
Herreros, D. et al. Estimating conformational landscapes from cryo-EM particles by 3D Zernike polynomials. Nat. Commun. 14, 154 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. Exploring generative atomic models in cryo-EM reconstruction. Preprint at https://arxiv.org/abs/2107.01331 (2021).
Rosenbaum, D. et al. Inferring a continuous distribution of atom coordinates from cryo-EM images using VAEs. Preprint at https://arxiv.org/abs/2106.14108 (2021).
Ma, J. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 13, 373–380 (2005).
Article CAS PubMed Google Scholar
Plaschka, C., Lin, P.-C. & Nagai, K. Structure of a pre-catalytic spliceosome. Nature 546, 617–621 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wan, R. et al. The 3.8 Å structure of the U4/U6.U5 tri-snRNP: insights into spliceosome assembly and catalysis. Science 351, 466–475 (2016).
Article CAS PubMed Google Scholar
Gao, Y., Cao, E., Julius, D. & Cheng, Y. TRPV1 structures in nanodiscs reveal mechanisms of ligand and lipid action. Nature 534, 347–351 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. et al. Molecular architecture of black widow spider neurotoxins. Nat. Commun. 12, 6956 (2021).
Article CAS PubMed PubMed Central Google Scholar
Schwab, J., Kimanius, D., Burt, A., Dendooven, T. & Scheres, S. H. W. DynaMight: estimating molecular motions with improved reconstruction from cryo-EM images. Nat. Methods https://doi.org/10.1038/s41592-024-02377-5(2024).
Kwon, D. H., Zhang, F., Fedor, J. G., Suo, Y. & Lee, S.-Y. Vanilloid-dependent TRPV1 opening trajectory from cryoEM ensemble analysis. Nat. Commun. 13, 2874 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Souza, P. C. T. et al. Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nat. Methods 18, 382–388 (2021).
Article CAS PubMed Google Scholar
Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707 (2013).
Article Google Scholar
Tancik, M. et al. Fourier features let networks learn high frequency functions in low dimensional domains. In Proceedings of the 34th Conference on Neural Information Processing Systems https://proceedings.neurips.cc/paper/2020/file/55053683268957697aa39fba6f231c68-Paper.pdf (NeurIPS, 2020).
Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).
Article CAS PubMed Google Scholar
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
Article CAS PubMed Google Scholar
Koo, B. et al. CryoChains: heterogeneous reconstruction of molecular assembly of semi-flexible chains from cryo-EM images. Preprint at https://arxiv.org/abs/2306.07274 (2023).
Download references
We thank Z. Zheng, Y. Wang and D. Xue for their insightful discussions on the project. We also thank H. Li for the feedback on the project that helped shape this study and M. Cianfrocco for the suggestions on the manuscript. The work is conducted and supported by ByteDance Research.
These authors contributed equally: Yilai Li, Yi Zhou, Jing Yuan.
ByteDance Research, San Jose, CA, USA
Yilai Li
ByteDance Research, Shanghai, China
Yi Zhou, Jing Yuan & Fei Ye
ByteDance Research, Los Angeles, CA, USA
Quanquan Gu
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
Y.L., Y.Z., J.Y., F.Y. and Q.G. conceived the work. Y.Z. and J.Y. implemented the cryoSTAR method. Y.L., Y.Z. and J.Y. designed, performed and analyzed the experiments. Y.L., Y.Z., J.Y. and Q.G. wrote the paper with feedback from all the authors. Q.G. supervised the project.
Correspondence to Quanquan Gu.
The innovative aspects of the method we have presented in this manuscript have been described in a provisional patent application.
Nature Methods thanks José-Maria Carazo, Tim Grant and Fred Sigworth for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Examples of the particles from the synthetic dataset. A total of 50,000 particles were simulated from 50 continuous conformational states from closed to open (5,000 particles for each conformation). The signal-to-noise ratio is 0.0001. b, Colored series of coarse-grained models and particle density maps generated by cryoSTAR, sampling along the first principal component of the latent space. c, The Cα-RMSD between the output of cryoSTAR and the ground truth at different conformational states (left: closed state; right: open state). The middle curve denotes the mean and the light blue regions denote one sigma deviation. d, The FSC curves between the particle density maps and the ground truth densities at each conformation. e, PCA visualization of the cryoSTAR latent space, where the color depth represents the particle population. f, The coarse-grained models and particle density maps generated by sampling along the first principal component in the latent space, as marked in e with the corresponding colors. Unmasked CGModel-Map FSC are calculated for each sample and the cutoff at 0.5 are reported.
a, Four density maps (top and side views) generated by cryoDRGN, a method without applying structural regularization, sampling along the first principal component of the latent space, using the same isosurface levels. CryoDRGN failed to reveal the motions in the TRPV1 dataset, because it focused on the area with the highest variability, which is the nanodisc region. This is typical when applying cryoDRGN on membrane proteins. b, With the help of structural regularization, cryoSTAR successfully uncovers the hidden motion in the TRPV1 dataset.
Here, we used a reference model (PDB: 5IRX) that does not cover the flexible region in the TRPV1 consensus density map. CryoSTAR does not find the continuous motion in the dataset (EMPIAR-10059). a, PCA visualization of the cryoSTAR latent space, where the color depth represents the particle population. b, Colored series of ten coarse-grained models and two particle density maps (top and side views) generated by cryoSTAR, sampling along the first principal component of the latent space, respectively. c, Coarse-grained models and particle density maps generated by sampling along the first principal component in the latent space, respectively, as marked in a with the corresponding colors and numbers. CGModel-Map FSC of the protein (excluding the nanodisc densities, mask not shown) are calculated for each sample and the cutoff at 0.5 are reported. No motions were found. The output coarse-grained models were biased because the reference model was incomplete, but the generated densities were not biased (compared with Fig. 5). All density maps are shown using the same isosurface levels.
a, PCA visualization of the cryoSTAR latent space, where the color depth represents the particle population. The prediction result from AlphaFold2 is docked into the consensus density map and shown in the bottom right corner. b, c, Colored series of ten coarse-grained models and two particle density maps generated by cryoSTAR, sampling along the first and second principal component of the latent space, respectively. d, e, The coarse-grained models and particle density maps generated by sampling along the first and second principal component in the latent space, respectively, as marked in a with the corresponding colors. Unmasked CGModel-Map FSC are calculated for each sample and the cutoff at 0.5 are reported. Two different types of motion of the ARD tail are uncovered. All density maps are shown using the same isosurface levels. Although the decompositions of PCA were different from Fig. 6, the revealed different conformations were similar. The CGModel-Map FSC were slightly worse due to the inaccuracy of the AlphaFold2 prediction result.
a, Four density maps generated by cryoDRGN, sampling along the first principal component of the latent space, using the same isosurface levels. Without the structural regularization from the reference atomic model, the resulting output density may exhibit discontinuities and artifacts, including density disappearance while traversing along the latent space (3 and 4). b, Flex volume sampled along the first latent dimension from 3DFlex (default parameters). The estimated motion is much smaller. Despite the reported FSC resolution of 2.85 Å, the flex densities are not continuous, with a severe artifact at the Ankyrin-like repeat domain (ARD). c, Structural regularization helps cryoSTAR to uncover the continuous motion without discernible artifacts in the reconstructed density maps.
Supplementary Tables 1 and 2.
Video of cryoSTAR results for the precatalytic spliceosome (EMPIAR-10180).
Video of cryoSTAR results for U4/U6.U5 tri-snRNP (EMPIAR-10073).
Video of cryoSTAR results for the TRPV1 channel (EMPIAR-10059).
Video of cryoSTAR results for α-LCT (EMPIAR-10827).
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
Li, Y., Zhou, Y., Yuan, J. et al. CryoSTAR: leveraging structural priors and constraints for cryo-EM heterogeneous reconstruction. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02486-1
Download citation
Received: 06 December 2023
Accepted: 25 September 2024
Published: 29 October 2024
DOI: https://doi.org/10.1038/s41592-024-02486-1
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative