Advertisement

Integrative Structure Modeling of Macromolecular Assemblies from Proteomics Data*

  • Keren Lasker
    Correspondence
    Supported by the Clore Foundation Ph.D. Scholars program and carried out research in partial fulfillment of the requirements for the Ph.D. degree at Tel Aviv University. To whom correspondence may be addressed.
    Footnotes
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158

    §Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
    Search for articles by this author
  • Jeremy L. Phillips
    Correspondence
    To whom correspondence may be addressed.
    Footnotes
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158

    the **Graduate Group in Biological and Medical Informatics, University of California, San Francisco, California 94158
    Search for articles by this author
  • Daniel Russel
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Javier Velázquez-Muriel
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Dina Schneidman-Duhovny
    Footnotes
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Elina Tjioe
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Ben Webb
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Avner Schlessinger
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Andrej Sali
    Correspondence
    Supported by the Sandler Family Supporting Foundation, National Science Foundation Grant IIS-0705196, Ron Conway, Mike Homer, Hewlett-Packard, NetApp, IBM, and Intel. To whom correspondence may be addressed: University of California, MC 2552, Byers Hall at Mission Bay, Suite 503B, 1700 4th St., San Francisco, CA 94158. Tel.:415-514-4227; Fax:415-514-4231;
    Affiliations
    ‡Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3) and University of California, San Francisco, California 94158
    Search for articles by this author
  • Author Footnotes
    * This work was supported, in whole or in part, by National Institutes of Health Grants R01 GM54762, U54 RR022220, PN2 EY016525, and R01 GM083960 (to A. Sali).
    2 K. Lasker, unpublished data.
    1 The abbreviations used are:EMelectron microscopyFRETfluorescence resonance energy transferSAXSsmall angle x-ray scatteringNPCnuclear pore complexAAA-ATPaseadenosine triphosphatase associated with diverse cellular activitiesRNAPIIRNA polymerase IIRpbRNA polymerase II subunitH-RNAPIIhuman RNA polymerase IIY2Hyeast two-hybridRMSDroot mean square deviation.
    ¶ Both authors contributed equally to this work.
    §§ Supported by a Weizmann Institute Advancing Women in Science postdoctoral fellowship.
Open AccessPublished:May 27, 2010DOI:https://doi.org/10.1074/mcp.R110.000067
      Proteomics techniques have been used to generate comprehensive lists of protein interactions in a number of species. However, relatively little is known about how these interactions result in functional multiprotein complexes. This gap can be bridged by combining data from proteomics experiments with data from established structure determination techniques. Correspondingly, integrative computational methods are being developed to provide descriptions of protein complexes at varying levels of accuracy and resolution, ranging from complex compositions to detailed atomic structures.
      A 3-D enhanced version of this article is available. The text is identical to this version but includes interactive figures.
      Viewing the enhanced version of this article requires the use of a browser plug-in. Please install the plug-in when prompted. http://www.thesgc.org/iSee/MCP/9/8/e2.html

      MOTIVATION: STRUCTURES FOR MECHANISTIC UNDERSTANDING OF PROCESSES

      The cell contains hundreds of functional macromolecular assemblies responsible for performing critical cellular processes (
      • Alberts B.
      The cell as a collection of protein machines: preparing the next generation of molecular biologists.
      ,
      • Abbott A.
      Proteomics: the society of proteins.
      ). These include, among others, the ribosome (translation) (
      • Schmeing T.M.
      • Ramakrishnan V.
      What recent ribosome structures have revealed about the mechanism of translation.
      ,
      • Allen G.S.
      • Frank J.
      Structural insights on the translation initiation complex: ghosts of a universal initiation complex.
      ), chaperonins (protein folding) (
      • Horwich A.L.
      • Fenton W.A.
      Chaperonin-mediated protein folding: using a central cavity to kinetically assist polypeptide chain folding.
      ,
      • Spiess C.
      • Meyer A.S.
      • Reissmann S.
      • Frydman J.
      Mechanism of the eukaryotic chaperonin: protein folding in the chamber of secrets.
      ), RNA polymerase (RNA synthesis) (
      • Cramer P.
      • Armache K.J.
      • Baumli S.
      • Benkert S.
      • Brueckner F.
      • Buchen C.
      • Damsma G.E.
      • Dengl S.
      • Geiger S.R.
      • Jasiak A.J.
      • Jawhari A.
      • Jennebach S.
      • Kamenski T.
      • Kettenberger H.
      • Kuhn C.D.
      • Lehmann E.
      • Leike K.
      • Sydow J.F.
      • Vannini A.
      Structure of eukaryotic RNA polymerases.
      ), and the proteasome (protein degradation) (
      • Cheng Y.
      Toward an atomic model of the 26S proteasome.
      ,
      • Murata S.
      • Yashiroda H.
      • Tanaka K.
      Molecular mechanisms of proteasome assembly.
      ,
      • Förster F.
      • Lasker K.
      • Nickell S.
      • Sali A.
      • Baumeister W.
      Towards an integrated structural model of the 26S proteasome.
      ). A macromolecular machine is often built around a stable core of proteins that defines the basic function of the complex. This core assembly can be modulated through interactions with peripheral protein components, resulting in a multitude of functionally relevant states (
      • Gavin A.C.
      • Bösche M.
      • Krause R.
      • Grandi P.
      • Marzioch M.
      • Bauer A.
      • Schultz J.
      • Rick J.M.
      • Michon A.M.
      • Cruciat C.M.
      • Remor M.
      • Höfert C.
      • Schelder M.
      • Brajenovic M.
      • Ruffner H.
      • Merino A.
      • Klein K.
      • Hudak M.
      • Dickson D.
      • Rudi T.
      • Gnau V.
      • Bauch A.
      • Bastuck S.
      • Huhse B.
      • Leutwein C.
      • Heurtier M.A.
      • Copley R.R.
      • Edelmann A.
      • Querfurth E.
      • Rybin V.
      • Drewes G.
      • Raida M.
      • Bouwmeester T.
      • Bork P.
      • Seraphin B.
      • Kuster B.
      • Neubauer G.
      • Superti-Furga G.
      Functional organization of the yeast proteome by systematic analysis of protein complexes.
      ). A structural description of an assembly in all of its states often facilitates a mechanistic understanding of the corresponding process (
      • Schmeing T.M.
      • Ramakrishnan V.
      What recent ribosome structures have revealed about the mechanism of translation.
      ,
      • Mitra K.
      • Frank J.
      Ribosome dynamics: insights from atomic structure modeling into cryo-electron microscopy maps.
      ,
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Sali A.
      • Rout M.P.
      The molecular architecture of the nuclear pore complex.
      ). Thus, a critical challenge in structural biology is to identify biologically relevant states of macromolecular assemblies and to determine the structures of these states at the highest possible resolution.

      ASSEMBLY STRUCTURES OFTEN CANNOT BE RESOLVED BY A SINGLE TECHNIQUE

      The structures of macromolecular assemblies in their biologically significant states generally cannot be resolved to atomic resolution by a single technique (
      • Robinson C.V.
      • Sali A.
      • Baumeister W.
      The molecular sociology of the cell.
      ). Although x-ray crystallography remains the most powerful approach for visualizing a static snapshot of a complex at atomic resolution, it is limited to samples that can be purified in large quantities and crystallized (
      • Blundell T.L.
      • Johnson L.
      ). Similarly, NMR spectroscopy results in an ensemble of structures of a system in solution (
      • Bonvin A.M.
      • Boelens R.
      • Kaptein R.
      NMR analysis of protein interactions.
      ,
      • Fiaux J.
      • Bertelsen E.B.
      • Horwich A.L.
      • Wüthrich K.
      NMR analysis of a 900K GroEL GroES complex.
      ,
      • Neudecker P.
      • Lundström P.
      • Kay L.E.
      Relaxation dispersion NMR spectroscopy as a tool for detailed studies of protein folding.
      ), but the technique is limited by the size of the complex and sample availability. Electron microscopy (EM)
      The abbreviations used are:
      EM
      electron microscopy
      FRET
      fluorescence resonance energy transfer
      SAXS
      small angle x-ray scattering
      NPC
      nuclear pore complex
      AAA-ATPase
      adenosine triphosphatase associated with diverse cellular activities
      RNAPII
      RNA polymerase II
      Rpb
      RNA polymerase II subunit
      H-RNAPII
      human RNA polymerase II
      Y2H
      yeast two-hybrid
      RMSD
      root mean square deviation.
      techniques provide an alternative approach for visualizing multiple conformations of complexes in vitro and even within cells (
      • Stahlberg H.
      • Walz T.
      Molecular electron microscopy: state of the art and current challenges.
      ,
      • Chiu W.
      • Baker M.L.
      • Jiang W.
      • Dougherty M.
      • Schmid M.F.
      Electron cryomicroscopy of biological machines at subnanometer resolution.
      ,
      • Lucic V.
      • Leis A.
      • Baumeister W.
      Cryo-electron tomography of cells: connecting structure and function.
      ,
      • Frank J.
      ). However, in most cases, the resolution of an electron density map is too low to provide a full mechanistic description of a protein complex. Additional techniques, such as high throughput proteomics methods (
      • Berggård T.
      • Linse S.
      • James P.
      Methods for the detection and analysis of protein-protein interactions.
      ), small angle x-ray scattering (SAXS) (
      • Svergun D.I.
      • Petoukhov M.V.
      • Koch M.H.
      Determination of domain structure of proteins from X-ray solution scattering.
      ,
      • Hura G.L.
      • Menon A.L.
      • Hammel M.
      • Rambo R.P.
      • Poole 2nd, F.L.
      • Tsutakawa S.E.
      • Jenney Jr., F.E.
      • Classen S.
      • Frankel K.A.
      • Hopkins R.C.
      • Yang S.J.
      • Scott J.W.
      • Dillard B.D.
      • Adams M.W.
      • Tainer J.A.
      Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS).
      ), and fluorescence resonance energy transfer (FRET) spectroscopy (
      • Joo C.
      • Balci H.
      • Ishitsuka Y.
      • Buranachai C.
      • Ha T.
      Advances in single-molecule fluorescence methods for molecular biology.
      ), are generally limited by low resolution (
      • Robinson C.V.
      • Sali A.
      • Baumeister W.
      The molecular sociology of the cell.
      ) and at times by low accuracy (
      • Hart G.T.
      • Ramani A.K.
      • Marcotte E.M.
      How complete are current yeast and human protein-interaction networks?.
      ,
      • Collins S.R.
      • Kemmeren P.
      • Zhao X.C.
      • Greenblatt J.F.
      • Spencer F.
      • Holstege F.C.
      • Weissman J.S.
      • Krogan N.J.
      Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae.
      ,
      • Cusick M.E.
      • Yu H.
      • Smolyar A.
      • Venkatesan K.
      • Carvunis A.R.
      • Simonis N.
      • Rual J.F.
      • Borick H.
      • Braun P.
      • Dreze M.
      • Vandenhaute J.
      • Galli M.
      • Yazaki J.
      • Hill D.E.
      • Ecker J.R.
      • Roth F.P.
      • Vidal M.
      Literature-curated protein interaction datasets.
      ) of the corresponding structural information.

      INTEGRATIVE STRUCTURE DETERMINATION

      The limitations in the resolution, accuracy, and coverage of individual experimental methods can be bridged by simultaneous consideration of multiple types of information. Examples of techniques that specialize in integrating a few types of experimental data include (i) combining electron density maps of complexes with atomic structures of protein components to build high resolution structures of protein complexes (
      • Topf M.
      • Lasker K.
      • Webb B.
      • Wolfson H.
      • Chiu W.
      • Sali A.
      Protein structure fitting and refinement guided by cryo-EM density.
      ,
      • Topf M.
      • Baker M.L.
      • Marti-Renom M.A.
      • Chiu W.
      • Sali A.
      Refinement of protein structures by iterative comparative modeling and CryoEM density fitting.
      ,
      • Lasker K.
      • Topf M.
      • Sali A.
      • Wolfson H.J.
      Inferential optimization for simultaneous fitting of multiple components into a cryoEM map of their assembly.
      ,

      Lasker, K., Sali, A., Wolfson, H. J., (in press) Determining macromolecular assembly structures by molecular docking and fitting into an electron density map. Proteins,

      ,
      • Lindert S.
      • Stewart P.L.
      • Meiler J.
      Hybrid approaches: applying computational methods in cryo-electron microscopy.
      ); (ii) using atomic models to estimate the phases required for converting diffraction data into electron density maps (
      • Qian B.
      • Raman S.
      • Das R.
      • Bradley P.
      • McCoy A.J.
      • Read R.J.
      • Baker D.
      High-resolution structure prediction and the crystallographic phase problem.
      ); (iii) inferring the binary interaction map of a complex from affinity purification, mass spectrometry, and comparative modeling data (
      • Taverner T.
      • Hernández H.
      • Sharon M.
      • Ruotolo B.T.
      • Matak-Vinkoviæ D.
      • Devos D.
      • Russell R.B.
      • Robinson C.V.
      Subunit architecture of intact protein complexes from mass spectrometry and homology modeling.
      ); and (iv) incorporating NMR-derived data into protein structure prediction (
      • Bowers P.M.
      • Strauss C.E.
      • Baker D.
      De novo protein structure determination using sparse NMR data.
      ,
      • Raman S.
      • Lange O.F.
      • Rossi P.
      • Tyka M.
      • Wang X.
      • Aramini J.
      • Liu G.
      • Ramelot T.A.
      • Eletsky A.
      • Szyperski T.
      • Kennedy M.A.
      • Prestegard J.
      • Montelione G.T.
      • Baker D.
      NMR structure determination for larger proteins using backbone-only data.
      ).
      Recently, a number of macromolecular structures have been resolved by such integrative methods. For instance, the constituent proteins in the nuclear pore complex (NPC) were localized based on the shape and symmetry of the NPC from cryo-EM, positions of the proteins from immuno-EM, relative proximities of proteins from affinity purification, and the shapes of proteins from ultracentrifugation (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Sali A.
      • Rout M.P.
      The molecular architecture of the nuclear pore complex.
      ,
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ). An atomic model of the AAA-ATPase ring of the 26 S proteasome was determined primarily by fitting comparative models of subunits into a single-particle cryo-EM map subject to protein interactions identified by proteomics (
      • Förster F.
      • Lasker K.
      • Beck F.
      • Nickell S.
      • Sali A.
      • Baumeister W.
      An Atomic Model AAA-ATPase/20S core particle sub-complex of the 26S proteasome.
      ). A structural model for a complete clathrin lattice (
      • Fotin A.
      • Cheng Y.
      • Sliz P.
      • Grigorieff N.
      • Harrison S.C.
      • Kirchhausen T.
      • Walz T.
      Molecular model for a complete clathrin lattice from electron cryomicroscopy.
      ) and a mechanistic model of the clathrin lattice assembly-disassembly cycle driven by chaperone Hsc70 (
      • Xing Y.
      • Böcking T.
      • Wolf M.
      • Grigorieff N.
      • Kirchhausen T.
      • Harrison S.C.
      Structure of clathrin coat with bound Hsc70 and auxilin: mechanism of Hsc70-facilitated disassembly.
      ) were suggested by combining data obtained by x-ray crystallography and single-particle cryo-EM. The architecture of RNA polymerase II in complex with its initiation factors was determined by combining known crystal structures with data from chemical cross-linking coupled to mass spectrometry (
      • Chen Z.A.
      • Jawhari A.
      • Fischer L.
      • Buchen C.
      • Tahir S.
      • Kamenski T.
      • Rasmussen M.
      • Lariviere L.
      • Bukowski-Wills J.C.
      • Nilges M.
      • Cramer P.
      • Rappsilber J.
      Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry.
      ). An NMR solution structure for the interface between two subunits in the human immunodeficiency virus type 1 capsid was fitted to an electron density map of the whole complex, revealing a relative orientation of subunits different from that in the corresponding crystal structure (
      • Byeon I.J.
      • Meng X.
      • Jung J.
      • Zhao G.
      • Yang R.
      • Ahn J.
      • Shi J.
      • Concel J.
      • Aiken C.
      • Zhang P.
      • Gronenborn A.M.
      Structural convergence between Cryo-EM and NMR reveals intersubunit interactions critical for HIV-1 capsid function.
      ).

      UNIFIED APPROACH FOR INTEGRATIVE MODELING

      As outlined above, different studies on different systems will have a variety of different types of available data (Fig. 1 and Table I). Therefore, a unified approach for integrative modeling that can incorporate any type of information about a macromolecular assembly into the determination of its structure is needed. This information may include physical theories, statistical preferences extracted from biological databases, and heterogeneous experimental data at different resolutions, ranging from atomic structures to sets of interacting proteins. We have proposed a single unified approach that can leverage all information to describe a macromolecular structure (
      • Robinson C.V.
      • Sali A.
      • Baumeister W.
      The molecular sociology of the cell.
      ,
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ,
      • Alber F.
      • Förster F.
      • Korkin D.
      • Topf M.
      • Sali A.
      Integrating diverse data for structure determination of macromolecular assemblies.
      ). This approach consists of an iterative series of four steps, including 1) generation of data informative about the structure being determined, 2) design of system representation and translation of the data into spatial restraints, 3) calculation of an ensemble of structures that satisfy the spatial restraints, and 4) an analysis of the ensemble. In this procedure, spatial restraints derived from data about the structure are summed into a scoring function that assesses how well a structural model of an assembly agrees with the data. The scoring function is used to optimize the structural models and to generate a final ensemble of solutions that agrees with the data as much as possible. This four-step approach, by design, benefits from synergy among the input data sets, minimizing the drawback of incomplete, inaccurate, and/or imprecise data sets; although each individual restraint may contain little structural information, the concurrent satisfaction of all restraints derived from independent experiments may drastically reduce the degeneracy of the final structural models.
      Figure thumbnail gr1
      Fig. 1.Structural information about a protein assembly. Standard proteomics, biophysical, and computational methods can collectively determine the copy numbers (stoichiometry) and types (composition) of assembly components and predict or experimentally determine protein-protein connectivities (interactivity among a group of proteins) and protein-protein interactions (direct physical interactions). Many of these techniques are capable of a high degree of throughput, allowing for collection of a high volume of data about components of an assembly in a short period of time. Additional biophysical methods can determine distances between components in an assembly, positions of the components, and their relative orientations. Integration of data from varied methods, including low resolution proteomics data, generally increases the accuracy, precision, coverage, and efficiency of structure determination. Methods listed include the following: mass spectrometry (
      • Gingras A.C.
      • Gstaiger M.
      • Raught B.
      • Aebersold R.
      Analysis of protein complexes using mass spectrometry.
      ,

      Zhou, M., Robinson, C. V., (in press) When proteomics meets structural biology. Trends Biochem. Sci.

      ,
      • Bich C.
      • Zenobi R.
      Mass spectrometry of large complexes.
      ), quantitative immunoblotting (
      • Towbin H.
      • Staehelin T.
      • Gordon J.
      Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications.
      ), genetic interactions (
      • Roguev A.
      • Bandyopadhyay S.
      • Zofall M.
      • Zhang K.
      • Fischer T.
      • Collins S.R.
      • Qu H.
      • Shales M.
      • Park H.O.
      • Hayles J.
      • Hoe K.L.
      • Kim D.U.
      • Ideker T.
      • Grewal S.I.
      • Weissman J.S.
      • Krogan N.J.
      Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast.
      ,
      • Phillips P.C.
      Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems.
      ), bioinformatics predictions of protein-protein interactions (
      • Skrabanek L.
      • Saini H.K.
      • Bader G.D.
      • Enright A.J.
      Computational prediction of protein-protein interactions.
      ), affinity purification (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Sali A.
      • Rout M.P.
      The molecular architecture of the nuclear pore complex.
      ,
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ,
      • Gavin A.C.
      • Aloy P.
      • Grandi P.
      • Krause R.
      • Boesche M.
      • Marzioch M.
      • Rau C.
      • Jensen L.J.
      • Bastuck S.
      • Dümpelfeld B.
      • Edelmann A.
      • Heurtier M.A.
      • Hoffman V.
      • Hoefert C.
      • Klein K.
      • Hudak M.
      • Michon A.M.
      • Schelder M.
      • Schirle M.
      • Remor M.
      • Rudi T.
      • Hooper S.
      • Bauer A.
      • Bouwmeester T.
      • Casari G.
      • Drewes G.
      • Neubauer G.
      • Rick J.M.
      • Kuster B.
      • Bork P.
      • Russell R.B.
      • Superti-Furga G.
      Proteome survey reveals modularity of the yeast cell machinery.
      ,
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      ), surface plasmon resonance (SPR) (
      • Visser N.F.
      • Heck A.J.
      Surface plasmon resonance mass spectrometry in proteomics.
      ), Y2H (
      • Stelzl U.
      • Worm U.
      • Lalowski M.
      • Haenig C.
      • Brembeck F.H.
      • Goehler H.
      • Stroedicke M.
      • Zenkner M.
      • Schoenherr A.
      • Koeppen S.
      • Timm J.
      • Mintzlaff S.
      • Abraham C.
      • Bock N.
      • Kietzmann S.
      • Goedde A.
      • Toksöz E.
      • Droege A.
      • Krobitsch S.
      • Korn B.
      • Birchmeier W.
      • Lehrach H.
      • Wanker E.E.
      A human protein-protein interaction network: a resource for annotating the proteome.
      ,
      • Rual J.F.
      • Venkatesan K.
      • Hao T.
      • Hirozane-Kishikawa T.
      • Dricot A.
      • Li N.
      • Berriz G.F.
      • Gibbons F.D.
      • Dreze M.
      • Ayivi-Guedehoussou N.
      • Klitgord N.
      • Simon C.
      • Boxem M.
      • Milstein S.
      • Rosenberg J.
      • Goldberg D.S.
      • Zhang L.V.
      • Wong S.L.
      • Franklin G.
      • Li S.
      • Albala J.S.
      • Lim J.
      • Fraughton C.
      • Llamosas E.
      • Cevik S.
      • Bex C.
      • Lamesch P.
      • Sikorski R.S.
      • Vandenhaute J.
      • Zoghbi H.Y.
      • Smolyar A.
      • Bosak S.
      • Sequerra R.
      • Doucette-Stamm L.
      • Cusick M.E.
      • Hill D.E.
      • Roth F.P.
      • Vidal M.
      Towards a proteome-scale map of the human protein-protein interaction network.
      ,
      • Giot L.
      • Bader J.S.
      • Brouwer C.
      • Chaudhuri A.
      • Kuang B.
      • Li Y.
      • Hao Y.L.
      • Ooi C.E.
      • Godwin B.
      • Vitols E.
      • Vijayadamodar G.
      • Pochart P.
      • Machineni H.
      • Welsh M.
      • Kong Y.
      • Zerhusen B.
      • Malcolm R.
      • Varrone Z.
      • Collis A.
      • Minto M.
      • Burgess S.
      • McDaniel L.
      • Stimpson E.
      • Spriggs F.
      • Williams J.
      • Neurath K.
      • Ioime N.
      • Agee M.
      • Voss E.
      • Furtak K.
      • Renzulli R.
      • Aanensen N.
      • Carrolla S.
      • Bickelhaupt E.
      • Lazovatsky Y.
      • DaSilva A.
      • Zhong J.
      • Stanyon C.A.
      • Finley Jr., R.L.
      • White K.P.
      • Braverman M.
      • Jarvie T.
      • Gold S.
      • Leach M.
      • Knight J.
      • Shimkets R.A.
      • McKenna M.P.
      • Chant J.
      • Rothberg J.M.
      A protein interaction map of Drosophila melanogaster.
      ,
      • Walhout A.J.
      • Sordella R.
      • Lu X.
      • Hartley J.L.
      • Temple G.F.
      • Brasch M.A.
      • Thierry-Mieg N.
      • Vidal M.
      Protein interaction mapping in C. elegans using proteins involved in vulval development.
      ,
      • Uetz P.
      • Giot L.
      • Cagney G.
      • Mansfield T.A.
      • Judson R.S.
      • Knight J.R.
      • Lockshon D.
      • Narayan V.
      • Srinivasan M.
      • Pochart P.
      • Qureshi-Emili A.
      • Li Y.
      • Godwin B.
      • Conover D.
      • Kalbfleisch T.
      • Vijayadamodar G.
      • Yang M.
      • Johnston M.
      • Fields S.
      • Rothberg J.M.
      A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.
      ,
      • Ito T.
      • Chiba T.
      • Ozawa R.
      • Yoshida M.
      • Hattori M.
      • Sakaki Y.
      A comprehensive two-hybrid analysis to explore the yeast protein interactome.
      ), protein microarrays (
      • Stoevesandt O.
      • Taussig M.J.
      • He M.
      Protein microarrays: high-throughput tools for proteomics.
      ,
      • Wolf-Yadlin A.
      • Sevecka M.
      • MacBeath G.
      Dissecting protein function and signaling using protein microarrays.
      ,
      • Korf U.
      • Wiemann S.
      Protein microarrays as a discovery tool for studying protein-protein interactions.
      ), protein-fragment complementation assay (PCA) (
      • Kerppola T.K.
      Visualization of molecular interactions by fluorescence complementation.
      ,
      • Remy I.
      • Michnick S.W.
      Application of protein-fragment complementation assays in cell biology.
      ), calorimetry (
      • Freyer M.W.
      • Lewis E.A.
      Isothermal titration calorimetry: experimental design, data analysis, and probing macromolecule/ligand binding and kinetic interactions.
      ,
      • Velazquez-Campoy A.
      • Leavitt S.A.
      • Freire E.
      Characterization of protein-protein interactions by isothermal titration calorimetry.
      ), FRET (
      • Piston D.W.
      • Kremers G.J.
      Fluorescent protein FRET: the good, the bad and the ugly.
      ), bioluminescence resonance energy transfer (BRET) (
      • Pfleger K.D.
      • Eidne K.A.
      Illuminating insights into protein-protein interactions using bioluminescence resonance energy transfer (BRET).
      ), SAXS (
      • Svergun D.I.
      • Petoukhov M.V.
      • Koch M.H.
      Determination of domain structure of proteins from X-ray solution scattering.
      ,
      • Hura G.L.
      • Menon A.L.
      • Hammel M.
      • Rambo R.P.
      • Poole 2nd, F.L.
      • Tsutakawa S.E.
      • Jenney Jr., F.E.
      • Classen S.
      • Frankel K.A.
      • Hopkins R.C.
      • Yang S.J.
      • Scott J.W.
      • Dillard B.D.
      • Adams M.W.
      • Tainer J.A.
      Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS).
      ), electron tomography (ET) (
      • Lucic V.
      • Leis A.
      • Baumeister W.
      Cryo-electron tomography of cells: connecting structure and function.
      ), EM (
      • Stahlberg H.
      • Walz T.
      Molecular electron microscopy: state of the art and current challenges.
      ,
      • Chiu W.
      • Baker M.L.
      • Jiang W.
      • Dougherty M.
      • Schmid M.F.
      Electron cryomicroscopy of biological machines at subnanometer resolution.
      ,
      • Frank J.
      ), gold labeling (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ,
      • Lucocq J.
      Quantification of structures and gold labeling in transmission electron microscopy.
      ,
      • Hainfeld J.F.
      • Powell R.D.
      New frontiers in gold labeling.
      ), green fluorescent protein (GFP) labeling (
      • Drummond S.P.
      • Allen T.D.
      From live-cell imaging to scanning electron microscopy (SEM): the use of green fluorescent protein (GFP) as a common label.
      ), protein-protein docking (
      • Vajda S.
      • Kozakov D.
      Convergence and combination of methods in protein-protein docking.
      ), cross-linking (
      • Taverner T.
      • Hernández H.
      • Sharon M.
      • Ruotolo B.T.
      • Matak-Vinkoviæ D.
      • Devos D.
      • Russell R.B.
      • Robinson C.V.
      Subunit architecture of intact protein complexes from mass spectrometry and homology modeling.
      ,
      • Chen Z.A.
      • Jawhari A.
      • Fischer L.
      • Buchen C.
      • Tahir S.
      • Kamenski T.
      • Rasmussen M.
      • Lariviere L.
      • Bukowski-Wills J.C.
      • Nilges M.
      • Cramer P.
      • Rappsilber J.
      Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry.
      ,
      • Sinz A.
      Chemical cross-linking and mass spectrometry to map three-dimensional protein structures and protein-protein interactions.
      ,
      • Trester-Zedlitz M.
      • Kamada K.
      • Burley S.K.
      • Fenyö D.
      • Chait B.T.
      • Muir T.W.
      A modular cross-linking approach for exploring protein interactions.
      ), hydrogen/deuterium (H/D) (
      • Tsutsui Y.
      • Wintrode P.L.
      Hydrogen/deuterium exchange-mass spectrometry: a powerful tool for probing protein structure, dynamics and interactions.
      ), limited proteolysis (
      • Dokudovskaya S.
      • Williams R.
      • Devos D.
      • Sali A.
      • Chait B.T.
      • Rout M.P.
      Protease accessibility laddering: a proteomic tool for probing protein structure.
      ), footprinting (
      • Guan J.Q.
      • Chance M.R.
      Structural proteomics of macromolecular assemblies using oxidative footprinting and mass spectrometry.
      ), x-ray crystallography (
      • Blundell T.L.
      • Johnson L.
      ), and NMR spectroscopy (
      • Bonvin A.M.
      • Boelens R.
      • Kaptein R.
      NMR analysis of protein interactions.
      ,
      • Fiaux J.
      • Bertelsen E.B.
      • Horwich A.L.
      • Wüthrich K.
      NMR analysis of a 900K GroEL GroES complex.
      ,
      • Neudecker P.
      • Lundström P.
      • Kay L.E.
      Relaxation dispersion NMR spectroscopy as a tool for detailed studies of protein folding.
      ).
      Table ICommon restraints that can be used for integrative structure determination
      RestraintDescriptionSource of information
      Excluded volume restraint
      Restraints used to determine the structure of H-RNAPII.
      Prevents steric clashes between system particlesPhysical first principles
      Geometric complementarity restraint
      Restraints used to determine the structure of H-RNAPII.
      Restrains a protein interface to the tightest possible packingPhysical first principles
      Statistical potential restraintRestrains a structure to have contact frequencies similar to those in structurally defined complexesPhysical first principals, all previously determined protein structures
      Distance restraint
      Restraints used to determine the structure of H-RNAPII.
      Restrains the distance between two particlesFRET, BRET, cross-linking, homology to a known structure
      Protein localization restraintRestrains a protein to a specific positionImmuno-EM, gold labeling, GFP labeling
      Protein connectivity restraint
      Restraints used to determine the structure of H-RNAPII.
      Restrains all proteins in a set to interact directly or indirectlyAffinity purification
      Angle restraintRestrains the angle between three particlesEM, SAXS, homology to a known structure
      Complex diameter restraintRestrains the distance between the two most distant particles in a protein or complexEM, SAXS
      Symmetry restraintMaintains the same configuration of equivalent particles across multiple symmetry unitsEM, SAXS, homology to a known structure
      EM quality-of-fit restraint
      Restraints used to determine the structure of H-RNAPII.
      Restrains the model to overlap with a density mapEM, SAXS
      Radial distribution function restraintRestrains the correlation between experimentally measured and computed radial distribution functionsSAXS
      a Restraints used to determine the structure of H-RNAPII.

      PROTEOMICS AS A KEY DATA SOURCE FOR INTEGRATIVE MODELING

      Proteomics techniques have emerged as a powerful tool for mapping protein interactions in the cell. However, data produced by these techniques are rarely formally incorporated into macromolecular structure determination efforts. Here, we focus on the potential of proteomics techniques to contribute to the integrative modeling of macromolecular assemblies. Specifically, we describe how protein binding and association data can be interpreted as spatial restraints on a protein complex and thus reduce ambiguity in its structural description. These ideas have already been applied to determine the molecular architecture of the NPC (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Sali A.
      • Rout M.P.
      The molecular architecture of the nuclear pore complex.
      ,
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ) and a pseudoatomic model of the 20 S/AAA-ATPase ring of the 26 S proteasome (
      • Förster F.
      • Lasker K.
      • Nickell S.
      • Sali A.
      • Baumeister W.
      Towards an integrated structural model of the 26S proteasome.
      ,
      • Förster F.
      • Lasker K.
      • Beck F.
      • Nickell S.
      • Sali A.
      • Baumeister W.
      An Atomic Model AAA-ATPase/20S core particle sub-complex of the 26S proteasome.
      ,
      • Nickell S.
      • Beck F.
      • Scheres S.H.
      • Korinek A.
      • Förster F.
      • Lasker K.
      • Mihalache O.
      • Sun N.
      • Nagy I.
      • Sali A.
      • Plitzko J.M.
      • Carazo J.M.
      • Mann M.
      • Baumeister W.
      Insights into the molecular architecture of the 26S proteasome.
      ). Below, we illustrate our integrative modeling approach by using real experimental data to determine the known architecture of the human RNA polymerase II.

      INTEGRATIVE STRUCTURE CHARACTERIZATION OF HUMAN RNA POLYMERASE II (RNAPII)

      The eukaryotic RNAPII is a central multiprotein machine that synthesizes messenger RNAs and small nuclear RNAs. It is composed of 12 protein subunits with a total molecular mass of 514 kDa (Fig. 2). Ten subunits (Rpb1, Rpb2, Rpb3, Rpb5, Rpb6, Rpb8, Rpb10, Rpb11, and Rpb12) form a structurally conserved core, whereas the Rpb4-Rpb7 heterodimer is located on the periphery (
      • Jasiak A.J.
      • Hartmann H.
      • Karakasili E.
      • Kalocsay M.
      • Flatley A.
      • Kremmer E.
      • Strässer K.
      • Martin D.E.
      • Söding J.
      • Cramer P.
      Genome-associated RNA polymerase II includes the dissociable Rpb4/7 subcomplex.
      ,
      • Hahn S.
      Structure and mechanism of the RNA polymerase II transcription machinery.
      ). Although the atomic structure of the Saccharomyces cerevisiae RNAPII has been solved by x-ray crystallography (
      • Cramer P.
      • Bushnell D.A.
      • Kornberg R.D.
      Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution.
      ), the human RNAPII (H-RNAPII) has not been determined at atomic resolution mostly because of difficulties in obtaining sufficient quantities of pure sample (
      • Kostek S.A.
      • Grob P.
      • De Carlo S.
      • Lipscomb J.S.
      • Garczarek F.
      • Nogales E.
      Molecular architecture and conformational flexibility of human RNA polymerase II.
      ). However, the molecular architecture of the H-RNAPII can be informed by that of its yeast homolog based on the homology between their constituent proteins (
      • Kostek S.A.
      • Grob P.
      • De Carlo S.
      • Lipscomb J.S.
      • Garczarek F.
      • Nogales E.
      Molecular architecture and conformational flexibility of human RNA polymerase II.
      ).
      Figure thumbnail gr2
      Fig. 2.Determining the molecular architecture of human RNAPII. Top, data gathering. Comparative models of the H-RNAPII subunits were obtained from the ModBase database (
      • Pieper U.
      • Chiang R.
      • Seffernick J.J.
      • Brown S.D.
      • Glasner M.E.
      • Kelly L.
      • Eswar N.
      • Sauder J.M.
      • Bonanno J.B.
      • Swaminathan S.
      • Burley S.K.
      • Zheng X.
      • Chance M.R.
      • Almo S.C.
      • Gerlt J.A.
      • Raushel F.M.
      • Jacobson M.P.
      • Babbitt P.C.
      • Sali A.
      Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies.
      ). A density map of H-RNAPII at 20-Å resolution (
      • Kostek S.A.
      • Grob P.
      • De Carlo S.
      • Lipscomb J.S.
      • Garczarek F.
      • Nogales E.
      Molecular architecture and conformational flexibility of human RNA polymerase II.
      ) was obtained from the EM data bank (
      • Henrick K.
      • Newman R.
      • Tagari M.
      • Chagoyen M.
      EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information.
      ). Proteomics data for S. cerevisiae RNAPII subunits were obtained from BioGRID () (
      • Stark C.
      • Breitkreutz B.J.
      • Reguly T.
      • Boucher L.
      • Breitkreutz A.
      • Tyers M.
      BioGRID: a general repository for interaction datasets.
      ). All pairwise direct interactions are visualized in a single graph with solid edges, and each pulldown experiment is presented as a separate graph with dashed edges to indicate the missing underlying binary interaction network. Pulldowns Rpb1-Rpb2-Rpb3-Rpb4-Rpb5-Rpb8 and Rpb1-Rpb2-Rpb3-Rpb8-Rpb10 are missing some edges for clarity. Gray edges indicate interactions present in BioGRID but not in the yeast RNAPII crystallographic structure. Middle, scoring. The scoring function is the sum of the distance (illustrated between Rpb4 and Rpb7), connectivity (illustrated between Rpb1, Rpb2, Rpb3, Rpb8 and Rpb10), EM quality-of-fit (illustrated between the H-RNAPII density map and Rpb1), and geometric complementarity (illustrated between Rpb4 and Rpb7) restraints. Bottom, optimization. The configuration of the subunits in H-RNAPII was optimized using an extension of the divide-and-conquer MultiFit protocol to incorporate proteomics-based restraints. The optimization procedure resulted in a single model that satisfied all of the input restraints.
      Below, we demonstrate that our integrative structure determination procedure can be used to accurately model the known architecture of H-RNAPII using only proteomics-derived protein interactions, an electron density map at 20-Å resolution, comparative models of the protein subunits based on yeast and human crystallographic structures, and geometric complementarity between the interacting subunits. We describe the input data used for the modeling, the translation of these data into spatial restraints, an optimization procedure for determining the models that satisfy the restraints, and an analysis of the resulting set of solutions. We use a previously determined crystallographic structure of the full complex in yeast (
      • Kettenberger H.
      • Armache K.J.
      • Cramer P.
      Complete RNA polymerase II elongation complex structure and its interactions with NTP and TFIIS.
      ) to evaluate the results.

      Data Generation by Experiments

      Different techniques produce data that differ in types of measured features as well as in the accuracy, resolution, and coverage of the measurements (Fig. 1). An interpretation of the data in terms of a spatial restraint involves identifying the restrained structural components and the allowed values of the restrained feature implied by the data. For example, a result of a cross-linking experiment might be used to restrain the distance between two proteins (
      • Förster F.
      • Lasker K.
      • Beck F.
      • Nickell S.
      • Sali A.
      • Baumeister W.
      An Atomic Model AAA-ATPase/20S core particle sub-complex of the 26S proteasome.
      ,
      • Maiolica A.
      • Cittaro D.
      • Borsotti D.
      • Sennels L.
      • Ciferri C.
      • Tarricone C.
      • Musacchio A.
      • Rappsilber J.
      Structural analysis of multiprotein complexes by cross-linking, mass spectrometry, and database searching.
      ) or within one protein (
      • Young M.M.
      • Tang N.
      • Hempel J.C.
      • Oshiro C.M.
      • Taylor E.W.
      • Kuntz I.D.
      • Gibson B.W.
      • Dollinger G.
      High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry.
      ); the restraint parameters are a function of the length and flexibility of the cross-linker.
      To determine the molecular architecture of the H-RNAPII, we use structural homologs of individual human protein subunits found in the ModBase database (
      • Pieper U.
      • Chiang R.
      • Seffernick J.J.
      • Brown S.D.
      • Glasner M.E.
      • Kelly L.
      • Eswar N.
      • Sauder J.M.
      • Bonanno J.B.
      • Swaminathan S.
      • Burley S.K.
      • Zheng X.
      • Chance M.R.
      • Almo S.C.
      • Gerlt J.A.
      • Raushel F.M.
      • Jacobson M.P.
      • Babbitt P.C.
      • Sali A.
      Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies.
      ) (Table II), proteomics data for yeast RNAPII subunits extracted from the BioGRID database (
      • Stark C.
      • Breitkreutz B.J.
      • Reguly T.
      • Boucher L.
      • Breitkreutz A.
      • Tyers M.
      BioGRID: a general repository for interaction datasets.
      ) (Table III), and an assembly electron density map of H-RNAPII determined at 20-Å resolution by single-particle cryo-EM (
      • Kostek S.A.
      • Grob P.
      • De Carlo S.
      • Lipscomb J.S.
      • Garczarek F.
      • Nogales E.
      Molecular architecture and conformational flexibility of human RNA polymerase II.
      ) deposited in the EM data bank (
      • Henrick K.
      • Newman R.
      • Tagari M.
      • Chagoyen M.
      EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information.
      ).
      Table IIRepresentation of H-RNAPII
      Subunit (name, UniProt accession no.)Sequence identity
      The percentage of sequence identity between the target subunit and the template as calculated from their alignment used for comparative modeling.
      (%)
      Number of residues, residue rangeTemplate (Protein Data Bank code and chain, residue range)
      Rpb1, P24928551970, 11–14751i6h A, 7–1445
      Rpb2, P30876631174, 15–11712vum B, 20–1216
      Rpb3, P1938747275, 7–2641twf C, 6–263
      Rpb4, O15514100142,14–1422c35 A, 14–142
      Rpb5, P193852210, 146–2091hmj A, 10–73
      Rpb6, P61218100127,1–1271qkl A, 801–927
      Rpb7, P62487100172,1–1712c35 B, 1–171
      RPB8, P52434100150,1–1502f3i A, 1–150
      Rpb9, P3695447125, 15–1241twf I, 5–111
      Rpb10, P628757367,1–641twf J, 1–65
      Rpb11, P52435521–105, 1171twf K, 1–105
      Rpb12, P53803381–70, 702e2h L, 70
      a The percentage of sequence identity between the target subunit and the template as calculated from their alignment used for comparative modeling.
      Table IIIProteomics data used for modeling the architecture of RNAPII
      Interacting subunitsSource method
      Rpb1, Rpb2, Rpb10Affinity capture-MS (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      )
      Rpb1, Rpb2, Rpb3, Rpb4, Rpb5, Rpb8Affinity capture-MS (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      )
      Rpb1, Rpb2, Rpb8Affinity capture-MS (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      )
      Rpb1, Rpb2, Rpb3, Rpb8, Rpb10Affinity capture-MS (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      )
      Rpb1, Rpb2, Rpb6Affinity capture-MS (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      )
      Rpb1, Rpb5Y2H (
      • Flores A.
      • Briand J.F.
      • Gadal O.
      • Andrau J.C.
      • Rubbi L.
      • Van Mullem V.
      • Boschiero C.
      • Goussot M.
      • Marck C.
      • Carles C.
      • Thuriaux P.
      • Sentenac A.
      • Werner M.
      A protein-protein interaction map of yeast RNA polymerase III.
      ,
      • Zaros C.
      • Briand J.F.
      • Boulard Y.
      • Labarre-Mariotte S.
      • Garcia-Lopez M.C.
      • Thuriaux P.
      • Navarro F.
      Functional organization of the Rpb5 subunit shared by the three yeast RNA polymerases.
      )
      Rpb1, Rpb8Y2H (
      • Briand J.F.
      • Navarro F.
      • Rematier P.
      • Boschiero C.
      • Labarre S.
      • Werner M.
      • Shpakovski G.V.
      • Thuriaux P.
      Partners of Rpb8p, a small subunit shared by yeast RNA polymerases I, II and III.
      )
      Rpb1, Rpb9Y2H (
      • Flores A.
      • Briand J.F.
      • Gadal O.
      • Andrau J.C.
      • Rubbi L.
      • Van Mullem V.
      • Boschiero C.
      • Goussot M.
      • Marck C.
      • Carles C.
      • Thuriaux P.
      • Sentenac A.
      • Werner M.
      A protein-protein interaction map of yeast RNA polymerase III.
      )
      Rpb2, Rpb3PCA (
      • Tarassov K.
      • Messier V.
      • Landry C.R.
      • Radinovic S.
      • Serna Molina M.M.
      • Shames I.
      • Malitskaya Y.
      • Vogel J.
      • Bussey H.
      • Michnick S.W.
      An in vivo map of the yeast protein interactome.
      )
      Rpb2, Rpb6PCA (
      • Tarassov K.
      • Messier V.
      • Landry C.R.
      • Radinovic S.
      • Serna Molina M.M.
      • Shames I.
      • Malitskaya Y.
      • Vogel J.
      • Bussey H.
      • Michnick S.W.
      An in vivo map of the yeast protein interactome.
      )
      Rpb2, Rpb10Y2H (
      • Flores A.
      • Briand J.F.
      • Gadal O.
      • Andrau J.C.
      • Rubbi L.
      • Van Mullem V.
      • Boschiero C.
      • Goussot M.
      • Marck C.
      • Carles C.
      • Thuriaux P.
      • Sentenac A.
      • Werner M.
      A protein-protein interaction map of yeast RNA polymerase III.
      )
      Rpb3, Rpb11Y2H (
      • Flores A.
      • Briand J.F.
      • Gadal O.
      • Andrau J.C.
      • Rubbi L.
      • Van Mullem V.
      • Boschiero C.
      • Goussot M.
      • Marck C.
      • Carles C.
      • Thuriaux P.
      • Sentenac A.
      • Werner M.
      A protein-protein interaction map of yeast RNA polymerase III.
      ), reconstituted complex (
      • Sareen A.
      • Choudhry P.
      • Mehta S.
      • Sharma N.
      Mapping the interaction site of Rpb4 and Rpb7 subunits of RNA polymerase II in Saccharomyces cerevisiae.
      ), PCA (
      • Tarassov K.
      • Messier V.
      • Landry C.R.
      • Radinovic S.
      • Serna Molina M.M.
      • Shames I.
      • Malitskaya Y.
      • Vogel J.
      • Bussey H.
      • Michnick S.W.
      An in vivo map of the yeast protein interactome.
      )
      Rpb4, Rpb7Y2H (
      • Tan Q.
      • Prysak M.H.
      • Woychik N.A.
      Loss of the Rpb4/Rpb7 subcomplex in a mutant form of the Rpb6 subunit shared by RNA polymerases I, II, and III.
      ,
      • Qi H.
      • Zakian V.A.
      The Saccharomyces telomere-binding protein Cdc13p interacts with both the catalytic subunit of DNA polymerase alpha and the telomerase-associated est1 protein.
      ,
      • Sampath V.
      • Rekha N.
      • Srinivasan N.
      • Sadhale P.
      The conserved and non-conserved regions of Rpb4 are involved in multiple phenotypes in Saccharomyces cerevisiae.
      ,
      • Khazak V.
      • Sadhale P.P.
      • Woychik N.A.
      • Brent R.
      • Golemis E.A.
      Human RNA polymerase II subunit hsRPB7 functions in yeast and influences stress survival and cell morphology.
      ,
      • Sareen A.
      • Choudhry P.
      • Mehta S.
      • Sharma N.
      Mapping the interaction site of Rpb4 and Rpb7 subunits of RNA polymerase II in Saccharomyces cerevisiae.
      ,
      • Selitrennik M.
      • Duek L.
      • Lotan R.
      • Choder M.
      Nucleocytoplasmic shuttling of the Rpb4p and Rpb7p subunits of Saccharomyces cerevisiae RNA polymerase II by two pathways.
      ), reconstituted complex (
      • Orlicky S.M.
      • Tran P.T.
      • Sayre M.H.
      • Edwards A.M.
      Dissociable Rpb4-Rpb7 subassembly of RNA polymerase II binds to single-strand nucleic acid and mediates a post-recruitment step in transcription initiation.
      )

      System Representation

      The first step in integrative structure determination is deciding on an appropriate representation for the system to be modeled as dictated by the resolution of the available data. At the finest representation granularity, an assembly structure can be represented by particles corresponding to its atoms, each associated with attributes such as position, radius, charge, and mass. Alternatively, a single-particle may be a sphere corresponding to a group of atoms, a whole amino acid residue, a secondary structure segment, a domain, a protein, a “subcomplex” consisting of a subset of proteins in a complete assembly, or even an entire assembly. Given the availability of high accuracy comparative models for the H-RNAPII subunits, we represent the structures of its subunits at atomic resolution. We use atomic models found in the ModBase database of comparative models for domains in ∼2.4 million protein sequences that are detectably related to known structures (Table II) (
      • Pieper U.
      • Eswar N.
      • Webb B.M.
      • Eramian D.
      • Kelly L.
      • Barkan D.T.
      • Carter H.
      • Mankoo P.
      • Karchin R.
      • Marti-Renom M.A.
      • Davis F.P.
      • Sali A.
      MODBASE, a database of annotated comparative protein structure models and associated resources.
      ).

      Translation of Data into Spatial Restraints

      A restraint is a function that reaches its minimum if the restrained feature (e.g. distance) is consistent with the data on which the restraint is based. Beyond that, a restraint can, in principle, have any functional form. For example, a restraint is frequently a harmonic function (of the form k·x2 where x is the distance from the mean and k is proportional to the force constant) of the restrained feature. A restrained feature may be any structural attribute of a protein or assembly, including contact, proximity, charge, distance, angle, chirality, surface area, volume, excluded volume, shape, symmetry, and localization of particles or sets of particles (Table I). Below, we highlight some restraints in the context of the H-RNAPII structure determination process.

      Dealing with Ambiguity

      Structural interpretation of data can be ambiguous, especially for proteomics data sets. For instance, if multiple copies of a protein exist in an assembly, a protein-protein interaction derived from a proteomics experiment may not be uniquely assigned to a specific pair of copies. Such ambiguous information must be translated into a restraint that considers all possible structural interpretations of the data; for example, an interaction between two protein types in an assembly with two symmetry units can occur either between the protein copies within each unit or between proteins across the two units (or both). We refer to such restraints as conditional restraints (
      • Alber F.
      • Förster F.
      • Korkin D.
      • Topf M.
      • Sali A.
      Integrating diverse data for structure determination of macromolecular assemblies.
      ).

      Distance Restraints from Proteomics

      We used direct physical interactions between eight pairs of eukaryotic RNAPII subunits as determined by the yeast two-hybrid (Y2H) system (
      • Flores A.
      • Briand J.F.
      • Gadal O.
      • Andrau J.C.
      • Rubbi L.
      • Van Mullem V.
      • Boschiero C.
      • Goussot M.
      • Marck C.
      • Carles C.
      • Thuriaux P.
      • Sentenac A.
      • Werner M.
      A protein-protein interaction map of yeast RNA polymerase III.
      ,
      • Zaros C.
      • Briand J.F.
      • Boulard Y.
      • Labarre-Mariotte S.
      • Garcia-Lopez M.C.
      • Thuriaux P.
      • Navarro F.
      Functional organization of the Rpb5 subunit shared by the three yeast RNA polymerases.
      ,
      • Briand J.F.
      • Navarro F.
      • Rematier P.
      • Boschiero C.
      • Labarre S.
      • Werner M.
      • Shpakovski G.V.
      • Thuriaux P.
      Partners of Rpb8p, a small subunit shared by yeast RNA polymerases I, II and III.
      ,
      • Tan Q.
      • Prysak M.H.
      • Woychik N.A.
      Loss of the Rpb4/Rpb7 subcomplex in a mutant form of the Rpb6 subunit shared by RNA polymerases I, II, and III.
      ,
      • Qi H.
      • Zakian V.A.
      The Saccharomyces telomere-binding protein Cdc13p interacts with both the catalytic subunit of DNA polymerase alpha and the telomerase-associated est1 protein.
      ,
      • Sampath V.
      • Rekha N.
      • Srinivasan N.
      • Sadhale P.
      The conserved and non-conserved regions of Rpb4 are involved in multiple phenotypes in Saccharomyces cerevisiae.
      ,
      • Khazak V.
      • Sadhale P.P.
      • Woychik N.A.
      • Brent R.
      • Golemis E.A.
      Human RNA polymerase II subunit hsRPB7 functions in yeast and influences stress survival and cell morphology.
      ,
      • Sareen A.
      • Choudhry P.
      • Mehta S.
      • Sharma N.
      Mapping the interaction site of Rpb4 and Rpb7 subunits of RNA polymerase II in Saccharomyces cerevisiae.
      ,
      • Selitrennik M.
      • Duek L.
      • Lotan R.
      • Choder M.
      Nucleocytoplasmic shuttling of the Rpb4p and Rpb7p subunits of Saccharomyces cerevisiae RNA polymerase II by two pathways.
      ), protein complementation assays (
      • Tarassov K.
      • Messier V.
      • Landry C.R.
      • Radinovic S.
      • Serna Molina M.M.
      • Shames I.
      • Malitskaya Y.
      • Vogel J.
      • Bussey H.
      • Michnick S.W.
      An in vivo map of the yeast protein interactome.
      ), co-localization (
      • Jasiak A.J.
      • Hartmann H.
      • Karakasili E.
      • Kalocsay M.
      • Flatley A.
      • Kremmer E.
      • Strässer K.
      • Martin D.E.
      • Söding J.
      • Cramer P.
      Genome-associated RNA polymerase II includes the dissociable Rpb4/7 subcomplex.
      ), and complex reconstitution experiments (
      • Benga W.J.
      • Grandemange S.
      • Shpakovski G.V.
      • Shematorova E.K.
      • Kedinger C.
      • Vigneron M.
      Distinct regions of RPB11 are required for heterodimerization with RPB3 in human and yeast RNA polymerase II.
      ,
      • Orlicky S.M.
      • Tran P.T.
      • Sayre M.H.
      • Edwards A.M.
      Dissociable Rpb4-Rpb7 subassembly of RNA polymerase II binds to single-strand nucleic acid and mediates a post-recruitment step in transcription initiation.
      ) (Table III). These interacting pairs were retrieved from the BioGRID database. Because we aim here to illustrate only what proteomics could do for structure determination, we selected true positive pairwise interactions and ignored the false positives; a discussion of techniques for addressing false positive interactions follows under “Dealing with Incorrect Data, Incomplete Data, and Multiple States”. There are also “indirect” interaction data in BioGRID. However, because BioGRID does not annotate which interactions are physical as opposed to indirect, we encoded as contact distance restraints only those experimentally measured interactions that have been detected by “pairwise” methods listed above.
      In general, distance restraints may operate on multiple scales, ranging from the distance between two atoms or residues to the distance between two protein centers in an assembly. For example, if a direct interaction between two proteins has been identified, we may apply a restraint that penalizes deviations from a specified distance between the two protein centers. This distance restraint scores equally all relative orientations between the two proteins with the same intercenter distance. When the shape of the interacting proteins is known, we can achieve a more accurate score at the cost of additional computational time by restraining the distance between the closest pair of particles across the protein-protein interface. Because we do not know a priori which two atoms, residues, or domains are closest to each other, this ambiguity must be handled by a conditional restraint.

      Connectivity Restraints from Proteomics

      In addition to the pairwise interactions described in the previous section, we also chose to use five sets of physically interacting RNAPII subunits as revealed by affinity purification and mass spectrometry (Table III). We searched three major large scale proteomics data sets (
      • Ho Y.
      • Gruhler A.
      • Heilbut A.
      • Bader G.D.
      • Moore L.
      • Adams S.L.
      • Millar A.
      • Taylor P.
      • Bennett K.
      • Boutilier K.
      • Yang L.
      • Wolting C.
      • Donaldson I.
      • Schandorff S.
      • Shewnarane J.
      • Vo M.
      • Taggart J.
      • Goudreault M.
      • Muskat B.
      • Alfarano C.
      • Dewar D.
      • Lin Z.
      • Michalickova K.
      • Willems A.R.
      • Sassi H.
      • Nielsen P.A.
      • Rasmussen K.J.
      • Andersen J.R.
      • Johansen L.E.
      • Hansen L.H.
      • Jespersen H.
      • Podtelejnikov A.
      • Nielsen E.
      • Crawford J.
      • Poulsen V.
      • Sørensen B.D.
      • Matthiesen J.
      • Hendrickson R.C.
      • Gleeson F.
      • Pawson T.
      • Moran M.F.
      • Durocher D.
      • Mann M.
      • Hogue C.W.
      • Figeys D.
      • Tyers M.
      Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.
      ,
      • Gavin A.C.
      • Aloy P.
      • Grandi P.
      • Krause R.
      • Boesche M.
      • Marzioch M.
      • Rau C.
      • Jensen L.J.
      • Bastuck S.
      • Dümpelfeld B.
      • Edelmann A.
      • Heurtier M.A.
      • Hoffman V.
      • Hoefert C.
      • Klein K.
      • Hudak M.
      • Michon A.M.
      • Schelder M.
      • Schirle M.
      • Remor M.
      • Rudi T.
      • Hooper S.
      • Bauer A.
      • Bouwmeester T.
      • Casari G.
      • Drewes G.
      • Neubauer G.
      • Rick J.M.
      • Kuster B.
      • Bork P.
      • Russell R.B.
      • Superti-Furga G.
      Proteome survey reveals modularity of the yeast cell machinery.
      ,
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      ) for all sets of interacting components that consist of RNAPII subunits only. We then disregarded sets of more than six subunits because such large affinity purification sets are relatively uninformative about the RNAPII structure (their inclusion does not significantly alter the results of our calculations; data not shown). In addition, because the majority of the sets (71 of the 103) were found in the Krogan et al. (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      ) data set, we used only the Krogan et al. (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      ) data set for our calculations. For affinity purification data, we know that at least one copy of each protein in a set directly interacts with at least one copy of another protein in the set; however, affinity purification data do not provide information on the stoichiometry of the proteins in the set, the number of complexes with distinct stoichiometry and configuration, or exactly which binary interactions occur, thus resulting in a great deal of ambiguity in the structural interpretation of the results. Because of this ambiguity, each affinity-purified set is encoded as a connectivity restraint that optimizes the assignment of binary interactions to proteins in the set along with the configuration of proteins (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ). A putative binary interaction network for the proteins that best satisfies all available data for the system is assigned during each evaluation of the connectivity restraint during the optimization procedure.

      Quality-of-fit Restraint from an Electron Density Map

      The fit of a model into an assembly density map is usually assessed by a cross-correlation measure between the assembly density and the model smoothed to the resolution of the map (
      • Frank J.
      ). Here, the configurations of the H-RNAPII subunits were restrained to fit an electron density map of the H-RNAPII complex (
      • Kostek S.A.
      • Grob P.
      • De Carlo S.
      • Lipscomb J.S.
      • Garczarek F.
      • Nogales E.
      Molecular architecture and conformational flexibility of human RNA polymerase II.
      ).

      Excluded Volume Restraint

      Molecules take up space that cannot be occupied by other molecules. This space filling property provides a key restraint on the conformations of the assembly. If the atomic structure is known, as is the case for H-RNAPII, the van der Waals radius for each atom is typically used to define the excluded volume (
      • Connolly M.L.
      Solvent-accessible surfaces of proteins and nucleic acids.
      ). When the structure of a molecule is not known, it can be represented by a sphere; the volume of the sphere can be estimated from its composition (e.g. the number of residues in a protein (
      • Shen M.
      • Davis F.P.
      • Sali A.
      The optimal size of a globular protein domain: a simple sphere-packing model.
      )).

      Geometric Complementarity Restraint from First Principles

      Protein-protein interfaces are typically geometrically complementary, characterized by tight packing with little space between them. This geometric complementarity is commonly used as a restraint in protein-protein docking (
      • Katchalski-Katzir E.
      • Shariv I.
      • Eisenstein M.
      • Friesem A.A.
      • Aflalo C.
      • Vakser I.A.
      Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques.
      ,
      • Duhovny D.
      • Nussinov R.
      • Wolfson H.J.
      Efficient unbound docking of rigid molecules.
      ). Because atomic models are used for H-RNAPII subunit structures, this consideration was enforced with an explicit restraint. The geometric complementarity restraint may be less informative if used on coarsely represented subunits.

      Additional Restraints

      Although not applied in our integrative structure determination of H-RNAPII, many additional restraint types can also be used.

      Radial Distribution Restraint

      An approximate radial distribution function of an assembly can be measured by an SAXS experiment (
      • Svergun D.I.
      • Petoukhov M.V.
      • Koch M.H.
      Determination of domain structure of proteins from X-ray solution scattering.
      ,
      • Hura G.L.
      • Menon A.L.
      • Hammel M.
      • Rambo R.P.
      • Poole 2nd, F.L.
      • Tsutakawa S.E.
      • Jenney Jr., F.E.
      • Classen S.
      • Frankel K.A.
      • Hopkins R.C.
      • Yang S.J.
      • Scott J.W.
      • Dillard B.D.
      • Adams M.W.
      • Tainer J.A.
      Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS).
      ). Correspondingly, the SAXS restraint on a model can penalize the difference between the experimental and computed radial distribution functions (
      • Förster F.
      • Webb B.
      • Krukenberg K.A.
      • Tsuruta H.
      • Agard D.A.
      • Sali A.
      Integration of small-angle X-ray scattering data into structural modeling of proteins and their assemblies.
      ). This restraint was used, for example, to select among several putative configurations of domains for the chaperone Hsp90 (
      • Krukenberg K.A.
      • Förster F.
      • Rice L.M.
      • Sali A.
      • Agard D.A.
      Multiple conformations of E. coli Hsp90 in solution: insights into the conformational dynamics of Hsp90.
      ).

      Symmetry Restraint

      Symmetry is a recurrent theme in macromolecular assembly structures (
      • Goodsell D.S.
      • Olson A.J.
      Structural symmetry and protein function.
      ,
      • Tama F.
      • Brooks C.L.
      Symmetry, form, and shape: guiding principles for robustness in macromolecular machines.
      ,
      • Levy E.D.
      • Boeri Erba E.
      • Robinson C.V.
      • Teichmann S.A.
      Assembly reflects evolution of protein complexes.
      ). For example, cyclic, helical, dihedral, and icosahedral symmetries are found in many important molecular machines such as viruses, the NPC, and chaperonins. The similarity between corresponding particle configurations in each symmetry unit can be enforced by imposing a restraint that maintains the same particle-particle distances within each unit (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ,
      • Alber F.
      • Kim M.F.
      • Sali A.
      Structural characterization of assemblies from overall shape and subcomplex compositions.
      ).

      Physical Energy and Statistical Potential Restraints

      Positions and orientations of interacting proteins can also be restrained by potentials based on the laws of physics (
      • Brooks B.R.
      • Bruccoleri R.E.
      • Olafson B.D.
      • States D.J.
      • Swaminathan S.
      • Karplus M.
      ,
      • Pearlman D.
      AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules.
      ,
      • Van Der Spoel D.
      • Lindahl E.
      • Hess B.
      • Groenhof G.
      • Mark A.E.
      • Berendsen H.J.
      GROMACS: fast, flexible, and free.
      ,
      • Jorgensen W.L.
      • Tirado-Rives J.
      The OPLS Potential Functions for Proteins. Energy Minimizations for Crystals of Cyclic Peptides and Crambin.
      ) as well as statistical potentials extracted from databases of known protein structures (
      • Shen M.Y.
      • Sali A.
      Statistical potential for assessment and prediction of protein structures.
      ,
      • Zhang C.
      • Liu S.
      • Zhou Y.
      Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential.
      ,
      • Melo F.
      • Sánchez R.
      • Sali A.
      Statistical potentials for fold assessment.
      ,
      • Misura K.M.
      • Chivian D.
      • Rohl C.A.
      • Kim D.E.
      • Baker D.
      Physically realistic homology models built with ROSETTA can be more accurate than their templates.
      ,
      • Simons K.T.
      • Kooperberg C.
      • Huang E.
      • Baker D.
      Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.
      ,
      • Simons K.T.
      • Ruczinski I.
      • Kooperberg C.
      • Fox B.A.
      • Bystroff C.
      • Baker D.
      Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins.
      ). For example, a statistical potential can be derived from the observed distance distributions or contact frequencies of different atom type pairs in structurally defined proteins or complexes (
      • Sippl M.J.
      Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins.
      ,
      • Davis F.P.
      • Sali A.
      PIBASE: a comprehensive database of structurally defined protein interfaces.
      ,
      • Davis F.P.
      • Braberg H.
      • Shen M.Y.
      • Pieper U.
      • Sali A.
      • Madhusudhan M.S.
      Protein complex compositions predicted by structural similarity.
      ,
      • Eswar N.
      • Eramian D.
      • Webb B.
      • Shen M.Y.
      • Sali A.
      Protein structure modeling with MODELLER.
      ).

      Combining Restraints into a Scoring Function

      Once the data sets are encoded as restraints, they are combined into a scoring function, usually the sum of all the restraints. In this sum, the degree of uncertainty encoded by each restraint is effectively its weight. Ideally, the restraint on a spatial feature should be a probability density function on the feature given the corresponding measurement (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ); for example, the lower and upper bounds on a distance should reflect the uncertainty of the corresponding distance measurement and its interpretation.

      Calculation of an Ensemble of Structures by Satisfaction of Spatial Restraints

      Next, all structural models that minimize the scoring function and therefore fit the original data must be found. An optimization procedure performs a search through the space of all possible macromolecular complex configurations by minimizing the violations of all restraints simultaneously. It is helpful to have many optimization methods available and to choose one that works best with a given representation and set of restraints. We have implemented several different optimizers as part of the Integrative Modeling Platform package. These optimizers can be classified as whole-system and divide-and-conquer optimizers.

      Whole-system Optimizers

      In this class of optimizers, an algorithm usually starts with a random initial configuration. The space of conformations is then explored iteratively by computing the next assembly configuration based on the values of all restraints for the configuration in the current optimization step with the intent of moving closer to the minimum value of the scoring function. Optimizers in this class include traditional conjugate gradients (
      • Shanno D.F.
      • Phua K.H.
      Minimization of unrestrained multivariate functions.
      ), quasi-Newton (
      • Ponder J.W.
      • Richards F.M.
      An efficient newton-like method for molecular mechanics energy minimization of large molecules.
      ) and molecular dynamics schemes (
      • Karplus M.
      • McCammon J.A.
      Molecular dynamics simulations of biomolecules.
      ), Monte Carlo procedures as well as more sophisticated methods such as self-guided Langevin dynamics (
      • Wu X.
      • Brooks B.R.
      Self-guided Langevin dynamics simulation method.
      ), and the replica exchange protocol (
      • Sugita Y.
      • Okamoto Y.
      Replica-exchange molecular dynamics method for protein folding.
      ). Because of the stochastic nature of these optimizations and the need to find all good scoring solutions, many independent runs are generally performed, each starting with a different random initial configuration.

      Divide-and-conquer Optimizers

      Divide-and-conquer optimizers can separate the particles and restraints in a system into smaller “suboptimizations,” ultimately resulting in more rapid sampling of structures. We have recently suggested a general divide-and-conquer approach to more efficiently sample protein assembly configurations (
      • Lasker K.
      • Topf M.
      • Sali A.
      • Wolfson H.J.
      Inferential optimization for simultaneous fitting of multiple components into a cryoEM map of their assembly.
      ). In this approach, the set of variables is decomposed into relatively uncoupled but potentially overlapping subsets that can be sampled independently of each other (i.e. are not required to be sampled together in a single calculation and can be sampled in parallel) and then efficiently gathered to compute the global minimum. The strength of this approach is derived from the decomposition procedure, which helps to reduce the size of the search space from exponential in the number of components in the whole system to exponential in the number of components in the largest subset. Similar ideas have been used for various modeling tasks such as side chain packing (
      • Canutescu A.A.
      • Shelenkov A.A.
      • Dunbrack Jr, R.L.
      A graph-theory algorithm for rapid protein side-chain prediction.
      ,
      • Xu J.
      • Jiao F.
      • Berger B.
      A tree-decomposition approach to protein structure prediction.
      ,
      • Yanover C.
      • Schueler-Furman O.
      • Weiss Y.
      Minimizing and learning energy functions for side-chain prediction.
      ), sequence-structure threading (
      • Xu J.
      • Jiao F.
      • Berger B.
      A tree-decomposition approach to protein structure prediction.
      ), ab initio RNA folding (
      • Zhao J.
      • Malmberg R.L.
      • Cai L.
      Rapid ab initio prediction of RNA pseudoknots via graph tree decomposition.
      ), and prediction of quaternary structures of multiprotein complexes (
      • Inbar Y.
      • Benyamini H.
      • Nussinov R.
      • Wolfson H.J.
      Prediction of multimolecular assemblies by multiple docking.
      ).

      Use of Restraints to Restrain the Search Space for Optimization

      Efficiency can be increased by designing an optimization scheme to avoid considering configurations that clearly violate a subset of the data. Examples include segmenting an electron density map for the entire assembly into components that likely correspond to individual proteins prior to fitting the assembly proteins into the map (
      • Lasker K.
      • Topf M.
      • Sali A.
      • Wolfson H.J.
      Inferential optimization for simultaneous fitting of multiple components into a cryoEM map of their assembly.
      ), eliminating geometrically unlikely protein-protein docking solutions (
      • Katchalski-Katzir E.
      • Shariv I.
      • Eisenstein M.
      • Friesem A.A.
      • Aflalo C.
      • Vakser I.A.
      Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques.
      ,
      • Schneidman-Duhovny D.
      • Inbar Y.
      • Polak V.
      • Shatsky M.
      • Halperin I.
      • Benyamini H.
      • Barzilai A.
      • Dror O.
      • Haspel N.
      • Nussinov R.
      • Wolfson H.J.
      Taking geometry to its edge: fast unbound rigid (and hinge-bent) docking.
      ), and restricting the search space to symmetric configurations (
      • Schneidman-Duhovny D.
      • Inbar Y.
      • Nussinov R.
      • Wolfson H.J.
      Geometry-based flexible and symmetric protein docking.
      ,
      • André I.
      • Bradley P.
      • Wang C.
      • Baker D.
      Prediction of the structure of symmetrical protein assemblies.
      ).

      Human RNAPII Optimization

      For our H-RNAPII example, we used the sum of the distance, connectivity, EM quality-of-fit, and geometric complementarity restraints described above as a scoring function. The configuration of the subunits in H-RNAPII was optimized using an extension of the divide-and-conquer MultiFit protocol (Fig. 2) (
      • Lasker K.
      • Topf M.
      • Sali A.
      • Wolfson H.J.
      Inferential optimization for simultaneous fitting of multiple components into a cryoEM map of their assembly.
      ,

      Lasker, K., Sali, A., Wolfson, H. J., (in press) Determining macromolecular assembly structures by molecular docking and fitting into an electron density map. Proteins,

      ).
      K. Lasker, unpublished data.
      We began by segmenting the electron density map into 12 regions, each one of which served to localize one of the 12 constituent H-RNAPII proteins. This procedure resulted in 479,001,600 (12!) possible H-RNAPII subunit configurations. Next, we eliminated all H-RNAPII subunit configurations that did not satisfy a majority of the proteomics restraints (Table III), keeping only 2,576 configurations for further refinement. We then refined each of these 2,576 configurations to optimize the EM quality-of-fit and geometric complementarity restraints using the standard MultiFit protocol (
      • Lasker K.
      • Topf M.
      • Sali A.
      • Wolfson H.J.
      Inferential optimization for simultaneous fitting of multiple components into a cryoEM map of their assembly.
      ); 63 of the 2,567 configurations resulted in refined models with “good” scores. These models had equivalent positions for Rpb1, Rpb2 and Rpb3; however, the models varied in the positions of the remaining subunits. Finally, we filtered the 63 models by all proteomics restraints, resulting in a single model that satisfied all proteomics restraints as well as the EM quality-of-fit and geometric complementarity restraints (Fig. 3).
      Figure thumbnail gr3
      Fig. 3.Comparison of the crystallographic structure of yeast RNAPII and the integrative model of human RNAPII. I, a–d, atomic representations of the integrative model of H-RNAPII and the reference structure in two views; the reference structure is composed of human subunits individually superposed on their orthologs in the yeast RNAPII structure. The configuration of the H-RNAPII subunits (a and c) is very similar to that in the reference structure (b and d); the Cα RMSD is only 11.4 Å. II, e–h, coarse representations of the H-RNAPII model (e and g) and the reference structure (f and h) in the same two views as in a–d further illustrate the high similarity between the model and the reference. In the coarse representation, sets of 30 contiguous residues are shown as a single bead. III, i and j, protein contact maps for the H-RNAPII model and the reference structure (white, no contact; gray, weak contact; black, contact). The maps are essentially identical, differing only in the interactions of Rpb6 with Rpb2 and Rpb3, and Rpb1 with Rpb12.

      Analysis of the Ensemble

      Precision

      There are three possible outcomes of an optimization procedure. First, if only a single structural model satisfies all restraints and thus all input information, there is probably sufficient data for prediction of the unique native state. Second, if two or more different models are consistent with the restraints, the data are insufficient to define the single native state, or there are multiple significantly populated states. If the number of distinct models is small, the structural differences between the models may suggest additional experiments to narrow down the possible solutions. Third, if no models satisfy all input information, the data or their interpretation in terms of the restraints are incorrect. For example, it might be that a complex exists in several functional states and that the available data cover more than one of them.
      In the case of the H-RNAPII model, optimization resulted in a single model that satisfied all the data. Thus, sufficient information was available to predict the positions and orientations of the H-RNAPII subunits. The ensemble of possible models in the absence of proteomics data was much larger (2,576 coarse configurations) and defined the structure far less precisely. Therefore, proteomics data were crucial for providing an unambiguous determination of a precise molecular architecture of H-RNAPII.

      Accuracy

      Assessing the accuracy of a structure, defined as the difference between the model and the native structure, is difficult but important (
      • Alber F.
      • Förster F.
      • Korkin D.
      • Topf M.
      • Sali A.
      Integrating diverse data for structure determination of macromolecular assemblies.
      ). It is impossible to know with certainty the accuracy of the proposed structure without knowing the real native structure. Nevertheless, our confidence can be modulated by five considerations: (a) self-consistency of independent experimental data; (b) structural similarity among all configurations in the ensemble that satisfy the input restraints; (c) simulations where a native structure is assumed, corresponding restraints are simulated from it, and the resulting calculated structure is compared with the assumed native structure; (d) confirmatory spatial data that were not used in the calculation of the structure (e.g. a criterion similar to the crystallographic free R-factor (
      • Brünger A.T.
      Free R value: a novel statistical quantity for assessing the accuracy of crystal structures.
      ) can be used to assess both the model accuracy and the harmony among the input restraints); and (e) patterns emerging from a mapping of independent and unused data on the structure that are unlikely to occur by chance (
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Sali A.
      • Rout M.P.
      The molecular architecture of the nuclear pore complex.
      ,
      • Alber F.
      • Dokudovskaya S.
      • Veenhoff L.M.
      • Zhang W.
      • Kipper J.
      • Devos D.
      • Suprapto A.
      • Karni-Schmidt O.
      • Williams R.
      • Chait B.T.
      • Rout M.P.
      • Sali A.
      Determining the architectures of macromolecular assemblies.
      ).
      In the case of H-RNAPII, we can estimate the accuracy directly because we know the crystallographic structure of the yeast RNAPII, which is likely to be highly similar to that of H-RNAPII (
      • Kostek S.A.
      • Grob P.
      • De Carlo S.
      • Lipscomb J.S.
      • Garczarek F.
      • Nogales E.
      Molecular architecture and conformational flexibility of human RNA polymerase II.
      ) (c.f. the high degree of sequence similarity between yeast and human subunit orthologs (Table II) and the high correlation coefficient of 0.65 between the crystallographic yeast RNAPII structure and the electron density map of H-RNAPII). The H-RNAPII model clearly recapitulates the molecular architecture of yeast RNAPII (Fig. 3), preserving all of its protein interactions. More quantitatively, the subunits in the H-RNAPII model share a Cα root mean square deviation (RMSD) of only 11.4 Å with the human subunits individually superposed on their orthologs in the yeast RNAPII structure.

      Dealing with Incorrect Data, Incomplete Data, and Multiple States

      Proteome-wide protein-protein interaction maps have been produced by high throughput assays, such as affinity purification (
      • Gavin A.C.
      • Bösche M.
      • Krause R.
      • Grandi P.
      • Marzioch M.
      • Bauer A.
      • Schultz J.
      • Rick J.M.
      • Michon A.M.
      • Cruciat C.M.
      • Remor M.
      • Höfert C.
      • Schelder M.
      • Brajenovic M.
      • Ruffner H.
      • Merino A.
      • Klein K.
      • Hudak M.
      • Dickson D.
      • Rudi T.
      • Gnau V.
      • Bauch A.
      • Bastuck S.
      • Huhse B.
      • Leutwein C.
      • Heurtier M.A.
      • Copley R.R.
      • Edelmann A.
      • Querfurth E.
      • Rybin V.
      • Drewes G.
      • Raida M.
      • Bouwmeester T.
      • Bork P.
      • Seraphin B.
      • Kuster B.
      • Neubauer G.
      • Superti-Furga G.
      Functional organization of the yeast proteome by systematic analysis of protein complexes.
      ,
      • Gavin A.C.
      • Aloy P.
      • Grandi P.
      • Krause R.
      • Boesche M.
      • Marzioch M.
      • Rau C.
      • Jensen L.J.
      • Bastuck S.
      • Dümpelfeld B.
      • Edelmann A.
      • Heurtier M.A.
      • Hoffman V.
      • Hoefert C.
      • Klein K.
      • Hudak M.
      • Michon A.M.
      • Schelder M.
      • Schirle M.
      • Remor M.
      • Rudi T.
      • Hooper S.
      • Bauer A.
      • Bouwmeester T.
      • Casari G.
      • Drewes G.
      • Neubauer G.
      • Rick J.M.
      • Kuster B.
      • Bork P.
      • Russell R.B.
      • Superti-Furga G.
      Proteome survey reveals modularity of the yeast cell machinery.
      ) and yeast two-hybrid system (
      • Stelzl U.
      • Worm U.
      • Lalowski M.
      • Haenig C.
      • Brembeck F.H.
      • Goehler H.
      • Stroedicke M.
      • Zenkner M.
      • Schoenherr A.
      • Koeppen S.
      • Timm J.
      • Mintzlaff S.
      • Abraham C.
      • Bock N.
      • Kietzmann S.
      • Goedde A.
      • Toksöz E.
      • Droege A.
      • Krobitsch S.
      • Korn B.
      • Birchmeier W.
      • Lehrach H.
      • Wanker E.E.
      A human protein-protein interaction network: a resource for annotating the proteome.
      ,
      • Rual J.F.
      • Venkatesan K.
      • Hao T.
      • Hirozane-Kishikawa T.
      • Dricot A.
      • Li N.
      • Berriz G.F.
      • Gibbons F.D.
      • Dreze M.
      • Ayivi-Guedehoussou N.
      • Klitgord N.
      • Simon C.
      • Boxem M.
      • Milstein S.
      • Rosenberg J.
      • Goldberg D.S.
      • Zhang L.V.
      • Wong S.L.
      • Franklin G.
      • Li S.
      • Albala J.S.
      • Lim J.
      • Fraughton C.
      • Llamosas E.
      • Cevik S.
      • Bex C.
      • Lamesch P.
      • Sikorski R.S.
      • Vandenhaute J.
      • Zoghbi H.Y.
      • Smolyar A.
      • Bosak S.
      • Sequerra R.
      • Doucette-Stamm L.
      • Cusick M.E.
      • Hill D.E.
      • Roth F.P.
      • Vidal M.
      Towards a proteome-scale map of the human protein-protein interaction network.
      ,
      • Giot L.
      • Bader J.S.
      • Brouwer C.
      • Chaudhuri A.
      • Kuang B.
      • Li Y.
      • Hao Y.L.
      • Ooi C.E.
      • Godwin B.
      • Vitols E.
      • Vijayadamodar G.
      • Pochart P.
      • Machineni H.
      • Welsh M.
      • Kong Y.
      • Zerhusen B.
      • Malcolm R.
      • Varrone Z.
      • Collis A.
      • Minto M.
      • Burgess S.
      • McDaniel L.
      • Stimpson E.
      • Spriggs F.
      • Williams J.
      • Neurath K.
      • Ioime N.
      • Agee M.
      • Voss E.
      • Furtak K.
      • Renzulli R.
      • Aanensen N.
      • Carrolla S.
      • Bickelhaupt E.
      • Lazovatsky Y.
      • DaSilva A.
      • Zhong J.
      • Stanyon C.A.
      • Finley Jr., R.L.
      • White K.P.
      • Braverman M.
      • Jarvie T.
      • Gold S.
      • Leach M.
      • Knight J.
      • Shimkets R.A.
      • McKenna M.P.
      • Chant J.
      • Rothberg J.M.
      A protein interaction map of Drosophila melanogaster.
      ,
      • Walhout A.J.
      • Sordella R.
      • Lu X.
      • Hartley J.L.
      • Temple G.F.
      • Brasch M.A.
      • Thierry-Mieg N.
      • Vidal M.
      Protein interaction mapping in C. elegans using proteins involved in vulval development.
      ,
      • Uetz P.
      • Giot L.
      • Cagney G.
      • Mansfield T.A.
      • Judson R.S.
      • Knight J.R.
      • Lockshon D.
      • Narayan V.
      • Srinivasan M.
      • Pochart P.
      • Qureshi-Emili A.
      • Li Y.
      • Godwin B.
      • Conover D.
      • Kalbfleisch T.
      • Vijayadamodar G.
      • Yang M.
      • Johnston M.
      • Fields S.
      • Rothberg J.M.
      A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.
      ,
      • Ito T.
      • Chiba T.
      • Ozawa R.
      • Yoshida M.
      • Hattori M.
      • Sakaki Y.
      A comprehensive two-hybrid analysis to explore the yeast protein interactome.
      ). However, these data sets can be limited in three respects (
      • Bader G.D.
      • Hogue C.W.
      Analyzing yeast protein-protein interaction data obtained from different sources.
      ,
      • von Mering C.
      • Krause R.
      • Snel B.
      • Cornell M.
      • Oliver S.G.
      • Fields S.
      • Bork P.
      Comparative assessment of large-scale data sets of protein-protein interactions.
      ,
      • Mann M.
      • Kelleher N.L.
      Precision proteomics: the case for high resolution and high mass accuracy.
      ). First, the data can be incomplete in the sense that a number of interactions insufficient to describe the studied system were detected. Second, the data can be inaccurate in the sense that some detected interactions do not apply to the studied system. Third, the data can be “frustrated” in the sense that different subsets of the data apply to compositionally and/or conformationally different states of the studied system. For example, prior to filtering, a significant fraction of the affinity purification data for RNAPII subunits corresponds to false positive interactions (defined as a set of interacting subunits that do not have a connecting interaction path in the crystallographic structure of the complex (
      • Kettenberger H.
      • Armache K.J.
      • Cramer P.
      Complete RNA polymerase II elongation complex structure and its interactions with NTP and TFIIS.
      )). In particular, 31, 35, and 0% of the 71, 26, and six affinity purification sets with two or more RNAPII subunits as reported by Krogan et al. (
      • Krogan N.J.
      • Cagney G.
      • Yu H.
      • Zhong G.
      • Guo X.
      • Ignatchenko A.
      • Li J.
      • Pu S.
      • Datta N.
      • Tikuisis A.P.
      • Punna T.
      • Peregrín-Alvarez J.M.
      • Shales M.
      • Zhang X.
      • Davey M.
      • Robinson M.D.
      • Paccanaro A.
      • Bray J.E.
      • Sheung A.
      • Beattie B.
      • Richards D.P.
      • Canadien V.
      • Lalev A.
      • Mena F.
      • Wong P.
      • Starostine A.
      • Canete M.M.
      • Vlasblom J.
      • Wu S.
      • Orsi C.
      • Collins S.R.
      • Chandran S.
      • Haw R.
      • Rilstone J.J.
      • Gandi K.
      • Thompson N.J.
      • Musso G.
      • St Onge P.
      • Ghanny S.
      • Lam M.H.
      • Butland G.
      • Altaf-Ul A.M.
      • Kanaya S.
      • Shilatifard A.
      • O'Shea E.
      • Weissman J.S.
      • Ingles C.J.
      • Hughes T.R.
      • Parkinson J.
      • Gerstein M.
      • Wodak S.J.
      • Emili A.
      • Greenblatt J.F.
      Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
      ), Gavin et al. (
      • Gavin A.C.
      • Aloy P.
      • Grandi P.
      • Krause R.
      • Boesche M.
      • Marzioch M.
      • Rau C.
      • Jensen L.J.
      • Bastuck S.
      • Dümpelfeld B.
      • Edelmann A.
      • Heurtier M.A.
      • Hoffman V.
      • Hoefert C.
      • Klein K.
      • Hudak M.
      • Michon A.M.
      • Schelder M.
      • Schirle M.
      • Remor M.
      • Rudi T.
      • Hooper S.
      • Bauer A.
      • Bouwmeester T.
      • Casari G.
      • Drewes G.
      • Neubauer G.
      • Rick J.M.
      • Kuster B.
      • Bork P.
      • Russell R.B.
      • Superti-Furga G.
      Proteome survey reveals modularity of the yeast cell machinery.
      ), and Ho et al. (
      • Ho Y.
      • Gruhler A.
      • Heilbut A.
      • Bader G.D.
      • Moore L.
      • Adams S.L.
      • Millar A.
      • Taylor P.
      • Bennett K.
      • Boutilier K.
      • Yang L.
      • Wolting C.
      • Donaldson I.
      • Schandorff S.
      • Shewnarane J.
      • Vo M.
      • Taggart J.
      • Goudreault M.
      • Muskat B.
      • Alfarano C.
      • Dewar D.
      • Lin Z.
      • Michalickova K.
      • Willems A.R.
      • Sassi H.
      • Nielsen P.A.
      • Rasmussen K.J.
      • Andersen J.R.
      • Johansen L.E.
      • Hansen L.H.
      • Jespersen H.
      • Podtelejnikov A.
      • Nielsen E.
      • Crawford J.
      • Poulsen V.
      • Sørensen B.D.
      • Matthiesen J.
      • Hendrickson R.C.
      • Gleeson F.
      • Pawson T.
      • Moran M.F.
      • Durocher D.
      • Mann M.
      • Hogue C.W.
      • Figeys D.
      • Tyers M.
      Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.
      ) were false positives, respectively. In addition, 33% of the 12 reported binary interactions extracted from the BioGRID database were false positives.
      A reasonable goal of structural modeling is to find the minimum number of system states that account for the observed data. If the data sets are correct and complete and describe a single state of the system, the optimization procedure should, in principle, result in a single solution that satisfies the data. If the data sets are inaccurate or incomplete, irrespective of the number of system states, the sampling should result in different states, some of which may or may not satisfy all the data. Next, we describe these possible outcomes in more detail.

      Correct, Complete Data, Single State

      The optimization procedure should result in a single solution that satisfies all restraints. If the data set is redundant, it is possible to cross-validate the solution by rerunning the modeling procedure using only random subsets of the data (
      • Duda R.O.
      • Hart P.E.
      • Stork D.G.
      ).

      Correct, Incomplete Data, Single State

      The optimization procedure should produce multiple solutions, all of which should satisfy all restraints. For example, this situation may occur when the proteomics data do not apply to all subunits of a system or only cover a small subset of interactions. It is possible to identify the least precisely localized components of the system within the set of solutions, directing future experiments for the largest possible gain in the next iteration of integrative modeling.

      Incorrect, Complete Data, Single State

      The optimization procedure should produce multiple solutions, each satisfying a fraction of the restraints. If there are redundant correct data, it may be possible to identify the conflicting incorrect data by cross-validation.

      Incorrect, Incomplete Data, Single State

      The optimization procedure should produce multiple solutions, each satisfying a fraction of the restraints. It is difficult to identify the incorrect data as well as to detect a solution corresponding to the correct state. This situation arose in a preliminary attempt to model the molecular architecture of the 19 S regulatory particle of the 26 S proteasome (
      • Nickell S.
      • Beck F.
      • Scheres S.H.
      • Korinek A.
      • Förster F.
      • Lasker K.
      • Mihalache O.
      • Sun N.
      • Nagy I.
      • Sali A.
      • Plitzko J.M.
      • Carazo J.M.
      • Mann M.
      • Baumeister W.
      Insights into the molecular architecture of the 26S proteasome.
      ). In that case, we have concluded that additional data are required.

      Multiple States

      Even when all data are correct and complete, the optimization procedure may be inadequate and produce multiple solutions, each satisfying only a fraction of the restraints. The same outcome is obtained when using incorrect data. Thus, multiple states are difficult to deconvolve from incorrect data (such as false positive interactions from proteomics).
      In conclusion, when no solution is found that satisfies all data, it is difficult to identify the correct state(s). Formally, a similar problem exists in protein structure determination based on NMR spectroscopy. There, structural features, such as interatomic distances and dihedral angles, are obtained experimentally and used in the form of spatial restraints for finding the set of structural models that satisfies these restraints. One approach to dealing with incorrect data for one or more states looks at the frequency with which each restraint is violated in an ensemble of calculated structures (
      • Clore G.M.
      • Robien M.A.
      • Gronenborn A.M.
      Exploring the limits of precision and accuracy of protein structures determined by nuclear magnetic resonance spectroscopy.
      ,
      • Clore G.M.
      • Gronenborn A.M.
      New methods of structure refinement for macromolecular structure determination by NMR.
      ); if a given restraint is violated often, the bounds on the distances allowed by the restraint can be loosened. Other approaches use cross-validation to assess the completeness of the experimental restraints (
      • Brünger A.T.
      • Clore G.M.
      • Gronenborn A.M.
      • Saffrich R.
      • Nilges M.
      Assessing the quality of solution nuclear magnetic resonance structures by complete cross-validation.
      ). Another development, the inferential structure determination method, formulates structure determination as an inference problem, handling incorrect and incomplete data as well as multiple states in a Bayesian framework (
      • Chen Z.A.
      • Jawhari A.
      • Fischer L.
      • Buchen C.
      • Tahir S.
      • Kamenski T.
      • Rasmussen M.
      • Lariviere L.
      • Bukowski-Wills J.C.
      • Nilges M.
      • Cramer P.
      • Rappsilber J.
      Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry.
      ). Adaptations of these methods and development of new methods should improve future handling of incorrect and incomplete data in integrative structure determination of conformationally and compositionally heterogenous assemblies.

      DISCUSSION

      As illustrated above, proteomics techniques can now facilitate the characterization of the structure of macromolecular assemblies via integrative modeling. We have demonstrated that by using atomic subunit structures, an electron density map of their assembly, and proteomics data restraining relative subunit proximities we can extend the scope of macromolecular structure determination beyond what is possible with single methods. Specifically, using the RNAPII structure as an example, we have shown that proteomics data, although traditionally not considered a source of formal structural information, can play a key role in assembly structure determination.
      One key challenge for integrating proteomics data into structure determination remains the treatment of assemblies that exist in multiple functional states, corresponding to different configurations and compositions of the assembly. Although integrative methods can already restrain the structure of the modeled assembly based on all available information, some of the proteomics data may in fact apply to only a subset of all functional states of the assembly. For example, proteomics techniques often detect peripheral interactions that are not part of the core assembly but could be vital for one of the biologically relevant states. Thus, future protocols need to be able to simultaneously determine structures for all biologically relevant states. These methods will need to associate specific interactions with specific functionally relevant states of an assembly as well as remove false positive interactions that are not relevant to a given state.
      As the quantity and variety of experimental data about macromolecular assemblies grows, integrative structure determination will be vital for characterization of these machines and the corresponding cellular processes. Methods are needed that are more accurate in translation of heterogenous data into spatial restraints as well as combination of these restraints into a scoring function. New sampling and optimization schemes should improve the accuracy and level of detail with which we can describe assembles. In addition, as a generalization of treating systems with multiple configurations and compositions, we should address the challenge of characterizing the dynamics of macromolecular assemblies by satisfying both spatial and temporal restraints for a system of multiple components. As integrative structure determination techniques advance, we will be able to describe an increasing number of key cellular structures, progressing toward a comprehensive structural, temporal, and logical model of the cell.

      Acknowledgments

      We thank Frank Alber, Michael P. Rout, Brian Chait, Wolfgang Baumeister, and Friedrich Förster for discussions about integrative structure determination based on proteomics data; Haim Wolfson for collaborating on optimization methods; and Hannes Braberg and Javier Fernandez-Martinez for discussing interpretation of proteomics data.

      REFERENCES

        • Alberts B.
        The cell as a collection of protein machines: preparing the next generation of molecular biologists.
        Cell. 1998; 92: 291-294
        • Abbott A.
        Proteomics: the society of proteins.
        Nature. 2002; 417: 894-896
        • Schmeing T.M.
        • Ramakrishnan V.
        What recent ribosome structures have revealed about the mechanism of translation.
        Nature. 2009; 461: 1234-1242
        • Allen G.S.
        • Frank J.
        Structural insights on the translation initiation complex: ghosts of a universal initiation complex.
        Mol. Microbiol. 2007; 63: 941-950
        • Horwich A.L.
        • Fenton W.A.
        Chaperonin-mediated protein folding: using a central cavity to kinetically assist polypeptide chain folding.
        Q. Rev. Biophys. 2009; 42: 83-116
        • Spiess C.
        • Meyer A.S.
        • Reissmann S.
        • Frydman J.
        Mechanism of the eukaryotic chaperonin: protein folding in the chamber of secrets.
        Trends Cell Biol. 2004; 14: 598-604
        • Cramer P.
        • Armache K.J.
        • Baumli S.
        • Benkert S.
        • Brueckner F.
        • Buchen C.
        • Damsma G.E.
        • Dengl S.
        • Geiger S.R.
        • Jasiak A.J.
        • Jawhari A.
        • Jennebach S.
        • Kamenski T.
        • Kettenberger H.
        • Kuhn C.D.
        • Lehmann E.
        • Leike K.
        • Sydow J.F.
        • Vannini A.
        Structure of eukaryotic RNA polymerases.
        Annu. Rev. Biophys. 2008; 37: 337-352
        • Cheng Y.
        Toward an atomic model of the 26S proteasome.
        Curr. Opin. Struct. Biol. 2009; 19: 203-208
        • Murata S.
        • Yashiroda H.
        • Tanaka K.
        Molecular mechanisms of proteasome assembly.
        Nat. Rev. Mol. Cell Biol. 2009; 10: 104-115
        • Förster F.
        • Lasker K.
        • Nickell S.
        • Sali A.
        • Baumeister W.
        Towards an integrated structural model of the 26S proteasome.
        Mol. Cell. Proteomics. 2010;
        • Gavin A.C.
        • Bösche M.
        • Krause R.
        • Grandi P.
        • Marzioch M.
        • Bauer A.
        • Schultz J.
        • Rick J.M.
        • Michon A.M.
        • Cruciat C.M.
        • Remor M.
        • Höfert C.
        • Schelder M.
        • Brajenovic M.
        • Ruffner H.
        • Merino A.
        • Klein K.
        • Hudak M.
        • Dickson D.
        • Rudi T.
        • Gnau V.
        • Bauch A.
        • Bastuck S.
        • Huhse B.
        • Leutwein C.
        • Heurtier M.A.
        • Copley R.R.
        • Edelmann A.
        • Querfurth E.
        • Rybin V.
        • Drewes G.
        • Raida M.
        • Bouwmeester T.
        • Bork P.
        • Seraphin B.
        • Kuster B.
        • Neubauer G.
        • Superti-Furga G.
        Functional organization of the yeast proteome by systematic analysis of protein complexes.
        Nature. 2002; 415: 141-147
        • Mitra K.
        • Frank J.
        Ribosome dynamics: insights from atomic structure modeling into cryo-electron microscopy maps.
        Annu. Rev. Biophys. Biomol. Struct. 2006; 35: 299-317
        • Alber F.
        • Dokudovskaya S.
        • Veenhoff L.M.
        • Zhang W.
        • Kipper J.
        • Devos D.
        • Suprapto A.
        • Karni-Schmidt O.
        • Williams R.
        • Chait B.T.
        • Sali A.
        • Rout M.P.
        The molecular architecture of the nuclear pore complex.
        Nature. 2007; 450: 695-701
        • Robinson C.V.
        • Sali A.
        • Baumeister W.
        The molecular sociology of the cell.
        Nature. 2007; 450: 973-982
        • Blundell T.L.
        • Johnson L.
        Protein Crystallography. Academic Press, New York1976
        • Bonvin A.M.
        • Boelens R.
        • Kaptein R.
        NMR analysis of protein interactions.
        Curr. Opin. Chem. Biol. 2005; 9: 501-508
        • Fiaux J.
        • Bertelsen E.B.
        • Horwich A.L.
        • Wüthrich K.
        NMR analysis of a 900K GroEL GroES complex.
        Nature. 2002; 418: 207-211
        • Neudecker P.
        • Lundström P.
        • Kay L.E.
        Relaxation dispersion NMR spectroscopy as a tool for detailed studies of protein folding.
        Biophys. J. 2009; 96: 2045-2054
        • Stahlberg H.
        • Walz T.
        Molecular electron microscopy: state of the art and current challenges.
        ACS Chem. Biol. 2008; 3: 268-281
        • Chiu W.
        • Baker M.L.
        • Jiang W.
        • Dougherty M.
        • Schmid M.F.
        Electron cryomicroscopy of biological machines at subnanometer resolution.
        Structure. 2005; 13: 363-372
        • Lucic V.
        • Leis A.
        • Baumeister W.
        Cryo-electron tomography of cells: connecting structure and function.
        Histochem. Cell Biol. 2008; 130: 185-196
        • Frank J.
        Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State. Oxford University Press, New York2006
        • Berggård T.
        • Linse S.
        • James P.
        Methods for the detection and analysis of protein-protein interactions.
        Proteomics. 2007; 7: 2833-2842
        • Svergun D.I.
        • Petoukhov M.V.
        • Koch M.H.
        Determination of domain structure of proteins from X-ray solution scattering.
        Biophys. J. 2001; 80: 2946-2953
        • Hura G.L.
        • Menon A.L.
        • Hammel M.
        • Rambo R.P.
        • Poole 2nd, F.L.
        • Tsutakawa S.E.
        • Jenney Jr., F.E.
        • Classen S.
        • Frankel K.A.
        • Hopkins R.C.
        • Yang S.J.
        • Scott J.W.
        • Dillard B.D.
        • Adams M.W.
        • Tainer J.A.
        Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS).
        Nat. Methods. 2009; 6: 606-612
        • Joo C.
        • Balci H.
        • Ishitsuka Y.
        • Buranachai C.
        • Ha T.
        Advances in single-molecule fluorescence methods for molecular biology.
        Annu. Rev. Biochem. 2008; 77: 51-76
        • Hart G.T.
        • Ramani A.K.
        • Marcotte E.M.
        How complete are current yeast and human protein-interaction networks?.
        Genome Biol. 2006; 7: 120
        • Collins S.R.
        • Kemmeren P.
        • Zhao X.C.
        • Greenblatt J.F.
        • Spencer F.
        • Holstege F.C.
        • Weissman J.S.
        • Krogan N.J.
        Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae.
        Mol. Cell. Proteomics. 2007; 6: 439-450
        • Cusick M.E.
        • Yu H.
        • Smolyar A.
        • Venkatesan K.
        • Carvunis A.R.
        • Simonis N.
        • Rual J.F.
        • Borick H.
        • Braun P.
        • Dreze M.
        • Vandenhaute J.
        • Galli M.
        • Yazaki J.
        • Hill D.E.
        • Ecker J.R.
        • Roth F.P.
        • Vidal M.
        Literature-curated protein interaction datasets.
        Nat. Methods. 2009; 6: 39-46
        • Topf M.
        • Lasker K.
        • Webb B.
        • Wolfson H.
        • Chiu W.
        • Sali A.
        Protein structure fitting and refinement guided by cryo-EM density.
        Structure. 2008; 16: 295-307
        • Topf M.
        • Baker M.L.
        • Marti-Renom M.A.
        • Chiu W.
        • Sali A.
        Refinement of protein structures by iterative comparative modeling and CryoEM density fitting.
        J. Mol. Biol. 2006; 357: 1655-1668
        • Lasker K.
        • Topf M.
        • Sali A.
        • Wolfson H.J.
        Inferential optimization for simultaneous fitting of multiple components into a cryoEM map of their assembly.
        J. Mol. Biol. 2009; 388: 180-194
      1. Lasker, K., Sali, A., Wolfson, H. J., (in press) Determining macromolecular assembly structures by molecular docking and fitting into an electron density map. Proteins,

        • Lindert S.
        • Stewart P.L.
        • Meiler J.
        Hybrid approaches: applying computational methods in cryo-electron microscopy.
        Curr. Opin. Struct. Biol. 2009; 19: 218-225
        • Qian B.
        • Raman S.
        • Das R.
        • Bradley P.
        • McCoy A.J.
        • Read R.J.
        • Baker D.
        High-resolution structure prediction and the crystallographic phase problem.
        Nature. 2007; 450: 259-264
        • Taverner T.
        • Hernández H.
        • Sharon M.
        • Ruotolo B.T.
        • Matak-Vinkoviæ D.
        • Devos D.
        • Russell R.B.
        • Robinson C.V.
        Subunit architecture of intact protein complexes from mass spectrometry and homology modeling.
        Acc. Chem. Res. 2008; 41: 617-627
        • Bowers P.M.
        • Strauss C.E.
        • Baker D.
        De novo protein structure determination using sparse NMR data.
        J. Biomol. NMR. 2000; 18: 311-318
        • Raman S.
        • Lange O.F.
        • Rossi P.
        • Tyka M.
        • Wang X.
        • Aramini J.
        • Liu G.
        • Ramelot T.A.
        • Eletsky A.
        • Szyperski T.
        • Kennedy M.A.
        • Prestegard J.
        • Montelione G.T.
        • Baker D.
        NMR structure determination for larger proteins using backbone-only data.
        Science. 2010; 327: 1014-1018
        • Alber F.
        • Dokudovskaya S.
        • Veenhoff L.M.
        • Zhang W.
        • Kipper J.
        • Devos D.
        • Suprapto A.
        • Karni-Schmidt O.
        • Williams R.
        • Chait B.T.
        • Rout M.P.
        • Sali A.
        Determining the architectures of macromolecular assemblies.
        Nature. 2007; 450: 683-694
        • Förster F.
        • Lasker K.
        • Beck F.
        • Nickell S.
        • Sali A.
        • Baumeister W.
        An Atomic Model AAA-ATPase/20S core particle sub-complex of the 26S proteasome.
        Biochem. Biophys. Res. Commun. 2009; 388: 228-233
        • Fotin A.
        • Cheng Y.
        • Sliz P.
        • Grigorieff N.
        • Harrison S.C.
        • Kirchhausen T.
        • Walz T.
        Molecular model for a complete clathrin lattice from electron cryomicroscopy.
        Nature. 2004; 432: 573-579
        • Xing Y.
        • Böcking T.
        • Wolf M.
        • Grigorieff N.
        • Kirchhausen T.
        • Harrison S.C.
        Structure of clathrin coat with bound Hsc70 and auxilin: mechanism of Hsc70-facilitated disassembly.
        EMBO J. 2010; 29: 655-665
        • Chen Z.A.
        • Jawhari A.
        • Fischer L.
        • Buchen C.
        • Tahir S.
        • Kamenski T.
        • Rasmussen M.
        • Lariviere L.
        • Bukowski-Wills J.C.
        • Nilges M.
        • Cramer P.
        • Rappsilber J.
        Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry.
        EMBO J. 2010; 29: 717-726
        • Byeon I.J.
        • Meng X.
        • Jung J.
        • Zhao G.
        • Yang R.
        • Ahn J.
        • Shi J.
        • Concel J.
        • Aiken C.
        • Zhang P.
        • Gronenborn A.M.
        Structural convergence between Cryo-EM and NMR reveals intersubunit interactions critical for HIV-1 capsid function.
        Cell. 2009; 139: 780-790
        • Alber F.
        • Förster F.
        • Korkin D.
        • Topf M.
        • Sali A.
        Integrating diverse data for structure determination of macromolecular assemblies.
        Annu. Rev. Biochem. 2008; 77: 443-477
        • Nickell S.
        • Beck F.
        • Scheres S.H.
        • Korinek A.
        • Förster F.
        • Lasker K.
        • Mihalache O.
        • Sun N.
        • Nagy I.
        • Sali A.
        • Plitzko J.M.
        • Carazo J.M.
        • Mann M.
        • Baumeister W.
        Insights into the molecular architecture of the 26S proteasome.
        Proc. Natl. Acad. Sci. U.S.A. 2009; 106: 11943-11947
        • Jasiak A.J.
        • Hartmann H.
        • Karakasili E.
        • Kalocsay M.
        • Flatley A.
        • Kremmer E.
        • Strässer K.
        • Martin D.E.
        • Söding J.
        • Cramer P.
        Genome-associated RNA polymerase II includes the dissociable Rpb4/7 subcomplex.
        J. Biol. Chem. 2008; 283: 26423-26427
        • Hahn S.
        Structure and mechanism of the RNA polymerase II transcription machinery.
        Nat. Struct. Mol. Biol. 2004; 11: 394-403
        • Cramer P.
        • Bushnell D.A.
        • Kornberg R.D.
        Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution.
        Science. 2001; 292: 1863-1876
        • Kostek S.A.
        • Grob P.
        • De Carlo S.
        • Lipscomb J.S.
        • Garczarek F.
        • Nogales E.
        Molecular architecture and conformational flexibility of human RNA polymerase II.
        Structure. 2006; 14: 1691-1700
        • Kettenberger H.
        • Armache K.J.
        • Cramer P.
        Complete RNA polymerase II elongation complex structure and its interactions with NTP and TFIIS.
        Mol. Cell. 2004; 16: 955-965
        • Maiolica A.
        • Cittaro D.
        • Borsotti D.
        • Sennels L.
        • Ciferri C.
        • Tarricone C.
        • Musacchio A.
        • Rappsilber J.
        Structural analysis of multiprotein complexes by cross-linking, mass spectrometry, and database searching.
        Mol. Cell. Proteomics. 2007; 6: 2200-2211
        • Young M.M.
        • Tang N.
        • Hempel J.C.
        • Oshiro C.M.
        • Taylor E.W.
        • Kuntz I.D.
        • Gibson B.W.
        • Dollinger G.
        High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry.
        Proc. Natl. Acad. Sci. U.S.A. 2000; 97: 5802-5806
        • Pieper U.
        • Chiang R.
        • Seffernick J.J.
        • Brown S.D.
        • Glasner M.E.
        • Kelly L.
        • Eswar N.
        • Sauder J.M.
        • Bonanno J.B.
        • Swaminathan S.
        • Burley S.K.
        • Zheng X.
        • Chance M.R.
        • Almo S.C.
        • Gerlt J.A.
        • Raushel F.M.
        • Jacobson M.P.
        • Babbitt P.C.
        • Sali A.
        Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies.
        J. Struct. Funct. Genomics. 2009; 10: 107-125
        • Stark C.
        • Breitkreutz B.J.
        • Reguly T.
        • Boucher L.
        • Breitkreutz A.
        • Tyers M.
        BioGRID: a general repository for interaction datasets.
        Nucleic Acids Res. 2006; 34: D535-D539
        • Henrick K.
        • Newman R.
        • Tagari M.
        • Chagoyen M.
        EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information.
        J. Struct. Biol. 2003; 144: 228-237
        • Pieper U.
        • Eswar N.
        • Webb B.M.
        • Eramian D.
        • Kelly L.
        • Barkan D.T.
        • Carter H.
        • Mankoo P.
        • Karchin R.
        • Marti-Renom M.A.
        • Davis F.P.
        • Sali A.
        MODBASE, a database of annotated comparative protein structure models and associated resources.
        Nucleic Acids Res. 2009; 37: D347-D354
        • Flores A.
        • Briand J.F.
        • Gadal O.
        • Andrau J.C.
        • Rubbi L.
        • Van Mullem V.
        • Boschiero C.
        • Goussot M.
        • Marck C.
        • Carles C.
        • Thuriaux P.
        • Sentenac A.
        • Werner M.
        A protein-protein interaction map of yeast RNA polymerase III.
        Proc. Natl. Acad. Sci. U.S.A. 1999; 96: 7815-7820
        • Zaros C.
        • Briand J.F.
        • Boulard Y.
        • Labarre-Mariotte S.
        • Garcia-Lopez M.C.
        • Thuriaux P.
        • Navarro F.
        Functional organization of the Rpb5 subunit shared by the three yeast RNA polymerases.
        Nucleic Acids Res. 2007; 35: 634-647
        • Briand J.F.
        • Navarro F.
        • Rematier P.
        • Boschiero C.
        • Labarre S.
        • Werner M.
        • Shpakovski G.V.
        • Thuriaux P.
        Partners of Rpb8p, a small subunit shared by yeast RNA polymerases I, II and III.
        Mol. Cell. Biol. 2001; 21: 6056-6065
        • Tan Q.
        • Prysak M.H.
        • Woychik N.A.
        Loss of the Rpb4/Rpb7 subcomplex in a mutant form of the Rpb6 subunit shared by RNA polymerases I, II, and III.
        Mol. Cell. Biol. 2003; 23: 3329-3338
        • Qi H.
        • Zakian V.A.
        The Saccharomyces telomere-binding protein Cdc13p interacts with both the catalytic subunit of DNA polymerase alpha and the telomerase-associated est1 protein.
        Genes Dev. 2000; 14: 1777-1788
        • Sampath V.
        • Rekha N.
        • Srinivasan N.
        • Sadhale P.
        The conserved and non-conserved regions of Rpb4 are involved in multiple phenotypes in Saccharomyces cerevisiae.
        J. Biol. Chem. 2003; 278: 51566-51576
        • Khazak V.
        • Sadhale P.P.
        • Woychik N.A.
        • Brent R.
        • Golemis E.A.
        Human RNA polymerase II subunit hsRPB7 functions in yeast and influences stress survival and cell morphology.
        Mol. Biol. Cell. 1995; 6: 759-775
        • Sareen A.
        • Choudhry P.
        • Mehta S.
        • Sharma N.
        Mapping the interaction site of Rpb4 and Rpb7 subunits of RNA polymerase II in Saccharomyces cerevisiae.
        Biochem. Biophys. Res. Commun. 2005; 332: 763-770
        • Selitrennik M.
        • Duek L.
        • Lotan R.
        • Choder M.
        Nucleocytoplasmic shuttling of the Rpb4p and Rpb7p subunits of Saccharomyces cerevisiae RNA polymerase II by two pathways.
        Eukaryot. Cell. 2006; 5: 2092-2103
        • Tarassov K.
        • Messier V.
        • Landry C.R.
        • Radinovic S.
        • Serna Molina M.M.
        • Shames I.
        • Malitskaya Y.
        • Vogel J.
        • Bussey H.
        • Michnick S.W.
        An in vivo map of the yeast protein interactome.
        Science. 2008; 320: 1465-1470
        • Benga W.J.
        • Grandemange S.
        • Shpakovski G.V.
        • Shematorova E.K.
        • Kedinger C.
        • Vigneron M.
        Distinct regions of RPB11 are required for heterodimerization with RPB3 in human and yeast RNA polymerase II.
        Nucleic Acids Res. 2005; 33: 3582-3590
        • Orlicky S.M.
        • Tran P.T.
        • Sayre M.H.
        • Edwards A.M.
        Dissociable Rpb4-Rpb7 subassembly of RNA polymerase II binds to single-strand nucleic acid and mediates a post-recruitment step in transcription initiation.
        J. Biol. Chem. 2001; 276: 10097-10102
        • Ho Y.
        • Gruhler A.
        • Heilbut A.
        • Bader G.D.
        • Moore L.
        • Adams S.L.
        • Millar A.
        • Taylor P.
        • Bennett K.
        • Boutilier K.
        • Yang L.
        • Wolting C.
        • Donaldson I.
        • Schandorff S.
        • Shewnarane J.
        • Vo M.
        • Taggart J.
        • Goudreault M.
        • Muskat B.
        • Alfarano C.
        • Dewar D.
        • Lin Z.
        • Michalickova K.
        • Willems A.R.
        • Sassi H.
        • Nielsen P.A.
        • Rasmussen K.J.
        • Andersen J.R.
        • Johansen L.E.
        • Hansen L.H.
        • Jespersen H.
        • Podtelejnikov A.
        • Nielsen E.
        • Crawford J.
        • Poulsen V.
        • Sørensen B.D.
        • Matthiesen J.
        • Hendrickson R.C.
        • Gleeson F.
        • Pawson T.
        • Moran M.F.
        • Durocher D.
        • Mann M.
        • Hogue C.W.
        • Figeys D.
        • Tyers M.
        Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.
        Nature. 2002; 415: 180-183
        • Gavin A.C.
        • Aloy P.
        • Grandi P.
        • Krause R.
        • Boesche M.
        • Marzioch M.
        • Rau C.
        • Jensen L.J.
        • Bastuck S.
        • Dümpelfeld B.
        • Edelmann A.
        • Heurtier M.A.
        • Hoffman V.
        • Hoefert C.
        • Klein K.
        • Hudak M.
        • Michon A.M.
        • Schelder M.
        • Schirle M.
        • Remor M.
        • Rudi T.
        • Hooper S.
        • Bauer A.
        • Bouwmeester T.
        • Casari G.
        • Drewes G.
        • Neubauer G.
        • Rick J.M.
        • Kuster B.
        • Bork P.
        • Russell R.B.
        • Superti-Furga G.
        Proteome survey reveals modularity of the yeast cell machinery.
        Nature. 2006; 440: 631-636
        • Krogan N.J.
        • Cagney G.
        • Yu H.
        • Zhong G.
        • Guo X.
        • Ignatchenko A.
        • Li J.
        • Pu S.
        • Datta N.
        • Tikuisis A.P.
        • Punna T.
        • Peregrín-Alvarez J.M.
        • Shales M.
        • Zhang X.
        • Davey M.
        • Robinson M.D.
        • Paccanaro A.
        • Bray J.E.
        • Sheung A.
        • Beattie B.
        • Richards D.P.
        • Canadien V.
        • Lalev A.
        • Mena F.
        • Wong P.
        • Starostine A.
        • Canete M.M.
        • Vlasblom J.
        • Wu S.
        • Orsi C.
        • Collins S.R.
        • Chandran S.
        • Haw R.
        • Rilstone J.J.
        • Gandi K.
        • Thompson N.J.
        • Musso G.
        • St Onge P.
        • Ghanny S.
        • Lam M.H.
        • Butland G.
        • Altaf-Ul A.M.
        • Kanaya S.
        • Shilatifard A.
        • O'Shea E.
        • Weissman J.S.
        • Ingles C.J.
        • Hughes T.R.
        • Parkinson J.
        • Gerstein M.
        • Wodak S.J.
        • Emili A.
        • Greenblatt J.F.
        Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.
        Nature. 2006; 440: 637-643
        • Connolly M.L.
        Solvent-accessible surfaces of proteins and nucleic acids.
        Science. 1983; 221: 709-713
        • Shen M.
        • Davis F.P.
        • Sali A.
        The optimal size of a globular protein domain: a simple sphere-packing model.
        Chem. Phys. Lett. 2005; 405: 224-228
        • Katchalski-Katzir E.
        • Shariv I.
        • Eisenstein M.
        • Friesem A.A.
        • Aflalo C.
        • Vakser I.A.
        Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques.
        Proc. Natl. Acad. Sci. U.S.A. 1992; 89: 2195-2199
        • Duhovny D.
        • Nussinov R.
        • Wolfson H.J.
        Efficient unbound docking of rigid molecules.
        in: Guido R. Gusfield D. Second International Workshop on Algorithms in Bioinformatics. Springer-Verlag, London2002: 185-200
        • Förster F.
        • Webb B.
        • Krukenberg K.A.
        • Tsuruta H.
        • Agard D.A.
        • Sali A.
        Integration of small-angle X-ray scattering data into structural modeling of proteins and their assemblies.
        J. Mol. Biol. 2008; 382: 1089-1106
        • Krukenberg K.A.
        • Förster F.
        • Rice L.M.
        • Sali A.
        • Agard D.A.
        Multiple conformations of E. coli Hsp90 in solution: insights into the conformational dynamics of Hsp90.
        Structure. 2008; 16: 755-765
        • Goodsell D.S.
        • Olson A.J.
        Structural symmetry and protein function.
        Annu. Rev. Biophys. Biomol. Struct. 2000; 29: 105-153
        • Tama F.
        • Brooks C.L.
        Symmetry, form, and shape: guiding principles for robustness in macromolecular machines.
        Annu. Rev. Biophys. Biomol. Struct. 2006; 35: 115-133
        • Levy E.D.
        • Boeri Erba E.
        • Robinson C.V.
        • Teichmann S.A.
        Assembly reflects evolution of protein complexes.
        Nature. 2008; 453: 1262-1265
        • Alber F.
        • Kim M.F.
        • Sali A.
        Structural characterization of assemblies from overall shape and subcomplex compositions.
        Structure. 2005; 13: 435-445
        • Brooks B.R.
        • Bruccoleri R.E.
        • Olafson B.D.
        • States D.J.
        • Swaminathan S.
        • Karplus M.
        CHARMM: a Program for Macromolecular Energy, Minimization, and Dynamics Calculations. Wiley, New York1983
        • Pearlman D.
        AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules.
        Comput. Phys. Commun. 1995; 91: 1-41
        • Van Der Spoel D.
        • Lindahl E.
        • Hess B.
        • Groenhof G.
        • Mark A.E.
        • Berendsen H.J.
        GROMACS: fast, flexible, and free.
        J. Comput. Chem. 2005; 26: 1701-1718
        • Jorgensen W.L.
        • Tirado-Rives J.
        The OPLS Potential Functions for Proteins. Energy Minimizations for Crystals of Cyclic Peptides and Crambin.
        J. Am. Chem. Soc. 1988; 110: 657-666
        • Shen M.Y.
        • Sali A.
        Statistical potential for assessment and prediction of protein structures.
        Protein Sci. 2006; 15: 2507-2524
        • Zhang C.
        • Liu S.
        • Zhou Y.
        Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential.
        Protein Sci. 2004; 13: 391-399
        • Melo F.
        • Sánchez R.
        • Sali A.
        Statistical potentials for fold assessment.
        Protein Sci. 2002; 11: 430-448