MCP Danish Cancer Society
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Glossary
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kenyon, G. L.
Right arrow Articles by Sheahan, L. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kenyon, G. L.
Right arrow Articles by Sheahan, L. C.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Molecular & Cellular Proteomics 1:763-780, 2002.
© 2002 by The American Society for Biochemistry and Molecular Biology, Inc.


Report

Defining the Mandate of Proteomics in the Post-Genomics Era: Workshop Report

©2002 National Academy of Sciences, Washington, D.C., USA. Reprinted with permission from the National Academies Press for the National Academy of Sciences. All rights reserved. The original report may be viewed online at http://www.nap.edu/catalog/10209.html

National Research Council Steering Committee: George L. Kenyon, (Chair)*,a, David M. DeMarinib, Elaine Fuchsc, David J. Galasd, Jack F. Kirsche, Thomas S. Leyh, (Contributing Author)f, Walter H. Moosg, Gregory A. Petskoh, Dagmar Ringei, Gerald M. Rubinj and Laura C. Sheahan, (Staff Director)k

a College of Pharmacy, University of Michigan, Ann Arbor, MI 48109-1065
b U.S. Environmental Protection Agency, Research Triangle Park, NC 27711
c The Rockefeller University, New York, NY 10021
d Keck Graduate Institute of Applied Life Sciences, Claremont, CA 91711
e Department of Molecular and Cellular Biology, University of California-Berkeley, Berkeley, CA 94720-3206
f Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY 10461-1975
g MitoKor, San Diego, CA 92121
h Department of Chemistry and Biochemistry, Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, MA 02254-9110
i W.M. Keck Institute for Cellular Visualization, Rosensteil Basic Medical Sciences Research Center, Brandeis University, Waltham, MA 02254-9110
j Howard Hughes Medical Institute, Chevy Chase, MD 20815-6789
k Program Officer, Board on International Scientific Organizations, Policy and Global Affairs, The National Academies, Washington, D.C. 20418


    ABSTRACT
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
Research in proteomics is the next step after genomics in understanding life processes at the molecular level. In the largest sense proteomics encompasses knowledge of the structure, function and expression of all proteins in the biochemical or biological contexts of all organisms. Since that is an impossible goal to achieve, at least in our lifetimes, it is appropriate to set more realistic, achievable goals for the field. Up to now, primarily for reasons of feasibility, scientists have tended to concentrate on accumulating information about the nature of proteins and their absolute and relative levels of expression in cells (the primary tools for this have been 2D gel electrophoresis and mass spectrometry). Although these data have been useful and will continue to be so, the information inherent in the broader definition of proteomics must also be obtained if the true promise of the growing field is to be realized. Acquiring this knowledge is the challenge for researchers in proteomics and the means to support these endeavors need to be provided. An attempt has been made to present the major issues confronting the field of proteomics and two clear messages come through in this report. The first is that the mandate of proteomics is and should be much broader than is frequently recognized. The second is that proteomics is much more complicated than sequencing genomes. This will require new technologies but it is highly likely that many of these will be developed. Looking back 10 to 20 years from now, the question is: Will we have done the job wisely or wastefully? This report summarizes the presentations made at a symposium at the National Academy of Sciences on February 25, 2002.


Due to the rising interest in proteomics research worldwide, a symposium entitled "Defining the Mandate of Proteomics in the Post-Genomics Era" was held at the National Academy of Sciences on February 25, 2002, in Washington, D.C. Most of the attendees were invited because of their strong interest in proteomics, proteins, or drug discovery. They came from industry, both large and small, academia, and government. Most were from the United States, but an effort was made to invite people from outside the United States. Four of the 10 speakers came from outside of the United States. Six young scientists from around the world received travel fellowships to attend the meeting. The attendees heard about recent advances in the field that will greatly accelerate the process of accumulating and interpreting much of this additional needed data and information.

The planning committee selected speakers (see Table I) and designed the symposium in the hope that one of the outcomes of the meeting would be helping to set the field on as wise a path as possible for the future. After the presentations attendees were involved in individual breakout sessions on a variety of topics, including


View this table:
[in this window]
[in a new window]
 
TABLE I Symposium speakers and affiliations

 
The thoughts and ideas of the speakers and those expressed in the breakout sessions were captured by recorders to assist in the preparation of this report. While other organizations and meetings have addressed many of the issues facing proteomics, we hope that participants and readers of this report will look back on this meeting as the field progresses and find that it was of some help in defining the current efforts and applications, as well as providing direction to the advancing state of the art.


    PROTEOMICS
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
Now that the DNA sequences of the human genome and genomes of dozens of other organisms are essentially known, the biomedical and biological communities are placing increased emphasis on proteomics, the study of the proteins that are the gene products. Proteomics, a word derived from "protein" and "genomics," needs further definition, as do proteomics initiatives, especially since many in the scientific community are asking for a human proteome project.

Historically one can point back to meetings and articles over 20 years ago, when scientists began to think about mapping the entire set of human proteins (see, for example, B. F. C. Clark, "Towards a Total Human Protein Map" (1)). Indeed, Congress was considering a project called the "Human Protein Index," long before the Human Genome Project had been conceived. The Human Protein Index project was developed in the late 70’s by Norman G. Anderson and N. Leigh Anderson at the Department of Energy’s Argonne National Laboratory (2). Its objective was to enumerate the human proteins (what would now be called the human proteome) by separation on 2-D gels and thus define their genes from the protein end, the only approach possible in those days before large scale DNA sequencing was possible. But this effort was perhaps ahead of its time given the lack of suitable technologies and shifting political sands. Instead, the rise of genomics took center stage. An Australian postdoctoral student, Marc Wilkins, is often credited with coining the term "proteomics" in 1994 (3) at a time when only one proteomics company existed (Large Scale Biology Corporation).

Today many proteomics initiatives are underway in industry and otherwise, such as the Human Proteomics Initiative (HPI), an effort which began in 2000 by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. The goal of the HPI is to annotate each known protein, providing information that includes the description of protein function, domain structure, subcellular location, post-translational modifications, splice variants, and similarities to other mammalian proteins (4). Another major proteomics effort is led by the Human Proteome Organization (HUPO), a group which has created a worldwide organization that engages in scientific and educational activities to encourage the spread of proteomics technologies and to disseminate knowledge pertaining to the human proteome and that of model organisms (5).

On which goals should these national and international efforts focus? Should they be limited to human proteomics or like the Human Genome Project, include key model organisms? Perhaps the proteomes of the human pathogens should be included as well (e.g., the malaria parasite and other infectious microorganisms), and if so, in what order of priority? Should development of more efficient instrumentation (e.g., mass spectrometers, X-ray diffractometers, nuclear magnetic resonance spectrometers) and improved computational methodologies (e.g., high-speed computers and software useful in bioinformatics) be emphasized? What should be the role of major federal funding agencies (e.g., the National Institutes of Health, the National Science Foundation, the U.S. Environmental Protection Agency, and the U.S. Department of Agriculture)? What should be the role of academic laboratories? Should projects be supported mostly by individual research grants or program project (group effort) grants? What should be the role of the private sector, particularly those companies large and small that have a major stake in exploiting the results of the various genome projects and proteomics initiatives? How can all of these stakeholders cooperate most effectively while still maintaining proprietary information where appropriate? Should the overall goal be to understand the structure and function of all known proteins or should only those known to be involved in diseases be emphasized? After all, one must first understand function if one is to fully understand dysfunction. Is enough emphasis being given to the functional aspects of proteomics? Are studies on post-translational modifications of proteins and subsequent functional aspects included in "proteomics?" Hence the interest in organizing the one-day symposium reported herein.


    DISCUSSION OF GENERAL TOPICS COVERED AT SYMPOSIUM
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
Beginning with a definition of the term "proteomics," Marvin Cassman, former director of the National Institute of General Medical Sciences, and now at University of California, San Francisco and the Institute for Quantitative Biomedical Research, was one of many speakers expressing an opinion on this subject and it was clear that proteomics means many (or at least different) things to different people. Some definitions include "high-throughput" and some do not. Obviously proteomics is not merely protein chemistry. Symposium chair and Dean of the University of Michigan College of Pharmacy, George Kenyon, commented, "Proteomics is not just a mass spectrum of a spot on a gel." Perhaps the most useful definition of proteomics for our purposes is the broadest: Proteomics represents the effort to establish the identities, quantities, structures, and biochemical and cellular functions of all proteins in an organism, organ, or organelle, and how these properties vary in space, time, or physiological state.

Somewhat limited operational definitions of proteomics were offered by some of the speakers. For instance, "In one sense it makes no difference at all why should you call something proteomics or call it something else?" Dr. Cassman continued, "What we call things often conditions how we organize our thinking and our efforts." He explained that genome-driven target selection coupled to high-throughput technologies is what he believes structural genomics means. "It means you are using the genomes as the primary source for target selection." However, structural proteomics uses these features "plus the additive feature of full coverage of protein space, that is, completeness" stated Dr. Cassman. The goal of completeness does not intend to suggest, however, that any smaller scale experiments, even including high- throughput analysis of specific tissues or subsets of proteins, would not be considered to be part of proteomics.

Of course there are many "-omics" along with proteomics including genomics, metabolomics, transcriptomics, interactomics and so on, which are collectively involved in the mandate of defining proteomics. However, we will restrain ourselves from commenting on other "-omics." Functional genomics and functional proteomics (which can encompass other ‘omics’ as mentioned) are closely juxtaposed on a continuum along the path of discovering the detailed secrets of life and life processes.

The general topics covered at the symposium included

Dr. Cassman defined proteomics as a set of related options: "the analysis of complete complements of proteins present in defined cell or tissue environments (i.e., context-dependent) and their variation in space and time" (with credit given to Stan Fields for his contributions to this definition). One example of a proteomic effort is the Protein Structure Initiative of the National Institutes of General Medical Sciences (NIGMS), which has as a goal the generation of a complete complement of protein structures in nature through the combination of direct structure determination and homology modeling. Although it requires high-throughput technology and genomic data to use for target determination, the goal of "completeness" is what distinguishes the effort as proteomics, according to Dr. Cassman.

The second part of his definition is exemplified by the use of microarrays to identify characteristic markers for cancer progression in specific tissue samples. These studies involve image and pattern recognition tools, which yield large-scale visualization of specific cell-dependent, context-dependent proteomic outputs.

The third part of the definition involves examining proteomic outputs in time and space. This requires not only the application of bioinformatics tools but also computational biology, that is, the use of modeling and simulation. Complex systems analysis could be considered an important element in the larger picture of defining a proteome, and such analysis will require theoretical modeling of systems. Several examples of NIGMS initiatives that focus on mathematical modeling of complex biological systems were provided. One example of this is the protein structure initiative or structural genomics as some may call it, which is discussed later in this report.

While we may be far off in terms of defining a complete human proteome, approaching proteomics on an organellar basis provides goals that are perhaps achievable in our lifetimes. Remember that the first DNA genomes sequenced were those of the bacteriophage, in the 1970s, followed in 1981 with the DNA sequencing of a human mitochondrial genome.

Consider also that the mitochondrion, which is estimated to be composed of about 2,000 proteins, presents a considerably more manageable problem and a microcosm of whole cell proteomics. With this in mind Nobel laureate Sir John Walker, head of the Dunn Medical Research Council Unit in Cambridge, UK, discussed his proteomic studies of mitochondria directed to resolving specific biological issues. Dr. Walker’s work includes the definition of the protein complement assembled in the respiratory enzyme known as complex I, the identification of the biochemical functions of a family of transport proteins found only in mitochondria, and the discovery of phosphorylation-dephosphorylation pathways in mitochondria. These studies rely not only on mass spectrometric and bioinformatics tools but also on biochemistry and genetics. Such an integrated approach is proving to be quite rewarding in Dr. Walker’s view, in terms of both understanding the biology of mitochondria and the technical development of new methods versus attempts to analyze the global complement of proteins in the organelle. It is also possible to focus on subcompartments of mitochondria, such as the inner mitochondrial membrane of so much interest to bioenergeticists.

In this report we have tried to avoid being constrained by a narrow definition of proteomics (e.g., merely quantitating protein levels) and have used the broad definition given earlier to allow a wide-ranging discussion of goals, techniques, opportunities, and challenges.


    LESSONS LEARNED FROM THE HUMAN GENOME PROJECT
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
Francis Collins, director of the National Human Genome Research Institute, spoke about lessons learned from the Human Genome Project that might be applicable to the discussion of a public large-scale proteomics initiative (see Table II). He began his presentation by taking issue with the term "post-genomics era." He queried whether this means that from the beginning of the universe until 2001 we were in the "pre-genome era," and then suddenly, "bang," we moved into the post-genome era (leading one to wonder what happened to the genome era). He suggested that it was presumptuous to say that the Human Genome Project is already behind us. He pointed out that proteomics is a subset of genomics, and genomics is more than sequencing genomes, which will be ongoing for decades to come. His comments are especially relevant given that the human genome was still only about 69 percent complete at the time of the meeting.


View this table:
[in this window]
[in a new window]
 
TABLE II Lessons learned from the Human Genome Project: Comments from Francis Collins

 
Dr. Collins concurred with other participants in delivering the sobering message that a large-scale proteomics effort is orders of magnitude more complicated and difficult than the sequencing of the human genome. (As if 100 trillion cells making up an organism and billions of base pairs in genomes are not enough complexity already!) The concept of a complete dataset of all human proteins is therefore very difficult to imagine. There are many challenges as stated below.

Dr. Collins said that the most important area for investment in proteomics right now is technology development so that we can move these methods in the direction of being able to tackle a mammalian proteome without facing enormous costs and problems with quality of the data.

A number of resources for genomics research continue to be generated that may help inform a proteomics effort, including multiple coverage of certain genomes and more specifically:

Dr. Collins referred to one publication: "Global Analysis of Protein Activities Using Proteome Chips (8)." He finished his presentation with a particular recommendation, not from a scientist but from a famous athlete (hockey star Wayne Gretzky). When asked how it occurred that he was so good at playing hockey, and why it was that he always seemed to score the key goals, Gretzky said, "It is very simple. You have got to skate where the puck is going to be." In the field of proteomics Dr. Collins said he was not sure where exactly the puck was going to be, but there were a lot of "Wayne Gretzky’s" at the meeting, and Dr. Collins was glad to get a chance to listen to them.


    SOURCES OF PROTEINS
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
By definition any proteomics effort aims at ‘completeness’ of information. This part of the symposium addressed primarily the comprehensiveness or completeness of any assembled library of proteins and the quality of the materials. It was noted that protein expression in a given cell varies from none to abundant. Historically, for practical reasons, the abundant proteins have been investigated most extensively; however, some of the rarely expressed proteins and proteins that appear only in disease states may be among the more interesting. Joshua LaBaer, Harvard Medical School, noted that the function of all proteins can be studied regardless of in vivo levels once a copy of the gene and adequate expression vectors are available. Ideally it would be desirable to have an available repository or library containing one clone for every spliced variant in the proteome. The size of that library will not be known for some time, but an intermediate realizable objective would be a repository consisting of one clone for every gene. These clones should be "expression ready"; that is, they should contain only the cDNA from the initiation site to the stop codons. It seems likely that we should have "some idea of all the different cDNAs" in the genome in the near future. The expressed proteins could be studied functionally and often identified by mass spectrometry. In general it is fairly easy to produce large quantities of proteins in insect cells or bacteria, but in certain cases it may be necessary to express them in their native cells in order to address such problems as localization or post-translational modifications. Dr. LaBaer compared the complexities of studying mammalian systems with those in yeast. There are approximately 6,000 genes in yeast compared to a much larger number in humans. Moreover, the genome in yeast is relatively simple; for example, there are only about 220 intron-containing genes in yeast, whereas a much larger fraction of mammalian genes contain introns and alternative splicing substantially increases the number of expressed proteins.

To this end Dr. LaBaer described the FLEX Gene repository, which is currently being assembled by a consortium of about 20 different public and private research laboratories. "FLEX" stands for Full Length Expression ready. This repository will enable scientists to move several genes simultaneously from the master vector to any expression vector, which will allow researchers to screen for function by high-throughput experimentation. It is the intention of this consortium to make this collection of all human genes broadly available without restrictions on their use. The four self-defined objectives of the consortium are (1) identification of the genes, (2) assembly of clones, (3) sequence validation, and (4) distribution to the scientific community. One example of the success of this effort resulted in the identification of two new genes that are likely involved in the migration of breast cancer cells through a membrane. The collaboration of public and private research groups raises certain legal issues, which include consideration of antitrust law.

Recombination-based cloning was presented as a high-throughput technology to enable the ready transfer of cDNAs from the supplied vector to one’s own preferred expression vector. Dr. LaBaer described a protein purification scheme that was developed by a graduate student in his laboratory, Pascal Braun. "In the case of human proteins," Dr. LaBaer explained, "where it is not easy to produce these proteins in human cells, [the availability of large numbers of purified proteins] will require the use of heterologous [expression] systems such as bacteria." "To develop these methods," continued Dr. LaBaer, "Braun transferred a collection of 30 cancer genes into four different expression vectors, each one adding a different epitope tag. [Braun] then developed a two-hour automated protocol for purifying 96 proteins in parallel [and] has now purified over 330 different proteins using this approach." Braun and Yanhui Hu of the lab created a database that correlates the success of purification with various features of the proteins such as pI, GO annotation, subcellular localization, and domain structure. Dr. LaBaer said they found that the presence of certain domains such as SH2 domains or SH3 domains can predict success in purification.

Dr. LaBaer concluded with a description of a database derived from a computer program that searches the primary literature for abstracts that mention both a gene and a disease. The assumption is that a significant number of such occurrences may identify groups of genes associated with a given disease. This effort was presented as a task in progress, and interested scientists were invited to experiment with the database (9).

Brian T. Chait from Rockefeller University described a proteomics approach to understanding cellular function. His group is interested in mechanisms by which materials enter and exit the nucleus, the isolation of multiprotein complexes and to the determination of their cellular localization. The basic concept is to introduce a particular affinity tag to one of the proteins at its natural location in the chromosome, which is done by replacing the endogenous gene by a gene that will code for a protein with a tag on it or as he termed it, "a piece of molecular Velcro." So long as the multiprotein complex is stable, the tag allows isolation of the associated interacting proteins. An application to the nuclear pore complex, a group of proteins involved in nuclear trafficking, was described extensively. The complex as isolated has a molecular mass of 50 million daltons. Interestingly, in the initial purification experiments it contained about 180 interacting proteins, but upon further fractionation only around 50 were found to comprise the complex. The individual proteins are identified by mass spectrometry, which has the power to provide additional information about phosphorylation sites.

Preliminary experiments describing the use of this approach to follow proteins at different points in the cell cycle and in the regulation of chromatin were mentioned briefly. The genomic tagging and mapping approach can be used to gain analogous information about a number of other systems. Most importantly this approach can show where the protein is localized within the cell, how much is present, when the protein is present and for how long, with what it is interacting, and even something about the topology of the protein complexes.


    PROTEIN SEPARATION
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
After more than a decade of effort in gene sequencing, reliable estimates of the number of human genes is still a matter of disagreement, speculation, and debate. From the point of view of proteomics, just the detection or enumeration of the numbers of expressed proteins defies prediction based on our current understanding of human cell-type protein composition and its modulation by myriad undefined post-translational modifications. Their actual identification or annotation of function remains a challenge. This entire situation is not significantly better for yeast. It is thus not surprising that a key problem in proteomics at a practical level is the simplification of protein mixtures to a state in which their characterization by physicochemical methods is experimentally tractable. There are no documented, reliable, or reproducible strategies for separation of classes of proteins or even individual proteins from very complex mixtures typically obtained in biological samples such as cell lysates. Clearly, not only does one wish to know which specific proteins are in a given sample but, ideally, one would wish to know whether specific proteins are part of a particular biologically significant compartment, complex, or subcomplex.

Denis Hochstrasser from the University of Geneva, a founder of GeneProt Inc., GeneBio SA, the Swiss Institute of Bioinformatics, and one of the pioneers in the identification of proteins in 2D gels, took the lead in dealing with the topic of protein separation. He stated at the outset that he wanted to play the role of "devil’s advocate": to describe some of the excitement in proteomics but also to describe some of the difficulties. He outlined the scale of potential proteins one can look for in the millimolar (10-3), micromolar (10-6), nanomolar (10-9), picomolar (10-12), femtomolar (10-15), attomolar (10-18), zeptomolar (10-21) and yoctomolar (10-24) (which is less than one molecule per liter) ranges. When one considers human blood, for example, Hochstrasser noted, "typically you only see albumin, immunoglobulin, and transferrin," whereas cardiac markers such as troponin are present at nanomolar concentrations, and insulin-like growth factor or insulin are in the picomolar range. Parathyroid hormone is in the low picomolar range and Tumor Necrosis Factor is found in the femtomolar range (see Fig. 1).



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 1. Potential plasma proteins observable at various concentration ranges. Millimolar, 10-3; micromolar, 10-6; nanomolar, 10-9; picomolar, 10-12; femtomolar, 10-15; attomolar, 10-18; zeptomolar, 10-21; yoctomolar, 10-24.

 
Hochstrasser speculated that there is "a linear logarithmic relationship between the concentration in blood and the number of proteins." He suggested that if there are about 300,000 proteins in the human body or five to six times the number of genes "you probably could find any protein you have in the body, maybe one in the total blood volume, which would be just below Avogadro’s number (1 protein/L of plasma), because we have 6 or 7 liters of blood which makes about 4 liters of plasma, and if you have one in 4 liters, it is about at the yoctomolar (10-24M) level."

For experimental studies the amount of starting material, such as blood, is considerable in order to have high enough levels of various protein material that can be detected by today’s methods. Since a 2D gel has a dynamic range of only 104, Hochstrasser stated, "if anyone used [a] 2D gel from crude plasma, you never go below the micromolar range." Hochstrasser noted, for example, that starting with 1 mL of sample leads to roughly a nanomolar limit of detection. He further explained that starting with a much larger volume (e.g., 5–10 liters of plasma) is necessary to achieve detectability in the lower picomolar range. Clearly, prefractionation of proteins, individually, or as a subgroup is essential to reach the dynamic range of detectability required for both cell and tissue lysates, and plasma.

In subsequent discussion it became clear that even the best large-format 2D gels are inadequate for studies of the global range of expression, perhaps still inadequate by a factor of 10; therefore at least a 10-fold fractionation prior to large-format 2D gel separation would be required. Unfortunately, many membrane proteins do not enter 2D gels effectively. This presents a formidable challenge for the field.

In his presentation, Julio Celis from the Institute of Cancer Biology and the Danish Center for Human Genome Research in Aarhus, Denmark, also spoke about methods and challenges in the area of protein separation. He stated that "for the study of tissue biopsies the use of high-resolution 2D electrophoresis is the method of choice [for separations] as non-gel high-throughput technologies based on chromatography-mass spectrometry are not yet ready for the study of tissue samples." He stated that 2D gel technology in combination with mass spectrometry can be used to establish comprehensive databases of protein information that can be useful in the clinical setting. He also made the important point that data in a given cell type can be valuable to the study of other cell types since 80–90 percent of the proteins are believed to be shared by all cell types. While many structural and metabolic gene products may be the same between all cells, as one reviewer pointed out, cell-specific proteins will be important for understanding function and disease.

An afternoon breakout session, devoted to the topic of "protein separation and identification," was led by Julio Celis; Alain Van Dorsselear, Louis Pasteur University, CNRS; and A. L. Burlingame of the University of California, San Francisco. Most of the 16 discussants were experts in mass spectrometry. The discussants concluded that the issue of sample preparation and purification has been sadly neglected at most meetings dealing with proteomics. There was the impression among some of the discussants that protein biochemists were developing and using methods to purify proteins that were not being adequately defined compositionally by mass spectrometrists interested in proteins. They envisioned setting up "core centers of excellence" in proteomics where innovation, mobility of people and ideas, and training can all occur. These core centers might also lead to spin-offs for the development of new instrumentation. Resources required to support a broad proteomic effort could be in the form of sample collections, standardization of data across platforms, and ligands that allow assaying of individual proteins, to name just a few. These centers would complement the work of scientists in individual, relatively small laboratories where more open-ended, curiosity-driven research can occur. Even when the advent of better strategies for protein mixture fractionation are in hand, new developments in mass spectrometry are needed to extend the dynamic range of detectability of protein samples, especially for proteins that are post-translationally modified.


    PROTEIN IDENTIFICATION
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
Until we can identify each expressed protein in a cell or target tissue we cannot fully define the proteome. Current, best practices have practical lower limits of protein-detection in the nanomolar or picomolar range, which is 10 to 15 orders of magnitude less sensitive than what is needed for complete proteome definition, which it was generally agreed, would require almost zeptomole or yoctomole sensitivity. The need for developing better and more sensitive methods of detection is pressing, and there are many opportunities to make significant technical advances. Insolubility is an important issue, membrane-associated proteins (which may comprise as much as 30 percent of the proteome) remain largely inaccessible experimentally, and the lack of sensitivity in protein detection and identification remains among our greatest limitations. Each order of magnitude increase in sensitivity brings important new insights into proteome composition and behavior.

Denis Hochstrasser brought into sharp focus the disparity between the sensitivity of current protein-detection methods and the proteomics community’s expectations regarding the sensitivity required to identify the complete proteome of a target cell or tissue. Cell and receptor based assay systems can detect peptides in the femtomolar range. These methods define the lower limit of our detection ability but, unfortunately, are applicable only to small sets of proteins. 2D gel electrophoresis, still an experimental mainstay in the proteomics community, can detect protein concentrations as low as micromolar, a sensitivity sufficient to identify ~100 plasma proteins, not including modified forms. Under ideal conditions, including the sieving out of abundant proteins, mass spectrometry can extend sensitivity three orders of magnitude to the nanomolar level. Mass-spectral proteome screening is being carried out on an industrial scale by GeneProt Inc., one of the world’s first large-scale proteomic R&D centers. The facility houses 40 Tandem Mass (MSMS) spectrometers, each serving two High Performance Liquid Chromatography (HPLC) machines. With each spectrometer running two samples per hour, the facility is capable of performing 1,920 MSMS-characterized HPLC profiles per day, remarkably with very little human intervention. The sensitivity can be extended to the picomolar range by preparing single, mass-spectrometry samples from 10–15 liters of the target.

Using a strategy that circumvents the need to detect a protein or to know its molecular function, Dr. Hochstrasser and colleagues are synthesizing proteins as large as 25 kDa in sufficient quantity and purity to immediately search for the effects of overexpression after injection in living systems, allowing them to move quickly from interesting protein candidates identified using informatics screens, to cellular or organism physiological response.

Dr. Hochstrasser underscored the well-known fact that mass-spectral identification is in general far more successful for peptides than proteins. He outlined a technological innovation that integrates the protein-separating resolution of 2D gels with the sensitivity of peptide-mass spectrometry. The method sandwiches a protease-impregnated membrane between a delivery membrane (that carries protein previously transferred from a 2D gel) and a capture membrane. The proteins are cleaved into peptides during electrophoretic transfer to the capture membrane, and are then desorbed from small registered sections of the membrane, using a laser, and finally delivered directly into the mass spectrometer. Dr. Hochstrasser called this new technology "The Molecular Scanner" (see Fig. 2).



View larger version (52K):
[in this window]
[in a new window]
 
FIG. 2. The molecular scanner integrates the protein separating resolution of 2D gels with the sensitivity of peptide-mass spectrometry. The proteins are cleaved into peptides and then desorbed using a laser and finally delivered directly into the mass spectrometer.

 
In the "Emerging Technologies" breakout session, co-chaired by Ruth Van Bogelen, Pfizer Global Research, and Norman G. Anderson, Large Scale Biology Corporation, the need for specific new technologies was discussed at length by experts in the field. Dr. Van Bogelen commented that "detection and realizing that the dynamic range of proteins in cells is probably over six, seven, maybe even twelve orders of magnitude where we really want to be able to detect proteins that are present, at even less than one molecule per cell. We think those [proteins] are important, and we don’t have the capabilities to do that."

The consensus of the group was that many areas are in need of development if the goal of defining the composition and behavior of proteomes is to become a reality. The perceived needs ranged from technologies that can determine the organization of the cellular matrix and the functions of the proteins in it, to single-cell proteomics, and the comprehensive analysis of post-translational modifications. Important practical issues were raised, like the need to standardize data across different technology platforms and how to organize the enormous volume of information being created daily. "The bottom line is, there is just a lot of work to be done," said Dr. Van Bogelen. "We need money invested into developing technologies. And we really need to have students in this area who are moving this field into the next generation."


    DATA COLLECTION
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
"An essential element of proteomics is the intent to collect data on proteins systematically and, where applicable, quantitatively," said Ruedi Aebersold, co-founder of the Institute for Systems Biology in Seattle, Washington. "Systematic data collection," said Dr. Aebersold, "means that the measurements are made on all the proteins present in a sample, eventually all the proteins that constitute a proteome." It is expected that proteomic data will be useful for classification of cells and tissues in health and disease and, more ambitiously, for achieving a detailed understanding of biological mechanisms. Dr. Aebersold discussed the development of an automated quantitative approach by his laboratory to help achieve their goals.

The technologies to perform various types of proteomic measurement are not mature and thus are limited in capacity (see Fig. 3). Dr. Aebersold’s group has developed a general approach to quantitative proteomics based on automated tandem mass spectrometry, stable isotope dilution theory, and a suite of bioinformatics tools for data analysis (10). Dr. Aebersold described the approach as follows: "Stable isotope signatures are introduced into proteins at specific sites by means of chemical reactions. Later these signatures are deconvoluted by a mass spectrometer and serve as the basis for accurate quantification of each labeled protein. The objective of the initial implementation of this technology has been quantitative protein profiling. The method is based on a class of reagents called ‘isotope coded affinity tags’ (ICAT reagents) (see Fig. 4) and the method is schematically illustrated (see Fig. 5). By changing the specificity of the reagent, the approach becomes generic for different quantitative proteomic measurements. Work is underway to extend this approach to determine profiles of enzyme activities (an area pioneered by Ben Cravatt from the Scripps Research Institute), and to protein linkage analysis, and protein phosphorylation profiles."



View larger version (25K):
[in this window]
[in a new window]
 
FIG. 3. Technologies for (quantitative) global analysis. The technologies to perform different proteomic measurements have reached various degrees of maturity and none of them is a fully mature technology. It is unlikely that a single experimental platform will be able to collect all types of proteomic data.

 


View larger version (16K):
[in this window]
[in a new window]
 
FIG. 4. Isotope-coded affinity tags (ICAT). A class of reagents called "isotope coded affinity tags" (ICAT reagents) are used to perform quantitative proteomic analysis based upon automated tandem mass spectrometry, stable isotope dilution theory, and a suite of bioinformatic tools for data analysis. Stable isotope signatures are introduced into proteins at specific sites via chemical reactions. These signatures are later deconvoluted by a mass spectrometer and serve as the basis for accurate quantification of each labeled protein. The objective of the initial implementation of this technology has been quantitative protein profiling. The method is schematically illustrated in Fig. 5.

 


View larger version (33K):
[in this window]
[in a new window]
 
FIG. 5. The basic ICAT approach. This figure schematically represents the ICAT approach to quantitative proteomics. By changing the specificity of the reagent, the approach becomes generic for different quantitative proteomic measurements.

 
Dr. Aebersold’s group has also developed a suite of software tools that use statistical methods to identify and eliminate poor quality spectra often observed during validation of automated liquid chromatography-MS/MS experiments. The software assigns a numerical value to each database search result, which indicates the probability that the search result is correct. Dr. Aebersold believes that such tools save considerable amounts of time and that they are essential for the adoption of community-accepted standards for protein identification by mass spectrometry.

In collaboration with Sciex (a manufacturer of mass spectrometers) Dr. Aebersold’s group has developed a mass spectrometry system that he refers to as "smart data acquisition." The system is based on a matrix-assisted laser desorption ionization (MALDI) quadrupole time-of flight mass spectrometer (QSTAR, Sciex) and is illustrated (see Figure 6). This system allows one to quantify all the detected peptides first and then to selectively sequence only those that show an interesting quantitative change (11). "By focusing the sequencing efforts on those peptides that show a change in quantity, the analysis is focused on those peptides that are relevant to the question asked, and the number of required sequencing operations is reduced by approximately an order of magnitude," stated Dr. Aebersold.



View larger version (35K):
[in this window]
[in a new window]
 
FIG. 6. Selective identification of differentially expressed proteins. In collaboration with Sciex (a manufacturer of mass spectrometers), Ruedi Aebersold’s group has developed a mass spectrometry system that allows one to quantify all the detected peptides first and then to selectively sequence only those that show an interesting quantitative behavior. The system is based on a matrix-assisted laser desorption ionization (MALDI) quadrupole time-of-flight mass spectrometer (QSTAR, Sciex) and is schematically illustrated in this figure. By focusing the sequencing efforts on those peptides that do show a change in quantity, the analysis is focused on those peptides that are relevant to the question asked and the number of required sequencing operations is reduced by approximately an order of magnitude (11).

 
The systematic and quantitative analysis of the properties that define protein activity and function within a defined context (i.e., proteomics) is essential for biology and medicine. "It appears unlikely that a single experimental platform will soon emerge that can collect all the different types of relevant data," stated Dr. Aebersold, "however, improved bioinformatics tools and smart data collection (either by themselves or in combination) have the potential to significantly increase the sample throughput in proteomics."


    PROTEOMICS AND THE PROBLEM OF FUNCTION
 TOP
 ABSTRACT
 PROTEOMICS
 DISCUSSION OF GENERAL TOPICS...
 LESSONS LEARNED FROM THE...
 SOURCES OF PROTEINS
 PROTEIN SEPARATION
 PROTEIN IDENTIFICATION
 DATA COLLECTION
 PROTEOMICS AND THE PROBLEM...
 APPLICATIONS
 COMPUTATIONAL METHODS AND...
 PROTEOMICS: A COORDINATED...
 CONCLUSION
 REFERENCES
 
Proteomics represents an exploration of a great unknown. Some 40 percent or more of the sequences in the genomic databases represent open reading frames that code for proteins for which there is no assigned function or for which the annotated function is incomplete or incorrect. This presents an enormous challenge to the biological community.

The function of a protein can be defined in many different ways depending on the experiments being done and the questions being asked. It may be useful to preface the word "function" with an adjective that specifies the nature of the effect that the protein produces. "Chemical function" refers to the general type of reaction catalyzed in the case of an enzyme; for non-enzymes this term is not applicable. "Biochemical function" refers to the specific substrates used, the products produced and the mechanism of the transformation between them in the case of an enzyme; the specific molecules bound, and the response produced in the case of a receptor, scaffold, regulatory protein or channel, and so forth. "Cellular function" refers to the pathway(s) in which the protein operates. These pathways are created by the combined biochemical functions of the proteins involved. Note that there exists a hierarchy of functions in which a complete understanding of function at each level depends on the information from the previous levels. The hierarchy continues with functions defined at the level of a phenotype of, say, a knockout: this may be manifest in the effect on a single cell or on an organelle or entire organism. Finally, it is possible to define function at the level of the effect of the loss or mutation of that protein on the development of a higher organism from embryo through to adult.

Function is not a fixed property for many if not most proteins. There are many ways that gene products can be altered to elicit modified or completely new functions. For example, there exist

These modifications can modulate biochemical function either directly or indirectly by altering the pathway in which a gene product operates. Cellular function can be changed similarly. Cellular function, and in some cases even biochemical function can also be changed simply by changing the location where the protein is found in the cell or by binding it to another protein or small molecule. A proteomics study that aims at understanding function is incomplete without taking these aspects into account.

One breakout session addressed "Metabolic Pathways and Post-Translational Modifications," which defined "function" to the group participants, as reported by Edward Dennis, University of California, San Diego, and Eugene Bruce, National Science Foundation. It was noted that most speakers had emphasized inventorying, categorizing, high-throughput screening, methods, and qualities such as completeness in defining proteomics. While these were the goals of the genomics revolution, they should not be the goals of proteomics, stated some of the participants. In contrast with genomics, which is finite in scope, proteomics, especially when function is included, is essentially without limit. "Whatever is done, completeness will be very difficult, if not impossible, to achieve from the viewpoint of function in proteomics," remarked session co-chair Edward Dennis. "Proteomics is many orders of magnitude more complex than genomics. It has been suggested that there are about 300,000 human proteins, thinking only about splice variants and post-translational modifications."

The list goes on and on when trying to get one’s hands around the number of discrete proteins that exist. Thus, instead of trying to count proteins some researchers suggested focusing on the life cycle of a protein. During its lifetime a protein undergoes phases of translation, maturation, regulation, and termination. Each of these phases involves numerous discrete proteins that interact with each other, and each phase involves protein modifications; so a given protein exists in an enormous number of discrete compositions and complexes, as stated previously in the report. There also exist many states imposed by protein:protein interactions, which change the nature of a protein, by ligands including a variety of metal ions, by activators, inhibitors, inducers, and this list goes on and on. There are even non-enzymatic modifications like oxidation that occur; so a given protein may exist in the cell in different redox states. Thus, this is really a combinatorial problem, explained Dr. Dennis, with both transient, and one might call them somewhat permanent, changes that occur to the protein as it undergoes its life cycle. Structural changes in the protein conformation can also lead to the development of a disease state. Prion diseases are clear examples where modifications in the structural conformation of a benign protein can lead to changes in normal function. An understanding of all the structural and functional states for even a single gene product is a huge, complex task, but one that must be considered when annotating proteins.

A number of conclusions arise from these considerations. The first is that a complete description of the function of any gene product must include aspects of both spatial and temporal changes in the protein, including changes of state. Gerald Carlson from the University of Missouri, Kansas City, suggested that we are most interested in the steady-state proteins that exist at some point in metabolism, but we are also interested in looking at all other states on the way to and after steady state. To begin to handle proteomics conceptually one must integrate the experimental results with the enormous amount of data on the computational side, and it is a huge undertaking to even begin to figure out how to relate all those states that are so important.

An interesting example of a chemical function genomics program was given by Thomas Leyh, co-chair of the "Structure Function" breakout session, who outlined an initiative intended to provide a functional genomics counterpart to the structural initiative already under way. The core of this multifaceted program, the subject of a recent National Institutes of Health workshop, is to perform large-scale mutagenesis and protein functional studies to create a database that assigns catalytic, ligand-binding, or other functions to the highly conserved, non-structural core residues for every protein family. A compendium of molecular function annotation will be propagated across relevant databases to establish links and assign molecular function to specific biological phenomena. While the design of the program tightly couples it to the structural genomics initiative (whose mission is to provide a representative structure for each protein family) it also includes interfaces with programs dedicated to the identification of protein function and the development of bioinformatic recognition algorithms, among others. Such a compendium would be extremely valuable to the biochemical and other scientific communities, and the program would establish the classical structure-function equation on a genomic scale.

Proteomics is far more complex than a simple profiling of the protein content of a cell, even with potential modifications of the proteins and protein:protein interactions included. Profiling of gene expression or protein expression is a useful tool but in most instances gives little direct information about biochemical function, although sometimes cellular functions do emerge. Among other problems with these approaches the correlation between mRNA levels and protein levels is poor for all but the most highly expressed genes. The view of function presented here makes this complexity apparent. A final point was that the field needs more emphasis on what a protein does, not just which proteins exist under what conditions.

Bioinformatics—
There are two ways that function is being determined at this time on a genome-wide scale. One is essentially bioinformatics driven and the other uses structural information. Bioinformatics involves, among other things, sequence comparisons and structure comparisons. These can be carried out on a genome-wide scale, as are comparisons of profiles of gene expression. Proteomics, as it is currently implemented in most instances, is geared towards comparisons of datasets of profiles of protein expression, usually determined by mass spectrometry.

Sequence comparison can be powerful especially if families of related sequences are identified. However, it is becoming apparent that not only can function diverge markedly when two sequences differ by 50 percent or more, in some instances sequences that are more than 90 percent identical code for proteins that operate on completely different substrates and have no cross-reactivity. Assignment of biochemical function from sequence data alone should always be regarded as tentative without confirmatory experimental evidence. Most functional annotation errors in genomics databases probably arise this way.

Structural Proteomics—
Among the possible experimental ways of approaching the problem of function determination on a large scale, the one that has received the most emphasis thus far is the use of structural information. Predicated on the assumption that the three-dimensional structure of a protein will often provide information about its biochemical and cellular functions, the structural approach is being applied on a genome-wide scale in a number of independent initiatives. Although in many instances at least the chemical function of an enzyme can be guessed from its overall fold, even that deduction is often problematic, and assignment of higher levels of function is practically impossible without additional information. This problem is exacerbated when membrane-associated proteins are considered. Between 25–40 percent of the proteins in the cell are estimated to be membrane associated (depending on the organism). The database of membrane protein structures is very small and the methods for determining those structures are very difficult and uncertain.

Cheryl Arrowsmith, a structural biologist from the Ontario Center for Structural Proteomics at the University of Toronto, discussed her group’s research on structural proteomics. She emphasized the difference of structural proteomics from structural genomics because they work on proteins, not genes. The focus of her proteomics research is to use X-ray crystallography and NMR spectroscopy to determine the three-dimensional structures of proteins on a genome-wide scale. She is particularly interested in examining the extent to which protein structure can reveal protein function. The model system used is Methanobacterium thermoautotrophicum, whose sequence was completed at the time the project was initiated in 1998. Since that time, her laboratory has evaluated thousands of proteins by subcloning into bacterial expression systems, performing either NMR studies or X-ray diffraction on soluble and relatively clean purified protein. They have also evaluated hundreds of proteins from a number of different bacterial, viral, and yeast genomes. However, the number of proteins that give structural samples was low. "There is a huge attrition rate in going from cloned genes to those that can be readily expressed in bacteria, are soluble in bacteria, can be purified, give good crystals or promising NMR spectra, and these would be very good in terms of getting a structure." The attrition rate overall is about 85–95 percent of genes that are tried, in other words, approximately 5–15 percent of bacterial or archaebacterial genes can be processed straight through to three-dimensional structures using a single protocol (e.g., single expression conditions, single purification procedure), according to Dr. Arrowsmith (12). The numbers are worse for eukaryotic systems. "Clearly one needs to try multiple procedures for protein expression, purification, and crystallization in order to improve the success rate for structures," said Dr. Arrowsmith.

She has confirmed these difficulties in a number of other species and systems, and she reported that many of the other National Institutes of Health centers participating in the project are seeing these sorts of statistics as well. Only in a few cases have they had the opportunity and actually gone on to do functional studies of these proteins. Even with proteins of known function, such as spermidine synthase, the determination of structure can be useful in proposing an atomic model and thus a better understanding of the mechanism of enzymatic function. Dr. Arrowsmith’s group was among the first to solve the structure of this protein. There are thousands of clones and proteins that have been prepared in the Ontario Center for Structural Proteomics and in many of the other centers; and these clones are available for further functional analysis. "I think this is a huge resource that is being generated, and it should be exploited through projects that emphasize [biochemical] functional analysis of proteins," said Dr. Arrowsmith.

Cellular Function—
Protein location can be determined by such genome-wide techniques as green fluorescent protein (GFP) tagging, and protein:protein interactions can be determined by affinity chromatography, immunoprecipitation, and yeast two-hybrid experiments. Databases resulting from these methods are beginning to emerge, but they are of uncertain accuracy. Recent comparisons of independently obtained databases for yeast proteins suggest that location determination is fairly robust but protein:protein interactions are at best determined with less than 50 percent overall accuracy. Clearly more reliable methods are needed, and efforts to create protein chips for profiling of interactions with proteins and small molecules appear promising.

One useful addition to the available arsenal of function-finding tools would be a database of three-dimensional motifs of biochemical function. Such a database would contain those structural elements that participate in ligand binding and catalysis for proteins of known function. This database could be searched in a manner similar to sequence database searches whenever a new protein structure is determined. Another useful tool would be, for each protein family, a database of mutations with functional characterization. Essentially this database would provide a link between a mutation at a particular site, a genetic lesion, a metabolic lesion and even a phenotype such as a disease.

Once again it was stressed that proteomics should be considered as a much broader field than would be apparent from early efforts, which have focused on cataloging levels of protein expression. Ideally it should encompass efforts to obtain complete functional descriptions for the gene products in a cell or organism. Because of the complexity of functional description, clearly more than one technique is required and no one existing technique should be emphasized in preference to any others. This goal may be beyond the reach of existing technologies, even for small numbers of proteins, but it is the direction in which the field must go.

<