Table I

Interaction databases

The availability of high quality curated information on complexes and interactions characterized in different organisms is not only important for understanding biology but also for aiding the discovery process. Several national and international efforts are devoted to producing this information as well as to standards that facilitate its exchange between different databases. The contents of the major databases are summarized are in the table. To keep up with the flood of publications dealing with the subject, database curators seek help from automatic text mining algorithms, which are rapidly gaining in accuracy (94). Nevertheless the quality of literature-curated data can be an issue as low throughput studies, sometimes based on a single experiment, can be just as, or more, error-prone than the more advanced high throughput techniques. In general, databases do not produce confidence scores for the interactions they curate, and extraction of high quality interactions from the databases remains primarily the user's responsibility. Most of the listed databases store experimentally derived protein-protein interactions obtained through literature curation. The only exception so far is the STRING database (95), which stores three types of interactions: 1) experimentally derived protein-protein interactions imported from the other databases and derived from text mining of PubMed abstracts, 2) interactions computed from genomic features, and 3) interactions transferred from model organisms based on orthology. All the listed databases support proteomics standards initiative-molecular interaction (PSI-MI) standards (see below). IntAct has the best conformity with the PSI-MI standards (96). Results deposited by high throughput TAP and yeast two-hybrid techniques include lists of the identified interactions as well as information on roles of each interactor (bait or prey). BIND (97) and DIP (88) allow retrieval of TAP-MS complexes that contain a query protein. The protein complexes in STRING, like those in SGD (98), are catalogued according to the GO (89) annotations and thus do not necessarily correspond to physical complexes. BIND and BioGRID (26) also store genetic interactions (99) (not considered in the table). Raw data (TAP purifications and peptide identification confidence scores) from high throughput studies are not available for search or download in the databases. Model organism databases such as SGD (98), Mouse Genome Database (100), WormBase (101), and FlyBase (102) usually do not independently archive protein-protein interactions. They either collaborate with major interaction databases by coordinating curation efforts (e.g. between SGD and BioGRID) or provide links to them (e.g. FlyBase and BioGRID). In addition to these major interaction databases and model organism databases, Human Protein Reference Database (103) archives 38,176 curated interactions in human, and MPACT (104) has 15,456 yeast interactions and hosts a catalogue of yeast protein complexes. BioGRID contains 38,609, 499, 22,524, 4,557, and 38,605 interactions in human, mouse, D. melanogaster, C. elegans, and S. cerevisiae, respectively. These figures were compiled in February 2008. PSI (105) is a community wide standard for data representation in proteomics to facilitate data comparison, exchange, and verification. PSI-MI specifies the format for exchange of molecular interactions using a controlled vocabulary. The MIMIx (minimum information required for reporting a molecular interaction experiment) (106) is a subset of the PSI-MI standard. It stipulates that a deposition must include key information that enables unambiguously defining the origin of the data, the method used to generate them, and the means to uniquely reference to other biological databases the partners of each deposited protein-protein link.

  • a Excluding genetic interactions.

  • b Includes 232 viruses and 56 phages of various organisms.

  • c Molecular Interactions database.

  • d MIPS mammalian protein-protein interaction database.