Submitted on April 17, 2006
Revised on June 1, 2006
Accepted on July 4, 2006
Analysis of high-throughput protein expression in Escherichia coli
Yair Benita, Michael J. Wise, Martin C. Lok, Ian Humphery-Smith, and Ronald S. Oosting
Psychopharmacology, Utrecht University, Utrecht 3584CA
Corresponding Author: ybenita{at}mac.com
The ability to efficiently produce hundreds of proteins in parallel is the most basic requirement of many aspects of proteomics. Overcoming the technical and financial barriers associated with high-throughput protein production is essential for the development of an experimental platform to query and browse the protein content of a cell (e.g. protein and antibody arrays). Proteins are inherently different from one another in their physicochemical properties, therefore, no single protocol can be expected to successfully express most of the proteins. Instead of optimizing a protocol to express a specific protein, we employed sequence analysis tools to estimate the probability of a specific protein being expressed successfully using a given protocol, thereby avoiding a priori proteins with a low chance of success probability. A set of 547 proteins, to be used for antibody production and selection, was expressed in Escherichia coli using a high-throughput protein production pipeline. Protein properties derived from sequence data alone were correlated to successful expression and general guidelines are given to increase the efficiency of similar pipelines. A second set of 68 proteins was expressed to investigate the link between successful protein expression and inclusion body formation. While more proteins were expressed in inclusion bodies, the formation of inclusion bodies was not a requirement for successful expression.