A Facile Method for High-throughput Co-expression of Protein Pairs*S

We developed a method to co-express protein pairs from collections of otherwise identical Escherichia coli plasmids expressing different ORFs by incorporating a 61-nucleotide sequence (LINK) into the plasmid to allow generation of tandem plasmids. Tandem plasmids are formed in a ligation-independent manner, propagate efficiently, and produce protein pairs in high quantities. This greatly facilitates co-expression for structural genomics projects that produce thousands of clones bearing identical origins and antibiotic markers.

We developed a method to co-express protein pairs from collections of otherwise identical Escherichia coli plasmids expressing different ORFs by incorporating a 61nucleotide sequence (LINK) into the plasmid to allow generation of tandem plasmids. Tandem plasmids are formed in a ligation-independent manner, propagate efficiently, and produce protein pairs in high quantities. This greatly facilitates co-expression for structural genomics projects that produce thousands of clones bearing identical origins and antibiotic markers.

Molecular & Cellular Proteomics 3:934 -938, 2004.
Co-expression of proteins is an important objective for biochemical and structural analysis of protein complexes because it often increases authenticity of biological activity and increases solubility of protein partners (1,2). Although a variety of versatile systems are available for construction of large collections of expression plasmids, no convenient system exists for co-expression of protein pairs from such collections. Because the presence of an identical replication origin and antibiotic resistance marker precludes stable propagation of plasmid pairs in the same cell, one or both of the ORFs must be moved to a new plasmid to achieve co-expression. To this end, a number of methods have been developed, including the use of two plasmids with different selectable markers and compatible origins of replication (3,4), a single plasmid containing two ORFs under control of separate promoters (5), or a single plasmid containing ORFs arranged in a polycistronic message (6). All of these methods are inconvenient for use in a high-throughput mode because they require ad hoc construction of the co-expression plasmid and often sequencing of the ORFs in the new constructs.
We have developed a facile and general method for protein co-expression in Escherichia coli that can utilize sets of ORFs in identical expression plasmids with the simple requirement that the starting plasmid contains a 61-nucleotide sequence called LINK. This method takes advantage of our demonstration that two otherwise identical plasmids bearing different ORFs can be joined "head to tail" in a single tandem plasmid and propagated in E. coli (7). The LINK sequence expedites the joining of two plasmids using methods from ligationindependent cloning (LIC) 1 (8) and generalizes the method for any pair of ORFs. We demonstrate that the resulting tandem plasmid, with two identical replication origins and antibiotic resistance markers, efficiently propagates in E. coli, and that the two proteins are readily co-expressed in quantities that would satisfy the most demanding structural biology applications. The method is simple and rapid and does not require sequencing of the ORFs in the resulting tandem plasmid.
The LINK sequence was originally cloned into a vector AVA0220, a derivative of the pET14b vector, by insertion of the LINK sequence into EagI-SalI sites using oligos FLIP_Top (5Ј-GGCCGTAACAACAC-CATTTAAATGGAGTGGTTACAAATGGAGTGGTTAATTAACAACAC-CATTTG-3Ј) and FLIP_Bottom (5Ј-TCGACAAATGGTGTTGTTAATTA-ACCACTCCATTTGTAACCACTCCATTTAAATGGTGTTGTTAC-3Ј) resulting in AVA0229 vector. The LINK sequence was moved to additional vectors (see below) by PCR amplification, digestion with appropriate restriction enzymes, and standard cloning procedures.
AVA0469 vector was constructed from BG1861 by insertion of a fragment that contained the LINK sequence into the NcoI-EagI sites. In this construction, the LINK sequence is upstream of the T7 promoter and does not interfere with expression or LIC of ORFs. The sequence of this vector is reported in the supplemental materials.
AVA0306 vector is a LIC vector containing a LINK sequence that expresses proteins as N-terminal His 6 -MBP-3C protease site -ORF fusion proteins. It was derived from H-MBP-3C vector (9) in three steps: insertion of sequences necessary for LIC of ORFs, deletion of an endogenous SwaI site in the plasmid, and insertion of the LINK sequence into a M13 origin of replication, concomitantly destroying the M13 origin. First, to convert H-MBP-3C vector into a LIC vector, the multiple cloning site of the vector was replaced with the LIC-sitecontaining DNA fragment (5Ј-GAATTCCTGGAAGTTCTGTTCCAGG-GTCCTGGTTCGCGAATATTCTAGCTTTGTTTAAACAGCACGAACA-AGTTCTGCAG-3Ј), which was inserted into the EcoRI-PstI sites of the H-MBP-3C vector (9) to add NruI and PmeI sites as well as sequences for LIC. Second, an existing SwaI site present in the H-MBP-3C vector was deleted as follows: primers complimentary to the sequences flanking the SwaI site (but not including it) were used to copy the plasmid by PCR, and the plasmid was rejoined via an added BamHI site encoded in the oligonucleotides. The primer pair was Swa_to_Bam_F (5Ј-GCGGGATCCGTAAACGTTAATATTTTGTTAAA-ATTCGC-3Ј) and Swa_to_Bam_R (5Ј-GCGGGATCCCAATCTTCCTG-TTTTTGGGGC-3Ј). Third, the DNA fragment containing the LINK site was amplified from the vector AVA0229 using the primers BamHI_FLIP_F (5Ј-GCGGGATCCGCAACGCGGGCATCCC-3Ј) and pMal_FLIP_R (5Ј-GAGGCCGTTGAGCACCGCACTACGTGATTCCTT-CTG-3Ј) and inserted into the BamHI-DraIII sites of the vector resulting from step two. The sequence of vector AVA0306 can be found in the supplemental materials.
Tandem Plasmid Construction-Tandem plasmids were constructed in three steps. First, 1 g of each plasmid was digested at the LINK site with 20 units of a restriction enzyme (SwaI or PacI) in appropriate buffers for 1 h in a 20-l reaction, followed by heatinactivation of the restriction enzyme (65°C for 20 min). Second, 0.15 g of digested plasmid (0.05 pmol, 3 l of the heat-inactivated restriction reaction) was treated with 1 unit of T4 DNA polymerase in buffer containing 50 mM TrisCl, pH 8.0, 10 mM MgCl 2 , 5 g/ml BSA, and 5 mM DTT in the presence of 2.5 mM dGTP (for SwaI-cleaved plasmid) or dCTP (for PacI-cleaved plasmid) at 22°C for 20 min in a 20-l reaction to form single-stranded 5Ј overhangs, followed by heat-inactivation of the enzymes. Third, 22.5 ng of T4 DNA polymerasetreated DNA from the heat-inactivated reactions (3 l) was mixed at room temperature, annealed at 65°C for 1 min, and supplemented with 2 l of 25 mM EDTA. Then 1 l of the resulting mixture was transformed into Nova Blue cells according to the manufacturer's protocol.
Cell Growth-Plasmid-transformed BL21 (DE3) pLysS cells were grown from a single colony overnight at 37°C in 6.2 ml of Luria broth media supplemented with 100 g/ml ampicillin, and then for 16 more generations using serial dilutions (equivalent to 406 liters of culture) and induced with 1 mM isopropyl-␤-D-1-thiogalactopyranoside (IPTG) at 18°C for 16 h. Cells were harvested, lysed with SDS, and proteins were analyzed by SDS-PAGE.
S. cerevisiae Trm8/Trm82 protein complex was purified in buffer containing 20 mM HEPES, pH 7.5, 10% glycerol, and 2 mM BME, using IMAC, followed by protease 3C cleavage overnight at 4°C to release the bound protein and gel filtration on a HiLoad Superdex 200 16/60 column, followed by dialysis.

RESULTS
To generate tandem plasmids from expression plasmids containing the LINK sequence, each ORF-expressing plasmid is cleaved at the LINK site with a restriction enzyme, and the plasmids are joined in a ligation-independent manner. The LINK sequence (Fig. 1a) features two octameric restriction sites (SwaI and PacI), each flanked by sequences (labeled 1-4) that produce 14-bp 5Ј overhangs upon restriction digestion and treatment with T4 DNA polymerase (Fig. 1b). The LINK sequence is designed such that the single-stranded overhangs around the SwaI site hybridize exactly with those around the PacI site, allowing for ligation-independent formation of the correct tandem plasmids, while preventing resealing of single plasmids and formation of incorrect products. Thus one plasmid must be cleaved with PacI and the other with SwaI to generate the complimentary overhangs, but either plasmid can be cleaved with either enzyme. As with LIC (8), cloning with the LINK sequence is simple and results in an overwhelming majority of transformants that contain the desired tandem plasmids. Fig. 2, a and b illustrate the analysis of six tandem plasmids constructed with the LINK sequence, containing all possible pairwise combinations of four plasmids, each expressing a distinct L. major ORF in a pET-derived vector. In each case, digestion of the tandem plasmids with restriction enzymes that excise the ORFs results in DNA fragments corresponding to the size of each ORF (Fig. 2a, compare lanes 1-4 with lanes  5-10), as well as a vector fragment of the expected size that is common to both parent and tandem plasmids. Furthermore, linearization of the plasmids with an enzyme that cleaves only once shows that the size of each tandem plasmid is distinctly larger than the size of the parent plasmid (Fig. 2b,  lanes 5-10) and close to the size expected from the sum of the sizes of the corresponding plasmids of which it is made (Fig.  2b, lanes 1-4).
The tandem plasmids propagate efficiently to allow large scale co-expression of proteins. To demonstrate this, we monitored plasmid maintenance and protein expression after extended growth. Starting from a single colony of each transformant, we first grew a 6.2-ml overnight culture and then propagated each strain for 16 more generations by serial dilution of cultures, equivalent to the growth of 406 liters of culture. Then we assessed the plasmid content of the cells and induced the cells to express protein. Plasmids analyzed at this point were all the same size as before transformation into expression cells (compare Fig. 2, b and c), illustrating that the tandem plasmids propagate as a unit during this growth period, with little observed recombination between the monomeric components. We speculate that the efficient propagation of tandem plasmids made from pBR322-derived vectors such as pET and pMal is due to the absence of a cer sequence, which is necessary for efficient recombinational resolution of dimers of ColE1 plasmids (10,11). Following induction of expression after this prolonged growth, each strain containing a tandem plasmid expressed the corresponding pair of proteins at high levels, as judged by SDS-PAGE analysis of whole-cell lysates (Fig. 2d, compare lanes  1-4 with lanes 5-10). We note that tandem plasmids are more likely than other methods to yield comparable expression of ORF pairs, because the relative stoichiometry of the two ORFs is fixed and each ORF uses the same promoter.
Proteins expected to be members of a complex are readily co-purified after co-expression from a tandem plasmid made with LINK. Fig. 2e illustrates this for two different protein pairs, each in a different LINK-containing vector. Lanes 1-4 show the analysis of the predicted complex of E2 and UEV from P. falciparum, identified in a yeast two-hybrid screen. Each ORF was cloned into a LINK version of the pET-derived LIC vector BG1861 (AVA0469) to produce the corresponding His 6 -ORF fusion protein when expressed alone (lanes 1 and 2). When combined as a tandem plasmid using the LINK sequence, both proteins are expressed (lane 3), and the E2/UEV protein pair is readily purified (lane 4). Similarly, lanes 6 -9 illustrates this for the Trm8/Trm82 protein complex of S. cerevisiae (7),  ). b, construction of tandem plasmids. Plasmids are digested with SwaI or PacI, followed by heat-inactivation, treatment with T4 DNA polymerase in the presence of dGTP or dCTP, respectively, to form 5Ј overhangs, mixing of plasmids, and transformation into E. coli.
using a different LINK-containing vector (AVA0306), derived from a pMal vector, that expresses proteins as His 6 -MBP-3C protease site -ORF fusion proteins.

DISCUSSION
The LINK method is ideal for structural genomics applications for two major reasons.
First, it has enormous potential as a high-throughput coexpression method for large sets of ORF-expressing plasmids containing LINK, because the method is not PCR-based and does not require any DNA purification steps and thus can be easily automated. Moreover, because the manipulations of each plasmid occur at a site remote from the ORF, the method does not require sequencing of ORFs in the tandem plasmid. We note that the LINK sequence can be introduced into any nonessential region of a plasmid for subsequent use.
Thus, a plasmid containing the LINK sequence can be used for routine expression of individual proteins, because explicit testing showed that the sequence has no obvious negative effect on expression (data not shown).
Second, the method can be used for co-expression of virtually any pair of proteins because either of the two octameric restriction sites can be used for cleavage of a given plasmid, as long as the other site is used for the second plasmid. This feature allows the generation of tandem plasmids containing more than 99% of typical protein pairs (Supplemental Table I), the exact percentage depending on GCcontent, average protein length, and distribution of sites within ORFs. In yeast, for example, 99.4% of all possible protein pairs can be obtained by this method. Indeed, the only pairs that cannot be joined are those exceedingly rare combinations of ORFs with the same octameric restriction site  1-4), and products were compared with those of tandem plasmids (lanes 5-10) after gel electrophoresis; lane 11, DNA ladder. b, tandem plasmids are of the expected size. Plasmids in a were linearized with PacI, which cleaves each plasmid once (lanes 1-10), and resolved by gel electrophoresis; lane 11, DNA ladder. c, tandem plasmids propagate efficiently in expression cells. Plasmids in a were transformed into BL21 (DE3) pLysS cells, colonies were grown at 37°C using serial dilutions equivalent to 406 liters of culture, and DNA was analyzed as in b. d, proteins are co-expressed at high levels from tandem plasmids. Cells were grown for multiple generations as in c, induced with IPTG at 18°C for 16 h, harvested, and then lysed with SDS; proteins were analyzed by SDS-PAGE (lanes 1-10); lane 11, broad range standards (Bio-Rad). e, co-expression and purification of two protein complexes. Lanes 1-4, analysis of an E2/UEV protein pair using vector AVA0469 in BL21 (DE3) pLysS cells. Lanes 1 and 2, expression of E2 and UEV protein fragments from individual plasmids; lane 3, co-expression of E2 and UEV from the tandem plasmid; lane 4, purified complex after IMAC and gel filtration. Lanes 6 -9, similar analysis of the yeast Trm8/Trm82 protein complex, cloned in vector AVA0306 and expressed in BL21-Codon Plus (DE3)-RIL cells. Lane 9, purified complex after IMAC, 3C protease cleavage, and gel filtration. Lanes 5 and 10, standards. within their sequences or those with both octameric recognition sequences in one ORF.
Finally, we note that the LINK method could be used to generate a random library of plasmids containing nearly every possible combination of protein pairs in a given genome for functional screening or selection. Applied to yeast, this would result in a library of 1.8 ϫ 10 7 possible protein pairs, well within the transformation capability of E. coli.