Identification of protein binding sites in genomic DNA by two-dimensional gel electrophoresis

We describe a simple two-dimensional electrophoresis procedure to identify the recognition sites of DNA-binding proteins within large DNA molecules. Using this approach, we have mapped E.coli IHF (Integration Host Factor) binding sites within phage Lambda (48 kb) and phage Mu (39 kb) DNA. We are also able to visualize IHF binding sites in E.coli chromosomal DNA (4,700 kb). We present an extension of this technique using direct amplification by PCR of the isolated restriction fragments, which should permit the cloning of a collection of recognition sequences for DNA binding proteins in complex genomes.


INTRODUCTION
The isolation of the target sequence of a DNA binding protein is an important step to characterize its functional role. Using available techniques (i.e. footprinting, gel shift assay), a systematic search for protein binding sites can only be performed on small DNA molecules, or on cloned DNA fragments. We were interested in finding an approach which would allow the rapid isolation of binding sites for the protein IHF within a large set of DNA fragments. IHF is a small histone-like protein of E. coli consisting of two subunits which are the products of the himA and himD (hip) genes. It was first identified genetically by its function in the integration of bacteriophage X both in vivo (1,2) and in vitro (3). The IHF protein appears to have a role in many cellular processes (see 4 for a review). It is involved in plasmid replication and segregation, site-specific recombination, transposition, transcription, and perhaps even translation of several prokaryotic genetic units. The pleiotropic effects of IHF are a direct consequence of its remarkable DNAbinding properties: unlike most histone-like proteins, IHF binds preferentially to specific sites in DNA (5), it makes most of its sequence-specific contacts in the minor groove of DNA (5,6), it is among the strongest DNA-bending proteins (7,8).
Little is known about the number and the position of IHF sites in the E.coli chromosome. Most IHF binding sites have been found in extrachromosomal elements such as phage, plasmids or transposons. From footprinting analysis, several related consensus sequences have been proposed. The common motif is the sequence: YAANNNNTTGATW (5,(9)(10)(11). Binding sites can, however, vary significantly from the consensus and several studies have shown that the context can also play a role in binding efficiency.
We describe here a simple procedure which allows the visualization and cloning of IHF binding sites within large collections of DNA fragments. This procedure consists of a simple two-dimensional band-shift assay, performed at two different temperatures. The combination of this approach with PCR offers a powerful way to recover rare protein binding sites from populations of DNA molecules (12). Individual binding sites can be amplified by PCR directly from the dried gel (13), and subsequently cloned in a suitable vector. We have begun to use this approach with several genomes over a broad range of complexity: small plasmids (5 kb), bacteriophages Lambda (48 kb) and Mu (39 kb), or the E. coli chromosome (4,700 kb). It should be applicable to the study of other DNA-binding proteins as well.

MATERIALS AND METHODS DNA Techniques
To avoid partial digestion with methylation sensitive restriction enzymes, unmethylated Lambda DNA was purchased from Promega. Plasmid pBR322::ISI # 11 is a pBR322 derivative with an ISJ insertion at coordinate 3291 (14). Mu DNA was a gift of R. Alazard (CNRS, Toulouse). E.coli chromosomal DNA (strain CSH5), was obtained by lysis of the cells with lysozyme and sarkosyl (0.6%) followed by an extensive treatment with RNAase A and proteinase K. It was then purified by standard protocols (15,16).
Digestions with restriction enzymes were performed according to the manufacturer's (Boehringer Mannheim; New England Biolabs; Biofinex) recommendations. The restriction fragments were end-labeled by filling-in, using Klenow polymerase (Biofinex or Boehringer-Mannheim). In a standard labeling reaction (27 Al), 1 1tg of DNA was resuspended in IOxMSB (medium salt buffer) containing 200 itM of each dCTP, dGTP, dTTP, 20 ItCi of [a-32P]-dATP (> 3000 Ci/mmol; Amersham) and 1 Al Klenow polymerase (lU/,l). Modified conditions (500 tsM of dGTP, dTTP and dATP, 15 ,uCi of [a-32P]-dCTP diluted in 1 xKlenow buffer (50 mM Tris, 10 mM MgCl2, 1 mM DTT, 50 mg/ml BSA)) were used to label a second set * To whom correspondence should be addressed k.) 1991 Oxford University Press of Hinfl and DdeI restriction digestions, because the recognition sequences of those two restriction enzymes (G/ANTC and C/TNAG respectively) allow a selective labeling at the N position.

Binding Reactions
Binding reactions were carried out by incubating several concentrations of labeled DNA (0.02 nM to 3 nM) with IHF (25 nM) for 30 min at 25°C in 1 xbinding buffer (Tris-HCl 50 mM pH 7.5, KCl 70 mM, MgCl2 7 mM, CaC12 3 mM, EDTA 1.1 mM, glycerol 10%, BSA 200 Ag/ml, j3-mercaptoethanol 1 mM). The stock solution of IHF (5 1M) was sonicated twice for 10 seconds before use to disrupt protein aggregates. The reactions were placed on ice, and the samples loaded on a polyacrylamide gel at 4°C without loading dye.
Two Dimensional Polyacrylamide Gel Electrophoresis Standard vertical slab polyacrylamide (6% or 7.5% acrylamide; acryl/bis ratio 29:1) gels were used for electrophoresis in the first dimension. The sizes of glass plates were 16 x 14.5 xO. 15 cm (narrow gels) or 30x 15.5 xO. 15 cm (wide gels). The wide gels were run on an adjustable sequencing gel apparatus in which the top tank can be lowered. Electrophoresis was performed in a cold room (4°C) at 14 V/cm for approximately 2 hours. Running buffer was TBE 1 x (15). The gels were prerun for several hours to permit the stabilization of conductivity.
For the second dimension, the lateral spacers were removed, leaving the gel between the glass plates. The gel was rotated by 900, and placed in a horizontal electrophoresis apparatus. It was then submerged in buffer (1 cm above the gel) prewarmed to 70°C to disrupt the protein-DNA complexes, and the temperature throughout electrophoresis was maintained at 600C by recirculating the buffer with a peristaltic pump through a closed tank placed in a 75°C waterbath. Taking into account the heat losses during circulation, the flow rate of the pump (approximately 200 ml/min) can be used to adjust the temperature of electrophoresis to 60°C in the second dimension. The gel was run for two hours at 7 V/cm, then dried and autoradiographed. When high DNA concentrations were loaded on the gel, staining with ethidium bromide was used to visualize the bands.
Addition of linkers to genomic DNA 3 itg (450 pmoles) of complementary 'catch' (12) linkers (5'-TTCTGTACACTCGAGATGAA-3' and 5'-TTCATCTCG-AGTGTACAGAA-3') containing a XhoI internal cut site and a EcoRI hemisite at each end, were labeled and phosphorylated at the same time, using 2.5 units of T4 polynucleotide kinase (Boehringer), 100 0Ci of [,y-32P]-ATP (3000 Ci/mmole; Amersham) diluted in suitable buffer. After 15 min at 37°C, the reaction volume was diluted two-fold with buffer; nonradioactive ATP (final concentration 1 mM) and kinase (10 units) were added and the incubation extended for 45 min. The reaction was inactivated 10 min at 700C, and transferred to a large beaker of water which was allowed to cool slowly from 700C to room temperature to promote annealing of the two oligonucleotides. These were then purified from unincorporated nucleotides using a Sephadex G-25 spin column.
The linkers were ligated to genomic DNA which had been successively restriction digested, treated with Klenow polymerase (12,15) to fill-in the ends (when needed) and EcoRI-methylase (Promega Biotech) to avoid digestion at internal EcoRI sites. The ligation mixture was then cleaved with EcoRI to eliminate linker multimers.

Amplification and Cloning of Shifted Bands
The DNA obtained in the previous experimental step was incubated with IHF and run on a 2D-gel as described above. The superimposition of the autoradiogram and the gel dried on Whatman 3MM paper allows one to cut out with precision a retarded band from the dried gel (13). The piece of gel was incubated in 200 t1l PCR buffer (Tris pH 8.3 10 mM, KCl 50 mM, MgCl2 2.5 mM, gelatin 100 yg/ml), containing 1 uM of each linker, 200 AM each dNTP, and 10 units of Taq polymerase (Perkin-Elmer). A standard amplification cycle of 1 min at 94°C, 1 min at 54°C and 2 min at 72°C was repeated 25 times (17). The last cycle was followed by 10 min at 72°C to allow a complete extension of the PCR products. The major PCR product was purified on a 6% PAGE gel (16), digested with XhoI, and cloned into XAoI digested plasmid Bluescript SKor KS-(Stratagene) as described (15). The ligation mixture was tranformed in a DHScv E. coli strain and selected on LB with Ap (100 jig/ml), X-gal and IPTG.

Sequencing
The plasmids with PCR inserts were directly sequenced using the dideoxy chain termination method as decribed (18), after a rapid miniscale purification (19).

Computer Search
The simulation of different restriction digestions of Lambda DNA was carried out with a sequence database manipulation program (Seqman) of the IDEAS package (M. Kanehisa, NIH), and managed by a VAX operating system.

RESULTS
Two Dimensional Band-Shift Assay To identify DNA fragments that bind to a protein, we have developed a simple and rapid procedure which is illustrated schematically in Figure 1. Polyacrylamide gels are run in two dimensions, successively. In the first dimension, a standard gel shift electrophoresis is performed (20,21). In the second, electrophoresis is performed perpendicularly, and at high temperature to destabilize the DNA-protein complex. The previously retarded fragments now migrate according to their real size, and are thus displaced from the diagonal formed by vertical polyacrylamide slab gel at 4°C. The retardation of the DNA-protein complex during the run is indicated by dotted arrows. The lateral spacers are then removed, and the gel, still between its glass plates, is placed in a horizontal gel apparatus (B). It is then submerged in buffer prewarmed to 700C, and the temperature throughout electrophoresis is maintained at 60°C DNA fragments that were complexed with protein run ahead of the diagonal in the second dimension. The size of the 'off the diagonal' bands can be estimated by projection on the diagonal (see text). the fragments which were never bound by the protein. This system thus allows the identification of DNA fragments containing one or more specific binding sites for the protein by their migration 'off the diagonal'. Projection of the middle of each shifted band onto the diagonal then defines the size of the retarded band. DNA fragments carrying sequence-induced bends may also be slightly displaced from the diagonal (22, see below), irrespectively of the presence of the protein.
IHF Binding Sites on pBR322 Derivatives To demonstrate the feasibility of this approach, we have used a plasmid carrying several IHF binding sites. This plasmid is a pBR322 derivative, pBR322: :ISI # 11 (5.1 kb; 14), which contains a copy of insertion sequence IS1. IHF binds to a single site at each end of ISJ (7,9), but a point mutation in the right end of IS1 in pBR322::ISI #11 abolishes binding by IHF. Additional IHF binding sites are located on the sequence of pBR322, near the beginning of the ampicillin resistance gene (9).
Plasmid pBR322: :ISI # 11 (5 ng) was digested with Hinfl and radioactively labeled. This generates 13 fragments, two of which should bind IHF: the 1632 bp fragment, carrying the pBR322 sites, and the 644 bp fragment carrying the left end of ISJ. The DNA was incubated with IHF (25 nM) and subjected to twodimensional electrophoresis. As expected, electrophoresis of endlabeled DNA yields a diagonal pattern (Figure 2, lane 1). Prior incubation of DNA with IHF (lane 2) results in the appearance of two detached bands corresponding to the 644 bp and the 1632 bp fragments. IHF binding to the pBR322 sites was clearly detected on smaller fragments after a two-dimensional analysis of DdeI digests of pBR322::IS1 # 11 or pBR322 (data not shown).

IHF Sites in Lambda DNA
The phage X genome (48,502 bp; 23) is an order of magnitude larger than pBR322 and several binding sites for IHF have been previously identified. They are located in the att locus (attachment site; 5) in the cos region (cohesive ends; 24, 25), upstream of the cro-clf genes (5, 26), P'R, PbL (10), and PL promoter (27).
In preliminary experiments, we have varied both IHF and DNA concentrations and found that the best results are obtained at low (0.02 nM) DNA concentration with saturating but low (25 nM) concentrations of IHF. At an IHF concentration of 10 nM, the intensity of the signal of shifted bands was significantly reduced (data not show); in contrast, for a concentration >50 nM, a 'smearing' effect is observed. This is probably due to non-specific binding of IHF. The addition of carrier DNA has a deleterious effect during the migration in the second dimension: an increased background interferes with the visualization of the shifted bands.
Examples of two-dimensional analysis of phage X DNA are shown in Figure 3. Phage X was digested first with Hinfl, which generates 148 fragments ranging in size from 1800 bp to a few bp, and the end-labeled fragments subjected to two dimensional gel electrophoresis (Figure 3, panel A). Again, naked DNA ( Figure 3A, lane 2) gives a diagonal pattern in which individual fragments are, however, too close to each other to be resolved.   regions. Additional restriction digestions of Lambda genome confirm the assignment of the known IHF binding sites described above. We have also selectively labeled the Hinfl and DdeI  Figure 3A, lane 1), their sizes can readily be estimated from calibration plots such as that shown in Figure 4. The distance migrated in either dimension can be used for calibration, yielding approximately the same result. Strikingly, in the case of the second dimension (temperature run 60°C) these sequence-induced effects are abolished (28) and a straight line is obtained. The fragments containing the known strong IHF binding sites, located in the cos (24) and attP region (5) were easily identified on the autoradiograms shown in Figures 3A and 3B. Note that X DNA was heated for 20 min at 67°C before labeling to denature the cohesive ends. Two shifted Hinfl fragments of 314 and 723 bp are obtained for the cos region (24), containing the binding sites located on the right side of cosB, and on the left side of cosB respectively (24; 25). CosB comprises a region of about 40 bp to the left of cosN and a region of about 170 bp between cosN and Nul. Each of the cos fragments is significantly retarded, as is the 317 bp fragment containing auP (5; Figure 3A).
To confirm the assignment of these bands, we have analyzed the two-dimensional pattern obtained after an additional digestion with XmnI ( Figure 3B). XnnI has 25 cleavage sites on X DNA, one of which is located in the cosB region, within the 314 bp Hinfl fragment. There are no XmnI sites within the 723 bp Hinfl fragment carrying cosB, nor in the 317 bp attP fragment. In Figure 3B, we can see that, as expected, the 317 bp (attP) and 723 bp (cosB) fragments are not affected by cutting with XmnI, whereas the 323 bp (cosB) fragment is replaced with a 281 bp digested fragments with dCTP (see Materials and Methods), which allows an additional discrimination.

Unknown Binding Sites of Lambda: Localization by Restriction Analysis and Cloning of PCR Products
In principle, the unknown IHF binding sites in X DNA can be identified by analyzing the size of shifted fragments with several restriction enzymes. A computer comparison of the size of these fragments and the restriction map of X suggested their possible location. The sequences obtained were screened for the presence of an IHF consensus sequence. In contrast to the fragments described above, more than one map location is possible for several fragments. To solve this problem, we have ligated labeled double stranded linkers to HaelIE digested X DNA in order to amplify the fragments by PCR (12,15). The shifted fragments were amplified from the dried gel. The PCR products were incubated with IHF, both before and after cloning to demonstrate the presence of a binding site (data not shown). An amplification of the bands, followed by cloning in plasmid Bluescript, sequencing, and subsequent gel retardation of the insert, shows that several DNA fragments bind IHF with different affinities.
A 527 bp HaeIII fragments carries new IHF binding site(s) and is distinctly retarded by IHF. After a search for consensus binding sequences in this fragment, we found several potential sites for IHF. The best match (CAC.... TTGATT; coord. 34,254-34,266; 23) is located at the N-terminus of the ral gene, which protects X from the host EcoB and EcoK restriction systems (29). This site was not detectable with the restriction analysis described above. We have calculated the similarity score of the DNA sequence according to the formula of Goodrich et al. (1 1 respectively. The low concentration of IHF used in our assays (25-50 nM) implies that the sites reported here have a good affinity for IHF. A few other weaker sites remain to be characterized in X using this approach.
Mu DNA Contrary to X, the sequence of phage Mu is only partially known (30). This includes the IHF binding site located near the left operator 01, whose role in transcription has been demonstrated (31). Another site is located downstream of the mom gene (J. Kur, S. Hattman, and W. Szybalski, pers. comm.). The autoradiogram shown in Figure 5 demonstrates the presence of 3 strong IHF binding sites in phage Mu DNA. The reagents concentrations were 25 nM for IHF and 0.01-0.02 nM for Mu DNA (Sau3AI digest). The 751 bp fragment contains the 01 operator since it carries one of the two HindHI sites of Mu (data not shown). The 134 bp fragment has the right size to carry the mom distal region. Digestion with other restriction enzymes have confirmed the assignment of these two sites. Finally, a shifted band corresponding to a size of approximately 700 bp carries an unknown binding site whose location can be narrowed down to a 260-280 bp interval after digestion with Hinfl or Hinfl +HindIH (data not shown). We are in the process of characterizing this band by PCR.
E.coli Chromosome An interesting use of this technique would be to isolate a collection of IHF binding sites from the E.coli genome. The best twodimensional gels ( Figure 6) were obtained by increasing the amount of chromosomal DNA (35 ng per reaction) relative to that used for X DNA, while increasing the IHF concentration only 2-3 fold (65 nM). After digestion with Hinfl (recognition site G/ANTC) and labeling with dCTP opposite the N position (see Materials and Methods) at least 30 shifted bands are visible on the original autoradiogram ( Figure 6). An oligonucleotide carrying a strong IHF binding site was included in the binding reaction as a control. Since only on average 7/16 of the chromosomal fragments have a C opposite the N position in at least one of their 3' ends and hence get radioactively labeled, we can estimate the number of IHF binding sites in the chromosome to be a minimum of about 70. The known IHF binding sites in E. coli number only 5 or 6 (4). Although many of them may correspond to binding sites at the ends of endogenous IS1 elements (7), or to the sites contributed by the integrated copy of phage X, a significant number of new sites remain to be isolated and characterized.

DISCUSSION
We have described a simple method to visualize, clone and characterize protein binding sites from genomic DNA. Using this approach, we have identified most of the known IHF sites within Lambda DNA, and have localized a potential new site. The two dimensional electrophoresis of protein-DNA complexes has several advantages. It allows the estimation, in one autoradiogram, of the minimal number of binding sites for a protein in a given genome. Since the denaturation is performed simply by increasing the temperature, there is no need for denaturing buffer in the second run. This allows the processing of several samples on the same gel, without tedious additional manipulations, and eliminates any problem with detergents in subsequent experimental steps.
The technique is easily reproducible. We always obtain the same pattern of IHF shifted bands in different assays with a given DNA sample, as well as after distinct labeling reactions. The relative intensity of each shifted band is also constant, which reflects the different binding affinities to various sites. A low concentration of the protein is needed in the assay in order to ensure specific binding. The system is sensitive to the IHF concentration, which, in the case of X must be within a narrow range between 10 and 25 nM. Exceeding the upper limit of protein concentration can result in the appearance of smears on the gel. The DNA concentration may be varied over a 100 fold range (up to 1 Ag), without deleterious effects on the binding reaction as seen on the gel. Although IHF is stable for at least 10 min at 100°C, its binding to DNA is abolished at temperatures above 50°C (data not shown). The heat lability of binding is a requirement for a protein to be used in this assay, a condition which should be met with many other DNA-binding proteins. It has recently been possible, using this approach, to visualize binding sites for eukaryotic transcription factors from unfractionated nuclear extracts (J. Wuarin and U. Schibler, pers. comm.).
The combination of two-dimensional electrophoresis with PCR offers a powerful way to isolate rare protein binding sites. By definition, all DNA fragments which are shifted ahead of the diagonal are of interest. They can be eluted and cloned individually, as was done for the X fragments. But, in the case of a more complex genome, one could simply pool all the fragments which migrate off the diagonal and sort them individually after cloning in a plasmid. We are currently using this approach to generate a library of the IHF sites from the E. coli chromosome. assistance, to Otto Jenni for drawing the Figures, to Robert Alazard for Mu DNA. This work was supported by grants 31-9073.87 and 31-9438.88 from the Swiss National Science Foundation.