Cell Cell Senior Scientist Wanted
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]


[Reprint (PDF) Version of this article]
Cell, Vol. 89, 555-564, May 16, 1997, Copyright © 1997 by Cell Press

Crystal Structure of PI-SceI, a Homing Endonuclease with Protein Splicing Activity

Xiaoqun Duan1, Frederick S. Gimble3, and Florante A. Quiocho1,2
1 Structural and Computational Biology, Molecular Biophysics Program
2 Howard Hughes Medical Institute, Department of Biochemistry, Baylor College of Medicine, Houston, Texas 77030
3 Center for Macromolecular Design, Institute of Biosciences and Technology, and Department of Biochemistry and Biophysics, Texas A & M University, Houston, Texas 77030

Corresponding author: Florante A. Quiocho, 713 798 6565 (phone), 713 798 8516 (fax), faq@dino.bcm.tmc.edu.

Summary
Summary Introduction Results and Discussion Procedures References

PI-SceI is a bifunctional yeast protein that propagates its mobile gene by catalyzing protein splicing and site-specific DNA double-strand cleavage. Here, we report the 2.4 Å crystal structure of the PI-SceI protein. The structure is composed of two separate domains (I and II) with novel folds and different functions. Domain I, which is elongated and formed largely from seven ß sheets, harbors the N and C termini residues and two His residues that are implicated in protein splicing. Domain II, which is compact and is primarily composed of two similar {alpha}/ß motifs related by local two-fold symmetry, contains the putative nuclease active site with a cluster of two acidic residues and one basic residue commonly found in restriction endonucleases. This report presents prototypic structures of domains with single endonuclease and protein splicing active sites.

Introduction
Summary Introduction Results and Discussion Procedures References

PI-SceI is a 454 amino acid (Mr 50 k) bifunctional protein encoded by a mobile selfish DNA at the VMA1 locus of S. cerevisiae (Gimble and Thorner, 1992 ). It occurs as an internal protein segment (termed the intein) situated between the N- and C-terminal segments (termed exteins) of an ATPase host protein, and initiates an intein homing process that results in the transfer of its DNA coding sequence to a recipient VMA1 locus that lacks the intein gene (Gimble and Thorner, 1992 ). First, it autocatalyzes its excision from the middle of a 119 kDa protein precursor by protein splicing and ligates the two flanking exteins to generate the 69 kDa vacuolar ATPase subunit (Hirata et al., 1990 ; Kane et al., 1990 ; reviewed in Cooper and Stevens, 1995 ). Second, the excised PI-SceI functions as a site-specific DNA endonuclease that makes a double-strand cut in the recipient VMA1 locus. Repair of this break during meiosis by yeast repair functions propagates the intein gene (Gimble and Thorner, 1992 ).

Based on conserved sequence motifs or blocks, PI-SceI is related to two different groups of proteins found in organisms from each phylogenetic kingdom (for a review, see Mueller et al., 1993 ). Motifs within the central region (Blocks C and E Pietrokovski, 1994 LAGLI-DADG, or dodecapeptide sequence motifs) occur in proteins that process nucleic acids and have diverse biological functions. Some are endonucleases similar to PI-SceI that are part of homing inteins. Other endonucleases are encoded by open reading frames (ORFs) in either Group I or archaeal introns and initiate intron homing events analogous to intein homing. Another group is encoded by free-standing genes that are independent of introns or inteins. Interestingly, the dodecapeptide motifs also occur in mitochondrial RNA maturases situated in Group I introns that are required for efficient intron splicing. A second set of motifs at the N- and C-terminal regions of PI-SceI (Blocks A, B, F, and G Pietrokovski, 1994 ) occurs in proteins that are known or hypothesized to catalyze protein splicing.

Protein splicing of the yeast VMA1 intein and others requires catalytic elements present in the intein and the first residue of the C-terminal extein (Cooper and Stevens, 1995 ). Mutagenesis studies have identified critical residues at the intein-extein borders (Hirata and Anraku, 1992 ; Cooper et al., 1993 ; Chong et al., 1996 ) and have provided strong evidence for a common protein splicing pathway (see Chong et al., 1996 for reaction pathway). First, the peptide bond at the N-terminal intein-extein junction is replaced with a thioester linkage through an N-to-S acyl rearrangement catalyzed by Cys-1 of the intein. Second, transesterification occurs as a result of nucleophilic attack of the N-terminal junction by Cys-455, the first residue of the C-terminal extein. This reaction yields a branched intermediate with two N termini derived from the intein and N-terminal extein. Third, aminosuccinimide formation by Asn-454, the C-terminal residue of the intein, is coupled to cleavage of the intein from the branched intermediate. Finally, conversion of the extein-extein thioester linkage into a peptide bond occurs by an S-to-N acyl shift.

To initiate intein homing, PI-Sce I cuts the yeast genome exclusively at the VMA1 locus (Bremer et al., 1992 ). Site-specific cleavage occurs within a 31 bp asymmetrical recognition site to generate a 4 bp overhang with a 5' phosphate and requires Mg2+ as a cofactor (Gimble and Wang, 1996 ). PI-SceI exists as a monomer in solution (Gimble and Thorner, 1993 ) and also in contact with DNA (Wende et al., 1996 ). In the absence of Mg2+, the protein forms stable intermediates where it is bound solely to a 17 bp sequence adjacent to the cleavage site (minimal binding region) or to the entire substrate (Gimble and Wang, 1996 ; Wende et al., 1996 ). PI-SceI makes extensive major groove contacts in the cleavage site region, whereas it uses primarily phosphate backbone contacts within the minimal binding region (Gimble and Wang, 1996 ). Interestingly, PI-SceI significantly distorts the DNA (~60°-75°) when bound to the entire recognition site (Gimble and Wang, 1996 ; Wende et al., 1996 ). These findings can be understood in terms of a model where PI-SceI binds first to the minimal binding region, then contacts the cleavage site and stabilizes a DNA distortion that positions the labile phosphodiester bonds into the enzyme active site.

Here, we report the 2.4 Å X-ray crystal structure of the PI-SceI protein. The data clearly reveal the structural and functional duality of the enzyme. Amino acid residues comprising the nucleolytic active site, identified by a cluster of charged residues that are conserved in the protein family, reside in one domain, while those that participate in protein splicing are located in the other. Preliminary docking of a DNA model revealed new features of molecular recognition. Furthermore, examination of the structure immediately suggests an evolutionary model that explains the association of the two disparate activities.

Results and Discussion
Summary Introduction Results and Discussion Procedures References

Structure and Novel Domain Motifs
The three-dimensional structure of PI-SceI, the first for a homing endonuclease and a protein generated by protein splicing, was determined by multiwavelength anomalous dispersion (MAD) and has been refined at 2.4 Å resolution to an R factor and an Rfree of 19.2% and 23.8%, respectively (Table 1). A portion of the electron density is shown in Figure 1. The two independent molecules within the asymmetric unit are very similar and can be overlapped with an rms deviation of about 1 Å between {alpha} carbons. The structure is composed of two separate domains (I and II) connected by two peptide segments (Figure 2 and Figure 3). Both domains possess not only unusual folds but also different functional sites located on the opposite sides of the molecule. Domain I, comprising the first 182 and the last 44 residues, is an unusual elongated domain (about 3:1 axial ratio) composed almost entirely of ß sheets (19 strands). The intervening continuous segment comprising residues 183-410 adopts an almost equal mixture of helices (7) and strands (9) forming the compact and globular domain II. A search (Holm and Sander, 1995 ) for protein or domain structures that are homologous with either domain in PI-SceI proved negative.


Figure 1. Stereo View of the Final 2.4 Å 2Fo-Fc Electron Density Map in the Region of the Putative Nuclease Active Site Containing Asp-218, Asp-326, and Lys-301

The map was contoured at 1{sigma}.

View larger version (65K): [in this window] [in new window]


Figure 2. Stereo Ribbon Drawing of the {alpha} Carbon Backbone Trace of PI-SceI

Domain I is shown at the bottom and domain II at the top. {alpha} helices and ß strands are green and blue, respectively. All nine helices and selected strands spaced at nearly equal intervals are labeled based on the identifications shown in Figure 3. Amino and carboxy termini, which are part of the self-splicing site, are labeled (N) and (C), respectively (see Figure 4A for details). Labeled in red are the positions of the three residues in domain II (Asp-218 and Asp-326 at the C termini of {alpha}4 and {alpha}7, respectively, and Lys-301 in a loop) that form a charged cluster in the endonuclease active site (see Figure 4B for details). Relative to the figure, the splicing and endonucleolytic active sites are located in the back and front, respectively. The local two-fold symmetry axis between the similar secondary structure motifs in domain II is located between the two vertical helices ({alpha}4 and {alpha}7) at the center of the domain and perpendicular to the plane of the figure. This figure and Figure 4A and Figure 4B were drawn using RIBBON (Carson, 1991 ).

View larger version (36K): [in this window] [in new window]


Figure 3. Topology Diagram of PI-SceI Structure

The circles (green) and triangles (red) represent {alpha} helices and ß strands, respectively. The bold numbers identify the ten different ß sheets. The two secondary structure motifs ({alpha}4-ß14{alpha}5ß15ß16{alpha}6 and {alpha}7ß19ß20{alpha}8ß21 ß22{alpha}9) related by local two-fold symmetry in domain II are enclosed in the rectangular box with dashed lines. Although the segment between {alpha}4 and ß14, which forms a single hydrogen bond with ß14, is technically not a ß strand, it is topologically identical to ß19 of the second motif (see also Figure 2). The closer proximity between {alpha}7 and ß19 is meant to convey a one residue connection (Gly 327) between the two secondary structures.

View larger version (27K): [in this window] [in new window]
Table 1. Summary of Crystallographic Analysis

View this table: [in this window] [in new window]

The 19 strands in domain I are incorporated into a total of 7 closely packed ß sheets approximately divided into two dissimilar underlying substructures (Figure 2 and Figure 3). The elongated structure of domain I is primarily due to side-by-side arrangement of sheets 2, 10, 5, and 6, which constitute one substructure, with sheet 6 providing a long extension. The other shorter substructure, comprised of sheets 1, 3, and 4, form a ß sandwich with sheets 2, 10, and 5 (Figure 2). As domain I harbors both terminal residues (Cys-1 and Asn-454) (Figure 2 and Figure 4A), which are conserved between various inteins and are essential for protein splicing (Pietrokovski, 1994 ), it is associated with the self-splicing machinery. Consistent with its autocatalytic self-splicing function, both terminal residues are in close proximity and are lodged in a cavity surrounded by parts of sheets 1, 3, 5, 7, 8, 25, 26, 27, and 28 (Figure 2 and Figure 4A).


Figure 4. Stereo Views of the Functional Sites in PI-SceI

(A) The protein self-splicing site containing the essential N-terminal Cys-1 and C-terminal Asn-454 residues viewed from the back of domain I, as shown in Figure 2. His-79 and His-453, positioned proximal to the terminal residues, are extremely conserved between self-splicing proteins and could perform as general acids/bases in the autocatalytic splicing reaction. Three of the ß strands that are identified are further linked to subsequent strands in the site shown (e.g., ß3 to ß6, ß7 to ß8, and ß24 to ß28).

(B) The endonuclease active site containing the charged cluster of three residues (see text). The orientations of the two symmetry-related {alpha}4 and {alpha}7 dodecapeptide repeats are similar to those shown in Figure 2.

View larger version (29K): [in this window] [in new window]

The compact domain II is mainly built up from two sub- structures, each with very similar secondary structure motifs ({alpha}4-ß14{alpha}5ß15ß16{alpha}6 and {alpha}7ß19ß20{alpha}8ß21ß22{alpha}9) (Figure 2 and Figure 3). Moreover, the two motifs are related by local two-fold symmetry about an axis between the vertical parallel {alpha}4 and {alpha}7 with a relative twist of about 35° (Figure 2). Helices 4 and 7 contain the two dodecapeptide sequences that are distinguishing characteristics of homing endonucleases and maturases (Michel et al., 1982 ; Waring et al., 1982 ; Hensgens et al., 1983 ). The two vertical parallel helices are surrounded by two nearly horizontal pairs of symmetry-related helices ({alpha}5 and {alpha}6 in one motif related to {alpha}8 and {alpha}9, respectively, in the other). The approximate symmetry is also clearly apparent between the two ß sheets in both motifs that flank the C-terminal ends of the two parallel helices and that form a concave twisted canopy above the symmetry-related helices (Figure 2). (Although the segment following {alpha}4 in one sheet is not quite a sheet strand, it is topologically equivalent to ß14 of the other sheet [ Figure 2 and Figure 3]). The similarity of the two motifs is evident from superpositioning of the {alpha} carbon atoms of the 63 pairs of residues in the overlapped secondary structures of both motifs, which show an rms deviation of 1.7 Å. The resulting overlapped sequences reveal 22% identity and 46% similarity.

Endonuclease Active Site
The nuclease active site apparently resides in domain II, at the C-terminal ends of the parallel helices 4 and 7. This location is consistent with several pieces of evidence. Two Asp residues (Asp-218 and Asp-326) that are located at the C termini of the parallel helices and Lys-301, found in a loop immediately after ß18, form a charged cluster (Figure 2 and Figure 4B) that bears similarity with those commonly seen in the catalytic sites of homodimeric restriction endonucleases with previously determined three-dimensional structures (EcoRI, PvuII, EcoRV, and BamHI; for a review, see Aggarwal, 1995 ). Comparison of the PI-SceI Asp-218, Asp-326, Lys-301 triad with the identical residues in the active site of EcoRV restriction endonucleases indicates an rms deviation of about 2 Å between {alpha} carbons. The Asp-218 and Asp-326 residues in PI-SceI occur at positions within the dodecapeptide motifs that are extremely conserved as acidic residues among the related homing endonucleases and maturases (Mueller et al., 1993 ; Pietrokovski, 1994 ). Conservation of the dodecapeptide motifs among the protein family may be a consequence of the fact that they comprise the two {alpha} helices that correctly position the active site residues. Lys-301 is a conserved residue within a separate motif found in inteins (Block D, Pietrokovski, 1994) and several maturases. Finally, substitutions of the two Asp residues by site-directed mutagenesis abolish DNA cleavage but not binding (Gimble and Stephens, 1995 ), and a Lys-301-to-Ala substitution leads to loss of catalytic activity (data not shown).

In spite of profound differences in the overall structures of the restriction endonucleases and PI-SceI (discussed below), the similar active site arrangement suggests an analogous hydrolytic mechanism. Asp-218 and Asp-326 of PI-SceI appear to be structurally equivalent to acidic residues in EcoRI, EcoRV, BamHI, and PvuII (Aggarwal, 1995 ) and presumably comprise part of the Mg2+ binding site. Although Mg2+ was present during crystal growth, its location has not been clearly established. There is a density between the two Asp carboxylate sidechains that, at present, is assigned to a water molecule (Figure 1). The exact role of the Mg2+ ion in the PI-SceI reaction pathway is unclear, but in the case of the restriction enzymes, its proposed function is to polarize the phosphate that is attacked. The close proximity of the two Asp residues implicated in Mg2+ binding likely requires that helices 4 and 7 mesh and remain in close contact. For EcoRV, it has been proposed that a second Mg2+ ion that is bound by a third acidic residue functions to activate the attacking water molecule (Baldwin et al., 1995 ; Vipond et al., 1995 ). However, there is no obvious equivalent acidic residue in PI-SceI structure or in the conserved blocks of the other homing endonucleases, so a two metal ion mechanism is unlikely for this enzyme family. The lysine residues found in the catalytic triads are thought to stabilize the doubly charged pentavalent transition state (Kostrewa and Winkler, 1995 ), and a similar role may be played by Lys-301.

Because PI-SceI is a monomer in solution (Gimble and Thorner, 1993 ) and even in the presence of DNA (Wende et al., 1996 ; F. S. G., unpublished data), there has been some speculation as to whether the enzyme contains 1 catalytic site that cuts two strands or 2 catalytic sites, each of which cleaves at one strand. The structural and mutational data (see above) are fully consistent with there being one active site that lies close to the approximate center of symmetry in domain II. (Although the structure determined is that of two molecules in the asymmetric unit, related by local two-fold symmetry, the active sites of the two proteins are separated by about 40 Å.) We are unable to find another similar three charged residue cluster that: (1) would be at an appropriate distance (about 13 Å) to the first cluster for DNA cleavage to leave a 4 bp overhang, and (2) would be conserved among homing endonucleases (Pietrokovski, 1994 ). The data, taken together, do not support the suggestion of two active site domains, each containing a single dodecapeptide motif (Henke et al., 1995 ; Lykke-Andersen et al., 1996 ; Wende et al., 1996 ). General features of DNA cleavage by the Type IIs FokI endonuclease resemble those of PI-SceI, even though the sequences of the enzymes and the specific protein-DNA interactions are very different. Like PI-SceI, FokI recognizes an asymmetric binding site, binds DNA as a monomer (Skowron et al., 1993 ), and possesses a single active site (Waugh and Sauer, 1993 ). How a single catalytic center in Fok I or in PI-SceI effects sequential scission of both DNA strands is unclear, but movement of either the protein or the DNA following first strand cleavage may be involved.

Potential DNA-Binding Sites and DNA Docking
The nature of the areas around the putative active site in domain II indicates potential sites for DNA binding. The obvious areas include the exposed surfaces of the two symmetry-related ß sheets flanking the two Asp active site residues and the ß-hairpin loops (between ß15 and ß16 and between ß21 and ß22) above the sheets (Figure 2). Loops are very often seen in structures of DNA-binding proteins involved in interacting with either the major or minor groove of DNA, including those of restriction endonucleases (Aggarwal, 1995 ). Sheets have also been observed as docking sites for DNA (e.g., in PvuII restriction endonuclease, MetJ and Arc repressor proteins Somers and Phillips, 1992 ; Cheng et al., 1994 ; Raumann et al., 1994 ). Evidence for the involvement of loops and ß sheets in domain II in DNA binding come from proteolytic protein footprinting studies of I-PorI and I-DmoI, two archaeal homing endonucleases related to PI-SceI (Lykke-Andersen et al., 1996 ). In both endonucleases, the footprinting studies identified four sites for binding DNA that follow immediately and approximately 40-60 residues after each dodecapeptide repeat. Although the interpretation of this result as indicating the presence of two separate DNA-binding domains is incorrect (see above), the four sites map on our PI-SceI three-dimensional structure on exposed parts of the areas identified above. For example, in I-PorI, the two sites after the first dodecapeptide repeat coincide with the C terminus of the nonsheet segment immediately following the first dodecapeptide repeat helix ({alpha}4) and ß14 and with the loop preceeding ß16 that is above the sheet (Figure 2 and Figure 3). The two sites after the second dodecapeptide repeat align with the loop between ß19 and ß20 and with the large loop between ß21 and ß22 above the sheet (Figure 2 and Figure 3). Furthermore, mutations located within the large loops and ß sheets interfere with substrate binding (F. S. G., unpublished data).

To help comprehend the interaction of PI-SceI with its lengthy recognition site (31 bp or longer; Gimble and Wang, 1996 ; Wende et al., 1996 ), we carried out a preliminary docking of a DNA onto the structure using the program GRASP (Nicholls et al., 1993 ), which identifies surface contours and electrostatic charge potentials. In docking of a 30 bp B-form model (Figure 5), we were guided by five criteria. First, the scissile bonds of the DNA were placed as close as possible to the pair of Asp residues in the putative active site. Second, the two symmetry-related sheets, along with their loops, served as a platform for docking DNA; sheets 7 and 9 for base pairs to the left (or minus; see Gimble and Wang, 1996 ) and right (or plus) sequences, respectively, of the center of the cleavage site. This docking arrangement of the DNA takes into account the experimental observation that the left sequence requires fewer base pairs than the right for DNA binding and cleavage (Gimble and Stephens, 1995 ; Gimble and Wang, 1996 ). Moreover, the right sequence alone is sufficient for high affinity binding to PI-SceI (Gimble and Stephens, 1995 ; Gimble and Wang, 1996 ; Wende et al., 1996 ). The DNA-binding surface of sheet 7, which extends to the edge of the protein, is more limited than that of sheet 9, which extends toward the middle of the protein structure. Third, the four DNA-binding sites, two following each of the dodecapeptide repeat helices, which contain positively charged residues, are close to the DNA. Fourth, a bend of about 55° is directed toward the major groove at about +7 bp to the right of the center of the cleavage site, which lies close to the junction between the two domains. This is the approximate location of the bend detected experimentally (Gimble and Wang, 1996 ; Wende et al., 1996 ). This bend is necessitated by the angular orientation between domain I, especially the sheet 6 extension, and domain II (Figure 2 and Figure 5). Fifth, the DNA was docked as closely as possible to clusters of exposed, intensely positive, charged residues (Figure 5). In domain II, these clusters are located in the four binding sites indicated above. A heavy concentration of positive surfaces is also found in an area at the interface between the two domains and in the extended region of domain I. In support, the DNA makes numerous phosphate backbone contacts in the region that is thought to bind to the interdomain surface (Gimble and Wang, 1996 ).


Figure 5. An Approximate Docking Model of a B DNA to PI-SceI

Protein surface charge distribution was calculated and displayed by the program GRASP; potentials less than -10 kT, neutral, and greater than 10 kT are displayed in red, white, and blue, respectively. The orientation of the molecule (domain I to the right and domain II to the left) is related to that shown in Figure 2 by a counterclockwise rotation of about 90°. The negative surface at the active site is contributed by Asp-218 and Asp-326. Lys-301 lies in the positive surface slightly below and to the right of this negative surface. The DNA is oriented so that the top strand (5' to 3') starts from left and the center of the cleavage site on the active site. The discontinuity in the backbone representation is caused by the introduction of a bend of about 55° in the modeling. See text for further details.

View larger version (46K): [in this window] [in new window]

In docking the DNA, it became apparent that the size or diameter of globular domain II is insufficient to accommodate the entire 30 bp DNA model, even in the presence of the bend. DNA binding would require the participation of domain I. The docking model indicates that domain II can recognize about 14 bp (from about -8 bp to +6 bp of the cleavage site). The additional 16 or more bp on the plus, or right, sequence of the cleavage site extends to the arm of domain I, which contains a high concentration of clusters of intense positive charge (Figure 5). The limited base sequence potentially recognized by domain II is consistent with the observation that endonucleases related only to this domain recognize much shorter DNA substrates (~14-20 bp). In addition, the sharp bend in the DNA, which has been experimentally observed in complexes of PI-SceI with DNA, may very well be due to the presence of the elongated domain I which, as the structure indicates, adopts a roughly equivalent bend relative to domain II.

The Protein Splicing Catalytic Site
The structure of the PI-SceI intein represents the excised end product of protein splicing. The positions of the key junction amino acid residues (Figure 4A) identified by mutation are entirely consistent with their proposed roles in the reaction pathway of self-splicing (Hirata and Anraku, 1992 ; Cooper et al., 1993 ; Chong et al., 1996 ). Our structural analysis has also revealed the presence of two His residues that occur in domain I, close to both terminal residues (Figure 4A): His-79, which is invariant among inteins (Perler et al., 1997 ), and His-453, which is conserved but not essential for splicing (Cooper et al., 1993 ; Chong et al., 1996 ). Due to their nearly neutral pKa's, these residues could act as general acids or bases, functions required in a majority of the proposed steps in splicing mechanisms (discussed below). At the former N-terminal and C-terminal intein-extein junctions, Cys-1 and Asn-454 lie in close proximity (2.9 Å between SG and OD2), which would be consistent with Cys-455 acting as a nucleophile to cleave the thioester at Cys-1 and forming the branched intermediate. In addition, His-79 is closely situated to Cys-1 and may act as a proton donor/acceptor to facilitate the N-to-S acyl shift and transesterification reactions (Pietrokovski, 1994 ). Indeed, the imidazole side chain of His-79 is closely situated to Cys-1, which undergoes the N-to-S acyl shift. The distance between His-79 and Cys-455, the putative nucleophile that initiates transesterification, is unknown, since that residue is absent from the protein. Cleavage of the peptide bond between Asn-454 and Cys-455 is coupled to the cyclization of Asn-454 that yields a C-terminal amino succinimide (Chong et al., 1996 ). In the PI-SceI structure, this position contains Asn because the protein was generated by recombinant methods rather than by protein splicing. Mutagenesis studies have suggested that His-453 assists in the cyclization of Asn-454 (Cooper et al., 1993 ; Chong et al., 1996 ). In support of this idea, the PI-SceI crystal structure shows that the imidazole side chain of His-453 lies very close to Asn-454 (Figure 4A). Whether the structure of the excised PI-SceI intein resembles that of the extein-intein precursor is unknown, as conformational changes may occur during the splicing process.

Evolutionary Implications of the Structure
The bipartite domain structure of PI-SceI is likely paralleled by a separation of the protein splicing and endonucleolytic cleavage activities. In the case of the related PI-TliI intein, it has been demonstrated that mutations that abolish one activity have little or no effect on the other (Hodges et al., 1992 ). Moreover, substantial evidence suggests that the two domains and activities evolved independently. Proteins related to domain II are not always associated with protein splicing inteins and perform a variety of biological functions in different contexts (see Introduction). Similarly, there are two examples of inteins, from the Mycobacterium xenopi gyrA and Porphyra purpurea dnaB genes (Perler et al., 1997 ), that are only related to domain I.

Based on these observations, we hypothesize that the VMA1 intein is encoded by a composite gene that resulted from the invasion of an endonuclease ORF into a preexisting gene that encoded a protein with protein splicing activity or that later evolved this activity. The endonuclease ORF is likely to have been the mobilizing entity rather than the protein splicing ORF, because endonucleolytic activity is required for intein and intron homing. Furthermore, the fact that the endonuclease ORF is embedded in the middle of the protein splicing ORF provides additional circumstantial evidence that it was the invading entity. Once these genes were fused, we speculate that the entire endonuclease-splicing ORF functioned as a mobile element that inserted into the VMA1 locus. The symbiotic association of the endonuclease ORF with the splicing ORF benefits both entities. The endonuclease ORF is associated with a gene that encodes a polypeptide that safely removes itself and the endonuclease from the vacuolar H+-ATPase host protein and prevents any deleterious effects to the host. By allying itself with an endonuclease ORF, the splicing gene is assured of mobility within the same species and eventually to new species as well, perhaps by horizontal transmission. This scenario is analogous to proposed models that explain the association of Group I introns and endonuclease ORFs (Belfort, 1989 ; Lambowitz, 1989 ). There, the intron provides a means of removal to the endonuclease ORF by RNA splicing rather than protein splicing, and the endonuclease ORF allows the intron to mobilize to new locations in the genome. Evidence for the original invasion of the intron by the endonuclease ORF comes from phage T4, where it has been shown that the sunY intron-encoded I-TevII endonuclease can cleave a synthetic intron that lacks the endonuclease ORF (Loizos et al., 1994 ). The fused intron-endonuclease ORF could then move to new genomic locations. By analogy, if the PI-SceI endonuclease ORF invaded a protein splicing gene, a PI-SceI target site would be predicted to occur at the border between domains I and II. However, none is observed, perhaps due to the further evolution of the intein.

Comparison of Endonuclease Structures
With the exception of the similarity with the catalytic charged residue triad, PI-SceI endonuclease is very different from EcoRI, PuvII, EcoRV, and BamHI restriction endonucleases, which together do not show common features. The Type II restriction enzymes function as homodimers, with each subunit interacting with a separate half-site of a palindromic sequence, whereas PI-SceI is a monomeric protein that contacts an extended asymmetric site. This points to a difference in strategy; the use of a palindromic target limits the size of the recognition site and the selectivity, but the dimeric composition of restriction enzymes permits a smaller subunit size (Schleif, 1988 ). The PI-SceI monomer, on the other hand, is approximately twice as long as the restriction endonucleases but recognizes a significantly longer target. Consequently, the 6 bp binding sites of the four restriction enzymes reside in a U-shaped cleft formed between the two monomers (Aggarwal, 1995 ), whereas the deduced DNA-binding site of PI-SceI covers a much wider area of the protein surface (Figure 5). Although the three charged residues in the active site of the four restriction endonucleases are similar to those of PI-SceI, the locations of these residues differ between the restriction endonucleases and PI-SceI. They are clustered at a ß-hairpin turn in the restriction endonucleases, whereas those of PI-SceI are clustered at the C termini of a pair of helices. Moreover, the charged active site residues in the four endonucleases are close to each other in the sequences, whereas those in the PI-SceI are widely separated. Despite these differences, the cleavage mechanism of these endonucleases will likely prove to be very similar. However, understanding how the two asymmetric DNA strands are cleaved by one active site in homing endonucleases is a very interesting challenge.

The three-dimensional structure reported here paves the way for investigations that will elucidate the molecular recognition functions and catalytic activities of PI-SceI. Further experiments (e.g., site-directed mutagenesis) suggested by the structure will enable us to test several predictions that result from the detail that it provides.

Experimental Procedures
Summary Introduction Results and Discussion Procedures References

Protein Purifications and Crystallization
The procedures for overexpression and purification of recombinant wild-type PI-SceI have been previously described (Gimble and Stephens, 1995 ). The selenomethionyl (Se-Met) PI-SceI was overexpressed from E. coli DL41 (DE3), a methionine auxotroph strain (Hendrickson et al., 1990 ). A starter culture of E. coli strain DL41 (DE3) (gift of W. A. Hendrickson) containing the expression plasmid pT7PI-SceI ESARC (Gimble and Stephens, 1995 ) was grown overnight at 37°C in Luria broth with 100 µg/ml ampicillin. This culture was diluted 1/100 (v/v) into a defined medium (Hendrickson et al., 1990 ) lacking methionine but containing 50 µg/ml D,L selenomethione (Sigma) and grown overnight at 32°C. This culture was then diluted 1/200 (v/v) into the same medium and grown to OD600 of 0.6 at 32°C, at which time isopropyl-1-thio-ß-D-galactopyranoside was added to 0.5 mM. Growth was continued at 32°C overnight, and the protein was purified as previously described for the wild-type protein (Gimble and Thorner, 1993 ). The endonuclease activity of the Se-Met PI-SceI is indistinguishable from that of the wild-type protein.

Both wild-type and Se-Met PI-SceI were crystallized at room temperature using the hanging drop vapor diffusion method over a reservoir solution containing 4% PEG 6K, 10 mM ß-mercaptoethanol (ßME), 3 mM CdCl2, 1 mM MgCl2, and 100 mM Tris (pH 8.5). The 2 µl drops of the protein (8 mg/ml in 5 mM ßME and 20 mM Tris [pH 8.0]) were mixed with an equal volume of the reservoir solution. Relative to the wild-type protein, Se-Met PI-SceI crystals were bigger and easier to reproduce and showed diffraction to higher resolution. The crystals belong to space group P21 with unit cell parameters a = 59.6 Å, b = 102.9 Å, c = 87.4 Å, and ß = 94.3° for the wild-type protein crystal measured with laboratory area detector, and a = 59.8 Å, b = 102.4 Å, c = 87.1 Å, and ß = 94.1° for the Se-Met protein measured using synchrotron data. There are two molecules per asymmetric unit, and the solvent content is about 53%. Crystals for data collection were stabilized in the mother liquor, which contained 20% glycerol, and flashcooled to -170°C in liquid nitrogen.

Data Collection
MAD data were collected from one frozen crystal at the HHMI X4A beam line of the National Synchrotron Light Source (NSLS). The optimal wavelengths for Se data collection at NSLS were determined by measuring fluorescence scan with a scintillation counter using the frozen PI-SceI crystal from which data were taken. The oscillation data were then collected at the absorption edge (0.9794 Å), peak (0.9790 Å), and remote peak (0.9656 Å) using inverse beam geometry with an oscillation angle of 1.2°. The data were indexed and integrated using the program DENZO and scaled within each wavelength using SCALEPACK (Otwinowski and Minor, 1997 ) without merging (Table 1). For the refinement, the data of the remote wavelength were processed to the limit of diffraction (2.4 Å) and merged, ignoring the anomalous difference.

Structure Determination
MAD phasing using reflections from 10 to 2.6 Å was done using the MADSYS suite of programs (Hendrickson, 1991 ). Scaled and unmerged data were used in phasing and were subjected to local scaling to reduce noise in the Bijvoet signal and in the dispersive signal among the three wavelengths. Scaled data were fitted in MADLSQ to approximate f' and f'' published values.

As there are 8 Met residues in the wild-type PI-SceI, we expect a total of 16 selenium sites in the two Se-Met PI-SceI molecules within the asymmeteric unit. The positions of 8 initial selenium atoms were determined from the anomalous difference Patterson map of the peak wavelength with SHELXS (Sheldrick, 1991 ) and confirmed by the dispersive difference between the edge and remote wavelengths. Six other selenium sites were located in difference Fourier maps. The two remaining sites could not be determined and, after completion of the structure refinement, we noticed that the missing site in either molecule likely corresponds to Met-372, which is located in a segment (residues 369-375) with little or no density in either molecule. This is presumably because the residues in this segment are highly exposed to solvent and adopt several spatially distinct conformations. All the other selenium sites were confirmed in fitting of the sequence.

The 14 total selenium sites were refined by ASLSQ and input into MADFAZ to obtain phases at 2.6 Å resolution (Table 1). The resulting 2.6 Å electron density map showed clear solvent boundary and several secondary structures and confirmed the presence of two molecules in the asymmetric unit. Density modification and two-fold noncrystallographic averaging using SOLV and AVER options, respectively, of "dm" (Cowton, 1994 ) in the CCP4 suite of programs greatly improved the quality of the map.

Model Building and Refinement
The skeleton generated by BONE (Jones et al., 1991 ) immediately revealed several long chains of secondary structures. Chain tracing and model building were carried out using the program O (Jones et al., 1991 ) and CHAIN (Sack, 1988 ). Identification and initial fitting of segments of the amino acid sequence were facilitated by the locations of the seleno-methionines. At several stages in fitting, improved maps were obtained using the experimental phases combined with the phases calculated from the partial structure by the program SIGMAA (Read, 1986 ) in the CCP4 package. For the protein refinement, the 2.4 Å data of the remote wavelength were used. The model was subjected to several cycles of molecular dynamics and restrained refinement with X-PLOR (Brünger, 1992 ) and manual rebuilding. The final refinement statistics are shown in Table 1. The following segments of residues, all located in loops at the protein surface, have no convincing density in either molecule in the asymmeteric units and, therefore, are missing from the structure: residues 93-102 between ß9 and ß10, 271-279 between ß16 and {alpha}6, and 369-374 between ß21 and ß22 (Figure 2 and Figure 3). The coordinates have been deposited in the Protein Data Bank (ID code 1VDE).

Structure Analysis
The correctness of the final model was verified by examining its stereochemistry using the program PROCHECK (Lawskowski et al., 1993 ) and its 3D-1D profile (Luthy et al., 1992 ). A Ramachandran plot showed 84% in the most favored region for both structures and none in the disallowed regions. The assignment of the elements of the secondary structure was performed using DSSP algorithm (Kabsch and Sander, 1983 ) as implemented in PROCHECK. Protein surface charge distribution was calculated and displayed by the program GRASP (Nicholls et al., 1993 ).

Acknowledgments

Correspondence regarding this paper should be addressed to F. A. Q. We thank C. Ogata of HHMI/Brookhaven NSLS and A. Nickitenko, A. Hodel, and Z. Wang of the F. A. Q. lab for assistance with data collection at the HHMI X4A beam line of NSLS and for helpful discussions; W. Hendrickson and A. DiGabriele for providing strain DL41 (DE3) and protocols; W. Meador for X-ray technical assistance; and J. Wang and E. Golunski for assistance in protein purification. This work was supported by National Institutes of Health grant R29 GM50815 (F. S. G.), funds from the Institute of Biosciences and Technology (F. S. G.), and funds from the Offices of Research and Information Technology of Baylor College of Medicine (F. A. Q.). X. D. is supported by an NIH-NIGMS Pre-Doctoral Training Grant (GM08280) to the Houston Area Molecular Biophysics Program. F. A. Q. is an Investigator of the Howard Hughes Medical Institute.

Received March 12, 1997; revised April 8, 1997.

References
Summary Introduction Results and Discussion Procedures References

Aggarwal, A.K. (1995). Structure and function of restriction endonucleases. Curr. Opin. Struct. Biol. 5, 11-19.[Medline]

Baldwin, G.S., Vipond, I.B., and Halford, S.E. (1995). Rapid reaction analysis of the catalytic cycle of the EcoRV restriction endonuclease. Biochemistry 34, 705-714.[Medline]

Belfort, M. (1989). Bacteriophage introns: parasites within parasites? Trends Genet. 5, 209-213.[Medline]

Bremer, M.C.D., Gimble, F.S., Thorner, J., and Smith, C.L. (1992). VDE endonuclease cleaves Saccharomyces cerevisiae genomic DNA at a single site: physical mapping of the VMA1 gene. Nucl. Acids Res. 20, 5484.

Brünger, A.T. (1992). X-PLOR Version 3.1: A System for X-Ray Crystallography and NMR., New Haven, CT: Yale University Press.

Carson, M. (1991). Ribbons 2.0. J. Appl. Cryst. 24, 958-961.

Cheng, X., Balendiran, K., Schildkraut, I., and Anderson, J.E. (1994). Structure of PvuII endonuclease with cognate DNA. EMBO J. 13, 3927-3935.[Medline]

Chong, S., Shao, Y., Paulus, H., Benner, J., Perler, F.B., and Xu, M.-Q. (1996). Protein splicing involving the Saccharomyces cerevisiae VMA intein. J. Biol. Chem. 271, 22159-22168.[Medline]

Cooper, A.A., Chen, Y.-J., Lindorfer, M.A., and Stevens, T.H. (1993). Protein splicing of the yeast TFP1 intervening protein sequence: a model of self-excision. EMBO J. 12, 2575-2583.[Medline]

Cooper, A.A., and Stevens, T.H. (1995). Protein splicing: self-splicing of genetically mobile elements at the protein level. Trends Biochem. Sci. 20, 351-356.[Medline]

Cowton, K. (1994). `dm': an automated procedure for phase improvements by density modifications. Joint CCP4 and ESF-EACBM Newslett. Protein Cryst. 31, 34-38.

Gimble, F.S., and Stephens, B.W. (1995). Substitutions in conserved dodecapeptide motifs that uncouple the DNA binding cleavage activities of PI-SceI endonuclease. J. Biol. Chem. 270, 5849-5856.[Medline]

Gimble, F.S., and Thorner, J. (1992). Homing of a DNA endonuclease gene by meiotic gene conversion in Saccharomyces cerevisiae. Nature 357, 301-306.[Medline]

Gimble, F.S., and Thorner, J. (1993). Purification and characterization of VDE, a site-specific endonuclease from the yeast Saccharomyces cerevisiae. J. Biol. Chem. 268, 21844-21853.[Medline]

Gimble, F.S., and Wang, J. (1996). Substrate recognition and induced DNA distortion by the PI-SceI endonuclease, an enzyme generated by protein splicing. J. Mol. Biol. 263, 163-180.[Medline]

Hendrickson, W.A., Horton, J.R., and LeMaster, D.M. (1990). Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD); a vehicle for direct determination of three-dimensional structure. EMBO J. 9, 1665-1572.[Medline]

Hendrickson, W.A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254, 51-58.[Medline]

Henke, R.M., Butow, R.A., and Perlman, P.S. (1995). Maturase and endonuclease functions depend on separate conserved domains of the bifunctional protein encoded by the group I intron aI4a of yeast mitochondrial DNA. EMBO J. 14, 5094-5099.[Medline]

Hensgens, L.A.M., Bonen, L., de Haan, M., van der Horst, G., and Grivell, L. (1983). Two intron sequences in yeast mitochondrial COX1 gene: homology among URF-containing introns and strain-dependent variation in flanking exons. Cell 32, 379-389.[Summary]

Hirata, R., Ohsumi, Y., Nakano, A., Kawasaki, H., Suzuki, K., and Anraku, Y. (1990). Molecular structure of a gene, VMA1, encoding the catalytic subunit of H+-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. J. Biol. Chem. 265, 6726-6733.[Medline]

Hirata, R., and Anraku, Y. (1992). Mutations at the putative junction sites of the yeast VMA1 protein, the catalytic subunit of the vacuolar membrane H+-ATPase, inhibit its processing by protein splicing. Biochem. Biophys. Res. Comm. 188, 40-47.[Medline]

Hodges, R.A., Perler, F.B., Noren, C.J., and Jack, W.E. (1992). Protein splicing removes intervening sequences in an archaea DNA polymerase. Nucl. Acids Res. 20, 6153-6157.

Holm, L., and Sander, C. (1995). Dali: a network tool for protein structure comparison. Trends Biochem. Sci. 20, 478-480.[Medline]

Jones, T.A., Zou, J.Y., Kjelgaard, M., and Cowan, S.W. (1991). Improved methods for building protein models in electron-density maps and the location of errors in these models. Acta Cryst. A 47, 110-119.

Kane, P.M., Yamashiro, C.T., Wolczyk, D.F., Neff, N., Goebl, M., and Stevens, T.H. (1990). Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H+-adenosine triphosphatase. Science 250, 651-657.[Medline]

Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary structure: pattern of hydrogen-bonded and geometrical features. Biopolymers 22, 2577-2637.[Medline]

Kostrewa, D., and Winkler, F.K. (1995). Mg2+ binding to the active site of EcoRV endonuclease: a crystallographic study of complexes with substrate and product DNA at 2 Å resolution. Biochemistry 34, 683-696.[Medline]

Lambowitz, A.M. (1989). Infectious introns. Cell 56, 323-326.[Summary]

Lawskowski, R.A., McArthur, M.W., Moss, D.S., and Thorton, J.M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 282-291.

Loizos, N., Tillier, E.R.M., and Belfort, M. (1994). Evolution of mobile group I introns: recognition of intron sequences by an intron- encoded endonuclease. Proc. Natl. Acad. Sci. USA 91, 11983-11987.[Medline]

Luthy, R., Bowie, J.U., and Eisenberg, D. (1992). Assessment of protein models with three-dimensional profiles. Nature 356, 83-85.[Medline]

Lykke-Andersen, J., Garrett, R.A., and Kjems, J. (1996). Protein footprinting approach to mapping DNA binding sites of two archaeal homing enzymes: evidence for a two-domain protein structure. Nucl. Acids Res. 24, 3982-3989.

Michel, F., Jacquier, A., and Dujon, B. (1982). Comparison of fungal mitochondrial introns reveals extensive homologies in RNA secondary structure. Biochimie 64, 867-881.[Medline]

Mueller, J.E., Bryk, M., Loizos, N., and Belfort, M. (1993). Homing endonucleases. In Nucleases, S.M. Linn, R.S. Lloyd, and R.J. Roberts, eds. (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press), pp. 111-143.

Nicholls, A., Bharadwaj, R., and Honig, B. (1993). GRASP: graphical representation and analysis of surface properties. Biophys. J. 64, 166-170.

Otwinowski, Z., and Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Meth. Enzymol. 276, 307-326.

Perler, F.B., Olsen, G.J., and Adam, E. (1997). Compilation and analysis of intein sequences. Nucl. Acids Res. 25, 1087-1094.

Pietrokovski, S. (1994). Conserved sequence features of inteins (protein introns) and their use in identifying new inteins and related proteins. Protein Sci. 3, 2340-2350.[Medline]

Raumann, B.E., Rould, M.A., Pabo, C.O., and Sauer, R.T. (1994). DNA recognition by ß-sheets in the Arc repressor-operator crystal structure. Nature 367, 754-757.[Medline]

Read, R.J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A 47, 110-119.

Sack, J.S. (1988). CHAIN--a crystallography molecular graphics program. J. Mol. Graphics 6, 224-245.

Schleif, R. (1988). DNA binding by proteins. Science 241, 1182-1187.[Medline]

Sheldrick, G.M. (1991). Heavy atom location using SHELXS-90. In Isomorphous Replacement and Anomalous Scattering: Proceedings of the CCP4 Study Weekend, 25-26 January 1992, W. Wolf, P. Evans, and A.G.W. Leslie, eds. (Warrington, UK: SERC Daresbury Laboratory).

Skowron, P., Kaczorowski, T., Tucholski, J., and Podhajska, A.J. (1993). Atypical DNA-binding properties of class-IIS restriction endonucleases: evidence for recognition of the cognate sequence by a FokI monomer. Gene 125, 1-10.[Medline]

Somers, W.S., and Phillips, S.E.V. (1992). Crystal structure of the met repressor-operator complex at 2.8 Å resolution reveals DNA recognition by ß-strands. Nature 367, 754-757.

Vipond, I.B., Baldwin, G.S., and Halford, S.E. (1995). Divalent metal ions at the active sites of the EcoRV and EcoRI restriction endonucleases. Biochemistry 34, 697-704.[Medline]

Waring, R.B., Davies, R.W., Scazzocchio, C., and Brown, T.A. (1982). Internal structure of a mitochondrial intron of Aspergillus nidulans. Proc. Natl. Acad. Sci. USA 79, 6332-6336.[Medline]

Waugh, D.S., and Sauer, R.T. (1993). Single amino acid substitutions uncouple the DNA binding and strand scission activities of FokI endonuclease. Proc. Natl. Acad. Sci. USA 90, 9596-9600.[Medline]

Wende, W., Grindl, W., Christ, F., Pingoud, A., and Pingoud, V. (1996). Binding, bending and cleavage of DNA substrates by the homing endonuclease PI-SceI. Nucl. Acids Res. 24, 4123-4132.


[Reprint (PDF) Version of this article]


Cell Immunity Neuron
Pos. Available Conferences Pos. Wanted
Copyright © 1997 by Cell Press.