Cell -- Duan et al. 89 (4):555

PI-SceI is a bifunctional yeast protein that propagates itsmobile gene by catalyzing protein splicing and site-specificDNA double-strand cleavage. Here, we report the 2.4 Åcrystal structure of the PI-SceI protein. The structure is composedof two separate domains (I and II) with novel folds and differentfunctions. Domain I, which is elongated and formed largely fromseven ß sheets, harbors the N and C termini residues andtwo His residues that are implicated in protein splicing. DomainII, which is compact and is primarily composed of two similar/ß motifs related by local two-fold symmetry, containsthe putative nuclease active site with a cluster of two acidicresidues and one basic residue commonly found in restrictionendonucleases. This report presents prototypic structures ofdomains with single endonuclease and protein splicing activesites.

Introduction

Introduction

PI-SceI is a 454 amino acid (M_r 50 k) bifunctional protein encodedby a mobile selfish DNA at the VMA1 locus of S. cerevisiae (Gimbleand Thorner, 1992 ). It occurs as an internal protein segment(termed the intein) situated between the N- and C-terminal segments(termed exteins) of an ATPase host protein, and initiates anintein homing process that results in the transfer of its DNAcoding sequence to a recipient VMA1 locus that lacks the inteingene (Gimble and Thorner, 1992 ). First, it autocatalyzes itsexcision from the middle of a 119 kDa protein precursor by proteinsplicing and ligates the two flanking exteins to generate the69 kDa vacuolar ATPase subunit (Hirata et al., 1990 ; Kaneet al., 1990 ; reviewed in Cooper and Stevens, 1995 ). Second,the excised PI-SceI functions as a site-specific DNA endonucleasethat makes a double-strand cut in the recipient VMA1 locus.Repair of this break during meiosis by yeast repair functionspropagates the intein gene (Gimble and Thorner, 1992 ).

Based on conserved sequence motifs or blocks, PI-SceI is relatedto two different groups of proteins found in organisms fromeach phylogenetic kingdom (for a review, see Mueller et al.,1993 ). Motifs within the central region (Blocks C and E Pietrokovski,1994 LAGLI-DADG, or dodecapeptide sequence motifs) occur inproteins that process nucleic acids and have diverse biologicalfunctions. Some are endonucleases similar to PI-SceI that arepart of homing inteins. Other endonucleases are encoded by openreading frames (ORFs) in either Group I or archaeal intronsand initiate intron homing events analogous to intein homing.Another group is encoded by free-standing genes that are independentof introns or inteins. Interestingly, the dodecapeptide motifsalso occur in mitochondrial RNA maturases situated in GroupI introns that are required for efficient intron splicing. Asecond set of motifs at the N- and C-terminal regions of PI-SceI(Blocks A, B, F, and G Pietrokovski, 1994 ) occurs in proteinsthat are known or hypothesized to catalyze protein splicing.

Protein splicing of the yeast VMA1 intein and others requirescatalytic elements present in the intein and the first residueof the C-terminal extein (Cooper and Stevens, 1995 ). Mutagenesisstudies have identified critical residues at the intein-exteinborders (Hirata and Anraku, 1992 ; Cooper et al., 1993 ; Chonget al., 1996 ) and have provided strong evidence for a commonprotein splicing pathway (see Chong et al., 1996 for reactionpathway). First, the peptide bond at the N-terminal intein-exteinjunction is replaced with a thioester linkage through an N-to-Sacyl rearrangement catalyzed by Cys-1 of the intein. Second,transesterification occurs as a result of nucleophilic attackof the N-terminal junction by Cys-455, the first residue ofthe C-terminal extein. This reaction yields a branched intermediatewith two N termini derived from the intein and N-terminal extein.Third, aminosuccinimide formation by Asn-454, the C-terminalresidue of the intein, is coupled to cleavage of the inteinfrom the branched intermediate. Finally, conversion of the extein-exteinthioester linkage into a peptide bond occurs by an S-to-N acylshift.

To initiate intein homing, PI-Sce I cuts the yeast genome exclusivelyat the VMA1 locus (Bremer et al., 1992 ). Site-specific cleavageoccurs within a 31 bp asymmetrical recognition site to generatea 4 bp overhang with a 5' phosphate and requires Mg²⁺ as a cofactor(Gimble and Wang, 1996 ). PI-SceI exists as a monomer in solution(Gimble and Thorner, 1993 ) and also in contact with DNA (Wendeet al., 1996 ). In the absence of Mg²⁺, the protein forms stableintermediates where it is bound solely to a 17 bp sequence adjacentto the cleavage site (minimal binding region) or to the entiresubstrate (Gimble and Wang, 1996 ; Wende et al., 1996 ). PI-SceImakes extensive major groove contacts in the cleavage site region,whereas it uses primarily phosphate backbone contacts withinthe minimal binding region (Gimble and Wang, 1996 ). Interestingly,PI-SceI significantly distorts the DNA (60°-75°) whenbound to the entire recognition site (Gimble and Wang, 1996; Wende et al., 1996 ). These findings can be understood interms of a model where PI-SceI binds first to the minimal bindingregion, then contacts the cleavage site and stabilizes a DNAdistortion that positions the labile phosphodiester bonds intothe enzyme active site.

Here, we report the 2.4 Å X-ray crystal structure of thePI-SceI protein. The data clearly reveal the structural andfunctional duality of the enzyme. Amino acid residues comprisingthe nucleolytic active site, identified by a cluster of chargedresidues that are conserved in the protein family, reside inone domain, while those that participate in protein splicingare located in the other. Preliminary docking of a DNA modelrevealed new features of molecular recognition. Furthermore,examination of the structure immediately suggests an evolutionarymodel that explains the association of the two disparate activities.

Results and Discussion

Results and Discussion

Structure and Novel Domain Motifs
The three-dimensional structure of PI-SceI, the first for ahoming endonuclease and a protein generated by protein splicing,was determined by multiwavelength anomalous dispersion (MAD)and has been refined at 2.4 Å resolution to an R factorand an R_free of 19.2% and 23.8%, respectively (Table 1). A portionof the electron density is shown in Figure 1. The two independentmolecules within the asymmetric unit are very similar and canbe overlapped with an rms deviation of about 1 Å between carbons. The structure is composed of two separate domains(I and II) connected by two peptide segments (Figure 2 and Figure 3). Both domains possess not only unusual folds but alsodifferent functional sites located on the opposite sides ofthe molecule. Domain I, comprising the first 182 and the last44 residues, is an unusual elongated domain (about 3:1 axialratio) composed almost entirely of ß sheets (19 strands).The intervening continuous segment comprising residues 183-410adopts an almost equal mixture of helices (7) and strands (9)forming the compact and globular domain II. A search (Holm andSander, 1995 ) for protein or domain structures that are homologouswith either domain in PI-SceI proved negative.

Figure 1. Stereo View of the Final 2.4 Å 2Fo-Fc Electron Density Map in the Region of the Putative Nuclease Active Site Containing Asp-218, Asp-326, and Lys-301

The map was contoured at 1.

View larger version (65K): [in this window] [in new window]

Figure 2. Stereo Ribbon Drawing of the

Carbon Backbone Trace of PI-SceI

Domain I is shown at the bottom and domain II at the top. helices and ß strands are green and blue, respectively. All nine helices and selected strands spaced at nearly equal intervals are labeled based on the identifications shown in Figure 3. Amino and carboxy termini, which are part of the self-splicing site, are labeled (N) and (C), respectively (see Figure 4A for details). Labeled in red are the positions of the three residues in domain II (Asp-218 and Asp-326 at the C termini of 4 and 7, respectively, and Lys-301 in a loop) that form a charged cluster in the endonuclease active site (see Figure 4B for details). Relative to the figure, the splicing and endonucleolytic active sites are located in the back and front, respectively. The local two-fold symmetry axis between the similar secondary structure motifs in domain II is located between the two vertical helices (4 and 7) at the center of the domain and perpendicular to the plane of the figure. This figure and Figure 4A and Figure 4B were drawn using RIBBON (Carson, 1991 ).

View larger version (36K): [in this window] [in new window]

Figure 3. Topology Diagram of PI-SceI Structure

The circles (green) and triangles (red) represent helices and ß strands, respectively. The bold numbers identify the ten different ß sheets. The two secondary structure motifs (4-ß145ß15ß166 and 7ß19ß208ß21 ß229) related by local two-fold symmetry in domain II are enclosed in the rectangular box with dashed lines. Although the segment between 4 and ß14, which forms a single hydrogen bond with ß14, is technically not a ß strand, it is topologically identical to ß19 of the second motif (see also Figure 2). The closer proximity between 7 and ß19 is meant to convey a one residue connection (Gly 327) between the two secondary structures.

View larger version (27K): [in this window] [in new window]

Table 1. Summary of Crystallographic Analysis

View this table: [in this window] [in new window]

The 19 strands in domain I are incorporated into a total of7 closely packed ß sheets approximately divided into twodissimilar underlying substructures (Figure 2 and Figure 3).The elongated structure of domain I is primarily due to side-by-sidearrangement of sheets 2, 10, 5, and 6, which constitute onesubstructure, with sheet 6 providing a long extension. The othershorter substructure, comprised of sheets 1, 3, and 4, forma ß sandwich with sheets 2, 10, and 5 (Figure 2). As domainI harbors both terminal residues (Cys-1 and Asn-454) (Figure 2and Figure 4A), which are conserved between various inteinsand are essential for protein splicing (Pietrokovski, 1994 ),it is associated with the self-splicing machinery. Consistentwith its autocatalytic self-splicing function, both terminalresidues are in close proximity and are lodged in a cavity surroundedby parts of sheets 1, 3, 5, 7, 8, 25, 26, 27, and 28 (Figure 2and Figure 4A).

Figure 4. Stereo Views of the Functional Sites in PI-SceI

(A) The protein self-splicing site containing the essential N-terminal Cys-1 and C-terminal Asn-454 residues viewed from the back of domain I, as shown in Figure 2. His-79 and His-453, positioned proximal to the terminal residues, are extremely conserved between self-splicing proteins and could perform as general acids/bases in the autocatalytic splicing reaction. Three of the ß strands that are identified are further linked to subsequent strands in the site shown (e.g., ß3 to ß6, ß7 to ß8, and ß24 to ß28).

(B) The endonuclease active site containing the charged cluster of three residues (see text). The orientations of the two symmetry-related 4 and 7 dodecapeptide repeats are similar to those shown in Figure 2.

View larger version (29K): [in this window] [in new window]

The compact domain II is mainly built up from two sub- structures,each with very similar secondary structure motifs (4-ß145ß15ß166and 7ß19ß208ß21ß229) (Figure 2 and Figure 3). Moreover, the two motifs are related by local two-foldsymmetry about an axis between the vertical parallel 4 and 7with a relative twist of about 35° (Figure 2). Helices 4and 7 contain the two dodecapeptide sequences that are distinguishingcharacteristics of homing endonucleases and maturases (Michelet al., 1982 ; Waring et al., 1982 ; Hensgens et al., 1983). The two vertical parallel helices are surrounded by two nearlyhorizontal pairs of symmetry-related helices (5 and 6 in onemotif related to 8 and 9, respectively, in the other). The approximatesymmetry is also clearly apparent between the two ß sheetsin both motifs that flank the C-terminal ends of the two parallelhelices and that form a concave twisted canopy above the symmetry-relatedhelices (Figure 2). (Although the segment following 4 in onesheet is not quite a sheet strand, it is topologically equivalentto ß14 of the other sheet [ Figure 2 and Figure 3]).The similarity of the two motifs is evident from superpositioningof the carbon atoms of the 63 pairs of residues in the overlappedsecondary structures of both motifs, which show an rms deviationof 1.7 Å. The resulting overlapped sequences reveal 22%identity and 46% similarity.

Endonuclease Active Site
The nuclease active site apparently resides in domain II, atthe C-terminal ends of the parallel helices 4 and 7. This locationis consistent with several pieces of evidence. Two Asp residues(Asp-218 and Asp-326) that are located at the C termini of theparallel helices and Lys-301, found in a loop immediately afterß18, form a charged cluster (Figure 2 and Figure 4B)that bears similarity with those commonly seen in the catalyticsites of homodimeric restriction endonucleases with previouslydetermined three-dimensional structures (EcoRI, PvuII, EcoRV,and BamHI; for a review, see Aggarwal, 1995 ). Comparison ofthe PI-SceI Asp-218, Asp-326, Lys-301 triad with the identicalresidues in the active site of EcoRV restriction endonucleasesindicates an rms deviation of about 2 Å between carbons.The Asp-218 and Asp-326 residues in PI-SceI occur at positionswithin the dodecapeptide motifs that are extremely conservedas acidic residues among the related homing endonucleases andmaturases (Mueller et al., 1993 ; Pietrokovski, 1994 ). Conservationof the dodecapeptide motifs among the protein family may bea consequence of the fact that they comprise the two helicesthat correctly position the active site residues. Lys-301 isa conserved residue within a separate motif found in inteins(Block D, Pietrokovski, 1994) and several maturases. Finally,substitutions of the two Asp residues by site-directed mutagenesisabolish DNA cleavage but not binding (Gimble and Stephens, 1995), and a Lys-301-to-Ala substitution leads to loss of catalyticactivity (data not shown).

In spite of profound differences in the overall structures ofthe restriction endonucleases and PI-SceI (discussed below),the similar active site arrangement suggests an analogous hydrolyticmechanism. Asp-218 and Asp-326 of PI-SceI appear to be structurallyequivalent to acidic residues in EcoRI, EcoRV, BamHI, and PvuII(Aggarwal, 1995 ) and presumably comprise part of the Mg²⁺ bindingsite. Although Mg²⁺ was present during crystal growth, its locationhas not been clearly established. There is a density betweenthe two Asp carboxylate sidechains that, at present, is assignedto a water molecule (Figure 1). The exact role of the Mg²⁺ ionin the PI-SceI reaction pathway is unclear, but in the caseof the restriction enzymes, its proposed function is to polarizethe phosphate that is attacked. The close proximity of the twoAsp residues implicated in Mg²⁺ binding likely requires thathelices 4 and 7 mesh and remain in close contact. For EcoRV,it has been proposed that a second Mg²⁺ ion that is bound bya third acidic residue functions to activate the attacking watermolecule (Baldwin et al., 1995 ; Vipond et al., 1995 ). However,there is no obvious equivalent acidic residue in PI-SceI structureor in the conserved blocks of the other homing endonucleases,so a two metal ion mechanism is unlikely for this enzyme family.The lysine residues found in the catalytic triads are thoughtto stabilize the doubly charged pentavalent transition state(Kostrewa and Winkler, 1995 ), and a similar role may be playedby Lys-301.

Because PI-SceI is a monomer in solution (Gimble and Thorner,1993 ) and even in the presence of DNA (Wende et al., 1996 ;F. S. G., unpublished data), there has been some speculationas to whether the enzyme contains 1 catalytic site that cutstwo strands or 2 catalytic sites, each of which cleaves at onestrand. The structural and mutational data (see above) are fullyconsistent with there being one active site that lies closeto the approximate center of symmetry in domain II. (Althoughthe structure determined is that of two molecules in the asymmetricunit, related by local two-fold symmetry, the active sites ofthe two proteins are separated by about 40 Å.) We areunable to find another similar three charged residue clusterthat: (1) would be at an appropriate distance (about 13 Å)to the first cluster for DNA cleavage to leave a 4 bp overhang,and (2) would be conserved among homing endonucleases (Pietrokovski,1994 ). The data, taken together, do not support the suggestionof two active site domains, each containing a single dodecapeptidemotif (Henke et al., 1995 ; Lykke-Andersen et al., 1996 ; Wende et al., 1996 ). General features of DNA cleavage by theType IIs FokI endonuclease resemble those of PI-SceI, even thoughthe sequences of the enzymes and the specific protein-DNA interactionsare very different. Like PI-SceI, FokI recognizes an asymmetricbinding site, binds DNA as a monomer (Skowron et al., 1993 ),and possesses a single active site (Waugh and Sauer, 1993 ).How a single catalytic center in Fok I or in PI-SceI effectssequential scission of both DNA strands is unclear, but movementof either the protein or the DNA following first strand cleavagemay be involved.

Potential DNA-Binding Sites and DNA Docking
The nature of the areas around the putative active site in domainII indicates potential sites for DNA binding. The obvious areasinclude the exposed surfaces of the two symmetry-related ßsheets flanking the two Asp active site residues and the ß-hairpinloops (between ß15 and ß16 and between ß21and ß22) above the sheets (Figure 2). Loops are very oftenseen in structures of DNA-binding proteins involved in interactingwith either the major or minor groove of DNA, including thoseof restriction endonucleases (Aggarwal, 1995 ). Sheets havealso been observed as docking sites for DNA (e.g., in PvuIIrestriction endonuclease, MetJ and Arc repressor proteins Somersand Phillips, 1992 ; Cheng et al., 1994 ; Raumann et al.,1994 ). Evidence for the involvement of loops and ß sheetsin domain II in DNA binding come from proteolytic protein footprintingstudies of I-PorI and I-DmoI, two archaeal homing endonucleasesrelated to PI-SceI (Lykke-Andersen et al., 1996 ). In both endonucleases,the footprinting studies identified four sites for binding DNAthat follow immediately and approximately 40-60 residues aftereach dodecapeptide repeat. Although the interpretation of thisresult as indicating the presence of two separate DNA-bindingdomains is incorrect (see above), the four sites map on ourPI-SceI three-dimensional structure on exposed parts of theareas identified above. For example, in I-PorI, the two sitesafter the first dodecapeptide repeat coincide with the C terminusof the nonsheet segment immediately following the first dodecapeptiderepeat helix (4) and ß14 and with the loop preceedingß16 that is above the sheet (Figure 2 and Figure 3).The two sites after the second dodecapeptide repeat align withthe loop between ß19 and ß20 and with the largeloop between ß21 and ß22 above the sheet (Figure 2and Figure 3). Furthermore, mutations located within the largeloops and ß sheets interfere with substrate binding (F.S. G., unpublished data).

To help comprehend the interaction of PI-SceI with its lengthyrecognition site (31 bp or longer; Gimble and Wang, 1996 ;Wende et al., 1996 ), we carried out a preliminary dockingof a DNA onto the structure using the program GRASP (Nichollset al., 1993 ), which identifies surface contours and electrostaticcharge potentials. In docking of a 30 bp B-form model (Figure 5),we were guided by five criteria. First, the scissile bonds ofthe DNA were placed as close as possible to the pair of Aspresidues in the putative active site. Second, the two symmetry-relatedsheets, along with their loops, served as a platform for dockingDNA; sheets 7 and 9 for base pairs to the left (or minus; seeGimble and Wang, 1996 ) and right (or plus) sequences, respectively,of the center of the cleavage site. This docking arrangementof the DNA takes into account the experimental observation thatthe left sequence requires fewer base pairs than the right forDNA binding and cleavage (Gimble and Stephens, 1995 ; Gimbleand Wang, 1996 ). Moreover, the right sequence alone is sufficientfor high affinity binding to PI-SceI (Gimble and Stephens, 1995; Gimble and Wang, 1996 ; Wende et al., 1996 ). The DNA-bindingsurface of sheet 7, which extends to the edge of the protein,is more limited than that of sheet 9, which extends toward themiddle of the protein structure. Third, the four DNA-bindingsites, two following each of the dodecapeptide repeat helices,which contain positively charged residues, are close to theDNA. Fourth, a bend of about 55° is directed toward themajor groove at about +7 bp to the right of the center of thecleavage site, which lies close to the junction between thetwo domains. This is the approximate location of the bend detectedexperimentally (Gimble and Wang, 1996 ; Wende et al., 1996). This bend is necessitated by the angular orientation betweendomain I, especially the sheet 6 extension, and domain II (Figure 2and Figure 5). Fifth, the DNA was docked as closely as possibleto clusters of exposed, intensely positive, charged residues(Figure 5). In domain II, these clusters are located in thefour binding sites indicated above. A heavy concentration ofpositive surfaces is also found in an area at the interfacebetween the two domains and in the extended region of domainI. In support, the DNA makes numerous phosphate backbone contactsin the region that is thought to bind to the interdomain surface(Gimble and Wang, 1996 ).

Figure 5. An Approximate Docking Model of a B DNA to PI-SceI

Protein surface charge distribution was calculated and displayed by the program GRASP; potentials less than -10 kT, neutral, and greater than 10 kT are displayed in red, white, and blue, respectively. The orientation of the molecule (domain I to the right and domain II to the left) is related to that shown in Figure 2 by a counterclockwise rotation of about 90°. The negative surface at the active site is contributed by Asp-218 and Asp-326. Lys-301 lies in the positive surface slightly below and to the right of this negative surface. The DNA is oriented so that the top strand (5' to 3') starts from left and the center of the cleavage site on the active site. The discontinuity in the backbone representation is caused by the introduction of a bend of about 55° in the modeling. See text for further details.

View larger version (46K): [in this window] [in new window]

In docking the DNA, it became apparent that the size or diameterof globular domain II is insufficient to accommodate the entire30 bp DNA model, even in the presence of the bend. DNA bindingwould require the participation of domain I. The docking modelindicates that domain II can recognize about 14 bp (from about-8 bp to +6 bp of the cleavage site). The additional 16 or morebp on the plus, or right, sequence of the cleavage site extendsto the arm of domain I, which contains a high concentrationof clusters of intense positive charge (Figure 5). The limitedbase sequence potentially recognized by domain II is consistentwith the observation that endonucleases related only to thisdomain recognize much shorter DNA substrates (14-20 bp). Inaddition, the sharp bend in the DNA, which has been experimentallyobserved in complexes of PI-SceI with DNA, may very well bedue to the presence of the elongated domain I which, as thestructure indicates, adopts a roughly equivalent bend relativeto domain II.

The Protein Splicing Catalytic Site
The structure of the PI-SceI intein represents the excised endproduct of protein splicing. The positions of the key junctionamino acid residues (Figure 4A) identified by mutation are entirelyconsistent with their proposed roles in the reaction pathwayof self-splicing (Hirata and Anraku, 1992 ; Cooper et al.,1993 ; Chong et al., 1996 ). Our structural analysis has alsorevealed the presence of two His residues that occur in domainI, close to both terminal residues (Figure 4A): His-79, whichis invariant among inteins (Perler et al., 1997 ), and His-453,which is conserved but not essential for splicing (Cooper etal., 1993 ; Chong et al., 1996 ). Due to their nearly neutralpK_a's, these residues could act as general acids or bases, functionsrequired in a majority of the proposed steps in splicing mechanisms(discussed below). At the former N-terminal and C-terminal intein-exteinjunctions, Cys-1 and Asn-454 lie in close proximity (2.9 Åbetween SG and OD2), which would be consistent with Cys-455acting as a nucleophile to cleave the thioester at Cys-1 andforming the branched intermediate. In addition, His-79 is closelysituated to Cys-1 and may act as a proton donor/acceptor tofacilitate the N-to-S acyl shift and transesterification reactions(Pietrokovski, 1994 ). Indeed, the imidazole side chain of His-79is closely situated to Cys-1, which undergoes the N-to-S acylshift. The distance between His-79 and Cys-455, the putativenucleophile that initiates transesterification, is unknown,since that residue is absent from the protein. Cleavage of thepeptide bond between Asn-454 and Cys-455 is coupled to the cyclizationof Asn-454 that yields a C-terminal amino succinimide (Chonget al., 1996 ). In the PI-SceI structure, this position containsAsn because the protein was generated by recombinant methodsrather than by protein splicing. Mutagenesis studies have suggestedthat His-453 assists in the cyclization of Asn-454 (Cooper etal., 1993 ; Chong et al., 1996 ). In support of this idea,the PI-SceI crystal structure shows that the imidazole sidechain of His-453 lies very close to Asn-454 (Figure 4A). Whetherthe structure of the excised PI-SceI intein resembles that ofthe extein-intein precursor is unknown, as conformational changesmay occur during the splicing process.

Evolutionary Implications of the Structure
The bipartite domain structure of PI-SceI is likely paralleledby a separation of the protein splicing and endonucleolyticcleavage activities. In the case of the related PI-TliI intein,it has been demonstrated that mutations that abolish one activityhave little or no effect on the other (Hodges et al., 1992 ).Moreover, substantial evidence suggests that the two domainsand activities evolved independently. Proteins related to domainII are not always associated with protein splicing inteins andperform a variety of biological functions in different contexts(see Introduction). Similarly, there are two examples of inteins,from the Mycobacterium xenopi gyrA and Porphyra purpurea dnaBgenes (Perler et al., 1997 ), that are only related to domainI.

Based on these observations, we hypothesize that the VMA1 inteinis encoded by a composite gene that resulted from the invasionof an endonuclease ORF into a preexisting gene that encodeda protein with protein splicing activity or that later evolvedthis activity. The endonuclease ORF is likely to have been themobilizing entity rather than the protein splicing ORF, becauseendonucleolytic activity is required for intein and intron homing.Furthermore, the fact that the endonuclease ORF is embeddedin the middle of the protein splicing ORF provides additionalcircumstantial evidence that it was the invading entity. Oncethese genes were fused, we speculate that the entire endonuclease-splicingORF functioned as a mobile element that inserted into the VMA1locus. The symbiotic association of the endonuclease ORF withthe splicing ORF benefits both entities. The endonuclease ORFis associated with a gene that encodes a polypeptide that safelyremoves itself and the endonuclease from the vacuolar H⁺-ATPasehost protein and prevents any deleterious effects to the host.By allying itself with an endonuclease ORF, the splicing geneis assured of mobility within the same species and eventuallyto new species as well, perhaps by horizontal transmission.This scenario is analogous to proposed models that explain theassociation of Group I introns and endonuclease ORFs (Belfort,1989 ; Lambowitz, 1989 ). There, the intron provides a meansof removal to the endonuclease ORF by RNA splicing rather thanprotein splicing, and the endonuclease ORF allows the intronto mobilize to new locations in the genome. Evidence for theoriginal invasion of the intron by the endonuclease ORF comesfrom phage T4, where it has been shown that the sunY intron-encodedI-TevII endonuclease can cleave a synthetic intron that lacksthe endonuclease ORF (Loizos et al., 1994 ). The fused intron-endonucleaseORF could then move to new genomic locations. By analogy, ifthe PI-SceI endonuclease ORF invaded a protein splicing gene,a PI-SceI target site would be predicted to occur at the borderbetween domains I and II. However, none is observed, perhapsdue to the further evolution of the intein.

Comparison of Endonuclease Structures
With the exception of the similarity with the catalytic chargedresidue triad, PI-SceI endonuclease is very different from EcoRI,PuvII, EcoRV, and BamHI restriction endonucleases, which togetherdo not show common features. The Type II restriction enzymesfunction as homodimers, with each subunit interacting with aseparate half-site of a palindromic sequence, whereas PI-SceIis a monomeric protein that contacts an extended asymmetricsite. This points to a difference in strategy; the use of apalindromic target limits the size of the recognition site andthe selectivity, but the dimeric composition of restrictionenzymes permits a smaller subunit size (Schleif, 1988 ). ThePI-SceI monomer, on the other hand, is approximately twice aslong as the restriction endonucleases but recognizes a significantlylonger target. Consequently, the 6 bp binding sites of the fourrestriction enzymes reside in a U-shaped cleft formed betweenthe two monomers (Aggarwal, 1995 ), whereas the deduced DNA-bindingsite of PI-SceI covers a much wider area of the protein surface(Figure 5). Although the three charged residues in the activesite of the four restriction endonucleases are similar to thoseof PI-SceI, the locations of these residues differ between therestriction endonucleases and PI-SceI. They are clustered ata ß-hairpin turn in the restriction endonucleases, whereasthose of PI-SceI are clustered at the C termini of a pair ofhelices. Moreover, the charged active site residues in the fourendonucleases are close to each other in the sequences, whereasthose in the PI-SceI are widely separated. Despite these differences,the cleavage mechanism of these endonucleases will likely proveto be very similar. However, understanding how the two asymmetricDNA strands are cleaved by one active site in homing endonucleasesis a very interesting challenge.

The three-dimensional structure reported here paves the wayfor investigations that will elucidate the molecular recognitionfunctions and catalytic activities of PI-SceI. Further experiments(e.g., site-directed mutagenesis) suggested by the structurewill enable us to test several predictions that result fromthe detail that it provides.

Experimental Procedures

Procedures

Protein Purifications and Crystallization
The procedures for overexpression and purification of recombinantwild-type PI-SceI have been previously described (Gimble andStephens, 1995 ). The selenomethionyl (Se-Met) PI-SceI was overexpressedfrom E. coli DL41 (DE3), a methionine auxotroph strain (Hendricksonet al., 1990 ). A starter culture of E. coli strain DL41 (DE3)(gift of W. A. Hendrickson) containing the expression plasmidpT7PI-SceI ESARC (Gimble and Stephens, 1995 ) was grown overnightat 37°C in Luria broth with 100 µg/ml ampicillin.This culture was diluted 1/100 (v/v) into a defined medium (Hendricksonet al., 1990 ) lacking methionine but containing 50 µg/mlD,L selenomethione (Sigma) and grown overnight at 32°C.This culture was then diluted 1/200 (v/v) into the same mediumand grown to OD₆₀₀ of 0.6 at 32°C, at which time isopropyl-1-thio-ß-D-galactopyranosidewas added to 0.5 mM. Growth was continued at 32°C overnight,and the protein was purified as previously described for thewild-type protein (Gimble and Thorner, 1993 ). The endonucleaseactivity of the Se-Met PI-SceI is indistinguishable from thatof the wild-type protein.

Both wild-type and Se-Met PI-SceI were crystallized at roomtemperature using the hanging drop vapor diffusion method overa reservoir solution containing 4% PEG 6K, 10 mM ß-mercaptoethanol(ßME), 3 mM CdCl₂, 1 mM MgCl₂, and 100 mM Tris (pH 8.5).The 2 µl drops of the protein (8 mg/ml in 5 mM ßMEand 20 mM Tris [pH 8.0]) were mixed with an equal volume ofthe reservoir solution. Relative to the wild-type protein, Se-MetPI-SceI crystals were bigger and easier to reproduce and showeddiffraction to higher resolution. The crystals belong to spacegroup P2₁ with unit cell parameters a = 59.6 Å, b = 102.9Å, c = 87.4 Å, and ß = 94.3° for the wild-typeprotein crystal measured with laboratory area detector, anda = 59.8 Å, b = 102.4 Å, c = 87.1 Å, and ß= 94.1° for the Se-Met protein measured using synchrotrondata. There are two molecules per asymmetric unit, and the solventcontent is about 53%. Crystals for data collection were stabilizedin the mother liquor, which contained 20% glycerol, and flashcooledto -170°C in liquid nitrogen.

Data Collection
MAD data were collected from one frozen crystal at the HHMIX4A beam line of the National Synchrotron Light Source (NSLS).The optimal wavelengths for Se data collection at NSLS weredetermined by measuring fluorescence scan with a scintillationcounter using the frozen PI-SceI crystal from which data weretaken. The oscillation data were then collected at the absorptionedge (0.9794 Å), peak (0.9790 Å), and remote peak(0.9656 Å) using inverse beam geometry with an oscillationangle of 1.2°. The data were indexed and integrated usingthe program DENZO and scaled within each wavelength using SCALEPACK(Otwinowski and Minor, 1997 ) without merging (Table 1). Forthe refinement, the data of the remote wavelength were processedto the limit of diffraction (2.4 Å) and merged, ignoringthe anomalous difference.

Structure Determination
MAD phasing using reflections from 10 to 2.6 Å was doneusing the MADSYS suite of programs (Hendrickson, 1991 ). Scaledand unmerged data were used in phasing and were subjected tolocal scaling to reduce noise in the Bijvoet signal and in thedispersive signal among the three wavelengths. Scaled data werefitted in MADLSQ to approximate f' and f'' published values.

As there are 8 Met residues in the wild-type PI-SceI, we expecta total of 16 selenium sites in the two Se-Met PI-SceI moleculeswithin the asymmeteric unit. The positions of 8 initial seleniumatoms were determined from the anomalous difference Pattersonmap of the peak wavelength with SHELXS (Sheldrick, 1991 ) andconfirmed by the dispersive difference between the edge andremote wavelengths. Six other selenium sites were located indifference Fourier maps. The two remaining sites could not bedetermined and, after completion of the structure refinement,we noticed that the missing site in either molecule likely correspondsto Met-372, which is located in a segment (residues 369-375)with little or no density in either molecule. This is presumablybecause the residues in this segment are highly exposed to solventand adopt several spatially distinct conformations. All theother selenium sites were confirmed in fitting of the sequence.

The 14 total selenium sites were refined by ASLSQ and inputinto MADFAZ to obtain phases at 2.6 Å resolution (Table 1).The resulting 2.6 Å electron density map showed clearsolvent boundary and several secondary structures and confirmedthe presence of two molecules in the asymmetric unit. Densitymodification and two-fold noncrystallographic averaging usingSOLV and AVER options, respectively, of "dm" (Cowton, 1994 )in the CCP4 suite of programs greatly improved the quality ofthe map.

Model Building and Refinement
The skeleton generated by BONE (Jones et al., 1991 ) immediatelyrevealed several long chains of secondary structures. Chaintracing and model building were carried out using the programO (Jones et al., 1991 ) and CHAIN (Sack, 1988 ). Identificationand initial fitting of segments of the amino acid sequence werefacilitated by the locations of the seleno-methionines. At severalstages in fitting, improved maps were obtained using the experimentalphases combined with the phases calculated from the partialstructure by the program SIGMAA (Read, 1986 ) in the CCP4 package.For the protein refinement, the 2.4 Å data of the remotewavelength were used. The model was subjected to several cyclesof molecular dynamics and restrained refinement with X-PLOR(Brünger, 1992 ) and manual rebuilding. The final refinementstatistics are shown in Table 1. The following segments ofresidues, all located in loops at the protein surface, haveno convincing density in either molecule in the asymmetericunits and, therefore, are missing from the structure: residues93-102 between ß9 and ß10, 271-279 between ß16and 6, and 369-374 between ß21 and ß22 (Figure 2and Figure 3). The coordinates have been deposited in the ProteinData Bank (ID code 1VDE).

Structure Analysis
The correctness of the final model was verified by examiningits stereochemistry using the program PROCHECK (Lawskowski etal., 1993 ) and its 3D-1D profile (Luthy et al., 1992 ). A Ramachandranplot showed 84% in the most favored region for both structuresand none in the disallowed regions. The assignment of the elementsof the secondary structure was performed using DSSP algorithm(Kabsch and Sander, 1983 ) as implemented in PROCHECK. Proteinsurface charge distribution was calculated and displayed bythe program GRASP (Nicholls et al., 1993 ).

Acknowledgments

Correspondence regarding this paper should be addressed to F.A. Q. We thank C. Ogata of HHMI/Brookhaven NSLS and A. Nickitenko,A. Hodel, and Z. Wang of the F. A. Q. lab for assistance withdata collection at the HHMI X4A beam line of NSLS and for helpfuldiscussions; W. Hendrickson and A. DiGabriele for providingstrain DL41 (DE3) and protocols; W. Meador for X-ray technicalassistance; and J. Wang and E. Golunski for assistance in proteinpurification. This work was supported by National Institutesof Health grant R29 GM50815 (F. S. G.), funds from the Instituteof Biosciences and Technology (F. S. G.), and funds from theOffices of Research and Information Technology of Baylor Collegeof Medicine (F. A. Q.). X. D. is supported by an NIH-NIGMS Pre-DoctoralTraining Grant (GM08280) to the Houston Area Molecular BiophysicsProgram. F. A. Q. is an Investigator of the Howard Hughes MedicalInstitute.

Received March 12, 1997; revised April 8, 1997.

References