Assignment 6
1. Add MLH1_Human protein to the Biology Workbench. Predict its secondary structure by GOR4.
>GENPEPT:463989
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVI
VKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFR
GEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQI
TVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGET
VADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNAN
YSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP
QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPL
SKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGD
TTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRI
INLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLL
NTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEE
DGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPL
EGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLY
KVFERC

Alpha Helix = H Beta Sheet = E Random Coil = C

2. Do a homology searching of MLH1_Human in Genpept Full Release Database. Import MLH1-like protein of C. elegans, S. cerevisiae, D. melanogaster, R. norvegicus and M. musculus to your workbench. Run CLUSTALW to get multiple sequence alignment for these six proteins.
Mus musculus                -----------------MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAK
Rattus norvegicus           -----------------MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAK
Human                       -----------------MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAK
Drosophila melanogaster     ---------------MAEYLQPGVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQ
Saccharomyces cerevisiae    --------------------MSLRIKALDASVVNKIAAGEIIISPVNALKEMMENSIDAN
Caenorhabditis elegans      MWHCGYRTRNCDEFSKIEFSLMGLIQRLPQDVVNRMAAGEVLARPCNAIKELVENSLDAG
                                                    *: *   ***::****::  * **:**: **.:** 

Mus musculus                STNIQVVVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEA
Rattus norvegicus           STNIQVIVREGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEA
Human                       STSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEA
Drosophila melanogaster     STHIQVQVKAGGLKLLQIQDNGTGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEA
Saccharomyces cerevisiae    ATMIDILVKEGGIKVLQITDNGSGINKADLPILCERFTTSKLQKFEDLSQIQTYGFRGEA
Caenorhabditis elegans      ATEIMVNMQNGGLKLLQVSDNGKGIEREDFALVCERFATSKLQKFEDLMHMKTYGFRGEA
                            :* * : :: **:*::*: ***.**.: *: ::****:****  ****  : *:******

Mus musculus                LASISHVAHVTITTKTADGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRR
Rattus norvegicus           LASISHVAHVTITTKTADGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRK
Human                       LASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRR
Drosophila melanogaster     LASISHVAHLSIQTKTAKEKCGYKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRR
Saccharomyces cerevisiae    LASISHVARVTVTTKVKEDRCAWRVSYAEGKMLESPKPVAGKDGTTILVEDLFFNIPSRL
Caenorhabditis elegans      LASLSHVAKVNIVSKRADAKCAYQANFLDGKMTADTKPAAGKNGTCITATDLFYNLPTRR
                            ***:****::.: :*  . :*.::..: :**:   .** **::** *   ***:*:  * 

Mus musculus                KALKNPSEEYGKILEVVGRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNA
Rattus norvegicus           KALKNPSEEYGKILEVVGRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNA
Human                       KALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNA
Drosophila melanogaster     QALRSPAEEFQRLSEVLARYAVHNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAA
Saccharomyces cerevisiae    RALRSHNDEYSKILDVVGRYAIHSKDIGFSCKKFGDSNYSLSVKPSYTVQDRIRTVFNKS
Caenorhabditis elegans      NKMTTHGEEAKMVNDTLLRFAIHRPDVSFALRQ--NQAGDFRTKGDGNFRDVVCNLLGRD
                            . : .  :*   : :.: *:::*   :.*: ::  :    . .    .  : :  : .  

Mus musculus                VSRELIEVG-CEDKTLAFK-MNGYISNANYSVKKCI----------FLLFINHRLVESAA
Rattus norvegicus           VSRELIEVG-CEDKTLAFK-MNGYISNANYSVKKCI----------FLLFINHRLVESAA
Human                       VSRELIEIG-CEDKTLAFK-MNGYISNANYSVKKCI----------FLLFINHRLVESTS
Drosophila melanogaster     ISKELLEFS-HRDEVYKFE-AECLITQVNYSAKKCQ----------MLLFINQRLVESTA
Saccharomyces cerevisiae    VASNLITFHISKVEDLNLESVDGKVCNLNFISKKSIS---------LIFFINNRLVTCDL
Caenorhabditis elegans      VADTILPLS-LNSTRLKFT-FTGHISKPIASATAAIAQNRKTSRSFFSVFINGRSVRCDI
                            ::  :: .   .     :      : :     . .           : .*** * * .  

Mus musculus                LRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILQRVQQHIE
Rattus norvegicus           LKKAIEAVYAAYLPKNTHPFLYLILEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIE
Human                       LRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIE
Drosophila melanogaster     LRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDVNVHPTKHEVHFLYQEEIVDSIKQQVE
Saccharomyces cerevisiae    LRRALNSVYSNYLPKGFRPFIYLGIVIDPAAVDVNVHPTKREVRFLSQDEIIEKIANQLH
Caenorhabditis elegans      LKHPIDEVLG--ARQLHAQFCALHLQIDETRIDVNVHPTKNSVIFLEKEEIIEEIRAYFE
                            *: .:: : .    :    *  : : :    :********..* ** ::.*:: :   ..

Mus musculus                SKLLGSNSSRMYFTQTLLPGLAG------PSGEAARPTTGVASSSTSGSGDKVYAYQMVR
Rattus norvegicus           SKLLGSNSSRMYFTQTLLPGLAG------PSGEAVKSTTGIASSSTSGSGDKVHAYQMVR
Human                       SKLLGSNSSRMYFTQTLLPGLAG------PSGEMVKSTTSLTSSSTSGSSDKVYAHQMVR
Drosophila melanogaster     ARLLGSNATRTFYKQLRLPGAP-----------------DLDETQLADKTQRIYPKEMVR
Saccharomyces cerevisiae    AELSAIDTSRTFKASSISTNKPESLIPFNDTIESDRNRKSLRQAQVVENSYTTANSQLRK
Caenorhabditis elegans      KVIGEIFGFEALDVEKPEEEQPD--------IENLVMIPMSQSLKSIEAIRKPDTKPEFK
                              :      .    .      .                    . .              :

Mus musculus                TDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTEGSPERATREDEEMLALPAPAEAAAE
Rattus norvegicus           TDSRDQKLDAFMQPVSRRLPSQPQD--PVPGNRTEGSPEKAMQKDQEISELPAPMEAAAD
Human                       TDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTDISSGRARQQDEEMLELPAPAEVAAK
Drosophila melanogaster     TDSTEQKLDKFLAPLVK-------------------------------------------
Saccharomyces cerevisiae    AKRQENKLVRIDASQAKITSFLSSS--QQFNFEGSSTKRQLSEPKVTNVSHSQEAEKLTL
Caenorhabditis elegans      SSPSAWKSDKKRVDYMEVRTDAKERKIDEFVTRGGAVGPTTSNDDIFGGSGILKRARTED
                            :.    *                                                     

Mus musculus                SENLERESLMETSDAAQKAAPTSSPGSSRKRHREDSDVEMVENASGKEMTAACYPRRRII
Rattus norvegicus           SASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSDVEMMENDSRKEMTAACYPRRRII
Human                       NQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHREDSDVEMVEDDSRKEMTAACTPRRRII
Drosophila melanogaster     ----------------SDSGVSSSSSQEASRLPEES------------FRVTAAKKSREV
Saccharomyces cerevisiae    NESEQPRDANTINDNDLKDQPKKKQKLGDYKVPSIADDEKNALPISKDGYIRVPKERVNV
Caenorhabditis elegans      STGGEKEPEDLNTDFDDVSMVSLVSTADGRRLNESQD-----LGEDDDVDFEYGKTHREF
                                                          :  .                         .

Mus musculus                NLTSVLSLQEEISERCHETLREILRNHSFVGCVNPQW--ALAQHQTKLYLLNTTKLSEEL
Rattus norvegicus           NLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW--ALAQHQTKLYLLNTTKLSEEL
Human                       NLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW--ALAQHQTKLYLLNTTKLSEEL
Drosophila melanogaster     RLSSVLDMRKRVERQCSVQLRSTLKNLVYVGCVDERR--ALFQHETRLYMCNTRSFSEEL
Saccharomyces cerevisiae    NLTSIKKLREKVDDSIHRELTDIFANLNYVGVVDEERRLAAIQHDLKLFLIDYGSVCYEL
Caenorhabditis elegans      HFESIEVLRKEIIANSSQSLREMFKTSTFVGSINVKQ--VLIQFGTSLYHLDFSTVLREF
                            .: *:  :::.:       * . : .  :** :: .   .  *.   *:  :  ..  *:

Mus musculus                FYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEDDGPKEGLA-----EYIVEF
Rattus norvegicus           FYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWTEEDGPKEGLA-----EYIVEF
Human                       FYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLA-----EYIVEF
Drosophila melanogaster     FYQRMIYEFQNCSEITICPPLPLKELLILSLESRAAGWTPEDEDKAELA-----DGAADI
Saccharomyces cerevisiae    FYQIGLTDFANFGKINLQSTNVSDDIVLYNLLSEFDELN-DDASK---------EKIISK
Caenorhabditis elegans      FYQISVFSFGNYGSYRLDE-EPPAIIEILELLGELSTREPNYAAFEVFANVENRFAAEKL
                            ***  : .* * .   :        : :  * .       :                 . 

Mus musculus                LKKKAEMLADYFSVEIDEEGN--------LIGLPLLIDSYVPPLEGLPIFILRLATEVNW
Rattus norvegicus           LKKKAKMLADYFSVEIDEEGN--------LIGLPLLIDSYVPPLEGLPIFILRLATEVNW
Human                       LKKKAEMLADYFSLEIDEEGN--------LIGLPLLIDNYVPPLEGLPIFILRLATEVNW
Drosophila melanogaster     LLKKAPIMREYFGLRISEDGM--------LESLPSLLHQHRPCVAHLPVYLLRLATEVDW
Saccharomyces cerevisiae    IWDMSSMLNEYYSIELVNDGLDNDLKSVKLKSLPLLLKGYIPSLVKLPFFIYRLGKEVDW
Caenorhabditis elegans      LAEHADLLHDYFAIKLDQLENGR----LHITEIPSLVHYFVPQLEKLPFLIATLVLNVDY
                            : . : :: :*:.:.: :           :  :* *:. . * :  **. :  *  :*::

Mus musculus                DEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWT--VEHIIYKAF
Rattus norvegicus           DEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWT--VEHIIYKAF
Human                       DEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWT--VEHIVYKAL
Drosophila melanogaster     EQETRCFETFCRETARFY--------------AQLDWREGATAVFSRWT--MEHVLFPAF
Saccharomyces cerevisiae    EDEQECLDGILREIALLYIPDMVPKVDTLDASLSEDEKAQFINRKEHISSLLEHVLFPCI
Caenorhabditis elegans      DDEQNTFRTICRAIGDLFTLDTN---------FITLDKKISAFSATPWKTLIKEVLMPLV
                            ::* . :  : :  . ::                              .  ::.::   .

Mus musculus                RSHLLPPKHFTEDGNVLQLANLPDLYKVFERC--
Rattus norvegicus           RSHLLPPKHFTEDGNVLQLANLPDLCKVFERC--
Human                       RSHILPPKHFTEDGNILQLANLPDLYKVFERC--
Drosophila melanogaster     KKYLLPPR---IKDQIYELTNLPTLYKVFERC--
Saccharomyces cerevisiae    KRRFLAPRHILKD--VVEIANLPDLYKVFERC--
Caenorhabditis elegans      KRKFIPPEHFKQAGVIRQLADSHDLYKVFERCGT
                            :  ::.*.       : ::::   * ******  
* - single, fully conserved residue
: - conservation of strong groups
. - conservation of weak groups


3. Perform BOXSHADE program to get a color-coded plot for the results of question 2.



4. Draw rooted phylogenetic tree for these proteins.

GENPEPT_7595954 Mus musculus MutL homolog 1 protein (MLH1) mRNA, complete cds
GENPEPT_1724118 Rattus norvegicus mismatch repair protein (MLH1) mRNA, complete
GENPEPT_463989   Human DNA mismatch repair protein homolog (hMLH1) mRNA, complete
GENPEPT_3192877 Drosophila melanogaster mutL homolog (Mlh1) gene, complete cds
GENPEPT_460627   Saccharomyces cerevisiae DNA mismatch repair (MLH1) gene, complete
GENPEPT_3880333 Caenorhabditis elegans cosmid T28A8, complete sequence