Multiple Sequence Alignment of MLH1
protein.
- To do Multiple Sequence Alignment in Biology Workbench
- To draw phylogenetic tree from alignment
¡@
1. Add MLH1_Human protein to the Biology Workbench. Predict its secondary structure by GOR4.
>MLH1_HUMAN
Sequence MSFVAGVIRR LDETVVNRIA AGEVIQRPAN AIKEMIENCL DAKSTSIQVI
Structure CCCCEEEEEE CCHHHHHHHH HCHHHHHHHH HHHHHHHHHC CCCCCCHHHH
Sequence VKEGGLKLIQ IQDNGTGIRK EDLDIVCERF TTSKLQSFED LASISTYGFR
Structure HHHCCCEEEE ECCCCCCCHH HHHHHHHCCC CCCCCCCCHH HHHHCCCCCC
Sequence GEALASISHV AHVTITTKTA DGKCAYRASY SDGKLKAPPK PCAGNQGTQI
Structure CCHHHHHHHE EEEEEEECCC CCCCEEEECC CCCCCCCCCC CCCCCCCCEE
Sequence TVEDLFYNIA TRRKALKNPS EEYGKILEVV GRYSVHNAGI SFSVKKQGET
Structure EEHHHHHHHH HHHHHHCCCC HHHHHEEEEE ECCCCCCCCE EEEECCCCCE
Sequence VADVRTLPNA STVDNIRSIF GNAVSRELIE IGCEDKTLAF KMNGYISNAN
Structure EEEEEECCCC CCCCCEEEEC CCCCCHHHHH HCCCHHHHHH CCCCCEECCC
Sequence YSVKKCIFLL FINHRLVEST SLRKAIETVY AAYLPKNTHP FLYLSLEISP
Structure CCCCCEEEEE ECCCCHHHHH HHHHHHHHHH HHCCCCCCCC EEEECCCCCC
Sequence QNVDVNVHPT KHEVHFLHEE SILERVQQHI ESKLLGSNSS RMYFTQTLLP
Structure CCCCEEECCC CCHHHHHHHH HHHHHHHHHH HHHHHCCCCC CEEEEEEECC
Sequence GLAGPSGEMV KSTTSLTSSS TSGSSDKVYA HQMVRTDSRE QKLDAFLQPL
Structure CCCCCCCCEE EEEEEEEEEC CCCCCCHHHH HHHHHHHHHH HHHHHHHCCC
Sequence SKPLSSQPQA IVTEDKTDIS SGRARQQDEE MLELPAPAEV AAKNQSLEGD
Structure CCCCCCCCCE EECCCCCCHH HHHHHHHHHH HHHCCCHHHH HHHHHCCCCC
Sequence TTKGTSEMSE KRGPTSSNPR KRHREDSDVE MVEDDSRKEM TAACTPRRRI
Structure CCCCCCHHHC CCCCCCCCCC CCCCCCCCHH HHHHHHHHHH HHHCCCCCEE
Sequence INLTSVLSLQ EEINEQGHEV LREMLHNHSF VGCVNPQWAL AQHQTKLYLL
Structure ECCCCHHHHH HHHHHHHHHH HHHHHCCCCE EEEECCCCHH HHHHHHHHHH
Sequence NTTKLSEELF YQILIYDFAN FGVLRLSEPA PLFDLAMLAL DSPESGWTEE
Structure HCCCCHHHHH HHHHHHCCCC CCEECCCCCC CHHHHHHHHC CCCCCCCCCC
Sequence DGPKEGLAEY IVEFLKKKAE MLADYFSLEI DEEGNLIGLP LLIDNYVPPL
Structure CCCCCCHHHH HHHHHHHHHH HHHHHHHHHH HHCCCCCCCC EEECCCCCCC
Sequence EGLPIFILRL ATEVNWDEEK ECFESLSKEC AMFYSIRKQY ISEESTLSGQ
Structure CCCCHHHHHH HHHHCHHHHH HCCCCCCCCC HHHHHCCCCC CCHHHHCCCC
Sequence QSEVPGSIPN SWKWTVEHIV YKALRSHILP PKHFTEDGNI LQLANLPDLY
Structure CCCCCCCCCC CCCEEEECHH HHHHHCCCCC CCCCCCCCHH HHHHCCCCCE
Sequence KVFERC
Structure EEEEEC
LEGEND:
Alpha Helix = H Beta Sheet = E Random Coil = C
¡@
2. Do a homology searching of MLH1_Human in Genpept Full Release
Database. Import MLH1-like protein of C. elegans, S. cerevisiae, D. melanogaster,
R. norvegicus and M. musculus to your workbench. Run CLUSTALW to get multiple
sequence alignment for these six proteins.
¡@
Consensus key (see documentation for details)
* - single, fully conserved residue
: - conservation of strong groups
. - conservation of weak groups
- no consensus
CLUSTAL W (1.81) multiple sequence alignment
GENPEPT_460627 --------------------MSLRIKALDASVVNKIAAGEIIISPVNALKEMMENSIDAN
GENPEPT_825572 --------------------MSLRIKALDASVVNKIAAGEIIISPVNALKEMMENSIDAN
GENPEPT_7595954 -----------------MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAK
GENPEPT_1724118 -----------------MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAK
MLH1_HUMAN -----------------MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAK
GENPEPT_3192877 ---------------MAEYLQPGVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQ
GENPEPT_3880333 MWHCGYRTRNCDEFSKIEFSLMGLIQRLPQDVVNRMAAGEVLARPCNAIKELVENSLDAG
*: * ***::****:: * **:**: **.:**
GENPEPT_460627 ATMIDILVKEGGIKVLQITDNGSGINKADLPILCERFTTSKLQKFEDLSQIQTYGFRGEA
GENPEPT_825572 ATMIDILVKEGGIKVLQITDNGSGINKADLPILCERFTTSKLQKFEDLSQIQTYGFRGEA
GENPEPT_7595954 STNIQVVVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEA
GENPEPT_1724118 STNIQVIVREGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEA
MLH1_HUMAN STSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEA
GENPEPT_3192877 STHIQVQVKAGGLKLLQIQDNGTGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEA
GENPEPT_3880333 ATEIMVNMQNGGLKLLQVSDNGKGIEREDFALVCERFATSKLQKFEDLMHMKTYGFRGEA
:* * : :: **:*::*: ***.**.: *: ::****:**** **** : *:******
GENPEPT_460627 LASISHVARVTVTTKVKEDRCAWRVSYAEGKMLESPKPVAGKDGTTILVEDLFFNIPSRL
GENPEPT_825572 LASISHVARVTVTTKVKEDRCAWRVSYAEGKMLESPKPVAGKDGTTILVEDLFFNIPSRL
GENPEPT_7595954 LASISHVAHVTITTKTADGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRR
GENPEPT_1724118 LASISHVAHVTITTKTADGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRK
MLH1_HUMAN LASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRR
GENPEPT_3192877 LASISHVAHLSIQTKTAKEKCGYKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRR
GENPEPT_3880333 LASLSHVAKVNIVSKRADAKCAYQANFLDGKMTADTKPAAGKNGTCITATDLFYNLPTRR
***:****::.: :* . :*.::..: :**: .** **::** * ***:*: *
GENPEPT_460627 RALRSHNDEYSKILDVVGRYAIHSKDIGFSCKKFGDSNYSLSVKPSYTVQDRIRTVFNKS
GENPEPT_825572 RALRSHNDEYSKILDVVGRYAIHSKDIGFSCKKFGDSNYSLSVKPSYTVQDRIRTVFNKS
GENPEPT_7595954 KALKNPSEEYGKILEVVGRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNA
GENPEPT_1724118 KALKNPSEEYGKILEVVGRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNA
MLH1_HUMAN KALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNA
GENPEPT_3192877 QALRSPAEEFQRLSEVLARYAVHNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAA
GENPEPT_3880333 NKMTTHGEEAKMVNDTLLRFAIHRPDVSFALRQ--NQAGDFRTKGDGNFRDVVCNLLGRD
. : . :* : :.: *:::* :.*: :: : . . . : : : .
GENPEPT_460627 VASNLITFHISKVEDLNLESVDGKVCNLNFISKKSIS---------LIFFINNRLVTCDL
GENPEPT_825572 VASNLITFHISKVEDLNLESVDGKVCNLNFISKKSIS---------PIFFINNRLVTCDL
GENPEPT_7595954 VSRELIEVG-CEDKTLAFK-MNGYISNANYSVKKCI----------FLLFINHRLVESAA
GENPEPT_1724118 VSRELIEVG-CEDKTLAFK-MNGYISNANYSVKKCI----------FLLFINHRLVESAA
MLH1_HUMAN VSRELIEIG-CEDKTLAFK-MNGYISNANYSVKKCI----------FLLFINHRLVESTS
GENPEPT_3192877 ISKELLEFS-HRDEVYKFE-AECLITQVNYSAKKCQ----------MLLFINQRLVESTA
GENPEPT_3880333 VADTILPLS-LNSTRLKFT-FTGHISKPIASATAAIAQNRKTSRSFFSVFINGRSVRCDI
:: :: . . : : : . . .*** * * .
GENPEPT_460627 LRRALNSVYSNYLPKGFRPFIYLGIVIDPAAVDVNVHPTKREVRFLSQDEIIEKIANQLH
GENPEPT_825572 LRRALNSVYSNYLPKGNRPFIYLGIVIDPAAVDVNVHPTKREVRFLSQDEIIEKIANQLH
GENPEPT_7595954 LRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILQRVQQHIE
GENPEPT_1724118 LKKAIEAVYAAYLPKNTHPFLYLILEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIE
MLH1_HUMAN LRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIE
GENPEPT_3192877 LRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDVNVHPTKHEVHFLYQEEIVDSIKQQVE
GENPEPT_3880333 LKHPIDEVLG--ARQLHAQFCALHLQIDETRIDVNVHPTKNSVIFLEKEEIIEEIRAYFE
*: .:: : . : * : : : :********..* ** ::.*:: : ..
GENPEPT_460627 AELSAIDTSRTFKASSISTNKPESLIPFNDTIESDRNRKSLRQAQVVENSYTTANSQLRK
GENPEPT_825572 AELSAIDTSRTFKASSISTNKPESLIPFNDTIESDRNRKSLRQAQVVENSYTTANSQLRK
GENPEPT_7595954 SKLLGSNSSRMYFTQTLLPGLAG------PSGEAARPTTGVASSSTSGSGDKVYAYQMVR
GENPEPT_1724118 SKLLGSNSSRMYFTQTLLPGLAG------PSGEAVKSTTGIASSSTSGSGDKVHAYQMVR
MLH1_HUMAN SKLLGSNSSRMYFTQTLLPGLAG------PSGEMVKSTTSLTSSSTSGSSDKVYAHQMVR
GENPEPT_3192877 ARLLGSNATRTFYKQLRLPGAP-----------------DLDETQLADKTQRIYPKEMVR
GENPEPT_3880333 KVIGEIFGFEALDVEKPEEEQPD--------IENLVMIPMSQSLKSIEAIRKPDTKPEFK
: . . . . . :
GENPEPT_460627 AKRQENKLVRIDASQAKITSFLSSS--QQFNFEGSSTKRQLSEPKVTNVSHSQEAEKLTL
GENPEPT_825572 AKRQENKLVRIDASQAKITSFLSSS--QQFNFEGSSTKRQLSEPKVTNVSHSQEAEKLTL
GENPEPT_7595954 TDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTEGSPERATREDEEMLALPAPAEAAAE
GENPEPT_1724118 TDSRDQKLDAFMQPVSRRLPSQPQD--PVPGNRTEGSPEKAMQKDQEISELPAPMEAAAD
MLH1_HUMAN TDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTDISSGRARQQDEEMLELPAPAEVAAK
GENPEPT_3192877 TDSTEQKLDKFLAPLVK-------------------------------------------
GENPEPT_3880333 SSPSAWKSDKKRVDYMEVRTDAKERKIDEFVTRGGAVGPTTSNDDIFGGSGILKRARTED
:. *
GENPEPT_460627 NESEQPRDANTINDNDLKDQPKKKQKLGDYKVPSIADDEKNALPISKDGYIRVPKERVNV
GENPEPT_825572 NESEQPRDANTINDNDLKDQPKKKQKLGDYKVPSIADDEKNALPISKDGYIRVPKERVNV
GENPEPT_7595954 SENLERESLMETSDAAQKAAPTSSPGSSRKRHREDSDVEMVENASGKEMTAACYPRRRII
GENPEPT_1724118 SASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSDVEMMENDSRKEMTAACYPRRRII
MLH1_HUMAN NQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHREDSDVEMVEDDSRKEMTAACTPRRRII
GENPEPT_3192877 ----------------SDSGVSSSSSQEASRLPEES------------FRVTAAKKSREV
GENPEPT_3880333 STGGEKEPEDLNTDFDDVSMVSLVSTADGRRLNES-----QDLGEDDDVDFEYGKTHREF
: . .
GENPEPT_460627 NLTSIKKLREKVDDSIHRELTDIFANLNYVGVVDEERRLAAIQHDLKLFLIDYGSVCYEL
GENPEPT_825572 NLTSIKKLREKVDDSIHRELTDIFANLNYVGVVDEERRLAAIQHDLKLFLIDYGSVCYEL
GENPEPT_7595954 NLTSVLSLQEEISERCHETLREILRNHSFVGCVNPQW--ALAQHQTKLYLLNTTKLSEEL
GENPEPT_1724118 NLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW--ALAQHQTKLYLLNTTKLSEEL
MLH1_HUMAN NLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW--ALAQHQTKLYLLNTTKLSEEL
GENPEPT_3192877 RLSSVLDMRKRVERQCSVQLRSTLKNLVYVGCVDERR--ALFQHETRLYMCNTRSFSEEL
GENPEPT_3880333 HFESIEVLRKEIIANSSQSLREMFKTSTFVGSINVKQ--VLIQFGTSLYHLDFSTVLREF
.: *: :::.: * . : . :** :: . . *. *: : .. *:
GENPEPT_460627 FYQIGLTDFANFGKINLQSTNVSDDIVLYNLLSEFDELN-DDASK---------EKIISK
GENPEPT_825572 FYQIGLTDFANFGKINLQSTNVSDDIVLYNLLSEFDELN-DDASK---------EKIISK
GENPEPT_7595954 FYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEDDGPKEGLA-----EYIVEF
GENPEPT_1724118 FYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWTEEDGPKEGLA-----EYIVEF
MLH1_HUMAN FYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLA-----EYIVEF
GENPEPT_3192877 FYQRMIYEFQNCSEITICPPLPLKELLILSLESRAAGWTPEDEDKAELA-----DGAADI
GENPEPT_3880333 FYQISVFSFGNYGSYRLDE-EPPAIIEILELLGELSTREPNYAAFEVFANVENRFAAEKL
*** : .* * . : : : * . : .
GENPEPT_460627 IWDMSSMLNEYYSIELVNDGLDNDLKSVKLKSLPLLLKGYIPSLVKLPFFIYRLGKEVDW
GENPEPT_825572 IWDMSSMLNEYYSIELVNDGLDNDLKSVKLKSLPLLLKGYIPSLVKLPFFIYRLGKEVDW
GENPEPT_7595954 LKKKAEMLADYFSVEIDEEGN--------LIGLPLLIDSYVPPLEGLPIFILRLATEVNW
GENPEPT_1724118 LKKKAKMLADYFSVEIDEEGN--------LIGLPLLIDSYVPPLEGLPIFILRLATEVNW
MLH1_HUMAN LKKKAEMLADYFSLEIDEEGN--------LIGLPLLIDNYVPPLEGLPIFILRLATEVNW
GENPEPT_3192877 LLKKAPIMREYFGLRISEDGM--------LESLPSLLHQHRPCVAHLPVYLLRLATEVDW
GENPEPT_3880333 LAEHADLLHDYFAIKLDQLENGR----LHITEIPSLVHYFVPQLEKLPFLIATLVLNVDY
: . : :: :*:.:.: : : :* *:. . * : **. : * :*::
GENPEPT_460627 EDEQECLDGILREIALLYIPDMVPKVDTLDASLSEDEKAQFINRKEHISSLLEHVLFPCI
GENPEPT_825572 EDEQECLDGILREIALLYIPDMVPKVDTSDASLSEDEKAQFINRKEHISSLLEHVLFPCI
GENPEPT_7595954 DEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWT--VEHIIYKAF
GENPEPT_1724118 DEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWT--VEHIIYKAF
MLH1_HUMAN DEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWT--VEHIVYKAL
GENPEPT_3192877 EQETRCFETFCRETARFY--------------AQLDWREGATAVFSRWT--MEHVLFPAF
GENPEPT_3880333 DDEQNTFRTICRAIGDLFTLDTN---------FITLDKKISAFSATPWKTLIKEVLMPLV
::* . : : : . :: . ::.:: .
GENPEPT_460627 KRRFLAPRHILKD--VVEIANLPDLYKVFERC--
GENPEPT_825572 KRRFLAPRHILKD--VVEIANLPDLYKVFERC--
GENPEPT_7595954 RSHLLPPKHFTEDGNVLQLANLPDLYKVFERC--
GENPEPT_1724118 RSHLLPPKHFTEDGNVLQLANLPDLCKVFERC--
MLH1_HUMAN RSHILPPKHFTEDGNILQLANLPDLYKVFERC--
GENPEPT_3192877 KKYLLPPR---IKDQIYELTNLPTLYKVFERC--
GENPEPT_3880333 KRKFIPPEHFKQAGVIRQLADSHDLYKVFERCGT
: ::.*. : :::: * ******
¡@
3. Perform BOXSHADE program to get a color-coded plot for the
results of question 2.
¡@
4. Draw rooted phylogenetic tree for these proteins.