1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7)
sequence.
2. Translate the above two gene sequences to protein sequences.
1 msfvagvirr ldetvvnria ageviqrpan aikemiencl dakstsiqvi vkegglkliq
61 iqdngtgirk edldivcerf ttsklqsfed lasistygfr gealasishv ahvtittkta
121 dgkcayrasy sdgklkappk pcagnqgtqi tvedlfynia trrkalknps eeygkilevv
181 grysvhnagi sfsvkkqget vadvrtlpna stvdnirsif gnavsrelie igcedktlaf
241 kmngyisnan ysvkkcifll finhrlvest slrkaietvy aaylpknthp flylsleisp
301 qnvdvnvhpt khevhflhee silervqqhi eskllgsnss rmyftqtllp glagpsgemv
361 ksttsltsss tsgssdkvya hqmvrtdsre qkldaflqpl skplssqpqa ivtedktdis
421 sgrarqqdee mlelpapaev aaknqslegd ttkgtsemse krgptssnpr krhredsdve
481 mveddsrkem taactprrri inltsvlslq eeineqghev lremlhnhsf vgcvnpqwal
541 aqhqtklyll nttklseelf yqiliydfan fgvlrlsepa plfdlamlal dspesgwtee
601 dgpkeglaey iveflkkkae mladyfslei deegnliglp llidnyvppl eglpifilrl
661 atevnwdeek ecfeslskec amfysirkqy iseestlsgq qsevpgsipn swkwtvehiv
721 ykalrshilp pkhftedgni lqlanlpdly kvferc
MSAIENFDAHTPMMQQYLKLKAQHPEILLFYRMGDFYELFYDDA
KRASQLLDISLTKRGASAGEPIPMAGIPYHAVENYLAKLVNQGESVAICEQIGDPATS
KGPVERKVVRIVTPGTISDEALLQERQDNLLAAIWQDSKGFGYATLDISSGRFRLSEP
ADRETMAAELQRTNPAELLYAEDFAEMSLIEGRRGLRRRPLWEFEIDTARQQLNLQFG
TRDLVGFGVENAPRGLCAAGCLLQYAKDTQRTTLPHIRSITMEREQDSIIMDAATRRN
LEITQNLAGGAENTLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGALQDF
TAELQPVLRQVGDLERILARLALRTARPRDLARMRHAFQQLPELRAQLETVDSAPVQA
LREKMGEFAELRDLLERAIIDTPPVLVRDGGVIASGYNEELDEWRALADGATDYLERL
EVRERERTGLDTLKVGFNAVHGYYIQISRGQSHLAPINYMRRQTLKNAERYIIPELKE
YEDKVLTSKGKALALEKQLYEELFDLLLPHLEALQQSASALAELDVLVNLAERAYTLN
YTCPTFIDKPGIRITEGRHPVVEQVLNEPFIANPLNLSPQRRMLIITGPNMGGKSTYM
RQTALIALMAYIGSYVPAQKVEIGPIDRIFTRVGAADDLASGRSTFMVEMTETANILH
NATEYSLVLMDEIGRGTSTYDGLSLAWACAENLANKIKALTLFATHYFELTQLPEKME
GVANVHLDALEHGDTIAFMHSVQDGAASKSYGLAVAALAGVPKEVIKRARQKLRELES
ISPNAAATQVDGTQMSLLSVPEETSPAVEALENLDPDSLTPRQALEWIYRLKSLV
3.Perform protein sequence homology searching for MLH1 in GenBank.
Give the 10 highest hits.
- Sequences producing significant alignments, followed by Score(bits)
and E Value
(1) gi|13878583|sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEI... 1292
0.0
(2) gi|13591989|ref|NP_112315.1| mismatch repair protein [Rattu... 1289
0.0
(3) gi|4557757|ref|NP_000240.1| mutL homolog 1; mutL (E. coli) ... 1467
0.0
(4) gi|466462|gb|AAA17374.1| (U07418) human homolog of E. coli ... 1466
0.0
(5) gi|604369|gb|AAA85687.1| (U17857) hMLH1 gene product [Homo ... 1453
0.0
(6) gi|12835158|dbj|BAB23172.1| (AK004105) putative [Mus musculus] 753
0.0
(7) gi|13543339|gb|AAH05833.1|AAH05833 (BC005833) Similar to mu... 731
0.0
(8) gi|7304079|gb|AAF59117.1| (AE003838) Mlh1 gene product [Dro... 615
e-175
(9) gi|3192877|gb|AAC19117.1| (AF068257) mutL homolog [Drosophi... 608
e-173
(10) gi|460627|gb|AAA16835.1| (U07187) Mlh1p [Saccharomyces cere... 471
e-132
4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus
and D. melanogaster. Give the pairwise alignment and % of sequence smility.
Score = 1292 bits (3344), Expect = 0.0
Identities = 651/760 (85%), Positives = 693/760 (90%), Gaps = 4/760 (0%)
Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ
60
_______ M+FVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKST+IQV+VKEGGLKLIQ
Subjct: 1 MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTNIQVVVKEGGLKLIQ
60
Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA
120
________ IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLASISTYGFRGEALASISHVAHVTITTKTA
Subjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEALASISHVAHVTITTKTA
120
Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV
180
_________ DGKCAYRASYSDGKL+ APPKPCAGNQGT_ ITVEDLFYNI_TRRKALKNPSEEYGKILEVV
Subjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRRKALKNPSEEYGKILEVV
180
Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF
240
_________ GRYS+HN+ GISFSVKKQGETV+ DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Subjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF
240
Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP
300
_________ KMNGYISNANYSVKKCIFLLFINHRLVES_+LRKAIETVYAAYLPKNTHPFLYLSLEISP
Subjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALRKAIETVYAAYLPKNTHPFLYLSLEISP
300
Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV
360
_________ QNVDVNVHPTKHEVHFLHEESIL+RVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE
Subjct: 301 QNVDVNVHPTKHEVHFLHEESILQRVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAA
360
Query: 361 KXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTD
418
_________ +___________________ DKVYA+ QMVRTDSR+ QKLDAFLQP+S__+_ SQPQ_A_V___
+T+
Subjct: 361 RPTTGVASSSTSGSGDKVYAYQMVRTDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTE
420
Query: 419 ISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHRXX
476
__________S__ RA_ + +DEEML_LPAPAE_ AA+ + ++LE+ +_____ TS+_ ++-PTSS___
_+_ RKRHR
Subjct: 421 GSPERATREDEEMLALPAPAEAAAESENLERESLMETSDAAQKAAPTSSPGSSRKRHRED
480
Query: 477 XXXXXXXXXXRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNP
536
________________________ KEMTAAC_PRRRIINLTSVLSLQEEI+ E+_ HE_ LRE + L_NHSFVGCVNP
Sbjct: 481 SDVEMVENASGKEMTAACYPRRRIINLTSVLSLQEEISERCHETLREILRNHSFVGCVNP
540
Query: 537 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
596
_________ QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
Subjct: 541 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
600
Query: 597 WTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIF
656
_________ WTE+DGPKEGLAEYIVEFLKKKAEMLADYFS+EIDEEGNLIGLPLLID +YVPPLEGLPIF
Subjct: 601 WTEDDGPKEGLAEYIVEFLKKKAEMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIF
660
Query: 657 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTV
716
_________ ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI EESTLSGQQS + +PGS____WKWTV
Subjct: 661 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWTV
720
Query: 717 EHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
_________ EHI+ YKA_RSH+LPPKHFTEDGN+LQLANLPDLYKVFERC
Subjct: 721 EHIIYKAFRSHLLPPKHFTEDGNVLQLANLPDLYKVFERC 760
Score = 1289 bits (3336), Expect = 0.0
Identities = 639/758 (84%), Positives = 684/758 (89%), Gaps = 3/758 (0%)
Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ
60
_______ MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEM ENCLDAKST+IQVIV+ EGGLKLIQ
Subjct: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAKSTNIQVIVREGGLKLIQ
60
Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA
120
________IQDNGTGIRKEDLDIVCERFTTSKLQ+ FEDLA_ISTYGFRGEALASISHVAHVTITTKTA
Subjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEALASISHVAHVTITTKTA
120
Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV
180
_________ DGKCAYRASYSDGKL+ APPKPCAGNQGT_ITVEDLFYNI_ TR+ KALKNPSEEYGKILEVV
Subjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRKKALKNPSEEYGKILEVV
180
Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF
240
_________ GRYS+ HN+ GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Subjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF
240
Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP
300
_________ KMNGYISNANYSVKKCIFLLFINHRLVES_+L+ KAIE_VYAAYLPKNTHPFLYL_LEISP
Subjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALKKAIEAVYAAYLPKNTHPFLYLILEISP
300
Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV
360
_________ QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE _V
Subjct: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAV
360
Query: 361 KXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS
420
_________K___________________ DKV+ A+ QMVRTDSR+QKLDAF+QP+ S+_L_SQPQ__V__
+ + T+ S
Subjct: 361 KSTTGIASSSTSGSGDKVHAYQMVRTDSRDQKLDAFMQPVSRRLPSQPQDPVPGNRTEGS
420
Query: 421 SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSE-MSEKRGPTS-SNPRKRHRXXXX
478
____________+A_ Q+D+E +_ ELPAP_ E _AA_ +_ SLE_ ++___G_SE_++_ +R_ P+S__+_
RKRH
Subjct: 421 PEKAMQKDQEISELPAPMEAAADSASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSD
480
Query: 479 XXXXXXXXRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW
538
____________________ RKEMTAAC PRRRIINLTSVLSLQEEIN + +GHE _LREML_ NH+FVGCVNPQW
Subjct: 481 VEMMENDSRKEMTAACYPRRRIINLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW
540
Query: 539 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWT
598
_________ ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRL EPAPLFD _AMLALDSPESGWT
Subjct: 541 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWT
600
Query: 599 EEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFIL
658
_________ EEDGPKEGLAEYIVEFLKKKA+MLADYFS+EIDEEGNLIGLPLLID+ YVPPLEGLPIFIL
Subjct: 601 EEDGPKEGLAEYIVEFLKKKAKMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIFIL
660
Query: 659 RLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEH
718
_________ RLATEVNWDEE _ECFESLSKECA+ FYSIRKQYI_EES_LSGQQS+ +PGS_WKW___
TVEH
Subjct: 661 RLATEVNWDEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWTVEH
719
Query: 719 IVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
_________ I +YKA_RSH+LPPKHFTEDGN+LQLANLPDL KVFERC
Subjct: 720 IIYKAFRSHLLPPKHFTEDGNVLQLANLPDLCKVFERC 757
Score = 615 bits (1586), Expect = e-175
Identities = 335/751 (44%), Positives = 453/751 (59%), Gaps = 94/751 (12%)
Query: 6 GVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNG
65
_______ GVIR+LDE_ VVNRIAAGE+IQRPANA+KE++EN_ LDA+ST_IQV_VK_GGLKL+QIQDNG
Subjct: 8 GVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQSTHIQVQVKAGGLKLLQIQDNG
67
Query: 66 TGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCA
125
________TGIR+ EDL_ IVCERFTTSKL__ FEDL+_ I+T+GFRGEALASISHVAH+ + I_TKTA___KC
Subjct: 68 TGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEALASISHVAHLSIQTKTAKEKCG
127
Query: 126 YRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSV
185
_________ Y+A+Y+DGKL+ __ PKPCAGNQGT_ I_+EDLFYN+__ RR+ AL+ +P+EE +_ ++_EV+_
RY+V
Subjct: 128 YKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRRQALRSPAEEFQRLSEVLARYAV
187
Query: 186 HNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGY
245
_________ HN__+_ F+++ KQG+_____ +RT____+S___+NIR_I+G_ A+ S+EL+E____D+____F+
Subjct: 188 HNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAAISKELLEFSHRDEVYKFEAECL
247
Query: 246 ISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDV
305
_________ I+N__ YS_ KKC_ LLFIN_ RLVEST+LR_ ++++ +YA_ YLP+___HPF+Y+SL_+_PQN+DV
Subjct: 248 ITQVNYSAKKCQMLLFINQRLVESTALRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDV
307
Query: 306 NVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKXXXX
365
_________ NVHPTKHEVHFL+ +E_I++_ + +Q_+E+ +LLGSN++R__++_ Q__LPG____P__+___+
Subjct: 308 NVHPTKHEVHFLYQEEIVDSIKQQVEARLLGSNATRTFYKQLRLPG--APDLDETQ----
361
Query: 366 XXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRAR
425
________________________ + + Y__ + MVRTDS_EQKLD_ FL_ PL_K__S_____+___
E__________ A
Sbjct: 362 _________ LADKTQRIYPKEMVRTDSTEQKLDKFLAPLVKSDSGVSSSSSQE___________AS
407
Query: 426 QQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHRXXXXXXXXXXX
485
___________+_ EE_ +___ A_________________________________________ K+_
R
Subjct: 408 RLPEESFRVTAA------------------------------------------------------------------KKSREVRLSSVLDMR
434
Query: 486 XRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQT
545
__________ R_ E____ +___ R__+ NL__ V__+_ E_______HE_____M_+___ SF___________________
Subjct: 435 KRVERQCSVQLRSTLKNLVYVGCVDERRALFQHETRLYMCNTRSF-----------------------------
479
Query: 546 KLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKE
605
_____________________ SEELFYQ + IY+ F_ N___ +_ +S_P_ PL_+L__+L+ L+ S__
+GWT_EDG K
Sbjct: 480 ---------------------SEELFYQRMIYEFQNCSEITISPPLPLKELLILSLESEAAGWTPEDGDKA
529
Query: 606 GLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVN
665
__________ LA+____+_L_ KKA_ ++_ + YF+L_I_ E+G_ L__LP_ L+__+_ P_+___LP+++LRLATEV+
Subjct: 530 ELADGAADILLKKAPIMREYFGLRISEDGMLESLPSLLHQHRPCVAHLPVYLLRLATEVD
589
Query: 666 WDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALR
725
_________ W++ E__ CFE+__+E_ A_ FY+_______________ Q______G+__ ____+WT+EH+
++_ A_+
Subjct: 590 WEQETRCFETFCRETARFYA-------------------------QLDWREGATAGFSRWTMEHVLFPAFK
635
Query: 726 SHILPPKHFTEDGNILQLANLPDLYKVFERC 756
__________+ +LPP____ +____ I_ +L_ NLP_ LYKVFERC
Subjct: 636 KYLLPPPRIKD----QIYELTNLPTLYKVFERC 664
5.Search the conserve domain (CD) for MLH1. Give the position of the
CD, name of CD and Pfam ID number.
- position of CD: residues 147~327, total 179 residues
- Name of CD: DNA_mis_repair, DNA mismatch repair protein. Also known
as the mutL/hexB/PMS1 family.
- PfamID: 01119
6. Show multiple alignment of MLH1 conserve domain with 5 sequences
from the top of the CD alignment.
|