¡@
¡@
¡@
1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.
No significant similarity was found.
2. Translate the above two gene sequences to protein sequences.
MLH1 protein sequence.
mutS protein sequence.
3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.
ref|NP_000240.1| mutL homolog 1; mutL (E. coli) homolog 1 [Homo sapiens]
gb|AAA17374.1| (U07418) human homolog of E. coli mutL gene product
gb|AAA85687.1| (U17857) hMLH1 gene product [Homo sapiens]
sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1).
ref|NP_112315.1| mismatch repair protein [Rattus norvegicus ]
dbj|BAB23172.1| (AK004105) putative [Mus musculus]
gb|AAH05833.1|AAH05833 (BC005833) Similar to mutL (E. coli) homolog 1 (colon cancer, nonpolyposis type 2) [Homo sapiens].
gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster].
gb|AAC19117.1| (AF068257) mutL homolog [Drosophila melanogaster].
ref|NP_192653.1| MLH1 protein [Arabidopsis thaliana]
4. Compare human MLH1 protein with MLH1 in M. musculus,
R. norvegicus and D. melanogaster. Give the pairwise alignment and % of
sequence smility.
M.musculus
sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG
1)
gb|AAF64514.1|AF250844_1 (AF250844) MutL homolog 1 protein [Mus musculus]
Length = 760
Score = 1329 bits (3440), Expect = 0.0
Identities = 670/760 (88%), Positives = 714/760 (93%), Gaps = 4/760 (0%)
Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
M+FVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKST+IQV+VKEGGLKLIQ
Sbjct: 1 MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTNIQVVVKEGGLKLIQ 60
Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLASISTYGFRGEALASISHVAHVTITTKTA
Sbjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
DGKCAYRASYSDGKL+APPKPCAGNQGT ITVEDLFYNI TRRKALKNPSEEYGKILEVV
Sbjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRRKALKNPSEEYGKILEVV 180
Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
GRYS+HN+GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Sbjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240
Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
KMNGYISNANYSVKKCIFLLFINHRLVES +LRKAIETVYAAYLPKNTHPFLYLSLEISP
Sbjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALRKAIETVYAAYLPKNTHPFLYLSLEISP 300
Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
QNVDVNVHPTKHEVHFLHEESIL+RVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE
Sbjct: 301 QNVDVNVHPTKHEVHFLHEESILQRVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAA 360
Query: 361 KSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTD 418
+ TT + SSSTSGS DKVYA+QMVRTDSR+QKLDAFLQP+S + SQPQ A V +T+
Sbjct: 361 RPTTGVASSSTSGSGDKVYAYQMVRTDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTE 420
Query: 419 ISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHRED 476
S RA ++DEEML LPAPAE AA++++LE ++ TS+ ++K PTSS + RKRHRED
Sbjct: 421 GSPERATREDEEMLALPAPAEAAAESENLERESLMETSDAAQKAAPTSSPGSSRKRHRED 480
Query: 477 SDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNP 536
SDVEMVE+ S KEMTAAC PRRRIINLTSVLSLQEEI+E+ HE LRE+L NHSFVGCVNP
Sbjct: 481 SDVEMVENASGKEMTAACYPRRRIINLTSVLSLQEEISERCHETLREILRNHSFVGCVNP 540
Query: 537 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 596
QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
Sbjct: 541 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 600
Query: 597 WTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIF 656
WTE+DGPKEGLAEYIVEFLKKKAEMLADYFS+EIDEEGNLIGLPLLID+YVPPLEGLPIF
Sbjct: 601 WTEDDGPKEGLAEYIVEFLKKKAEMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIF 660
Query: 657 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTV 716
ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI EESTLSGQQS++PGS WKWTV
Sbjct: 661 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWTV 720
Query: 717 EHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
EHI+YKA RSH+LPPKHFTEDGN+LQLANLPDLYKVFERC
Sbjct: 721 EHIIYKAFRSHLLPPKHFTEDGNVLQLANLPDLYKVFERC 760
R. norvegicus
ref|NP_112315.1| mismatch repair protein [Rattus norvegicus]
sp|P97679|MLH1_RAT DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1)
gb|AAB38506.1| (U80054) mismatch repair protein [Rattus norvegicus]
Length = 757
Score = 1306 bits (3380), Expect = 0.0
Identities = 659/758 (86%), Positives = 707/758 (92%), Gaps = 3/758 (0%)
Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEM ENCLDAKST+IQVIV+EGGLKLIQ
Sbjct: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAKSTNIQVIVREGGLKLIQ 60
Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLA ISTYGFRGEALASISHVAHVTITTKTA
Sbjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEALASISHVAHVTITTKTA 120
Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
DGKCAYRASYSDGKL+APPKPCAGNQGT ITVEDLFYNI TR+KALKNPSEEYGKILEVV
Sbjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRKKALKNPSEEYGKILEVV 180
Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
GRYS+HN+GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Sbjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240
Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
KMNGYISNANYSVKKCIFLLFINHRLVES +L+KAIE VYAAYLPKNTHPFLYL LEISP
Sbjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALKKAIEAVYAAYLPKNTHPFLYLILEISP 300
Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE V
Sbjct: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAV 360
Query: 361 KSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS 420
KSTT + SSSTSGS DKV+A+QMVRTDSR+QKLDAF+QP+S+ L SQPQ V ++T+ S
Sbjct: 361 KSTTGIASSSTSGSGDKVHAYQMVRTDSRDQKLDAFMQPVSRRLPSQPQDPVPGNRTEGS 420
Query: 421 SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSE-MSEKRGPTS-SNPRKRHREDSD 478
+A Q+D+E+ ELPAP E AA + SLE ++ G SE ++ +R P+S + RKRH EDSD
Sbjct: 421 PEKAMQKDQEISELPAPMEAAADSASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSD 480
Query: 479 VEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW 538
VEM+E+DSRKEMTAAC PRRRIINLTSVLSLQEEIN++GHE LREML NH+FVGCVNPQW
Sbjct: 481 VEMMENDSRKEMTAACYPRRRIINLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW 540
Query: 539 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWT 598
ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRL EPAPLFD AMLALDSPESGWT
Sbjct: 541 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWT 600
Query: 599 EEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFIL 658
EEDGPKEGLAEYIVEFLKKKA+MLADYFS+EIDEEGNLIGLPLLID+YVPPLEGLPIFIL
Sbjct: 601 EEDGPKEGLAEYIVEFLKKKAKMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIFIL 660
Query: 659 RLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEH 718
RLATEVNWDEE ECFESLSKECA+FYSIRKQYI EES LSGQQS++PGS WKWTVEH
Sbjct: 661 RLATEVNWDEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWTVEH 719
Query: 719 IVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
I+YKA RSH+LPPKHFTEDGN+LQLANLPDL KVFERC
Sbjct: 720 IIYKAFRSHLLPPKHFTEDGNVLQLANLPDLCKVFERC 757
D. melanogaster.
gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster]
Length = 664
Score = 644 bits (1662), Expect = 0.0
Identities = 345/751 (45%), Positives = 472/751 (61%), Gaps = 94/751 (12%)
Query: 6 GVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNG 65
GVIR+LDE VVNRIAAGE+IQRPANA+KE++EN LDA+ST IQV VK GGLKL+QIQDNG
Sbjct: 8 GVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQSTHIQVQVKAGGLKLLQIQDNG 67
Query: 66 TGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCA 125
TGIR+EDL IVCERFTTSKL FEDL+ I+T+GFRGEALASISHVAH++I TKTA KC
Sbjct: 68 TGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEALASISHVAHLSIQTKTAKEKCG 127
Query: 126 YRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSV 185
Y+A+Y+DGKL+ PKPCAGNQGT I +EDLFYN+ RR+AL++P+EE+ ++ EV+ RY+V
Sbjct: 128 YKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRRQALRSPAEEFQRLSEVLARYAV 187
Query: 186 HNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGY 245
HN + F+++KQG+ +RT +S +NIR I+G A+S+EL+E D+ F+
Sbjct: 188 HNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAAISKELLEFSHRDEVYKFEAECL 247
Query: 246 ISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDV 305
I+ NYS KKC LLFIN RLVEST+LR +++++YA YLP+ HPF+Y+SL + PQN+DV
Sbjct: 248 ITQVNYSAKKCQMLLFINQRLVESTALRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDV 307
Query: 306 NVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKSTTS 365
NVHPTKHEVHFL++E I++ ++Q +E++LLGSN++R ++ Q LPG
Sbjct: 308 NVHPTKHEVHFLYQEEIVDSIKQQVEARLLGSNATRTFYKQLRLPG-----------APD 356
Query: 366 LTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRAR 425
L + + + ++Y +MVRTDS EQKLD FL PL K SG +
Sbjct: 357 LDETQLADKTQRIYPKEMVRTDSTEQKLDKFLAPLVKS----------------DSGVSS 400
Query: 426 QQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDD 485
+E LP E T++ + R S ++M
Sbjct: 401 SSSQEASRLP-----------------------EESFRVTAAKKSREVRLSSVLDM---- 433
Query: 486 SRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQT 545
RK + C+ + LR L N +VGCV+ + AL QH+T
Sbjct: 434 -RKRVERQCSVQ-----------------------LRSTLKNLVYVGCVDERRALFQHET 469
Query: 546 KLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKE 605
+LY+ NT SEELFYQ +IY+F N + +S P PL +L +L+L+S +GWT EDG K
Sbjct: 470 RLYMCNTRSFSEELFYQRMIYEFQNCSEITISPPLPLKELLILSLESEAAGWTPEDGDKA 529
Query: 606 GLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVN 665
LA+ + L KKA ++ +YF L I E+G L LP L+ + P + LP+++LRLATEV+
Sbjct: 530 ELADGAADILLKKAPIMREYFGLRISEDGMLESLPSLLHQHRPCVAHLPVYLLRLATEVD 589
Query: 666 WDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALR 725
W++E CFE+ +E A FY+ Q G+ +WT+EH+++ A +
Sbjct: 590 WEQETRCFETFCRETARFYA--------------QLDWREGATAGFSRWTMEHVLFPAFK 635
Query: 726 SHILPPKHFTEDGNILQLANLPDLYKVFERC 756
++LPP + I +L NLP LYKVFERC
Sbjct: 636 KYLLPPPRIKD--QIYELTNLPTLYKVFERC 664
5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.
position: 147~325a.a.
name: DNA_mis_repair, DNA mismatch repair protein. Also known as the mutL/hexB/PMS1 family.
pfam01119
6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.