Assignment 4

¡@

¡@

¡@

1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.

No significant similarity was found.

2. Translate the above two gene sequences to protein sequences.


    MLH1 protein sequence. mutS protein sequence.

3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.

ref|NP_000240.1| mutL homolog 1; mutL (E. coli) homolog 1 [Homo sapiens]

gb|AAA17374.1| (U07418) human homolog of E. coli mutL gene product

gb|AAA85687.1| (U17857) hMLH1 gene product [Homo sapiens]

sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1).

ref|NP_112315.1| mismatch repair protein [Rattus norvegicus ]

dbj|BAB23172.1| (AK004105) putative [Mus musculus]

gb|AAH05833.1|AAH05833 (BC005833) Similar to mutL (E. coli) homolog 1 (colon cancer, nonpolyposis type 2) [Homo sapiens].

gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster].

gb|AAC19117.1| (AF068257) mutL homolog [Drosophila melanogaster].

ref|NP_192653.1| MLH1 protein [Arabidopsis thaliana]

4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.

M.musculus

sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1)
gb|AAF64514.1|AF250844_1 (AF250844) MutL homolog 1 protein [Mus musculus]
Length = 760

Score = 1329 bits (3440), Expect = 0.0
Identities = 670/760 (88%), Positives = 714/760 (93%), Gaps = 4/760 (0%)

Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
M+FVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKST+IQV+VKEGGLKLIQ
Sbjct: 1 MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTNIQVVVKEGGLKLIQ 60

Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLASISTYGFRGEALASISHVAHVTITTKTA
Sbjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEALASISHVAHVTITTKTA 120

Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
DGKCAYRASYSDGKL+APPKPCAGNQGT ITVEDLFYNI TRRKALKNPSEEYGKILEVV
Sbjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRRKALKNPSEEYGKILEVV 180

Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
GRYS+HN+GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Sbjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240

Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
KMNGYISNANYSVKKCIFLLFINHRLVES +LRKAIETVYAAYLPKNTHPFLYLSLEISP
Sbjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALRKAIETVYAAYLPKNTHPFLYLSLEISP 300

Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
QNVDVNVHPTKHEVHFLHEESIL+RVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE
Sbjct: 301 QNVDVNVHPTKHEVHFLHEESILQRVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAA 360

Query: 361 KSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTD 418
+ TT + SSSTSGS DKVYA+QMVRTDSR+QKLDAFLQP+S + SQPQ A V +T+
Sbjct: 361 RPTTGVASSSTSGSGDKVYAYQMVRTDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTE 420

Query: 419 ISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHRED 476
S RA ++DEEML LPAPAE AA++++LE ++ TS+ ++K PTSS + RKRHRED
Sbjct: 421 GSPERATREDEEMLALPAPAEAAAESENLERESLMETSDAAQKAAPTSSPGSSRKRHRED 480

Query: 477 SDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNP 536
SDVEMVE+ S KEMTAAC PRRRIINLTSVLSLQEEI+E+ HE LRE+L NHSFVGCVNP
Sbjct: 481 SDVEMVENASGKEMTAACYPRRRIINLTSVLSLQEEISERCHETLREILRNHSFVGCVNP 540

Query: 537 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 596
QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
Sbjct: 541 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 600

Query: 597 WTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIF 656
WTE+DGPKEGLAEYIVEFLKKKAEMLADYFS+EIDEEGNLIGLPLLID+YVPPLEGLPIF
Sbjct: 601 WTEDDGPKEGLAEYIVEFLKKKAEMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIF 660

Query: 657 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTV 716
ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI EESTLSGQQS++PGS WKWTV
Sbjct: 661 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWTV 720

Query: 717 EHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
EHI+YKA RSH+LPPKHFTEDGN+LQLANLPDLYKVFERC
Sbjct: 721 EHIIYKAFRSHLLPPKHFTEDGNVLQLANLPDLYKVFERC 760

R. norvegicus

ref|NP_112315.1| mismatch repair protein [Rattus norvegicus]
sp|P97679|MLH1_RAT DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1)
gb|AAB38506.1| (U80054) mismatch repair protein [Rattus norvegicus]
Length = 757

Score = 1306 bits (3380), Expect = 0.0
Identities = 659/758 (86%), Positives = 707/758 (92%), Gaps = 3/758 (0%)

Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEM ENCLDAKST+IQVIV+EGGLKLIQ
Sbjct: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAKSTNIQVIVREGGLKLIQ 60

Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLA ISTYGFRGEALASISHVAHVTITTKTA
Sbjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEALASISHVAHVTITTKTA 120

Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
DGKCAYRASYSDGKL+APPKPCAGNQGT ITVEDLFYNI TR+KALKNPSEEYGKILEVV
Sbjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRKKALKNPSEEYGKILEVV 180

Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
GRYS+HN+GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Sbjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240

Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
KMNGYISNANYSVKKCIFLLFINHRLVES +L+KAIE VYAAYLPKNTHPFLYL LEISP
Sbjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALKKAIEAVYAAYLPKNTHPFLYLILEISP 300

Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE V
Sbjct: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAV 360

Query: 361 KSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS 420
KSTT + SSSTSGS DKV+A+QMVRTDSR+QKLDAF+QP+S+ L SQPQ V ++T+ S
Sbjct: 361 KSTTGIASSSTSGSGDKVHAYQMVRTDSRDQKLDAFMQPVSRRLPSQPQDPVPGNRTEGS 420

Query: 421 SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSE-MSEKRGPTS-SNPRKRHREDSD 478
+A Q+D+E+ ELPAP E AA + SLE ++ G SE ++ +R P+S + RKRH EDSD
Sbjct: 421 PEKAMQKDQEISELPAPMEAAADSASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSD 480

Query: 479 VEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW 538
VEM+E+DSRKEMTAAC PRRRIINLTSVLSLQEEIN++GHE LREML NH+FVGCVNPQW
Sbjct: 481 VEMMENDSRKEMTAACYPRRRIINLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW 540

Query: 539 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWT 598
ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRL EPAPLFD AMLALDSPESGWT
Sbjct: 541 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWT 600

Query: 599 EEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFIL 658
EEDGPKEGLAEYIVEFLKKKA+MLADYFS+EIDEEGNLIGLPLLID+YVPPLEGLPIFIL
Sbjct: 601 EEDGPKEGLAEYIVEFLKKKAKMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIFIL 660

Query: 659 RLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEH 718
RLATEVNWDEE ECFESLSKECA+FYSIRKQYI EES LSGQQS++PGS WKWTVEH
Sbjct: 661 RLATEVNWDEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWTVEH 719

Query: 719 IVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
I+YKA RSH+LPPKHFTEDGN+LQLANLPDL KVFERC
Sbjct: 720 IIYKAFRSHLLPPKHFTEDGNVLQLANLPDLCKVFERC 757


D. melanogaster.

gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster]
Length = 664

Score = 644 bits (1662), Expect = 0.0
Identities = 345/751 (45%), Positives = 472/751 (61%), Gaps = 94/751 (12%)

Query: 6 GVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNG 65
GVIR+LDE VVNRIAAGE+IQRPANA+KE++EN LDA+ST IQV VK GGLKL+QIQDNG
Sbjct: 8 GVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQSTHIQVQVKAGGLKLLQIQDNG 67

Query: 66 TGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCA 125
TGIR+EDL IVCERFTTSKL FEDL+ I+T+GFRGEALASISHVAH++I TKTA KC
Sbjct: 68 TGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEALASISHVAHLSIQTKTAKEKCG 127

Query: 126 YRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSV 185
Y+A+Y+DGKL+ PKPCAGNQGT I +EDLFYN+ RR+AL++P+EE+ ++ EV+ RY+V
Sbjct: 128 YKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRRQALRSPAEEFQRLSEVLARYAV 187

Query: 186 HNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGY 245
HN + F+++KQG+ +RT +S +NIR I+G A+S+EL+E D+ F+
Sbjct: 188 HNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAAISKELLEFSHRDEVYKFEAECL 247

Query: 246 ISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDV 305
I+ NYS KKC LLFIN RLVEST+LR +++++YA YLP+ HPF+Y+SL + PQN+DV
Sbjct: 248 ITQVNYSAKKCQMLLFINQRLVESTALRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDV 307

Query: 306 NVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKSTTS 365
NVHPTKHEVHFL++E I++ ++Q +E++LLGSN++R ++ Q LPG
Sbjct: 308 NVHPTKHEVHFLYQEEIVDSIKQQVEARLLGSNATRTFYKQLRLPG-----------APD 356

Query: 366 LTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRAR 425
L + + + ++Y +MVRTDS EQKLD FL PL K SG +
Sbjct: 357 LDETQLADKTQRIYPKEMVRTDSTEQKLDKFLAPLVKS----------------DSGVSS 400

Query: 426 QQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDD 485
+E LP E T++ + R S ++M
Sbjct: 401 SSSQEASRLP-----------------------EESFRVTAAKKSREVRLSSVLDM---- 433

Query: 486 SRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQT 545
RK + C+ + LR L N +VGCV+ + AL QH+T
Sbjct: 434 -RKRVERQCSVQ-----------------------LRSTLKNLVYVGCVDERRALFQHET 469

Query: 546 KLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKE 605
+LY+ NT SEELFYQ +IY+F N + +S P PL +L +L+L+S +GWT EDG K
Sbjct: 470 RLYMCNTRSFSEELFYQRMIYEFQNCSEITISPPLPLKELLILSLESEAAGWTPEDGDKA 529

Query: 606 GLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVN 665
LA+ + L KKA ++ +YF L I E+G L LP L+ + P + LP+++LRLATEV+
Sbjct: 530 ELADGAADILLKKAPIMREYFGLRISEDGMLESLPSLLHQHRPCVAHLPVYLLRLATEVD 589

Query: 666 WDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALR 725
W++E CFE+ +E A FY+ Q G+ +WT+EH+++ A +
Sbjct: 590 WEQETRCFETFCRETARFYA--------------QLDWREGATAGFSRWTMEHVLFPAFK 635

Query: 726 SHILPPKHFTEDGNILQLANLPDLYKVFERC 756
++LPP + I +L NLP LYKVFERC
Sbjct: 636 KYLLPPPRIKD--QIYELTNLPTLYKVFERC 664

 

5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.

position: 147~325a.a.

name: DNA_mis_repair, DNA mismatch repair protein. Also known as the mutL/hexB/PMS1 family.

pfam01119

6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.

answer