Assignment 4

1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.

  • Use BLAST 2 Sequences

    Result:
    Sequence 1 lcl|seq_1 Length 2530
    Sequence 2 lcl|seq_2 Length 2562
    No significant similarity was foun

2. Translate the above two gene sequences to protein sequences.

  • Use ORF Finder
  • MLH1

1 msfvagvirr ldetvvnria ageviqrpan aikemiencl dakstsiqvi vkegglkliq
61 iqdngtgirk edldivcerf ttsklqsfed lasistygfr gealasishv ahvtittkta
121 dgkcayrasy sdgklkappk pcagnqgtqi tvedlfynia trrkalknps eeygkilevv
181 grysvhnagi sfsvkkqget vadvrtlpna stvdnirsif gnavsrelie igcedktlaf
241 kmngyisnan ysvkkcifll finhrlvest slrkaietvy aaylpknthp flylsleisp
301 qnvdvnvhpt khevhflhee silervqqhi eskllgsnss rmyftqtllp glagpsgemv
361 ksttsltsss tsgssdkvya hqmvrtdsre qkldaflqpl skplssqpqa ivtedktdis
421 sgrarqqdee mlelpapaev aaknqslegd ttkgtsemse krgptssnpr krhredsdve
481 mveddsrkem taactprrri inltsvlslq eeineqghev lremlhnhsf vgcvnpqwal
541 aqhqtklyll nttklseelf yqiliydfan fgvlrlsepa plfdlamlal dspesgwtee
601 dgpkeglaey iveflkkkae mladyfslei deegnliglp llidnyvppl eglpifilrl
661 atevnwdeek ecfeslskec amfysirkqy iseestlsgq qsevpgsipn swkwtvehiv
721 ykalrshilp pkhftedgni lqlanlpdly kvferc

  • E.coli mutS

MSAIENFDAHTPMMQQYLKLKAQHPEILLFYRMGDFYELFYDDA
KRASQLLDISLTKRGASAGEPIPMAGIPYHAVENYLAKLVNQGESVAICEQIGDPATS
KGPVERKVVRIVTPGTISDEALLQERQDNLLAAIWQDSKGFGYATLDISSGRFRLSEP
ADRETMAAELQRTNPAELLYAEDFAEMSLIEGRRGLRRRPLWEFEIDTARQQLNLQFG
TRDLVGFGVENAPRGLCAAGCLLQYAKDTQRTTLPHIRSITMEREQDSIIMDAATRRN
LEITQNLAGGAENTLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGALQDF
TAELQPVLRQVGDLERILARLALRTARPRDLARMRHAFQQLPELRAQLETVDSAPVQA
LREKMGEFAELRDLLERAIIDTPPVLVRDGGVIASGYNEELDEWRALADGATDYLERL
EVRERERTGLDTLKVGFNAVHGYYIQISRGQSHLAPINYMRRQTLKNAERYIIPELKE
YEDKVLTSKGKALALEKQLYEELFDLLLPHLEALQQSASALAELDVLVNLAERAYTLN
YTCPTFIDKPGIRITEGRHPVVEQVLNEPFIANPLNLSPQRRMLIITGPNMGGKSTYM
RQTALIALMAYIGSYVPAQKVEIGPIDRIFTRVGAADDLASGRSTFMVEMTETANILH
NATEYSLVLMDEIGRGTSTYDGLSLAWACAENLANKIKALTLFATHYFELTQLPEKME
GVANVHLDALEHGDTIAFMHSVQDGAASKSYGLAVAALAGVPKEVIKRARQKLRELES
ISPNAAATQVDGTQMSLLSVPEETSPAVEALENLDPDSLTPRQALEWIYRLKSLV

3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.

  • Sequences producing significant alignments, followed by Score(bits) and E Value

(1) gi|13878583|sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEI... 1292 0.0
(2) gi|13591989|ref|NP_112315.1| mismatch repair protein [Rattu... 1289 0.0
(3) gi|4557757|ref|NP_000240.1| mutL homolog 1; mutL (E. coli) ... 1467 0.0
(4) gi|466462|gb|AAA17374.1| (U07418) human homolog of E. coli ... 1466 0.0
(5) gi|604369|gb|AAA85687.1| (U17857) hMLH1 gene product [Homo ... 1453 0.0
(6) gi|12835158|dbj|BAB23172.1| (AK004105) putative [Mus musculus] 753 0.0
(7) gi|13543339|gb|AAH05833.1|AAH05833 (BC005833) Similar to mu... 731 0.0
(8) gi|7304079|gb|AAF59117.1| (AE003838) Mlh1 gene product [Dro... 615 e-175
(9) gi|3192877|gb|AAC19117.1| (AF068257) mutL homolog [Drosophi... 608 e-173
(10) gi|460627|gb|AAA16835.1| (U07187) Mlh1p [Saccharomyces cere... 471 e-132

4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.

  • Mus musculus

Score = 1292 bits (3344), Expect = 0.0
Identities = 651/760 (85%), Positives = 693/760 (90%), Gaps = 4/760 (0%)

Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
_______ M+FVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKST+IQV+VKEGGLKLIQ
Subjct: 1 MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTNIQVVVKEGGLKLIQ 60

Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
________ IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLASISTYGFRGEALASISHVAHVTITTKTA
Subjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEALASISHVAHVTITTKTA 120

Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
_________ DGKCAYRASYSDGKL+ APPKPCAGNQGT_ ITVEDLFYNI_TRRKALKNPSEEYGKILEVV
Subjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRRKALKNPSEEYGKILEVV 180

Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
_________ GRYS+HN+ GISFSVKKQGETV+ DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Subjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240

Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
_________ KMNGYISNANYSVKKCIFLLFINHRLVES_+LRKAIETVYAAYLPKNTHPFLYLSLEISP
Subjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALRKAIETVYAAYLPKNTHPFLYLSLEISP 300

Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
_________ QNVDVNVHPTKHEVHFLHEESIL+RVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE
Subjct: 301 QNVDVNVHPTKHEVHFLHEESILQRVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAA 360

Query: 361 KXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTD 418
_________ +___________________ DKVYA+ QMVRTDSR+ QKLDAFLQP+S__+_ SQPQ_A_V___ +T+
Subjct: 361 RPTTGVASSSTSGSGDKVYAYQMVRTDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTE 420

Query: 419 ISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHRXX 476
__________S__ RA_ + +DEEML_LPAPAE_ AA+ + ++LE+ +_____ TS+_ ++-PTSS___ _+_ RKRHR
Subjct: 421 GSPERATREDEEMLALPAPAEAAAESENLERESLMETSDAAQKAAPTSSPGSSRKRHRED 480

Query: 477 XXXXXXXXXXRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNP 536
________________________ KEMTAAC_PRRRIINLTSVLSLQEEI+ E+_ HE_ LRE + L_NHSFVGCVNP
Sbjct: 481 SDVEMVENASGKEMTAACYPRRRIINLTSVLSLQEEISERCHETLREILRNHSFVGCVNP 540

Query: 537 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 596
_________ QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
Subjct: 541 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 600

Query: 597 WTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIF 656
_________ WTE+DGPKEGLAEYIVEFLKKKAEMLADYFS+EIDEEGNLIGLPLLID +YVPPLEGLPIF
Subjct: 601 WTEDDGPKEGLAEYIVEFLKKKAEMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIF 660

Query: 657 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTV 716
_________ ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI EESTLSGQQS + +PGS____WKWTV
Subjct: 661 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWTV 720

Query: 717 EHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
_________ EHI+ YKA_RSH+LPPKHFTEDGN+LQLANLPDLYKVFERC
Subjct: 721 EHIIYKAFRSHLLPPKHFTEDGNVLQLANLPDLYKVFERC 760

  • Rattus norvegicus
Score = 1289 bits (3336), Expect = 0.0
Identities = 639/758 (84%), Positives = 684/758 (89%), Gaps = 3/758 (0%)

Query: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
_______ MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEM ENCLDAKST+IQVIV+ EGGLKLIQ
Subjct: 1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAKSTNIQVIVREGGLKLIQ 60

Query: 61 IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
________IQDNGTGIRKEDLDIVCERFTTSKLQ+ FEDLA_ISTYGFRGEALASISHVAHVTITTKTA
Subjct: 61 IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEALASISHVAHVTITTKTA 120

Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
_________ DGKCAYRASYSDGKL+ APPKPCAGNQGT_ITVEDLFYNI_ TR+ KALKNPSEEYGKILEVV
Subjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRKKALKNPSEEYGKILEVV 180

Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
_________ GRYS+ HN+ GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Subjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240

Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
_________ KMNGYISNANYSVKKCIFLLFINHRLVES_+L+ KAIE_VYAAYLPKNTHPFLYL_LEISP
Subjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALKKAIEAVYAAYLPKNTHPFLYLILEISP 300

Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
_________ QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE _V
Subjct: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAV 360

Query: 361 KXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS 420
_________K___________________ DKV+ A+ QMVRTDSR+QKLDAF+QP+ S+_L_SQPQ__V__ + + T+ S
Subjct: 361 KSTTGIASSSTSGSGDKVHAYQMVRTDSRDQKLDAFMQPVSRRLPSQPQDPVPGNRTEGS 420

Query: 421 SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSE-MSEKRGPTS-SNPRKRHRXXXX 478
____________+A_ Q+D+E +_ ELPAP_ E _AA_ +_ SLE_ ++___G_SE_++_ +R_ P+S__+_ RKRH
Subjct: 421 PEKAMQKDQEISELPAPMEAAADSASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSD 480

Query: 479 XXXXXXXXRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW 538
____________________ RKEMTAAC PRRRIINLTSVLSLQEEIN + +GHE _LREML_ NH+FVGCVNPQW
Subjct: 481 VEMMENDSRKEMTAACYPRRRIINLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW 540

Query: 539 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWT 598
_________ ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRL EPAPLFD _AMLALDSPESGWT
Subjct: 541 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWT 600

Query: 599 EEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFIL 658
_________ EEDGPKEGLAEYIVEFLKKKA+MLADYFS+EIDEEGNLIGLPLLID+ YVPPLEGLPIFIL
Subjct: 601 EEDGPKEGLAEYIVEFLKKKAKMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIFIL 660

Query: 659 RLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEH 718
_________ RLATEVNWDEE _ECFESLSKECA+ FYSIRKQYI_EES_LSGQQS+ +PGS_WKW___ TVEH
Subjct: 661 RLATEVNWDEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWTVEH 719

Query: 719 IVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
_________ I +YKA_RSH+LPPKHFTEDGN+LQLANLPDL KVFERC
Subjct: 720 IIYKAFRSHLLPPKHFTEDGNVLQLANLPDLCKVFERC 757

  • Drosophila melanogaster
Score = 615 bits (1586), Expect = e-175
Identities = 335/751 (44%), Positives = 453/751 (59%), Gaps = 94/751 (12%)

Query: 6 GVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNG 65
_______ GVIR+LDE_ VVNRIAAGE+IQRPANA+KE++EN_ LDA+ST_IQV_VK_GGLKL+QIQDNG
Subjct: 8 GVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQSTHIQVQVKAGGLKLLQIQDNG 67

Query: 66 TGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCA 125
________TGIR+ EDL_ IVCERFTTSKL__ FEDL+_ I+T+GFRGEALASISHVAH+ + I_TKTA___KC
Subjct: 68 TGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEALASISHVAHLSIQTKTAKEKCG 127

Query: 126 YRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSV 185
_________ Y+A+Y+DGKL+ __ PKPCAGNQGT_ I_+EDLFYN+__ RR+ AL+ +P+EE +_ ++_EV+_ RY+V
Subjct: 128 YKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRRQALRSPAEEFQRLSEVLARYAV 187

Query: 186 HNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGY 245
_________ HN__+_ F+++ KQG+_____ +RT____+S___+NIR_I+G_ A+ S+EL+E____D+____F+
Subjct: 188 HNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAAISKELLEFSHRDEVYKFEAECL 247

Query: 246 ISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDV 305
_________ I+N__ YS_ KKC_ LLFIN_ RLVEST+LR_ ++++ +YA_ YLP+___HPF+Y+SL_+_PQN+DV
Subjct: 248 ITQVNYSAKKCQMLLFINQRLVESTALRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDV 307

Query: 306 NVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKXXXX 365
_________ NVHPTKHEVHFL+ +E_I++_ + +Q_+E+ +LLGSN++R__++_ Q__LPG____P__+___+
Subjct: 308 NVHPTKHEVHFLYQEEIVDSIKQQVEARLLGSNATRTFYKQLRLPG--APDLDETQ---- 361

Query: 366 XXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRAR 425
________________________ + + Y__ + MVRTDS_EQKLD_ FL_ PL_K__S_____+___ E__________ A
Sbjct: 362 _________ LADKTQRIYPKEMVRTDSTEQKLDKFLAPLVKSDSGVSSSSSQE___________AS 407

Query: 426 QQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHRXXXXXXXXXXX 485
___________+_ EE_ +___ A_________________________________________ K+_ R
Subjct: 408 RLPEESFRVTAA------------------------------------------------------------------KKSREVRLSSVLDMR 434

Query: 486 XRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQT 545
__________ R_ E____ +___ R__+ NL__ V__+_ E_______HE_____M_+___ SF___________________
Subjct: 435 KRVERQCSVQLRSTLKNLVYVGCVDERRALFQHETRLYMCNTRSF----------------------------- 479

Query: 546 KLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKE 605
_____________________ SEELFYQ + IY+ F_ N___ +_ +S_P_ PL_+L__+L+ L+ S__ +GWT_EDG K
Sbjct: 480 ---------------------SEELFYQRMIYEFQNCSEITISPPLPLKELLILSLESEAAGWTPEDGDKA 529

Query: 606 GLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVN 665
__________ LA+____+_L_ KKA_ ++_ + YF+L_I_ E+G_ L__LP_ L+__+_ P_+___LP+++LRLATEV+
Subjct: 530 ELADGAADILLKKAPIMREYFGLRISEDGMLESLPSLLHQHRPCVAHLPVYLLRLATEVD 589

Query: 666 WDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALR 725
_________ W++ E__ CFE+__+E_ A_ FY+_______________ Q______G+__ ____+WT+EH+ ++_ A_+
Subjct: 590 WEQETRCFETFCRETARFYA-------------------------QLDWREGATAGFSRWTMEHVLFPAFK 635

Query: 726 SHILPPKHFTEDGNILQLANLPDLYKVFERC 756
__________+ +LPP____ +____ I_ +L_ NLP_ LYKVFERC
Subjct: 636 KYLLPPPRIKD----QIYELTNLPTLYKVFERC 664

5.Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.

  • position of CD: residues 147~327, total 179 residues
  • Name of CD: DNA_mis_repair, DNA mismatch repair protein. Also known as the mutL/hexB/PMS1 family.
  • PfamID: 01119

6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.

 

 

Back to Bioinformatics