Compare human colon cancer gene MLH1 with other genes.
- To use ORF finder to translate DNA sequence to protein sequence in all reading frames. -
- To use blastn, blastp, CD search and blast 2 sequence programs for searching and
comparison. -
-Deadline- 11/06/2001

1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.
No significant similarity was found
2. Translate the above two gene sequences to protein sequences.
MLH1







FramefromtoLength
+122..22922271
-11396..1650255
+2926..1120195
+21304..1492189
-2489..677189
-21353..1523171
+22231..2368138
-21716..1853138
-22358..2483126
-21..119119
-22070..2177108
+33..110108

muts







<<
FramefromtoLength
+11..25612562
-1565..936372
-2432..803372
+22243..2548306
-11177..1449273
-11546..1755210
-21029..1235207
+2728..922195
-21698..1850153
-11..147147
+21583..1696114
+2362..472111
-2255..356102
+33..104102

3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.

                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

gi|13878583|sp|Q9JK91|MLH1_MOUSE  DNA MISMATCH REPAIR PROTEI...  1292  0.0
gi|13591989|ref|NP_112315.1|  mismatch repair protein [Rattu...  1289  0.0
gi|4557757|ref|NP_000240.1|  mutL homolog 1; mutL (E. coli) ...  1467  0.0
gi|466462|gb|AAA17374.1|  (U07418) human homolog of E. coli ...  1466  0.0
gi|604369|gb|AAA85687.1|  (U17857) hMLH1 gene product [Homo ...  1453  0.0
gi|12835158|dbj|BAB23172.1|  (AK004105) putative [Mus musculus]   753  0.0
gi|13543339|gb|AAH05833.1|AAH05833  (BC005833) Similar to mu...   731  0.0
gi|7304079|gb|AAF59117.1|  (AE003838) Mlh1 gene product [Dro...   615  e-175
gi|3192877|gb|AAC19117.1|  (AF068257) mutL homolog [Drosophi...   608  e-173
gi|460627|gb|AAA16835.1|  (U07187) Mlh1p [Saccharomyces cere...   471  e-132

4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.
M. musculus
 gi|7595954|gb|AAF64514.1|AF250844_1 (AF250844) MutL homolog 1 protein [Mus musculus]
          Length = 760

 Score = 1292 bits (3344), Expect = 0.0
 Identities = 651/760 (85%), Positives = 693/760 (90%), Gaps = 4/760 (0%)

Query: 1   MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
           M+FVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKST+IQV+VKEGGLKLIQ
Sbjct: 1   MAFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTNIQVVVKEGGLKLIQ 60

Query: 61  IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
           IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLASISTYGFRGEALASISHVAHVTITTKTA
Sbjct: 61  IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLASISTYGFRGEALASISHVAHVTITTKTA 120

Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
           DGKCAYRASYSDGKL+APPKPCAGNQGT ITVEDLFYNI TRRKALKNPSEEYGKILEVV
Sbjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRRKALKNPSEEYGKILEVV 180

Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
           GRYS+HN+GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Sbjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240

Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
           KMNGYISNANYSVKKCIFLLFINHRLVES +LRKAIETVYAAYLPKNTHPFLYLSLEISP
Sbjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALRKAIETVYAAYLPKNTHPFLYLSLEISP 300

Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
           QNVDVNVHPTKHEVHFLHEESIL+RVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE  
Sbjct: 301 QNVDVNVHPTKHEVHFLHEESILQRVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAA 360

Query: 361 KXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQ--AIVTEDKTD 418
           +              DKVYA+QMVRTDSR+QKLDAFLQP+S  + SQPQ  A V   +T+
Sbjct: 361 RPTTGVASSSTSGSGDKVYAYQMVRTDSRDQKLDAFLQPVSSLVPSQPQDPAPVRGARTE 420

Query: 419 ISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSS--NPRKRHRXX 476
            S  RA ++DEEML LPAPAE AA++++LE ++   TS+ ++K  PTSS  + RKRHR  
Sbjct: 421 GSPERATREDEEMLALPAPAEAAAESENLERESLMETSDAAQKAAPTSSPGSSRKRHRED 480

Query: 477 XXXXXXXXXXRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNP 536
                      KEMTAAC PRRRIINLTSVLSLQEEI+E+ HE LRE+L NHSFVGCVNP
Sbjct: 481 SDVEMVENASGKEMTAACYPRRRIINLTSVLSLQEEISERCHETLREILRNHSFVGCVNP 540

Query: 537 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 596
           QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG
Sbjct: 541 QWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESG 600

Query: 597 WTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIF 656
           WTE+DGPKEGLAEYIVEFLKKKAEMLADYFS+EIDEEGNLIGLPLLID+YVPPLEGLPIF
Sbjct: 601 WTEDDGPKEGLAEYIVEFLKKKAEMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIF 660

Query: 657 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTV 716
           ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI EESTLSGQQS++PGS    WKWTV
Sbjct: 661 ILRLATEVNWDEEKECFESLSKECAMFYSIRKQYILEESTLSGQQSDMPGSTSKPWKWTV 720

Query: 717 EHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
           EHI+YKA RSH+LPPKHFTEDGN+LQLANLPDLYKVFERC
Sbjct: 721 EHIIYKAFRSHLLPPKHFTEDGNVLQLANLPDLYKVFERC 760

R. norvegicus
gi|1724118|gb|AAB38506.1| (U80054) mismatch repair protein [Rattus norvegicus]
          Length = 757

 Score = 1289 bits (3336), Expect = 0.0
 Identities = 639/758 (84%), Positives = 684/758 (89%), Gaps = 3/758 (0%)

Query: 1   MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQ 60
           MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEM ENCLDAKST+IQVIV+EGGLKLIQ
Sbjct: 1   MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMTENCLDAKSTNIQVIVREGGLKLIQ 60

Query: 61  IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA 120
           IQDNGTGIRKEDLDIVCERFTTSKLQ+FEDLA ISTYGFRGEALASISHVAHVTITTKTA
Sbjct: 61  IQDNGTGIRKEDLDIVCERFTTSKLQTFEDLAMISTYGFRGEALASISHVAHVTITTKTA 120

Query: 121 DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVV 180
           DGKCAYRASYSDGKL+APPKPCAGNQGT ITVEDLFYNI TR+KALKNPSEEYGKILEVV
Sbjct: 121 DGKCAYRASYSDGKLQAPPKPCAGNQGTLITVEDLFYNIITRKKALKNPSEEYGKILEVV 180

Query: 181 GRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF 240
           GRYS+HN+GISFSVKKQGETV+DVRTLPNA+TVDNIRSIFGNAVSRELIE+GCEDKTLAF
Sbjct: 181 GRYSIHNSGISFSVKKQGETVSDVRTLPNATTVDNIRSIFGNAVSRELIEVGCEDKTLAF 240

Query: 241 KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP 300
           KMNGYISNANYSVKKCIFLLFINHRLVES +L+KAIE VYAAYLPKNTHPFLYL LEISP
Sbjct: 241 KMNGYISNANYSVKKCIFLLFINHRLVESAALKKAIEAVYAAYLPKNTHPFLYLILEISP 300

Query: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMV 360
           QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE V
Sbjct: 301 QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEAV 360

Query: 361 KXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS 420
           K              DKV+A+QMVRTDSR+QKLDAF+QP+S+ L SQPQ  V  ++T+ S
Sbjct: 361 KSTTGIASSSTSGSGDKVHAYQMVRTDSRDQKLDAFMQPVSRRLPSQPQDPVPGNRTEGS 420

Query: 421 SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSE-MSEKRGPTS-SNPRKRHRXXXX 478
             +A Q+D+E+ ELPAP E AA + SLE ++  G SE ++ +R P+S  + RKRH     
Sbjct: 421 PEKAMQKDQEISELPAPMEAAADSASLERESVIGASEVVAPQRHPSSPGSSRKRHPEDSD 480

Query: 479 XXXXXXXXRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW 538
                   RKEMTAAC PRRRIINLTSVLSLQEEIN++GHE LREML NH+FVGCVNPQW
Sbjct: 481 VEMMENDSRKEMTAACYPRRRIINLTSVLSLQEEINDRGHETLREMLRNHTFVGCVNPQW 540

Query: 539 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWT 598
           ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRL EPAPLFD AMLALDSPESGWT
Sbjct: 541 ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLPEPAPLFDFAMLALDSPESGWT 600

Query: 599 EEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFIL 658
           EEDGPKEGLAEYIVEFLKKKA+MLADYFS+EIDEEGNLIGLPLLID+YVPPLEGLPIFIL
Sbjct: 601 EEDGPKEGLAEYIVEFLKKKAKMLADYFSVEIDEEGNLIGLPLLIDSYVPPLEGLPIFIL 660

Query: 659 RLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEH 718
           RLATEVNWDEE ECFESLSKECA+FYSIRKQYI EES LSGQQS++PGS    WKWTVEH
Sbjct: 661 RLATEVNWDEE-ECFESLSKECAVFYSIRKQYILEESALSGQQSDMPGSPSKPWKWTVEH 719

Query: 719 IVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC 756
           I+YKA RSH+LPPKHFTEDGN+LQLANLPDL KVFERC
Sbjct: 720 IIYKAFRSHLLPPKHFTEDGNVLQLANLPDLCKVFERC 757

D. melanogaster.
>gi|7304079|gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster]
          Length = 664

 Score =  615 bits (1586), Expect = e-175
 Identities = 335/751 (44%), Positives = 453/751 (59%), Gaps = 94/751 (12%)

Query: 6   GVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNG 65
           GVIR+LDE VVNRIAAGE+IQRPANA+KE++EN LDA+ST IQV VK GGLKL+QIQDNG
Sbjct: 8   GVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQSTHIQVQVKAGGLKLLQIQDNG 67

Query: 66  TGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCA 125
           TGIR+EDL IVCERFTTSKL  FEDL+ I+T+GFRGEALASISHVAH++I TKTA  KC 
Sbjct: 68  TGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEALASISHVAHLSIQTKTAKEKCG 127

Query: 126 YRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSV 185
           Y+A+Y+DGKL+  PKPCAGNQGT I +EDLFYN+  RR+AL++P+EE+ ++ EV+ RY+V
Sbjct: 128 YKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRRQALRSPAEEFQRLSEVLARYAV 187

Query: 186 HNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGY 245
           HN  + F+++KQG+    +RT   +S  +NIR I+G A+S+EL+E    D+   F+    
Sbjct: 188 HNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAAISKELLEFSHRDEVYKFEAECL 247

Query: 246 ISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDV 305
           I+  NYS KKC  LLFIN RLVEST+LR +++++YA YLP+  HPF+Y+SL + PQN+DV
Sbjct: 248 ITQVNYSAKKCQMLLFINQRLVESTALRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDV 307

Query: 306 NVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKXXXX 365
           NVHPTKHEVHFL++E I++ ++Q +E++LLGSN++R ++ Q  LPG   P  +  +    
Sbjct: 308 NVHPTKHEVHFLYQEEIVDSIKQQVEARLLGSNATRTFYKQLRLPG--APDLDETQ---- 361

Query: 366 XXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRAR 425
                      ++Y  +MVRTDS EQKLD FL PL K  S    +   E         A 
Sbjct: 362 -----LADKTQRIYPKEMVRTDSTEQKLDKFLAPLVKSDSGVSSSSSQE---------AS 407

Query: 426 QQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHRXXXXXXXXXXX 485
           +  EE   + A                                  K+ R           
Sbjct: 408 RLPEESFRVTAA---------------------------------KKSREVRLSSVLDMR 434

Query: 486 XRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQT 545
            R E   +   R  + NL  V  + E      HE    M +  SF               
Sbjct: 435 KRVERQCSVQLRSTLKNLVYVGCVDERRALFQHETRLYMCNTRSF--------------- 479

Query: 546 KLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKE 605
                     SEELFYQ +IY+F N   + +S P PL +L +L+L+S  +GWT EDG K 
Sbjct: 480 ----------SEELFYQRMIYEFQNCSEITISPPLPLKELLILSLESEAAGWTPEDGDKA 529

Query: 606 GLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVN 665
            LA+   + L KKA ++ +YF L I E+G L  LP L+  + P +  LP+++LRLATEV+
Sbjct: 530 ELADGAADILLKKAPIMREYFGLRISEDGMLESLPSLLHQHRPCVAHLPVYLLRLATEVD 589

Query: 666 WDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALR 725
           W++E  CFE+  +E A FY+              Q     G+     +WT+EH+++ A +
Sbjct: 590 WEQETRCFETFCRETARFYA--------------QLDWREGATAGFSRWTMEHVLFPAFK 635

Query: 726 SHILPPKHFTEDGNILQLANLPDLYKVFERC 756
            ++LPP    +   I +L NLP LYKVFERC
Sbjct: 636 KYLLPPPRIKD--QIYELTNLPTLYKVFERC 664

5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.
answer
6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.
                       10        20        30        40        50        60
               ....*....|....*....|....*....|....*....|....*....|....*....|
consensus    1 GTTVEVRDLFYNLPVRRKFLKSPKKEFRKILDLLQRYALIHPNVSFSLTKEG--KALLQL 58
query      147 GTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQG--ETVADV 204
1B63_A     144 GTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNG--KIVRQY 201
gi 8039787 159 GTVVRVEQLFENFPARKRFLGRQSAETTLCRSALIDVSLAHHPVEFRFTVDGthKLTLLS 218
gi 8928214 141 GTIVDVTKIFHNFPARKRFLKQEPIETKMCLKVLEEKIITHPEINFEIN-LN--QKLRKI 197
gi 3914081 141 GTEVEVRDLFFNLPVRRKFLKKEDTERRKVLELIKEYALTNPEVEFTLFSEG--RETLKL 198
gi 3914082 141 GTEVEVYDLFFNLPARKKFLRKEDTERRKITELVKEYAITNPQVDFHLFSEG--KETLNL 198

                       70        80        90       100       110       120
               ....*....|....*....|....*....|....*....|....*....|....*....|
consensus   59 KTSP--S-SLKERIRSVFGTAVLKNLIPF--EEKDGDFRIEG-FISSPNVSR-SSRDRQF 111
query      205 RTLP--NaSTVDNIRSIFGNAVSRELIEIgcEDKTLAFKMNG-YISNANYSV--KKCIFL 259
1B63_A     202 RAVPegG-QKERRLGAICGTAFLEQALAI--EWQHGDLTLRG-WVADPNHTTpALAEIQY 257
gi 8039787 219 QQTR--K-DRCLETQMLKGDPALFHTIEG--G--DCSFHFHLvLSEPAICRR--ERRGIF 269
gi 8928214 198 YFK---E-SLIDRVQNVYGNVIENNKFRV--LKKEHDNIKIEiFLAPDNFSK-KSKRHIK 250
gi 3914081 199 KKS-----SLKERVEEVFQTKTEELYAER--E--GITLRA---FVSRNQRQG-----KYY 241
gi 3914082 199 KKK-----DLKGRIEEIFESIFEEESSER--E--GIKVRA---FISRNQKRG-----KYY 241

                      130       140       150       160       170       180
               ....*....|....*....|....*....|....*....|....*....|....*....|
consensus  112 LFINGRPVEDKLLLKAIREVYATYLPRGRYPVFVLNLELPPELVDVNVHPDKKEVRLLKE 171
query      260 LFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHE 319
1B63_A     258 CYVNGRMMRDRLINHAIRQACEDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQS 317
gi 8039787 270 TFVNGRRIFDYGLVQALVLGSEGYFPNGTFPVACLFLTVNSERIDFNIHPAKKEVHLQDY 329
gi 8928214 251 TFVNRRPIDQKDLLEAITNGHSRILSPGNFPICYLFLEINPEYIDFNVHPQKKEVRFYNL 310
gi 3914081 242 VFINKRPIQNKNLKEFLRKVFG------YKTLVVLYAELPPFMVDFNVHPKKKEVNILKE 295
gi 3914082 242 LFVNSRPVYNKNLKEYLKKTFG------YKTIVVLFIDIPPFLVDFNVHPKKKEVKFLKE 295

               ....*...
consensus  172 EEILDLIK 179
query      320 ESILERVQ 327
1B63_A     318 RLVHDFIY 325
gi 8039787 330 AHIRHTLS 337
gi 8928214 311 PFLFKLIS 318
gi 3914081 296 RKFLELVR 303
gi 3914082 296 RKIYELIR 303


§@·~¦n¦h³á!!!