Compare human colon cancer gene MLH1 with other genes.
- To use ORF finder to translate DNA sequence to protein sequence in all reading frames. -
- To use blastn, blastp, CD search and blast 2 sequence programs for searching and comparison. -
-Deadline- 11/06/2001
1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.
No significant similarity was found
2. Translate the above two gene sequences to protein sequences.
MLH1
Frame
from
to
Length
+1
22
..
2292
2271
-1
1396
..
1650
255
+2
926
..
1120
195
+2
1304
..
1492
189
-2
489
..
677
189
-2
1353
..
1523
171
+2
2231
..
2368
138
-2
1716
..
1853
138
-2
2358
..
2483
126
-2
1
..
119
119
-2
2070
..
2177
108
+3
3
..
110
108
muts
Frame
from
to
Length
+1
<
1
..
2561
2562
-1
565
..
936
372
-2
432
..
803
372
+2
2243
..
2548
306
-1
1177
..
1449
273
-1
1546
..
1755
210
-2
1029
..
1235
207
+2
728
..
922
195
-2
1698
..
1850
153
-1
1
..
147
147
+2
1583
..
1696
114
+2
362
..
472
111
-2
<
255
..
356
102
+3
3
..
104
102
3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.
Score E
Sequences producing significant alignments: (bits) Value
gi|13878583|sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEI... 1292 0.0
gi|13591989|ref|NP_112315.1| mismatch repair protein [Rattu... 1289 0.0
gi|4557757|ref|NP_000240.1| mutL homolog 1; mutL (E. coli) ... 1467 0.0
gi|466462|gb|AAA17374.1| (U07418) human homolog of E. coli ... 1466 0.0
gi|604369|gb|AAA85687.1| (U17857) hMLH1 gene product [Homo ... 1453 0.0
gi|12835158|dbj|BAB23172.1| (AK004105) putative [Mus musculus] 753 0.0
gi|13543339|gb|AAH05833.1|AAH05833 (BC005833) Similar to mu... 731 0.0
gi|7304079|gb|AAF59117.1| (AE003838) Mlh1 gene product [Dro... 615 e-175
gi|3192877|gb|AAC19117.1| (AF068257) mutL homolog [Drosophi... 608 e-173
gi|460627|gb|AAA16835.1| (U07187) Mlh1p [Saccharomyces cere... 471 e-132
4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.
M. musculus
>gi|7304079|gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster]
Length = 664
Score = 615 bits (1586), Expect = e-175
Identities = 335/751 (44%), Positives = 453/751 (59%), Gaps = 94/751 (12%)
Query: 6 GVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNG 65
GVIR+LDE VVNRIAAGE+IQRPANA+KE++EN LDA+ST IQV VK GGLKL+QIQDNG
Sbjct: 8 GVIRKLDEVVVNRIAAGEIIQRPANALKELLENSLDAQSTHIQVQVKAGGLKLLQIQDNG 67
Query: 66 TGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCA 125
TGIR+EDL IVCERFTTSKL FEDL+ I+T+GFRGEALASISHVAH++I TKTA KC
Sbjct: 68 TGIRREDLAIVCERFTTSKLTRFEDLSQIATFGFRGEALASISHVAHLSIQTKTAKEKCG 127
Query: 126 YRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSV 185
Y+A+Y+DGKL+ PKPCAGNQGT I +EDLFYN+ RR+AL++P+EE+ ++ EV+ RY+V
Sbjct: 128 YKATYADGKLQGQPKPCAGNQGTIICIEDLFYNMPQRRQALRSPAEEFQRLSEVLARYAV 187
Query: 186 HNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGY 245
HN + F+++KQG+ +RT +S +NIR I+G A+S+EL+E D+ F+
Sbjct: 188 HNPRVGFTLRKQGDAQPALRTPVASSRSENIRIIYGAAISKELLEFSHRDEVYKFEAECL 247
Query: 246 ISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDV 305
I+ NYS KKC LLFIN RLVEST+LR +++++YA YLP+ HPF+Y+SL + PQN+DV
Sbjct: 248 ITQVNYSAKKCQMLLFINQRLVESTALRTSVDSIYATYLPRGHHPFVYMSLTLPPQNLDV 307
Query: 306 NVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKXXXX 365
NVHPTKHEVHFL++E I++ ++Q +E++LLGSN++R ++ Q LPG P + +
Sbjct: 308 NVHPTKHEVHFLYQEEIVDSIKQQVEARLLGSNATRTFYKQLRLPG--APDLDETQ---- 361
Query: 366 XXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRAR 425
++Y +MVRTDS EQKLD FL PL K S + E A
Sbjct: 362 -----LADKTQRIYPKEMVRTDSTEQKLDKFLAPLVKSDSGVSSSSSQE---------AS 407
Query: 426 QQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHRXXXXXXXXXXX 485
+ EE + A K+ R
Sbjct: 408 RLPEESFRVTAA---------------------------------KKSREVRLSSVLDMR 434
Query: 486 XRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQT 545
R E + R + NL V + E HE M + SF
Sbjct: 435 KRVERQCSVQLRSTLKNLVYVGCVDERRALFQHETRLYMCNTRSF--------------- 479
Query: 546 KLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKE 605
SEELFYQ +IY+F N + +S P PL +L +L+L+S +GWT EDG K
Sbjct: 480 ----------SEELFYQRMIYEFQNCSEITISPPLPLKELLILSLESEAAGWTPEDGDKA 529
Query: 606 GLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVN 665
LA+ + L KKA ++ +YF L I E+G L LP L+ + P + LP+++LRLATEV+
Sbjct: 530 ELADGAADILLKKAPIMREYFGLRISEDGMLESLPSLLHQHRPCVAHLPVYLLRLATEVD 589
Query: 666 WDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALR 725
W++E CFE+ +E A FY+ Q G+ +WT+EH+++ A +
Sbjct: 590 WEQETRCFETFCRETARFYA--------------QLDWREGATAGFSRWTMEHVLFPAFK 635
Query: 726 SHILPPKHFTEDGNILQLANLPDLYKVFERC 756
++LPP + I +L NLP LYKVFERC
Sbjct: 636 KYLLPPPRIKD--QIYELTNLPTLYKVFERC 664
5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number. answer
6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.
10 20 30 40 50 60
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 1 GTTVEVRDLFYNLPVRRKFLKSPKKEFRKILDLLQRYALIHPNVSFSLTKEG--KALLQL 58
query 147 GTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQG--ETVADV 204
1B63_A 144 GTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNG--KIVRQY 201
gi 8039787 159 GTVVRVEQLFENFPARKRFLGRQSAETTLCRSALIDVSLAHHPVEFRFTVDGthKLTLLS 218
gi 8928214 141 GTIVDVTKIFHNFPARKRFLKQEPIETKMCLKVLEEKIITHPEINFEIN-LN--QKLRKI 197
gi 3914081 141 GTEVEVRDLFFNLPVRRKFLKKEDTERRKVLELIKEYALTNPEVEFTLFSEG--RETLKL 198
gi 3914082 141 GTEVEVYDLFFNLPARKKFLRKEDTERRKITELVKEYAITNPQVDFHLFSEG--KETLNL 198
70 80 90 100 110 120
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 59 KTSP--S-SLKERIRSVFGTAVLKNLIPF--EEKDGDFRIEG-FISSPNVSR-SSRDRQF 111
query 205 RTLP--NaSTVDNIRSIFGNAVSRELIEIgcEDKTLAFKMNG-YISNANYSV--KKCIFL 259
1B63_A 202 RAVPegG-QKERRLGAICGTAFLEQALAI--EWQHGDLTLRG-WVADPNHTTpALAEIQY 257
gi 8039787 219 QQTR--K-DRCLETQMLKGDPALFHTIEG--G--DCSFHFHLvLSEPAICRR--ERRGIF 269
gi 8928214 198 YFK---E-SLIDRVQNVYGNVIENNKFRV--LKKEHDNIKIEiFLAPDNFSK-KSKRHIK 250
gi 3914081 199 KKS-----SLKERVEEVFQTKTEELYAER--E--GITLRA---FVSRNQRQG-----KYY 241
gi 3914082 199 KKK-----DLKGRIEEIFESIFEEESSER--E--GIKVRA---FISRNQKRG-----KYY 241
130 140 150 160 170 180
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 112 LFINGRPVEDKLLLKAIREVYATYLPRGRYPVFVLNLELPPELVDVNVHPDKKEVRLLKE 171
query 260 LFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHE 319
1B63_A 258 CYVNGRMMRDRLINHAIRQACEDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQS 317
gi 8039787 270 TFVNGRRIFDYGLVQALVLGSEGYFPNGTFPVACLFLTVNSERIDFNIHPAKKEVHLQDY 329
gi 8928214 251 TFVNRRPIDQKDLLEAITNGHSRILSPGNFPICYLFLEINPEYIDFNVHPQKKEVRFYNL 310
gi 3914081 242 VFINKRPIQNKNLKEFLRKVFG------YKTLVVLYAELPPFMVDFNVHPKKKEVNILKE 295
gi 3914082 242 LFVNSRPVYNKNLKEYLKKTFG------YKTIVVLFIDIPPFLVDFNVHPKKKEVKFLKE 295
....*...
consensus 172 EEILDLIK 179
query 320 ESILERVQ 327
1B63_A 318 RLVHDFIY 325
gi 8039787 330 AHIRHTLS 337
gi 8928214 311 PFLFKLIS 318
gi 3914081 296 RKFLELVR 303
gi 3914082 296 RKIYELIR 303