Assignment 3

Search genes associated with colon cancer in human genome.
BCL10¡ADRCC1¡AMIC1¡AMLH1¡AMSH2¡APTPN12¡ARAD54L¡ASDCCAG1¡ASDCCAG10¡ASDCCAG16¡ASDCCAG28¡ASDCCAG3¡ASDCCAG31¡ASDCCAG33¡ASDCCAG43¡ASDCCAG8¡ASRC¡ASTUB1¡ATGFBR2¡ATNFSE15¡C
MLH1 was identified as a locus frequently mutated in hereditary nonpolyposis colon cancer (HNPCC). When cloned, it was discovered to be a human homolog of the E. coli mismatch repair gene mutS. -Deadline- 10/30/2001

1. How many hits will you get if you search genes associated with colon cancer in human genome?
 About 17¡C

2. How many loci will you find if you search locus link for human in Genebank?
 About 21¡C

3. Give the locus ID and position of MLH1.
 ID¡G4292¡C
 Position¡G3p21.3¡C

4. Find the %ID of nucleotide sequence for its possible orthologs in mouse.
 Hs. 57301 to Mm.4438 is about 90.5¡C

5. Find the total number of mutations of MLH1 reported in human gene mutation database.
 About 147¡C

6. Give the DNA sequence of MLH1.

CTTGGCTCTTCTGGCGCCAAAATGTCGTTCGTGGCAGGGGTTATTCGGCGGCTGGACGAG
ACAGTGGTGAACCGCATCGCGGCGGGGGAAGTTATCCAGCGGCCAGCTAATGCTATCAAA
GAGATGATTGAGAACTGTTTAGATGCAAAATCCACAAGTATTCAAGTGATTGTTAAAGAG
GGAGGCCTGAAGTTGATTCAGATCCAAGACAATGGCACCGGGATCAGGAAAGAAGATCTG
GATATTGTATGTGAAAGGTTCACTACTAGTAAACTGCAGTCCTTTGAGGATTTAGCCAGT
ATTTCTACCTATGGCTTTCGAGGTGAGGCTTTGGCCAGCATAAGCCATGTGGCTCATGTT
ACTATTACAACGAAAACAGCTGATGGAAAGTGTGCATACAGAGCAAGTTACTCAGATGGA
AAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGGGACCCAGATCACGGTGGAG
GACCTTTTTTACAACATAGCCACGAGGAGAAAAGCTTTAAAAAATCCAAGTGAAGAATAT
GGGAAAATTTTGGAAGTTGTTGGCAGGTATTCAGTACACAATGCAGGCATTAGTTTCTCA
GTTAAAAAACAAGGAGAGACAGTAGCTGATGTTAGGACACTACCCAATGCCTCAACCGTG
GACAATATTCGCTCCATCTTTGGAAATGCTGTTAGTCGAGAACTGATAGAAATTGGATGT
GAGGATAAAACCCTAGCCTTCAAAATGAATGGTTACATATCCAATGCAAACTACTCAGTG
AAGAAGTGCATCTTCTTACTCTTCATCAACCATCGTCTGGTAGAATCAACTTCCTTGAGA
AAAGCCATAGAAACAGTGTATGCAGCCTATTTGCCCAAAAACACACACCCATTCCTGTAC
CTCAGTTTAGAAATCAGTCCCCAGAATGTGGATGTTAATGTGCACCCCACAAAGCATGAA
GTTCACTTCCTGCACGAGGAGAGCATCCTGGAGCGGGTGCAGCAGCACATCGAGAGCAAG
CTCCTGGGCTCCAATTCCTCCAGGATGTACTTCACCCAGACTTTGCTACCAGGACTTGCT
GGCCCCTCTGGGGAGATGGTTAAATCCACAACAAGTCTGACCTCGTCTTCTACTTCTGGA
AGTAGTGATAAGGTCTATGCCCACCAGATGGTTCGTACAGATTCCCGGGAACAGAAGCTT
GATGCATTTCTGCAGCCTCTGAGCAAACCCCTGTCCAGTCAGCCCCAGGCCATTGTCACA
GAGGATAAGACAGATATTTCTAGTGGCAGGGCTAGGCAGCAAGATGAGGAGATGCTTGAA
CTCCCAGCCCCTGCTGAAGTGGCTGCCAAAAATCAGAGCTTGGAGGGGGATACAACAAAG
GGGACTTCAGAAATGTCAGAGAAGAGAGGACCTACTTCCAGCAACCCCAGAAAGAGACAT
CGGGAAGATTCTGATGTGGAAATGGTGGAAGATGATTCCCGAAAGGAAATGACTGCAGCT
TGTACCCCCCGGAGAAGGATCATTAACCTCACTAGTGTTTTGAGTCTCCAGGAAGAAATT
AATGAGCAGGGACATGAGGTTCTCCGGGAGATGTTGCATAACCACTCCTTCGTGGGCTGT
GTGAATCCTCAGTGGGCCTTGGCACAGCATCAAACCAAGTTATACCTTCTCAACACCACC
AAGCTTAGTGAAGAACTGTTCTACCAGATACTCATTTATGATTTTGCCAATTTTGGTGTT
CTCAGGTTATCGGAGCCAGCACCGCTCTTTGACCTTGCCATGCTTGCCTTAGATAGTCCA
GAGAGTGGCTGGACAGAGGAAGATGGTCCCAAAGAAGGACTTGCTGAATACATTGTTGAG
TTTCTGAAGAAGAAGGCTGAGATGCTTGCAGACTATTTCTCTTTGGAAATTGATGAGGAA
GGGAACCTGATTGGATTACCCCTTCTGATTGACAACTATGTGCCCCCTTTGGAGGGACTG
CCTATCTTCATTCTTCGACTAGCCACTGAGGTGAATTGGGACGAAGAAAAGGAATGTTTT
GAAAGCCTCAGTAAAGAATGCGCTATGTTCTATTCCATCCGGAAGCAGTACATATCTGAG
GAGTCGACCCTCTCAGGCCAGCAGAGTGAAGTGCCTGGCTCCATTCCAAACTCCTGGAAG
TGGACTGTGGAACACATTGTCTATAAAGCCTTGCGCTCACACATTCTGCCTCCTAAACAT
TTCACAGAAGATGGAAATATCCTGCAGCTTGCTAACCTGCCTGATCTATACAAAGTCTTT
GAGAGGTGTTAAATATGGTTATTTATGCACTGTGGGATGTGTTCTTCTTTCTCTGTATTC
CGATACAAAGTGTTGTATCAAAGTGTGATATACAAAGTGTACCAACATAAGTGTTGGTAG
CACTTAAGACTTATACTTGCCTTCTGATAGTATTCCTTTATACACAGTGGATTGATTATA
AATAAATAGATGTGTCTTAACATA

7. Give the DNA sequence of E. coli mismatch repair gene mutS.
> EG10625 mutS (2855116..2857677) E. coli

ATGAGTGCAA TAGAAAATTT CGACGCCCAT ACGCCCATGA TGCAGCAGTA TCTCAGGCTG
AAAGCCCAGC ATCCCGAGAT CCTGCTGTTT TACCGGATGG GTGATTTTTA TGAACTGTTT
TATGACGACG CAAAACGCGC GTCGCAACTG CTGGATATTT CACTGACCAA ACGCGGTGCT
TCGGCGGGAG AGCCGATCCC GATGGCGGGG ATTCCCTACC ATGCGGTGGA AAACTATCTC
GCCAAACTGG TGAATCAGGG AGAGTCCGTT GCCATCTGCG AACAAATTGG CGATCCGGCG
ACCAGCAAAG GTCCGGTTGA GCGCAAAGTT GTGCGTATCG TTACGCCAGG CACCATCAGC
GATGAAGCCC TGTTGCAGGA GCGTCAGGAC AACCTGCTGG CGGCTATCTG GCAGGACAGC
AAAGGTTTCG GCTACGCGAC GCTGGATATC AGTTCCGGGC GTTTTCGCCT GAGCGAACCG
GCTGACCGCG AAACGATGGC GGCAGAACTG CAACGCACTA ATCCTGCGGA ACTGCTGTAT
GCAGAAGATT TTGCTGAAAT GTCGTTAATT GAAGGCCGTC GCGGCCTGCG CCGTCGCCCG

CTGTGGGAGT TTGAAATCGA CACCGCGCGC CAGCAGTTGA ATCTGCAATT TGGGACCCGC
GATCTGGTCG GTTTTGGCGT CGAGAACGCG CCGCGCGGAC TTTGTGCTGC CGGTTGTCTG
TTGCAGTATG CGAAAGATAC CCAACGTACG ACTCTGCCGC ATATTCGTTC CATCACCATG
GAACGTGAGC AGGACAGCAT CATTATGGAT GCCGCGACGC GTCGTAATCT GGAAATCACC
CAGAACCTGG CGGGTGGTGC GGAAAATACG CTGGCTTCTG TGCTCGACTG CACCGTCACG
CCGATGGGCA GCCGTATGCT GAAACGCTGG CTGCATATGC CAGTGCGCGA TACCCGCGTG
TTGCTTGAGC GCCAGCAAAC TATTGGCGCA TTGCAGGATT TCACCGCCGG GCTACAGCCG
GTACTGCGTC AGGTCGGCGA CCTGGAACGT ATTCTGGCAC GTCTGGCTTT ACGAACTGCT
CGCCCACGCG ATCTGGCCCG TATGCGCCAC GCTTTCCAGC AACTGCCGGA GCTGCGTGCG
CAGTTAGAAA CTGTCGATAG TGCACCGGTA CAGGCGCTAC GTGAGAAGAT GGGCGAGTTT

GCCGAGCTGC GCGATCTGCT GGAGCGAGCA ATCATCGACA CACCGCCGGT GCTGGTACGC
GACGGTGGTG TTATCGCATC GGGCTATAAC GAAGAGCTGG ATGAGTGGCG CGCGCTGGCT
GACGGCGCGA CCGATTATCT GGAGCGTCTG GAAGTCCGCG AGCGTGAACG TACCGGCCTG
GACACGCTGA AAGTTGGCTT TAATGCGGTG CACGGCTACT ACATTCAAAT CAGCCGTGGG
CAAAGCCATC TGGCACCCAT CAACTACATG CGTCGCCAGA CGCTGAAAAA CGCCGAGCGC
TACATCATTC CAGAGCTAAA AGAGTACGAA GATAAAGTTC TCACCTCAAA AGGCAAAGCA
CTGGCACTGG AAAAACAGCT TTATGAAGAG CTGTTCGACC TGCTGTTGCC GCATCTGGAA
GCGTTGCAAC AGAGCGCGAG CGCGCTGGCG GAACTCGACG TGCTGGTTAA CCTGGCGGAA
CGGGCCTATA CCCTGAACTA CACCTGCCCG ACCTTCATTG ATAAACCGGG CATTCGCATT
ACCGAAGGTC GCCATCCGGT AGTTGAACAA GTACTGAATG AGCCATTTAT CGCCAACCCG

CTGAATCTGT CGCCGCAGCG CCGCATGTTG ATCATCACCG GTCCGAACAT GGGCGGTAAA
AGTACCTATA TGCGCCAGAC CGCACTGATT GCGCTGATGG CCTACATCGG CAGCTATGTA
CCGGCACAAA AAGTCGAGAT TGGACCTATC GATCGCATCT TTACCCGCGT AGGCGCGGCA
GATGACCTGG CGTCCGGGCG CTCAACCTTT ATGGTGGAGA TGACTGAAAC CGCCAATATT
TTACATAACG CCACCGAATA CAGTCTGGTG TTAATGGATG AGATCGGGCG TGGAACGTCC
ACCTACGATG GTCTGTCGCT GGCGTGGGCG TGCGCGGAAA ATCTGGCGAA TAAGATTAAG
GCATTGACGT TATTTGCTAC CCACTATTTC GAGCTGACCC AGTTACCGGA GAAAATGGAA
GGCGTCGCTA ACGTGCATCT CGATGCACTG GAGCACGGCG ACACCATTGC CTTTATGCAC
AGCGTGCAGG ATGGCGCGGC GAGCAAAAGC TACGGCCTGG CGGTTGCAGC TCTGGCAGGC
GTGCCAAAAG AGGTTATTAA GCGCGCACGG CAAAAGCTGC GTGAGCTGGA AAGCATTTCG

CCGAACGCCG CCGCTACGCA AGTGGATGGT ACGCAAATGT CTTTGCTGTC AGTACCAGAA
GAAACTTCGC CTGCGGTCGA AGCTCTGGAA AATCTTGATC CGGATTCACT CACCCCGCGT
CAGGCGCTGG AGTGGATTTA TCGCTTGAAG AGCCTGGTGT AA