Assignmnet 3
Search genes associated with colon cancer in human genome.
MLH1 was identified as a locus frequently mutated in hereditary nonpolyposis colon cancer (HNPCC). When cloned, it was discovered to be a human homolog of the E. coli mismatch repair gene mutS.
  1. How many hits will you get if you search genes associated with colon cancer in human genome?
  • 17hits
  • Click to see (NCBI--> Human Map Viwer--> Search for colon cancer)

2. How many loci will you find if you search locus link for human in Genebank?

  • 21 loci
  • Click to see (NCBI--> Human Genome Resources-->Search Locus Link for colon cancer)

3. Give the locus ID and position of MLH1.

4. Find the %ID of nucleotide sequence for its possible orthologs in mouse.

  • 90.5%
  • Click to see (NCBI--> Human Genome Resources--> Search Locus Link for colon cancer--> Click on the Homologene data of 4292 MLH1--> Click more of Hs.57301--> Calculated Orthologs Human vs Mouse)

 

5. Find the total number of mutations of MLH1 reported in human gene mutation database.

  • 147
  • Click to see (NCBI--> Human Genome Resources--> Search Locus Link for colon cancer-->Click on 4292--> Click on HGMD (Human Gene Mutaion Database)

6. Give the DNA sequence of MLH1.

  • Click to see (NCBI--> Human Genome Resources--> Search Locus Link for colon cancer-->Click on 4292--> GenBank Sequences--> Click on Nucleotide BC006850)

  • 1 ggcacttccg ttgagcatct agacgtttcc ttggctcttc tggcgccaaa atgtcgttcg
    61 tggcaggggt tattcggcgg ctggacgaga cagtggtgaa ccgcatcgcg gcgggggaag
    121 ttatccagcg gccagctaat gctatcaaag agatgattga gaactgttta gatgcaaaat
    181 ccacaagtat tcaagtgatt gttaaagagg gaggcctgaa gttgattcag atccaagaca
    241 atggcaccgg gatcaggaaa gaagatctgg atattgtatg tgaaaggttc actactagta
    301 aactgcagtc ctttgaggat ttagccagta tttctaccta tggctttcga ggtgaggctt
    361 tggccagcat aagccatgtg gctcatgtta ctattacaac gaaaacagct gatggaaagt
    421 gtgcatacag agcaagttac tcagatggaa aactgaaagc ccctcctaaa ccatgtgctg
    481 gcaatcaagg gacccagatc acggtggagg acctttttta caacatagcc acgaggagaa
    541 aagctttaaa aaatccaagt gaagaatatg ggaaaatttt ggaagttgtt ggcaggtatt
    601 cagtacacaa tgcaggcatt agtttctcag ttaaaaaaca aggagagaca gtagctgatg
    661 ttaggacact acccaatgcc tcaaccgtgg acaatattcg ctccatcttt ggaaatgctg
    721 ttagtcgaga actgatagaa attggatgtg aggataaaac cctagccttc aaaatgaatg
    781 gttacatatc caatgcaaac tactcagtga agaagtgcat cttcttactc ttcatcaacc
    841 atcgtctggt agaatcaact tccttgagaa aagccataga aacagtgtat gcagcctatt
    901 tgcccaaaaa cacacaccca ttcctgtacc tcagtttaga aatcagtccc cagaatgtgg
    961 atgttaatgt gcaccccaca aagcatgaag ttcacttcct gcacgaggag agcatcctgg
    1021 agcgggtgca gcagcacatc gagagcaagc tcctgggctc caattcctcc aggatgtact
    1081 tcacccagac tttgctacca ggacttgctg gcccctctgg ggagatggtt aaatccacaa
    1141 caagtctgac ctcgtcttct acttctggaa gtagtgataa ggtctatgcc caccagatgg
    1201 ttcgtacaga ttcccgggaa cagaagcttg atgcatttct gcagcctctg agcaaacccc
    1261 tgtccagtca gccccaggcc attgtcacag aggataagac agatatttct agtggcaggg
    1321 ctaggcagca agatgaggag atgcttgaac tcccagcccc tgctgaagtg gctgccaaaa
    1381 atcagagctt ggagggggat acaacaaagg ggacttcaga aatgtcagag aagagaggac
    1441 ctacttccag caaccccaga aagagacatc gggaagattc tgatgtggaa atggtggaag
    1501 atgattcccg aaaggaaatg actgcagctt gtaccccccg gagaaggatc attaacctca
    1561 ctagtgtttt gagtctccag gaagaaatta atgagcaggg acatgaggtt ctccgggaga
    1621 tgttgcataa ccactccttc gtgggctgtg tgaatcctca gtgggccttg gcacagcatc
    1681 aaaccaagtt ataccttctc aacaccacca agcttagtga agaactgttc taccagatac
    1741 tcatttatga ttttgccaat tttggtgttc tcaggttatc ggagccagca ccgctctttg
    1801 accttgccat gcttgcctta gatagtccag agagtggctg gacagaggaa gatggtccca
    1861 aagaaggact tgctgaatac attgttgagt ttctgaagaa gaaggctgag atgcttgcag
    1921 actatttctc tttggaaatt gatgaggaag ggaacctgat tggattaccc cttctgattg
    1981 acaactatgt gccccctttg gagggactgc ctatcttcat tcttcgacta gccactgagg
    2041 tgaattggga cgaagaaaag gaatgttttg aaagcctcag taaagaatgc gctatgttct
    2101 attccatccg gaagcagtac atatctgagg agtcgaccct ctcaggccag cagagtgaag
    2161 tgcctggctc cattccaaac tcctggaagt ggactgtgga acacattgtc tataaagcct
    2221 tgcgctcaca cattctgcct cctaaacatt tcacagaaga tggaaatatc ctgcagcttg
    2281 ctaacctgcc tgatctatac aaagtctttg agaggtgtta aatatggtta tttatgcact
    2341 gtgggatgtg ttcttctttc tctgtattcc gatacaaagt gttgtatcaa agtgtgatat
    2401 acaaagtgta ccaacataag tgttggtagc acttaagact tatacttgcc ttctgatagt
    2461 attcctttat acacagtgga ttgattataa ataaatagat gtgtcttaac ataaaaaaaa
    2521 aaaaaaaaaa

7. Give the DNA sequence of E. coli mismatch repair gene mutS.

  • Click to see (NCBI--> Search Genomes for E.coli mutS--> page 6, 111: AJ006210--> click on gene of mutS)
  • 1 atgagtgcaa tagaaaattt cgacgcccat acgcccatga tgcagcagta tctcaagctg
    61 aaagcccagc atcccgagat cctgctgttt taccggatgg gtgattttta tgaactgttt
    121 tatgacgacg caaaacgcgc gtcgcaactg ctggatattt cactgaccaa acgcggtgct
    181 tcggcgggag agccgatccc gatggcgggg attccctacc atgcggtgga aaactacctc
    241 gccaaactgg tgaatcaggg cgagtccgtt gccatctgcg aacaaattgg cgatccggcg
    301 accagcaaag gtccggttga gcgcaaagtt gtgcgtatcg ttacgccagg caccatcagc
    361 gatgaagccc tgttgcagga gcgtcaggac aacctgctgg cggctatctg gcaggacagc
    421 aaaggtttcg gctacgcgac gctggatatc agttccggtc gttttcgcct gagcgaaccg
    481 gctgaccggg aaacgatggc ggcagaactg caacgcacta atcctgcgga actgctgtat
    541 gcagaagatt ttgctgaaat gtcgttaatt gaaggccgtc gcggcctgcg ccgtcgcccg
    601 ctgtgggagt ttgaaatcga caccgcgcgc cagcagttga atctgcaatt tgggacccgc
    661 gatctggtcg gttttggcgt cgagaacgcg ccgcgcggac tttgtgctgc cggttgtctg
    721 ttgcagtatg cgaaagatac ccaacgtacg actctgccgc atattcgttc catcaccatg
    781 gaacgtgagc aggacagcat cattatggat gccgcgacgc gtcgtaatct ggaaatcacc
    841 cagaacctgg cgggtggtgc ggaaaatacg ctggcttctg tgctcgactg caccgtcacg
    901 ccgatgggca gccgtatgct gaaacgctgg ctgcatatgc cagtgcgcga tacccgcgtg
    961 ttgcttgagc gccagcaaac tattggcgca ttgcaggatt tcaccgccga gttgcagccg
    1021 gtactacgtc aggtcggcga cctggaacgt attctggcgc gtctggcgtt gcgtaccgct
    1081 cgcccacgcg atctggcccg tatgcgtcac gctttccagc aactgccgga gctgcgtgcg
    1141 cagttagaaa ctgttgatag tgcaccagta caggcgctac gtgagaagat gggcgagttt
    1201 gccgagctgc gcgatctgct ggagcgagca atcatcgaca caccgccggt gctggtacgc
    1261 gacggtggtg ttatcgcatc aggctataac gaagagctgg atgagtggcg cgcgctggct
    1321 gacggcgcga ccgattatct ggagcgtctg gaagtccgcg agcgtgaacg taccggcctg
    1381 gacacgctaa aagttggctt taatgcggtg cacggctact acattcaaat cagccgtggg
    1441 caaagccatc tggcacctat caactatatg cgtcgccaga cgctgaaaaa cgccgagcgc
    1501 tacatcattc cagagctaaa agagtacgaa gataaagtcc tcacttcaaa aggcaaagca
    1561 ctggctctgg aaaaacagct ttatgaagag ctgttcgacc tgctgttgcc gcatctggaa
    1621 gcgttgcaac agagcgcgag cgcgctggcg gaactcgacg tgctggtgaa cctggcggaa
    1681 cgggcctata ccctgaacta cacctgcccg accttcattg ataaaccggg cattcgcatt
    1741 accgaaggcc gccatccggt ggttgaacag gtgctgaacg agccatttat cgccaacccg
    1801 ctgaatctgt caccgcagcg ccggatgttg attattaccg gtccgaacat gggcggtaaa
    1861 agtacctata tgcgccagac cgcactgatt gcgctgatgg cctatatcgg cagctacgta
    1921 ccggcgcaaa aagtcgagat tggcccgatt gaccgtatct ttacccgcgt aggggcagcg
    1981 gatgatctgg cttccgggcg ttcaaccttt atggtggaga tgaccgaaac cgctaatatt
    2041 ctgcataacg ccaccgagta cagtctggtg ctgatggatg agattgggcg cggaacgtcc
    2101 acttacgatg gtctgtcgct ggcgtgggcg tgcgcggaaa atctggcgaa taagattaag
    2161 gcgttgacgc tgtttgccac ccactatttc gagctgaccc agttaccgga gaaaatggaa
    2221 ggcgtcgcca acgtgcatct cgatgcactg gagcacggcg acaccattgc ctttatgcat
    2281 agcgtgcagg atggcgcggc gagcaaaagc tacggcctgg cggttgcagc tctggccggc
    2341 gtgccaaaag aggttattaa gcgcgcacgg caaaaactgc gtgagctgga aagcatttcg
    2401 ccgaacgccg ccgctacgca agtggatggt acgcaaatgt ctttgctgtc cgtaccggaa
    2461 gaaacctcgc ctgcagtcga ggcactggaa aacctcgatc cggattcact caccccgcgt
    2521 caggcgctgg aatggattta tcgcctgaag agtctggtgt aa

Back to Bioinformatics