Assignment 6
1. Add MLH1_Human protein to the Biology Workbench. Predict its secondary structure by GOR4.
>GENPEPT:463989
MSFV AGVIRR LD ETVVNRIAA G EVIQRPANAIKEMIENC LDAKSTS IQVI
VKE GGL KLIQI QDNGTGI RKEDLDIVC ERFTTSKLQSF EDLASI STYGFR
GE ALASISH VAHVTITT KTADGKC AYRA SYSDGKLKAPPKPCAGNQGT QI
TV EDLFYNIATRRKAL KNPS EEYGK ILEVVG RYSVHNAG ISFSV KKQGE T
VADVRT LPNASTVDN IRSI FGNAVS RELIEI GCE DKTLAF KMNGY IS NAN
YSVKK CIFLLF INHR LVESTSLRKAIETVYAA YLPKNTHP FLYL SLEISP
QNVD VNV HPTKH EVHFLHEESILERVQQHIESKLL GSNSSR MYFTQTL LP
GLAGPSGE MVKSTTSLTSS STSGSSD KVYAHQMVRTDSREQKLDAFL QPL
SKPLSSQPQ AIV TEDKTD ISSGRARQQDEEMLE LPA PAEVAAKNQ SLEGD
TTKGTS EMS EKRGPTSSNPRKRHREDSD VEMVEDDSRKEMTAA CTPRR RI
I NLTS VLSLQEEINEQGHEVLREML HNHS FVGCV NPQW ALAQHQTKLYLL
N TTKL SEELFYQILIY DFANFG VL RLSEPAP LFDLAMLA LDSPESGWTEE
DGPKEG LAEYIVEFLKKKAEMLADYFSLEIDE EGNLIGLP LLI DNYVPPL
EGLP IFILRLATEV N WDEEKE CFESLSKEC AMFYS IRKQYIS EEST LSGQ
QSEVPGSIPNSWK WTVE H IVYKALR SHILPPKHFTEDG NILQLA NLPDL Y
KVFER C
Alpha Helix = H Beta Sheet = E Random Coil = C
2. Do a homology searching of MLH1_Human in Genpept Full Release Database. Import MLH1-like protein of C. elegans, S. cerevisiae, D. melanogaster, R. norvegicus and M. musculus to your workbench. Run CLUSTALW to get multiple sequence alignment for these six proteins.
Mus musculus -----------------MAFVAGVI R RL DETVVN RI AAGE VI QRP ANA I KE M IEN C L DA K
Rattus norvegicus -----------------MSFVAGVI R RL DETVVN RI AAGE VI QRP ANA I KE M TEN C L DA K
Human -----------------MSFVAGVI R RL DETVVN RI AAGE VI QRP ANA I KE M IEN C L DA K
Drosophila melanogaster ---------------MAEYLQPGVI R KL DEVVVN RI AAGE II QRP ANA L KE L LEN S L DA Q
Saccharomyces cerevisiae --------------------MSLRI K AL DASVVN KI AAGE II ISP VNA L KE M MEN S I DA N
Caenorhabditis elegans MWHCGYRTRNCDEFSKIEFSLMGLI Q RL PQDVVN RM AAGE VL ARP CNA I KE L VEN S L DA G
* : * *** :: **** :: * ** : ** : ** . : **
Mus musculus S T NI QV VVK EGG L K LI Q I QDNG T GI R K ED L DIV CERF T TSKL QTFEDL ASI ST Y GFRGEA
Rattus norvegicus S T NI QV IVR EGG L K LI Q I QDNG T GI R K ED L DIV CERF T TSKL QTFEDL AMI ST Y GFRGEA
Human S T SI QV IVK EGG L K LI Q I QDNG T GI R K ED L DIV CERF T TSKL QSFEDL ASI ST Y GFRGEA
Drosophila melanogaster S T HI QV QVK AGG L K LL Q I QDNG T GI R R ED L AIV CERF T TSKL TRFEDL SQI AT F GFRGEA
Saccharomyces cerevisiae A T MI DI LVK EGG I K VL Q I TDNG S GI N K AD L PIL CERF T TSKL QKFEDL SQI QT Y GFRGEA
Caenorhabditis elegans A T EI MV NMQ NGG L K LL Q V SDNG K GI E R ED F ALV CERF A TSKL QKFEDL MHM KT Y GFRGEA
: * * : :: ** : * :: * : *** . ** . : * : :: **** : **** **** : * : ******
Mus musculus LAS I SHVA HV T I TT K TAD GK C A YR AS Y SD GK L QAPP KP CAG NQ GT LI TVEDLF Y N I ITR R
Rattus norvegicus LAS I SHVA HV T I TT K TAD GK C A YR AS Y SD GK L QAPP KP CAG NQ GT LI TVEDLF Y N I ITR K
Human LAS I SHVA HV T I TT K TAD GK C A YR AS Y SD GK L KAPP KP CAG NQ GT QI TVEDLF Y N I ATR R
Drosophila melanogaster LAS I SHVA HL S I QT K TAK EK C G YK AT Y AD GK L QGQP KP CAG NQ GT II CIEDLF Y N M PQR R
Saccharomyces cerevisiae LAS I SHVA RV T V TT K VKE DR C A WR VS Y AE GK M LESP KP VAG KD GT TI LVEDLF F N I PSR L
Caenorhabditis elegans LAS L SHVA KV N I VS K RAD AK C A YQ AN F LD GK M TADT KP AAG KN GT CI TATDLF Y N L PTR R
*** : **** :: . : : * . : * . :: .. : : ** : . ** ** :: ** * *** : * : *
Mus musculus K AL KN PSE E YGKI LE V V GR YSI H NSGI S F S VKK QGE TVSDV RT LPNAT TVD NI RSI FG NA
Rattus norvegicus K AL KN PSE E YGKI LE V V GR YSI H NSGI S F S VKK QGE TVSDV RT LPNAT TVD NI RSI FG NA
Human K AL KN PSE E YGKI LE V V GR YSV H NAGI S F S VKK QGE TVADV RT LPNAS TVD NI RSI FG NA
Drosophila melanogaster Q AL RS PAE E FQRL SE V L AR YAV H NPRV G F T LRK QGD AQPAL RT PVASS RSE NI RII YG AA
Saccharomyces cerevisiae R AL RS HND E YSKI LD V V GR YAI H SKDI G F S CKK FGD SNYSL SV KPSYT VQD RI RTV FN KS
Caenorhabditis elegans N KM TT HGE E AKMV ND T L LR FAI H RPDV S F A LRQ --N QAGDF RT KGDGN FRD VV CNL LG RD
. : . : * : : . : * ::: * : . * : :: : . . . : : : .
Mus musculus VS RELI EV G-CE DKTLAF K-MNGYI SN ANYSVK KC I----------F LL FIN HR LV ES AA
Rattus norvegicus VS RELI EV G-CE DKTLAF K-MNGYI SN ANYSVK KC I----------F LL FIN HR LV ES AA
Human VS RELI EI G-CE DKTLAF K-MNGYI SN ANYSVK KC I----------F LL FIN HR LV ES TS
Drosophila melanogaster IS KELL EF S-HR DEVYKF E-AECLI TQ VNYSAK KC Q----------M LL FIN QR LV ES TA
Saccharomyces cerevisiae VA SNLI TF HISK VEDLNL ESVDGKV CN LNFISK KS IS---------L IF FIN NR LV TC DL
Caenorhabditis elegans VA DTIL PL S-LN STRLKF T-FTGHI SK PIASAT AA IAQNRKTSRSFF SV FIN GR SV RC DI
:: :: . . : : : . . : . *** * * .
Mus musculus L R KA IE TV YA AYLPK NTHPF LYL SL EI SPQNV DVNVHPTK HE V HFL HEE S I LQ RV QQHIE
Rattus norvegicus L K KA IE AV YA AYLPK NTHPF LYL IL EI SPQNV DVNVHPTK HE V HFL HEE S I LE RV QQHIE
Human L R KA IE TV YA AYLPK NTHPF LYL SL EI SPQNV DVNVHPTK HE V HFL HEE S I LE RV QQHIE
Drosophila melanogaster L R TS VD SI YA TYLPR GHHPF VYM SL TL PPQNL DVNVHPTK HE V HFL YQE E I VD SI KQQVE
Saccharomyces cerevisiae L R RA LN SV YS NYLPK GFRPF IYL GI VI DPAAV DVNVHPTK RE V RFL SQD E I IE KI ANQLH
Caenorhabditis elegans L K HP ID EV LG --ARQ LHAQF CAL HL QI DETRI DVNVHPTK NS V IFL EKE E I IE EI RAYFE
* : . :: : . : * : : : : ******** .. * ** :: . * :: : ..
Mus musculus SKL LGSNSSR MYFTQ TLLPGLA G------PSGEAARPTTGVAS SS TSGSGDKVYAYQMVR
Rattus norvegicus SKL LGSNSSR MYFTQ TLLPGLA G------PSGEAVKSTTGIAS SS TSGSGDKVHAYQMVR
Human SKL LGSNSSR MYFTQ TLLPGLA G------PSGEMVKSTTSLTS SS TSGSSDKVYAHQMVR
Drosophila melanogaster ARL LGSNATR TFYKQ LRLPGAP -----------------DLDE TQ LADKTQRIYPKEMVR
Saccharomyces cerevisiae AEL SAIDTSR TFKAS SISTNKP ESLIPFNDTIESDRNRKSLRQ AQ VVENSYTTANSQLRK
Caenorhabditis elegans KVI GEIFGFE ALDVE KPEEEQP D--------IENLVMIPMSQS LK SIEAIRKPDTKPEFK
: . . . . . :
Mus musculus T D SRDQK LDAFLQPVSSLVPSQPQDPAPVRGARTEGSPERATREDEEMLALPAPAEAAAE
Rattus norvegicus T D SRDQK LDAFMQPVSRRLPSQPQD--PVPGNRTEGSPEKAMQKDQEISELPAPMEAAAD
Human T D SREQK LDAFLQPLSKPLSSQPQ--AIVTEDKTDISSGRARQQDEEMLELPAPAEVAAK
Drosophila melanogaster T D STEQK LDKFLAPLVK-------------------------------------------
Saccharomyces cerevisiae A K RQENK LVRIDASQAKITSFLSSS--QQFNFEGSSTKRQLSEPKVTNVSHSQEAEKLTL
Caenorhabditis elegans S S PSAWK SDKKRVDYMEVRTDAKERKIDEFVTRGGAVGPTTSNDDIFGGSGILKRARTED
: . *
Mus musculus SENLERESLMETSDAAQKAAPTSSPGSSRKR HRE DSDVEMVENASGKEMTAACYPRRRII
Rattus norvegicus SASLERESVIGASEVVAPQRHPSSPGSSRKR HPE DSDVEMMENDSRKEMTAACYPRRRII
Human NQSLEGDTTKGTSEMSEKRGPTSS--NPRKR HRE DSDVEMVEDDSRKEMTAACTPRRRII
Drosophila melanogaster ----------------SDSGVSSSSSQEASR LPE ES------------FRVTAAKKSREV
Saccharomyces cerevisiae NESEQPRDANTINDNDLKDQPKKKQKLGDYK VPS IADDEKNALPISKDGYIRVPKERVNV
Caenorhabditis elegans STGGEKEPEDLNTDFDDVSMVSLVSTADGRR LNE SQD-----LGEDDDVDFEYGKTHREF
: . .
Mus musculus N L TS V LSLQE E I SERCHETL RE IL RN HSF VG CVN PQ W--A LAQ H QTKL Y LLN TTKL SEE L
Rattus norvegicus N L TS V LSLQE E I NDRGHETL RE ML RN HTF VG CVN PQ W--A LAQ H QTKL Y LLN TTKL SEE L
Human N L TS V LSLQE E I NEQGHEVL RE ML HN HSF VG CVN PQ W--A LAQ H QTKL Y LLN TTKL SEE L
Drosophila melanogaster R L SS V LDMRK R V ERQCSVQL RS TL KN LVY VG CVD ER R--A LFQ H ETRL Y MCN TRSF SEE L
Saccharomyces cerevisiae N L TS I KKLRE K V DDSIHREL TD IF AN LNY VG VVD EE RRLA AIQ H DLKL F LID YGSV CYE L
Caenorhabditis elegans H F ES I EVLRK E I IANSSQSL RE MF KT STF VG SIN VK Q--V LIQ F GTSL Y HLD FSTV LRE F
. : * : ::: . : * . : . : ** :: . . * . * : : .. * :
Mus musculus FYQ ILI YD F AN FG VLRL SEPAPLFDL AM LAL DS PESGWTED DGPKEGLA-----EYIVE F
Rattus norvegicus FYQ ILI YD F AN FG VLRL PEPAPLFDF AM LAL DS PESGWTEE DGPKEGLA-----EYIVE F
Human FYQ ILI YD F AN FG VLRL SEPAPLFDL AM LAL DS PESGWTEE DGPKEGLA-----EYIVE F
Drosophila melanogaster FYQ RMI YE F QN CS EITI CPPLPLKEL LI LSL ES RAAGWTPE DEDKAELA-----DGAAD I
Saccharomyces cerevisiae FYQ IGL TD F AN FG KINL QSTNVSDDI VL YNL LS EFDELN-D DASK---------EKIIS K
Caenorhabditis elegans FYQ ISV FS F GN YG SYRL DE-EPPAII EI LEL LG ELSTREPN YAAFEVFANVENRFAAEK L
*** : . * * . : : : * . : .
Mus musculus L KK KA EML AD Y F S V E I DE EGN--------L IGL P LL I D SY VP PL EGLP I FI LRL ATE V NW
Rattus norvegicus L KK KA KML AD Y F S V E I DE EGN--------L IGL P LL I D SY VP PL EGLP I FI LRL ATE V NW
Human L KK KA EML AD Y F S L E I DE EGN--------L IGL P LL I D NY VP PL EGLP I FI LRL ATE V NW
Drosophila melanogaster L LK KA PIM RE Y F G L R I SE DGM--------L ESL P SL L H QH RP CV AHLP V YL LRL ATE V DW
Saccharomyces cerevisiae I WD MS SML NE Y Y S I E L VN DGLDNDLKSVKL KSL P LL L K GY IP SL VKLP F FI YRL GKE V DW
Caenorhabditis elegans L AE HA DLL HD Y F A I K L DQ LENGR----LHI TEI P SL V H YF VP QL EKLP F LI ATL VLN V DY
: . : :: : * : . : . : : : : * * : . . * : ** . : * : * ::
Mus musculus DE E KE CF ESL SK ECA MFY SIRKQYILEESTLSGQQSDMPGSTSKPWKWT --VE H II YKAF
Rattus norvegicus DE E -E CF ESL SK ECA VFY SIRKQYILEESALSGQQSDMPGSPSKPWKWT --VE H II YKAF
Human DE E KE CF ESL SK ECA MFY SIRKQYISEESTLSGQQSEVPGSIPNSWKWT --VE H IV YKAL
Drosophila melanogaster EQ E TR CF ETF CR ETA RFY --------------AQLDWREGATAVFSRWT --ME H VL FPAF
Saccharomyces cerevisiae ED E QE CL DGI LR EIA LLY IPDMVPKVDTLDASLSEDEKAQFINRKEHIS SLLE H VL FPCI
Caenorhabditis elegans DD E QN TF RTI CR AIG DLF TLDTN---------FITLDKKISAFSATPWK TLIK E VL MPLV
:: * . : : : . :: . :: . :: .
Mus musculus R SHLL P P K HFTEDGNV LQLAN LPDL YKVFERC --
Rattus norvegicus R SHLL P P K HFTEDGNV LQLAN LPDL CKVFERC --
Human R SHIL P P K HFTEDGNI LQLAN LPDL YKVFERC --
Drosophila melanogaster K KYLL P P R ---IKDQI YELTN LPTL YKVFERC --
Saccharomyces cerevisiae K RRFL A P R HILKD--V VEIAN LPDL YKVFERC --
Caenorhabditis elegans K RKFI P P E HFKQAGVI RQLAD SHDL YKVFERC GT
: :: . * . : :::: * ******
* - single, fully conserved residue
: - conservation of strong groups
. - conservation of weak groups
3. Perform BOXSHADE program to get a color-coded plot for the results of question 2.
4. Draw rooted phylogenetic tree for these proteins.
GENPEPT_7595954 Mus musculus MutL homolog 1 protein (MLH1) mRNA, complete cds
GENPEPT_1724118 Rattus norvegicus mismatch repair protein (MLH1) mRNA, complete
GENPEPT_463989 Human DNA mismatch repair protein homolog (hMLH1) mRNA, complete
GENPEPT_3192877 Drosophila melanogaster mutL homolog (Mlh1) gene, complete cds
GENPEPT_460627 Saccharomyces cerevisiae DNA mismatch repair (MLH1) gene, complete
GENPEPT_3880333 Caenorhabditis elegans cosmid T28A8, complete sequence