Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Background How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare physical features of the species (as we did with the wild and domestic canines.) We generally expect that brothers and sisters will look more similar to each other than two cousins might. If you make a family tree, you find that brothers and sisters share a common parent, but you most look harder at the tree to find which ancestor the two cousins shore. Cousins do not share the same parents; rather, they share some of the same grandparents. In other words, the common ancestor of two brothers is more recent (their parents) than the common ancestor of two cousins (their grandparents), and in evolutionary sense, this is why we say that two brothers are more closely related than two cousins. Similarly, evolutionary biologists might compare salamanders and frogs and salamanders and fish. More physical features are shared between frogs and salamanders than between frogs and fish, and an evolutionary biologists might use this information to infer that frogs and salamanders had a more recent common ancestor than did frogs and fish. But, no process is without problems. Two very similar looking people are not necessarily related, and two species that have similar features also may not be closely related. Comparing morphology can also be difficult if it is hard to find sufficient morphological characteristics to compare. Remember the problems with the canids! Imagine that you were responsible for determining which two of three salamander species were most closely related. What physical features would you compare? When you ran out of physical features, is there anything else you could compare? Many biologists turn next to comparing genes and proteins. Genes and proteins are not necessarily better than morphological features except in the sense that differences in morphology can be a result of environmental conditions rather than genetics, and differences in genes are definitely genetic. Also, there are sometimes more molecules to compare than physical features. There are three exercises that follow in which you will use protein databases that are in the public domain. You will be able to investigate gene products (which are proteins) and evaluate evolutionary relationships. The protein you will work with is the hemoglobin beta chain. You will obtain your data from public online databases that contains the amino-acid sequences of proteins coded for by many different organisms. Hemoglobin, the molecule that carries oxygen in our bloodstream, is composed of four subunits. In adult hemoglobin, two of these subunits are identical and coded for by the alpha hemoglobin gene located on 16 chromosome. The other two are identical and coded for by the beta-hemoglobin gene found on the 11 chromosome. We will use the protein sequences of the beta hemoglobin as a set of traits to compare among species. Part I “Are Bats Birds?�
1
In Parts I and II, there are no hypotheses. Hence, you are not collecting data to determine the validity of your hypothesis. You are learning a new skill – the mining of molecular information available to all on the internet. In Part III, you will use your skills to test a hypothesis. Then, you will need to determine the Experimental Design Questions. We’ll discuss these in class. However, since Part III does have a hypothesis, you will collect data to determine the validity of your hypothesis. Procedure Part I To be completed as a team. Answer questions in spaces provided on Data Tables Table 1: Morphological comparison of birds, bats, and other mammals Feature Birds Bats Presence of hair Presence of feathers Presence of mammary glands Presence of wings Homeothermy Four-chambered hearts 1a. What morphological features do bats share with mammals?
Other mammals
1b. Based on morphology, are bats more similar to birds or mammals? Use your internet access to complete Part I 2. Generate a distance matrix for the beta-hemoglobin chain for two birds species, two bat species\, and two non-bat mammal species into a word processing worksheet. Follow the steps below to do this. Step 1: Begin by going to www.uniprot.org Step 2. In the Search In dialogue box, use down arrow to select Protein Knowledgebase(UniProt). In the Query box, type “ hemoglobin beta”. Click “Search”. Step 3. This will take you to the beginning of the database. http://www.uniprot.org/uniprot/?query=hemoglobin&sort=score Step 4. Use the right-hand scroll bar to scroll through the names of the many entries. Find one for either a bird, a bat, or some other mammal. When you find one, check to make sure that it is the hemoglobin beta chain (preferably one without a number after it) and not the alpha or gamma or other hemoglobin subunit. If the sequence is for the beta chain and it is for an appropriate species, click on it and the computer will retrieve the sequence. Step 5. Once you have selected your organisms, then hit Retreive at the bottom of the page. Step6. Once you are at the next page, you will see the UniProt Identifiers Step 7. Hit Align at the bottom of the page. Scroll down to see what was found in the Uniprot database. a. You will see Entry results. This is a list of those organisms for which you wish to compare the sequence of amino acids for hemoglobin beta protein. The Accession number is the UniProt Identifier. b.
Second, scroll down to ClustalW results. This is an alignment of the two organisms you wish to compare hemoglobin beta protein molecule. The Uniprot Identifiers on the left and the sequences are listed and matched to the right. Each letter in the list represents one amino acid. Please see Appendix A for the list. Now, go back up to Entry results.
2
c.
Scrolling down a bit, you will see a box in which may enter additional sequences in the box. Please note, that FASTA format is the acceptable format for this program. This program automatically places your requested sequences into this FASTA format. There is a manual way to do this as well. I will show you how to do this if you if you would like.
d.
Scroll down to Amino acid properties. You can select one or multiple properties of the amino acids and different colors will show up on the ClustalW results.
e.
Scroll down to Sequence annotation (Features) and you will see other characteristics of the protein that you can compare (e.g. location of Active Sites, a Helix turn, a Metal binding site, etc.)
Step 8. Hit Start Jalview. Using the various menus at the top you can see some similarities among the two different protein molecules
Step 8. Hit UniProtKB (#) WHERE??? Step 9. Align at bottom of page P02 070 MLTA EEK AAV TAF WG KV KVDE VGG EAL GRL LV VY PWTQ RFF ESF GDL ST AD AVMN NPK VK HBB_ BOV IN P020 75 MLTA EEK AAV TGF WG KV KVDE VGA EAL GRL LV VY PWTQ RFF EHF GDL SN AD AVMN NPK VK
60
60
1 0 . C h ec k o u t Cl u s talI r es u lt a nd Sta r t J a l vi e w 1 1 . B la st S ear c h : h ttp :/ / www. u n i p r o t .o r g/ b la st / Ad d O N LY p r o te in s eq ue nc e to M LT AE E K AAVT GFW G KV KV DE V G AE AL G R L L VV YP W T QR F FE H FG D LS N AD AVM N NP KV K AH GK KV L D SF SN GM K H LD D L KGT F AQ L SE L H CD K L HV DP E N F R L L GN V LV VV L AR H H GNE FT P VL Q AD FQ KV V AG V AN AL AH KY H M LT AE E K AAVT AFW G KV KV DE V GG E AL G R L L VV YP W T QR F FE S F GD L ST AD AV M N NP KV K AH GK KV L D SF SN GM K H LD D L KGT F AAL S E L H CD K L HV DP E N FK L L G NV L VV V L AR N FG KE FT P VL Q AD FQ KV V AG V AN AL AH R YH Hit Retr ie v e = id e n ti f ier s i n b o x Hit B la s t = e mp t y b o x . Ad d o nl y p r o te i n seq ue n ce l et ter s
Hit B la s t a t r i g h t h ttp : // w w w .u n ip ro t .o r g /b la s t/u n ip r o t/ QB T 2
3
Step 7. The next screen contains lots of information. The protein sequence is near the bottom of the information sheet in the “Sequence information” section (see Figure E for example). Using the right-hand scroll bar, find the amino-acid sequence. The amino acids are indicated with their single-letter symbols (see Figure F for their full names; found on page 6) and every 10th amino acid is marked with its position.
Step 8. Here is a sample of that species information for the beta hemoglobin sequence for goldfish Figure E.
P02140-1 [UniParc].
FASTA
147
16,210
Blast
go
Last modified July 21, 1986. Version 1. Checksum: 32F6EA73A1D52497
10 20 30 40 50 60 VEWTDAERSA IIGLWGKLNP DELGPQALAR CLIVYPWTQR YFATFGNLSS PAAIMGNPKV 70 80 90 100 110 120 AAHGRTVMGG LERAIKNMDN IKATYAPLSV MHSEKLHVDP DNFRLLADCI TVCAAMKFGP 130 140 SGFNADVQEA WQKFLSVVVS ALCRQYH
Step 9. Above the sequence click on the “FASTA” format. This will simply provide you with the condensed sequence for that species, along with the species identification. >sp|P02140|HBB_CARAU Hemoglobin subunit beta OS=Carassius auratus GN=hbb PE=1 SV=1 VEWTDAERSAIIGLWGKLNPDELGPQALARCLIVYPWTQRYFATFGNLSSPAAIMGNPKV AAHGRTVMGGLERAIKNMDNIKATYAPLSVMHSEKLHVDPDNFRLLADCITVCAAMKFGP SGFNADVQEAWQKFLSVVVSALCRQYH Step 10. Use your mouse to select and copy the information. Start a new Word document, and paste the information into that Word document. Name the document and SAVE! Step 11. Repeat the above steps until your Word document sheet contains the FASTA formatted sequences for six different species: two bird species, two bat species, two non-bat mammals. Write the names of the species you have chosen into Table 2 Step 12. Save your Word document but do not close it. Step 13. To align the sequences and determine how similar they are, go to an internet alignment program, e.g. “LALIGN” at http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=lalign (Figure G). There are other alignment programs e.g. http://www.ch.embnet.org/software/LALIGN_form.html (change the default to global). Go ahead and play with them.
4
Step 14. Select, copy and paste one sequence from your Word document into the first sequence box and another sequence into the second sequence box as shown in Figure G. It is best to copy only the sequence and not any of the identification information. Keep track of what you entered.
Figure G. LALIGN: Sequence Alignment Tool, with two sequences to compare
Step 15.
Click on “Align Sequence”. Be patient.
Step 16. The computer will return a set of information including the “percent identity in the 146 aa overlap” (Figure H). Record that piece of information in the Table 3 grid. This value is essentially the percent of amino acids that are similar. If all the amino acids were the same, the percent would be 100%. Not only does LALIGN give you the percent similarity, it also shows you the actual alignment of the two sequences. Identical amino acids are marked with two dots between them (:) If there is one dot, the change in amino acid is conservative (both amino acids have similar properties and charge), and if there are no dots, then the two amino acids have different biochemical properties. Step 17. A distance matrix is a table that shows all the pairwise comparisons between species. Continue to make all pairwise comparisons until Table 3 is filled. For each comparison, use the percent identity for the overlap of all the 146 amino acids.
Figure H. LALIGN: Sample Alignment Analysis Results
5
Step 18. Use Table 3 to answer the six questions which follow.
Table 3 The distance matrix for Part I Bat 1 Bat 2 Bat 1 100% Bat 2 100% Bird 1 Bird 2 Mammal 1 Mammal 2 Shaded boxes are simply repeat data.
Bird 1
Bird 2
Mammal 1
Mammal 2
100% 100% 100% 100%
6
Figure A. Amino Acid Symbols
7
Procedure Part II “Reptiles with Feathers?� You may complete as a team. Some phylogenetic systematists (scientists who work to make the classification of organisms match their evolutionary history) complain that the vertebrate class Reptilia is improper because it should include birds. In technical terms, the vertebrate class Reptilia is paraphyletic because it contains some but not all of the species that arose from the most recent common ancestor to this group. Just how similar are reptiles and birds in terms of the beta-hemoglobin chain? Should birds be considered a type of reptile? Evaluate this question using a BLAST (Best Local Alignment Search Tool) search. A BLAST (Best Local Alignment Search Tool) search takes a particular sequence and then locates the most similar sequences in the entire database. A BLAST search will result in a list of sequences with the first sequence being most close to the one entered and the last sequence being least similar Step 1. Repeat steps in Procedure Part I Step 2. Hit BLAST on upper menu Step 3. Add FASTA formatted amino acids sequence of a specific organism to response box. Step 4. Click BLAST on right side of box. If you scroll down you will see information about % identity of the particular protein in other organisms. Table 6: Results of a BLAST search on the crocodile beta-hemoglobin sequence Similarity Species name & name of protein First most similar (do not use crocodile) Second most familiar Third most familiar Fourth most familiar Fifth most familiar Sixth most familiar Seventh most familiar Eighth most familiar Ninth most familiar Tenth most familiar 1. 2. 3. 4. 5.
Were any of those species birds? One unusual reptile is the tuatara, whose name is Sphenodon punctatus. How similar is the tuatara to the crocodile? Does the tuatara appear in your list of ten? If not, how far down on the BLAST search list does it occur, fifteenth, twentieth? Most importantly, which species are more similar to the crocodile? (birds, or other reptiles?) Do the molecular data suggest that Reptilia is paraphyletic, or monophyletic? Explain.
8
Part III To be completed individually Purpose: To determine the relative phylogenetic proximity of the canid genus: grey wolf, domestic dog, red fox, jackal. Hypothesis: The beta hemoglobin protein sequences among the four canid species suggests a phylogenetic relationship among the four canid species. Using the tools from Parts I and II, suggest what you think are the evolutionary relationships among the four canids. Below is a suggestion as to how you can develop a hypothesis.
A B C D
Materials: Procedure: See Parts I and II above. Experimental Design Questions 1. Control/ 2. DV/IV? 3. Extraneous Factors? 4. Repeat Data? 5. What will be measured? Data: Develop data charts. Analysis What does your data show about the relationships among the four animals in question? Explain. Conclusion Do your data support or not support your hypothesis? Cite specific reference to data. Error Analysis:
9
10