FEBRUARY 2018 VOL 4 ISSUE 2
“Mars is the only place in the solar system where it's possible for life to become multi-planetarian.� -
Elon Musk
Benchmark databases for multiple sequence alignment: An overview
Multiple sequence alignment
Public Service Ad sponsored by IQLBioinformatics
Contents
February 2018
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
03 Sequence Analysis Benchmark databases for multiple sequence alignment: An overview 07
05
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR NABAJIT DAS
REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
Bioinformatics- A broad future ahead: Editorial
EDITORIAL
It has been a wonderful time since BiR came into existence. As we enter a new year, BiR tries to look forward towards the development and wonderful achievements and providing the best knowledge regarding bioinformatics. In the past two years, BiR has hit a long road from a few readers to several thousand.
Muniba Faiza
Founding Editor
Every complimentary and appreciation mail we get feels like an achievement for us. Bioinformatics has got a great future ahead of it with a better understanding and precise methodologies for both dry and the wet lab experimentations. In the last two years, BiR has advanced in many aspects. We have come up with an android app which helps our readers to stay connected with the latest updates, our articles have started to appear in Google Scholar, we get a lot of cherishing emails, and collaboration proposals. BiR is trying to broaden the horizons by covering different domains of bioinformatics. Since bioinformatics is multidisciplinary, to date, the team of BiR has tried to go through almost every aspect of it including big data, sequence analysis, structural bioinformatics, data mining, tools, software, biostatistics, and so on. This year BiR is more focused to provide a rich content to our readers and help to understand the concepts of bioinformatics more easily. The team of BiR is trying to reach to the students to encourage them for their career in bioinformatics and to the researchers currently working in the same area. The last internship at BiR was a great success and we got an amazing response from our interns. We are looking forward to presenting our work at school and college level to introduce this to the young minds who are more fascinated by the technology. We have such a long road to drive on which is not possible without the support of our readers, subscribers, and contributors. We are thankful to our readers wholeheartedly for their support and suggestions and wish them a very happy and prosperous new year
Letters and responses: info@bioinformaticsreview.com
with new hopes and great achievements. We would like to hear your thoughts and feedback about BiR, and what other kinds of articles you would like to read.
EDITORIAL
Please write us at info@bioinformaticsrevew.com
SEQUENCE ANALYSIS
Benchmark databases for multiple sequence alignment: An overview Image Credit: Stock Photos
“There are various benchmark databases available amongst which BAliBASE (Benchmark alignment database) [6-8] is the most widely used. BAliBASE is created by combining automated and manual methods and provides a variety of reference alignment sets such as repeats, circular permutations, sequences with highly divergent orphans, N/C terminal extensions and so on.� ultiple sequence alignment (MSA) is a very crucial step in most of the molecular analyses and evolutionary studies. Many MSA programs have been developed so far based on different approaches which attempt to provide optimal alignment with high accuracy. Basic algorithms employed to develop MSA programs include progressive algorithm [1], iterative-based [2], and consistency-based algorithm [3]. Some of the programs incorporate several other methods into the process of creating an optimal alignment such as MCOFFEE [4] and PCMA [5].
M
An MSA program outperforms the other in different aspects with different accuracy levels. The assessment of accuracy and efficiency of these MSA programs is done on the basis of benchmark databases. These benchmark databases are either manually created or semi-automatedly generated and developed on the basis of protein structure alignment. Since multiple structure alignment is complex, therefore, the pairwise structure alignment is preferred. The alignments created by MSA programs are compared to the
reference alignment sets provided by these benchmark databases. There are various benchmark databases available amongst which BAliBASE (Benchmark alignment database) [6-8] is the most widely used. BAliBASE is created by combining automated and manual methods and provides a variety of reference alignment sets such as repeats, circular permutations, sequences with highly divergent orphans, N/C terminal extensions and so on. HOMSTRAD (Homologous structure alignment database) [911] is another database of protein structure alignments which is frequently used as a benchmark Bioinformatics Review | 7
database though was not created for this purpose. Several other benchmarks have been developed in the last decade which includes OXBench [12], PREFAB (Protein reference alignment benchmark) [13], SABmark (Sequence alignment benchmark) [14], and IRMBASE (Implanted rose motifs base) [15]. Most of the reference alignments in these benchmark databases are globally aligned and measure sensitivity (i.e., number of correctly aligned positions) instead of calculating specificity. The IRMBASE benchmark is comprised of simulated conserved motifs inserted/deleted/substituted manually with the help of a software called ROSE [16]. The manually simulated sequences give correct multiple alignments with known evolution, which is used to assess the capability of MSA programs to detect isolated motifs within the sequences [15]. The evaluation of the MSA programs is done on the basis of some scores such as Sum-of-Pair (SP) score, column score, maximum-likelihood, minimum entropy, consensus, and star, calculated by the reference alignment databases. The most widely used evaluation function is the SP score used for the assessment of the MSA programs. The
evaluation functions of the MSA programs will be discussed in detail in the upcoming article. For further reading kindly refer to the references given below. For any other query write to muniba@bioinformaticsreview.com . References 1.
2.
Fitch, W. M., & Yasunobu, K. T. (1975). Phylogenies from amino acid sequences aligned with gaps: The problem of gap weighting. Journal of Molecular Evolution, 5(1), 1–24. https://doi.org/10.1007/BF01732010 Berger, M. P., & Munson, P. J. (1991). A novel randomized iterative strategy for aligning multiple protein sequences. Bioinformatics, 7(4), 479–484. https://doi.org/10.1093/bioinformatics/7.4. 479
3.
Gotoh, O. (1990). Consistency of optimal sequence alignments. Bulletin of Mathematical Biology, 52(4), 509–525. https://doi.org/10.1016/S00928240(05)80359-3
4.
Wallace, I. M., O’Sullivan, O., Higgins, D. G., & Notredame, C. (2006). M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Research, 34(6), 1692–1699. https://doi.org/10.1093/nar/gkl091
5.
6.
Pei, J., Sadreyev, R., & Grishin, N. V. (2003). PCMA: fast and accurate multiple sequence alignment based on profile consistency. BIOINFORMATICS APPLICATIONS NOTE, 19(3), 427–428. https://doi.org/10.1093/bioinformatics/btg 008 Thompson, J., Plewniak, F., & Poch, O. (1999). BAliBASE: a benchmark alignment database for the evaluation of multiple
alignment programs. Bioinformatics, 15(1), 87–88. https://doi.org/10.1093/bioinformatics/15. 1.87 7.
Bahr, A., Thompson, J. D., Thierry, J.-C., & Poch, O. (2001). BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations. Nucleic Acids Research, 29(1), 323–326. https://doi.org/10.1093/nar/29.1.323
8.
Thompson, J. D., Koehl, P., Ripp, R., & Poch, O. (2005). BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins: Structure, Function, and Bioinformatics, 61(1), 127– 136. https://doi.org/10.1002/prot.20527
9.
Mizuguchi, K., Deane, C. M., Blundell, T. L., & Overington, J. P. (1998). HOMSTRAD: A database of protein structure alignments for homologous families. Protein Science, 7(11), 2469–2471. https://doi.org/10.1002/pro.5560071126
10. De Bakker, P. I. W., Bateman, A., Burke, D. F., Miguel, R. N., Mizuguchi, K., Shi, J., … Blundell, T. L. (2001). HOMSTRAD: Adding sequence information to structure-based alignments of homologous protein families. Bioinformatics, 17(8), 748–749. https://doi.org/10.1093/bioinformatics/17. 8.748 11. Stebbings, L. A., & Mizuguchi, K. (2004). HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database. Nucleic Acids Research, 32(Database issue), D203–D207. 12. Raghava, G., & Searle, S. (2003). OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC. 13. Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340
Bioinformatics Review | 8
14. Wallace, I. M., Blackshields, G., & Higgins, D. G. (2005). Multiple sequence alignments. Current Opinion in Structural Biology, 15(3), 261–266. https://doi.org/10.1016/J.SBI.2005.04.00 2 15. Subramanian, A. R., Weyer-Menkhoff, J., Kaufmann, M., & Morgenstern, B. (2005). DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics, 6(1), 66. https://doi.org/10.1186/1471-2105-6-66 16. Stoye, J., Evers, D., & Meyer, F. (1998). Rose: generating sequence families. Bioinformatics, 14(2), 157–163. https://doi.org/10.1093/bioinformatics/1 4.2.157
Bioinformatics Review | 9
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 10
Bioinformatics Review | 11