DECEMBER 2017 VOL 3 ISSUE 12
“Science is what you know, philosophy is what you don't know.” -
Bertrand Russell
ab-initio prediction of protein structure: An introduction
Protein structure prediction
Public Service Ad sponsored by IQLBioinformatics
Contents
December 2017
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
03 Structural Bioinformatics ab-initio prediction of protein structure: An introduction 07
05
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR NABAJIT DAS REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
EDITORIAL
Bioinformatics Review (BiR): Bridging Between The Two Worlds Informatics and Biology are two sciences which are as different from each other as possible. One runs on the core concept of variation and another on strict reasoning. But still, these two have combined in a most natural way under the realm of “Bioinformatics”. For a biologist today it’s difficult to imagine a world without all biological databases and further no branch to decipher the huge enigma that it brings. Bioinformatics Review (BiR) journal is a platform to discover the latest happenings in this melting pot of two varied fields.
Dr. Roopam Sharma
Honorary Editor
The era of “omics” kick-started with the drafting of Human Genome Project (HGP) in 2003. Since then, a number of technological advancements especially, NGS has been generating mind-boggling data for the knowledge banks. Latest inventions like single-cell transcriptomics or metagenomics of most unusual habitats show how the evolution of technological advancements is directly resulting in breakthroughs in biological sciences. Among various areas of biology which has benefited from these advancements is Pathology. In fact, deciphering the molecular and genetic basis of diseases in humans was the guiding force behind human genome sequencing Project. Bioinformatics has led to an impressive increase in recognition of possible pathogenic factors in varied systems, so much so that new techniques are being devised to increase the speed to actually test these factors in the wet lab. If we consider computationally, smaller but ever-changing genomes and transcriptomes of these pathogens, make them a much suitable candidate to test out many hypotheses for Bioinformatics studies. Effector Bioinformatics involves building custom pipelines for distinct species based on characteristics of effectors and size of the genome involved. These can be based on
Letters and responses: info@bioinformaticsreview.com
EDITORIAL
Homology or feature extraction or both, e.g. discovery of RXLR motifs in Oomycete effectors allowed many more effectors to be identified. This collaboration of two sciences for plant pathology has led to the development of many general use platforms like Broad-Fungal Genome Initiative, EuPathDB, PhytoPath and so on, but there is much need of developing specified resources like PHIbase for specific areas like effector biology. The use of machinelearning techniques like artificial neural network approach (which is actually based on biological neural networks) really shows how the two branches are so distinct yet so intertwined. All in all, it’s a brave new world where artificial communication is not only stimulating but also helping us understand the communication (between host and pathogen) going within the realm of life. In this issue, BiR focusses on reviews related to some of the very basic techniques which have been used in computational biology and its applications in various biological studies. We look forward to continued support from our readers and contributors. For suggestions and feedback, do write to us at info@bioinformaticsreview.com
STRUCTURAL BIOINFORMATICS
ab-initio prediction of protein structure: An introduction Image Credit: Stock Photos
“The ab-initio method is based on the thermodynamic hypothesis proposed by Anfinsen [4], according to which the native structure corresponds to the global free energy minimum under a given set of conditions.�
W
e have heard a lot about the ab-initio term in Bioinformatics, which could be difficult to understand for newbies in the field of bioinformatics. Today, we will discuss in detail what ab-initio is and what are the applicable methods for it. First of all, let's get familiar with the literal meaning of the term abinitio, it means 'from the scratch'. This term is applied in the context of the protein structure prediction in bioinformatics, which is quite useful. Actually, ab-initio is one of the methods to predict a protein structure, which in case not available in protein data bank (PDB) [1]. There
are basically three methods to predict a protein's structure: a) homology modeling b) ab-initio c) threading Homology modeling method is applied when there is a sufficient amount of similarity between the protein (structure to be predicted) and the template (whose structure have been determined already). But in the other case, when the similarity between the two is quite low, then the ab-initio method is applied. Although homology modeling aims to find a template protein which is evolutionary related to the query protein sequence. Threading is a
little similar to homology modeling in the sense that it predicts the structure by recognizing the folds of the template and it aims to detect the evolutionary-related proteins and analogous folds, so we can say they are template-based methods. The homology modeling and threading both are capable of predicting protein structures with high-resolution folds based on the searched templates, but they suffer a few limitations that the native topology for the query sequence must have been solved, and new folds cannot be predicted using these two approaches. The ab-initio method is often preferred for structure prediction when there is no or very low amount
Bioinformatics Review | 7
of similarity for the protein (let's say query protein sequence). It is the most difficult [2,3] and general approach where the query protein is folded with a random conformation. The ab-initio method is based on the thermodynamic hypothesis proposed by Anfinsen [4], according to which the native structure corresponds to the global free energy minimum under a given set of conditions. There are several ab-initio structure prediction approaches available such as ROSETTA [5], TOUCHSTONE-II [6], and the most widely preferred I-Tasser [7,8]. These approaches are based on the Monte-Carlo algorithm [9,10]. It has been found that I-Tasser outperforms the ROSETTA and TOUCHSTONE-II approaches with a far lower CPU cost [11]. The ab-initio modeling is often termed as de-novo modeling [12], physics-based modeling [13], or free modeling [14]. The basic protocol followed by the ab-initio method of the protein structure prediction starts with the primary amino acid sequence which is searched for the different conformations leading to the prediction of native folds. After the folds have been recognized and predicted, the model assessment is performed to verify the quality of
the predicted structure. ROSETTA and I-Tasser follow the enhanced methodology for ab-initio prediction of a protein. ROSETTA prediction begins with the identification of small fragments (3mers and 9 mers) from the structure databases that have consistency with local sequence preferences. After that, all the fragments are assembled into models with global properties followed by the assessment of the models using a scoring function from decoy population [5]. The protocol followed by the I-Tasser includes threading along with the ab-initio method [6,7]. I-Tasser program is based on the secondary-structure enhanced Profile-Profile threading Alignment (PPA) [15] and the iterative implementation of the Threading ASSEmbly Refinement (TASSER) program [16]. The details of the I-Tasser program can be read here. We will be discussing other protein structure methods in detail in the upcoming articles. References 1.
Protein data bank (www.rcsb.org)
2.
Lu, L., Lu, H., & Skolnick, J. (2002). MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins:
Structure, Function, Bioinformatics, 49(3), 350-364.
and
3.
Floudas, C. A., Fung, H. K., McAllister, S. R., Mรถnnigmann, M., & Rajgaria, R. (2006). Advances in protein structure prediction and de novo protein design: A review. Chemical Engineering Science, 61(3), 966-988.
4.
Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science, 181(4096), 223-230.
5.
Simons, K. T., Bonneau, R., Ruczinski, I., & Baker, D. (1999). Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins: Structure, Function, and Bioinformatics, 37(S3), 171-176.
6.
Zhang, Y., Kolinski, A., & Skolnick, J. (2003). TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophysical journal, 85(2), 11451164.
7.
Wu, S., Skolnick, J., & Zhang, Y. (2007). Ab initio modeling of small proteins by iterative TASSER simulations. BMC biology, 5(1), 17.
8.
Zhang, Y. (2008). Progress and challenges in protein structure prediction. Current opinion in structural biology, 18(3), 342348.
9.
Simons, K. T., Kooperberg, C., Huang, E., & Baker, D. (1997). Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of molecular biology, 268(1), 209-225.
10. Hansmann, U. H., & Okamoto, Y. (1999). New Monte Carlo algorithms for protein folding. Current opinion in structural biology, 9(2), 177-183. 11. Wu, S., Skolnick, J., & Zhang, Y. (2007). Ab initio modeling of small proteins by iterative TASSER simulations. BMC biology, 5(1), 17.
Bioinformatics Review | 8
12. Bradley, P., Misura, K. M., & Baker, D. (2005). Toward high-resolution de novo structure prediction for small proteins. Science, 309(5742), 1868-1871. 13. OĹ‚dziej, S., Czaplewski, C., Liwo, A., Chinchio, M., Nanias, M., Vila, J. A., ... & Schafroth, H. D. (2005). Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind
tests. Proceedings of the National Academy of Sciences of the United States of America, 102(21), 7547-7552. 14. Jauch, R., Yeo, H. C., Kolatkar, P. R., & Clarke, N. D. (2007). Assessment of CASP7 structure predictions for template free targets. Proteins: Structure, Function, and Bioinformatics, 69(S8), 57-67.
15. Wu, S., & Zhang, Y. (2007). LOMETS: a local meta-threading-server for protein structure prediction. Nucleic acids research, 35(10), 3375-3382. 16. Zhang, Y., & Skolnick, J. (2004). Automated structure prediction of weakly homologous proteins on a genomic scale. Proceedings of the National Academy of Sciences of the United States of America, 101(20), 75947599.
Bioinformatics Review | 9
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 10
Bioinformatics Review | 11