NOVEMBER 2017 VOL 3 ISSUE 11
“The whole of science is nothing more than a refinement of everyday thinking.� -
Albert Einstein
How to perform protein structure modeling using ITasser stand-alone tool?
Protein structure modeling tutorial
Public Service Ad sponsored by IQLBioinformatics
Contents
November 2017
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
03 Tutorial How to perform protein structure modeling using I-Tasser stand-alone tool? 07
05
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR NABAJIT DAS REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
EDITORIAL
Bioinformatics Review (BiR): Bridging Between The Two Worlds Informatics and Biology are two sciences which are as different from each other as possible. One runs on the core concept of variation and another on strict reasoning. But still, these two have combined in a most natural way under the realm of “Bioinformatics”. For a biologist today it’s difficult to imagine a world without all biological databases and further no branch to decipher the huge enigma that it brings. Bioinformatics Review (BiR) journal is a platform to discover the latest happenings in this melting pot of two varied fields.
Dr. Roopam Sharma
Honorary Editor
The era of “omics” kick-started with the drafting of Human Genome Project (HGP) in 2003. Since then, a number of technological advancements especially, NGS has been generating mind-boggling data for the knowledge banks. Latest inventions like single-cell transcriptomics or metagenomics of most unusual habitats show how the evolution of technological advancements is directly resulting in breakthroughs in biological sciences. Among various areas of biology which has benefited from these advancements is Pathology. In fact, deciphering the molecular and genetic basis of diseases in humans was the guiding force behind human genome sequencing Project. Bioinformatics has led to an impressive increase in recognition of possible pathogenic factors in varied systems, so much so that new techniques are being devised to increase the speed to actually test these factors in the wet lab. If we consider computationally, smaller but ever-changing genomes and transcriptomes of these pathogens, make them a much suitable candidate to test out many hypotheses for Bioinformatics studies. Effector Bioinformatics involves building custom pipelines for distinct species based on characteristics of effectors and size of the genome involved. These can be based on
Letters and responses: info@bioinformaticsreview.com
EDITORIAL
Homology or feature extraction or both, e.g. discovery of RXLR motifs in Oomycete effectors allowed many more effectors to be identified. This collaboration of two sciences for plant pathology has led to the development of many general use platforms like Broad-Fungal Genome Initiative, EuPathDB, PhytoPath and so on, but there is much need of developing specified resources like PHIbase for specific areas like effector biology. The use of machinelearning techniques like artificial neural network approach (which is actually based on biological neural networks) really shows how the two branches are so distinct yet so intertwined. All in all, it’s a brave new world where artificial communication is not only stimulating but also helping us understand the communication (between host and pathogen) going within the realm of life. In this issue, BiR focusses on reviews related to some of the very basic techniques which have been used in computational biology and its applications in various biological studies. We look forward to continued support from our readers and contributors. For suggestions and feedback, do write to us at info@bioinformaticsreview.com
TUTORIAL
How to perform protein structure modeling using I-Tasser standalone tool? Image Credit: Stock Photos
“In this article, we will learn how to predict a protein structure using the I-Tasser standalone version.�
I
-Tasser stands for the iterative threading assembly refinement is a well-known tool for abinitio structure modeling of proteins [1]. It uses secondary-structure enhanced profile-profile threading alignment (PPA) [2] and iterative structure assembly simulations using a threading assembly refinement program [3]. I-Tasser is used for abinitio prediction when the similarity of a protein is quite low (<=30%). Mostly, the I-Tasser server [4] is used for this purpose, which can be easily accessed by registering with a valid institutional mail ID. In this article, we will learn how to predict a protein structure using the I-Tasser standalone version.
This article is being written by the demand of our esteemed readers and we are not going to get into more details of the algorithm applied by the I-Tasser (if you wish to know about the algorithm, drop me an email). The following sections will explain the downloading, installation, preparation, and submission of the query protein on a Linux platform. So, let's get started! Getting started It is good to update and upgrade your Ubuntu system first. Open the terminal by pressing Ctrl+Alt+T altogether and type the following commands:
$ sudo apt-get update $ sudo apt-get upgrade
Downloading the Suite package For downloading the suite package you must have to be registered on ITasser website and have to request a password for non-commercial use of the software. After getting the password, you will be able to log in and download the latest version of the package available. Installation Open the terminal and enter the directory (let's say Downloads) where you downloaded the package and unpack it by typing the following commands in the terminal. Bioinformatics Review | 7
$ cd Downloads $ tar -xvjf TASSER5.1.tar.bz2
I-
It will create a new folder named ITasser5.1 in the Downloads directory, enter the folder and you will find a Perl script named 'download_lib.pl'. Run this script in the terminal from the same directory to download important libraries, it will take a while to finish.
3. Save the same sequence file in the I-Tasser5.1 folder also. Submitting the job
-GO true -EC true -LBS true
Now you can submit your query sequence for structure prediction by using the run I-Tasser.pl script present in the I-Tassermod folder. So, enter this folder and write the following commands:
After pressing enter, your job will be submitted. I-Tasser runs many simulations on the protein so it could take days to finish one job, in my case, it was finished in 7 days. After the job will be finished, you will be able to see the PDB file for the query protein which you can analyze with a molecular viewer such as PyMol [5].
$ cd Tassermod
I-Tasser5.1/I-
$ cd I-Tasser5.1 $ ./download_lib.pl -P true -B true -N true
After downloading all the important libraries, a new folder will be generated named 'libdir' inside the ITasser5.1 directory. Now you need to prepare your input file as explained in the following section. Preparing the input 1. Create a directory, say example, in the I-Tasser5.1 folder which is required to save the query protein sequence and the output files. 2. Let's save this query protein sequence as 'seq.fasta' (the sequence must be in fasta format only and the residues should not be more than 1500).
a file present in the I-Tasser5.1 folder.
$ sudo ./runI-Tasser.pl libdir /home/username/Downloads/ITasser5.1/libdir -seqname protein -datadir /home/username/Downloads/ITasser5.1/example
For any query, you can comment below, or write me at tariq@bioinformaticsreview.com. References 1.
Roy, A., Kucukural, A., & Zhang, Y. (2010). ITASSER: a unified platform for automated protein structure and function prediction. Nature protocols, 5(4), 725-738.
2.
Wu, S., & Zhang, Y. (2007). LOMETS: a local meta-threading-server for protein structure prediction. Nucleic acids research, 35(10), 3375-3382.
3.
Zhang, Y., & Skolnick, J. (2004). Automated structure prediction of weakly homologous proteins on a genomic scale. Proceedings of the National Academy of Sciences of the United States of America, 101(20), 75947599.
4.
Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. BMC bioinformatics, 9(1), 40.
5.
DeLano, W. L. (2002). The PyMOL molecular graphics system. http://pymol. org.
-seqname is the name of your query protein file you saved in the ITassermod folder (i.e., protein). -libdir is the folder for libraries which were downloaded earlier, write the full path to this folder. -datadir is the folder where you have saved your query sequence (i.e., seq.fasta), write the full path to this folder. There are many other options which you can specify for your job, e.g., to predict the gene ontology, EC number, ligand binding site, and so on. You can find these arguments in
Bioinformatics Review | 8
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 9
Bioinformatics Review | 10