MAY 2019 VOL 5 ISSUE 5
“Science is beautiful when it
makes simple explanations of phenomena or connections between different observations. Examples include the double helix in biology and the fundamental equations of physics.� -
How to perform blind docking using AutoDock Vina?
Stephen Hawking
A workshop on Genome Editing Tools and Techniques
Public Service Ad sponsored by IQLBioinformatics
Contents
May 2019
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
05
03 Tools How to perform blind docking using AutoDock Vina? 06
05 Bioinformatics Programming How to read fasta sequences from a file using PHP? 11
04 Bioinformatics News A workshop on Genome Editing Tools and Techniques 10
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR PRAKASH JHA NABAJIT DAS
REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com
PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
"What is the scope of bioinformatics?" Do we really need to ask this?
EDITORIAL
"What is the scope of bioinformatics?" This is the question which is most frequently asked by some students and scholars. The real question is do we really need to ask this? Bioinformatics is an interdisciplinary field including computer science, chemistry, mathematics, physics, and many disciplines.
Muniba Faiza
Founding Editor
Bioinformatics is a fast emerging field which has been grown immensely in the last decade. There are more than a thousand databases and multiple bioinformatics tools/software available which are frequently used to extract or develop new information for further use. We are capable of visualizing the biological data easily, ease tedious tasks, develop advanced methods, study the phylogeny of organisms, solve essential problems, and so on. With the application of programming skills in bioinformatics, new innovations can be made easily. Though there are a few limitations in bioinformatics such as lack of data connectivity, redundant data, accurate prediction of protein-protein interaction. These limitations can be overcome by developing and applying new methods and techniques. For example, a few years ago, bioNerDS [1], a recognition system for databases' and software names have been developed which is capable of identifying mentions of named-entities in the literature. This is helpful in exploring various things in bioinformatics on a single platform. Exploration is another aspect which is quite helpful in answering that question as exploration leads to learning which further leads to innovation. The things which exist now were non-existent once but the curiosity and exploration are the reasons for these realities. It is not about the scope, it's about the possibilities and discovering new opportunities.
Letters and responses: info@bioinformaticsreview.com
It is better to find solutions to the existing problems and overcoming the limitations either by utilizing the available resources or inventing new ones. That's what science and research are all about. Please write to us at info@bioinformaticreview.com. With best wishes!
EDITORIAL
Bioinformatics Review (BiR)
TOOLS
How to perform blind docking using AutoDock Vina? Image Credit: Stock photos
“Blind docking is done when the catalytic/binding residues are unknown in a protein, hence, the binding pocket is unknown. lind docking is done when the catalytic/binding residues are unknown in a protein, hence, the binding pocket is unknown. In a previous article, we showed how to perform site-specific docking using Autodock Vina, where we bound a ligand in a catalytic pocket of a protein. Now, this article will cover the blind docking using Autodock Vina.
B
We are docking the same protein Human Serum Albumin (HSA) with a ligand Sodium Octanoate (SO) but since the HSA is already complexed with 3-carboxy-4-methyl-5-propylfuranpropanoic acid (CMPF), therefore, it should be removed first leaving the only protein.
As mentioned previously, we need the following files prepared for docking with AutoDock Vina:
Preparation of PDB file before docking
The structure we are using is a crystal structure complexed with a ligand, therefore, in order to know the binding position of our ligand, we need to empty all the binding pockets by removing the bound ligand which can be done by deleting all hetatoms from the PDB file. If we will dock our ligand without removing the already complexed ligand, then we will not get the correct results. We can also easily remove ligand by visualizing the protein in PyMol.
1. Download a protein crystal structure from PDB. We are using Human Serum Albumin complexed with 3-carboxy-4methyl-5-propyl-furanpropanoic acid (CMPF) (PDB ID: 2BXA).
3. After removing hetatoms, we will keep only one of the four chains (here, Chain A was taken) and remove the rest of the three chains and save this file as “protein.pdb”
2. Open the PDB file and remove HETATOMS.
The chains are removed from the protein structure just to avoid complexity. But remember to read
1. Pdbqt files of protein and the ligand 2. Configuration file 3. Grid file
Bioinformatics Review | 6
about the structure of your protein to know what chains are necessarily involved in the protein functioning. 4. Now save the “protein.pdb”.
file
as
Now we have prepared our protein structure to proceed further for docking. Now we will prepare our ligand which we want to dock with the protein. Preparation of ligand before docking 5. Open PubChem (www.pubchem.ncbi.nlm.nih.go v) and search for the compound. We are using “sodium octanoate” as a ligand. We can download the structure from the ZINC database also. 6. Click on Sodium octanoate and look under “3D Structure” section, click on “Download” and then you will see four different formats for downloading it. We will download the .SDF format. 7. Since we need the protein and the ligand to be in a .pdb format, therefore, we have to convert .SDF to .pdb. We will use PyMol for this purpose and never use online converters because they may ruin your ligand file. 8. Open PyMol, and open the downloaded ligand. Click on
“File” --> “Save Molecule” --> select the molecule --> click “OK”. You can save it to your desired folder. We will rename the ligand as “SO.pdb” just to avoid any kind of confusion. Now we have a PDB file of protein and that of the ligand. In order to perform docking, we need to prepare .pdbqt files from the .pdb files of the protein and the ligand, because docking through AutoDock Vina requires .pdbqt file format to dock. Preparation of .pdbqt files First, we will prepare a .pdbqt file of the ligand. 1. Open AutoDock Vina --> click “Ligand” --> click “Input”--> click “Open” It will ask to select your ligand, we will go to the folder where we have saved our ligand’s .pdb file and click “SO.pdb”. 2. Click “Ligand” --> click “Torsion Tree” --> click “Detect Root”. It will show the torsion angle on the ligand from where it can be rotated. 3. Click “Ligand” --> click “Output” -> Click “Save as PDBQT”. We can rename the ligand, but we will use the same name as before and will
name it as “SO.pdbqt” and save it in the same folder. We have prepared a .pdbqt file of the ligand and now we will prepare the protein file. 4. Open AutoDock Vina, click “File” --> click “Read Molecule” --> select protein.pdb. 5. We will delete water molecules from the protein as they can make unnecessary bonds with the ligand. Click “Edit” à click “Delete water”. 6. We will add polar hydrogens in order to avoid any empty group/ atom left in the protein. Click “Edit” --> click “Add Hydrogens” -> click “Polar only”. 7. We will save this file as .pdbqt, click “Grid” --> click “Macromolecule” --> click “Choose” --> select the “protein.pdb” --> click “OK”. It will ask for a folder to save, then save it as "protein.pdbqt", in the same folder where the pdbqt file of the ligand was saved. Again, since it is a tutorial for blind docking, there is no need to define binding residues. Defining Grid Box for docking
Bioinformatics Review | 7
In blind docking, we do not need to define a specific site in the protein for the ligand to bind, because we do not know the binding site, so we enclose the whole protein into the grid box. Make sure the whole protein fits inside the grid box.
AutoDock Vina requires an input configuration file which contains all the information of the parameters used in configuring the docking including the name of the protein and the ligand. The configuration is as follows:
1. Click “Grid” --> click “Grid Box”.
conf.txt
You will see a small window in which you can see x, y, and z coordinates. 2. Now try to adjust the grid box by scrolling the three coordinates, such that it covers all the selected residues. 3. After adjusting the grid box, click “File” --> click “Output Grid Dimension File” --> save this file as grid.txt in the same folder.
receptor= protein.pdbqt
center_x= 0.430
4. All the MGL_Tools, Autodock Tools, Python.exe (for Linux) and Autodock Vina setup files.
center_y= 6.575 center_z= -0.235
size_y= 94
Please keep in mind that you have named the files properly and kept all the setup files in the same folder otherwise you may get errors while running the docking. Linux
size_z= 72
grid.txt
Preparation of Configuration file
1. protein.pdbqt
3. conf.txt
5. Now close the Autodock Vina.
spacing 0.375 npts 66 56 54 center 4.402 -8.060 8.874
Put all the following in the same folder (i.e., dock):
2. SO.pdbqt
size_x= 72
protein
Perform Docking
ligand= SO.pdbqt
4. Click “File” --> Click “Close saving current”.
You will get the grid file as follows:
From “grid.txt” file, we have written the center_x, y, and z coordinates, and also the size_x, y, and z of the grid box. Save this file as “conf.txt”.
1. Open the terminal and enter into the “dock” folder. 2. Type the following command:
out= vina_outSO.pdbqt log= logSO.txt
./vina -–config conf.txt -–log logSO.txt 3. Press “enter”. Windows
exhaustiveness= 8
1. Open the command prompt and enter the folder where all the docking files are placed.
Bioinformatics Review | 8
2. Type the following command: vina -–config conf.txt -–log logSO.txt Vina Output After the successful docking, you will get a log file, which in this case is named “logSO.txt”.
References 1.
Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461.
The log file will look like this:
This file consists of all the poses generated by the AutoDock Vina along with their binding affinities and RMSD scores. In the Vina output log file, the first pose is considered as the best because it has more binding affinity than the other poses and without any RMSD value, but you can choose the appropriate pose and visualize it in PyMol viewer. Please share if you like this article! If you have any query, then feel free to contact me at muniba@bioinformaticsreview.com.
Bioinformatics Review | 9
BIOINFORMATICS NEWS
A workshop on Genome Editing Tools and Techniques Image Credit: Stock Photos
“The workshop is based on "Genome editing in Plants" at Biotech Auditorium, University of Delhi, South Campus, Dhaula Kuan, New Delhi 110 021 from May 27th to 31st 2019.�
D
epartment of Biotechnology (DBT), India is organizing a 5-day workshop on "CRISPR/Cas mediated genome editing in plants: Applications, tools, and experimental design" to address the strategies and methods involved in genome modification utilized as essential tools in research and development.
The workshop is based on "Genome editing in Plants" at Biotech Auditorium, University of Delhi, South Campus, Dhaula Kuan, New Delhi 110 021 from May 27th to 31st 2019. Applications are invited from Ph.D. students, Post-Docs & Young faculty (Professors/Scientists). For more details, click here.
This 5-days workshop is being organized along with "Indo-U.S. Science & Technology Forum (IUSSTF)" under the program "Indo-U.S. Genome Engineering/Editing Technology Initiative (GETin) [DBT-IUSSTF Indo-US GETin 2019]". Bioinformatics Review | 10
BIOINFORMATICS PROGRAMMING
How to read fasta sequences from a file using PHP? Image Credit: Stock Photos
“Here is a simple function in PHP to read fasta sequences from a file.�
H
ere is a simple function in PHP to read fasta sequences from a file.
Your multifasta "input.fasta".
$sequence['header'] = $line;
The rest of the script would go like this: <?php
input
file
is
function read_fastas($filename){ $fh = fopen($filename, 'r'); $i= 0;
} else{
$filename = "input.fasta"; //Define function
$sequence['sequence'] = $line;
function read_fastas($filename{
array_push($sequences, $sequence);
$sequences = array();
}
while($line = fgets($fh)){ $i++; if($i%2==1){
} return $sequences; }
$fh = fopen($filename, 'r'); if (filesize($x) == 0) { //check if file is empty or not echo "Input file is empty!"; } else{ $i= 0 ; Bioinformatics Review | 11
$sequences = array(); while($line = fgets($fh)){ $i++; if($i%2==1{ $sequence['header'] = $line; } else{ $sequence['sequence'] = $line; array_push($sequences, $sequence); } } return $sequences; } } //Call the function read_fastas(); //do something with your fasta
?>
Bioinformatics Review | 12
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 13
Bioinformatics Review | 14