OCTOBER 2018 VOL 4 ISSUE 10
“There is no law except the law that there is no law.” -
John Archibald Wheeler
McPAS-TCR: A database of TCR sequences associated with pathology and antigens
Biotite: A bioinformatics framework for sequence and structure data analysis
Public Service Ad sponsored by IQLBioinformatics
Contents
October 2018
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
05
03 Software Biotite: A bioinformatics framework for sequence and structure data analysis 08
06 Docking 04 Databases McPAS-TCR: A database of TCR sequences associated with pathology and antigens 10
05 Tutorial How to install Raccoon plugin on Ubuntu for virtual screening using Autodock? 12
Raccoon2: A GUI facilitating virtual screenings with Autodock and Autodock Vina 15
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR NABAJIT DAS
REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com
PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA) trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
EDITORIAL
Editorial: Need to reformulate the bioinformatics curricula at undergraduate and postgraduate level
Dr. Prashant Pant Editor-in-Chief Muniba Faiza Founding Editor
Bioinformatics is interdisciplinary of Computer science and Biological science, requires the knowledge of both these broad disciplines. It combines biophysics, statistics, maths, and chemistry to provide software and tools which help in understanding biological data. Computer science and biological science are very different broad disciplines of their own. We have different researchers who specialize either in computer science with a knowledge in biology or in biological science with a knowledge in computational biology. There are not enough scientists who are completely trained in bioinformatics, which leads to difficulty in finding appropriate scientists, researchers, scholars, and graduates in the academia and industry as well. As mentioned in one of the previous articles, both bioinformaticians and bioinformaticists are the main requirements in this field of study. The foundation of becoming a bioinformatician or a bioinformaticist is laid down at the undergraduate and postgraduate level. This brings us down to a question, "what should be the curricula for bioinformatics at the undergraduate and postgraduate level in order to provide welltrained bioinformatics professionals?" This is considered one of the challenges in the field of bioinformatics nowadays [1]. In the beginning, for the students with a knowledge in both the fields of biology and computers, it is easier to grasp the concepts of bioinformatics but for those students who haven't any idea about either of these disciplines, it is a bit difficult to work in the
Letters and responses: info@bioinformaticsreview.com
EDITORIAL
same. In most of the universities, students are given an introduction to the databases, tools in bioinformatics and an introduction to computer science. Besides, the basic concepts of bioinformatics, the methods and ideologies must be included in the courses such as the basic evolutionary concepts. Further, instead of just tossing some topics of biology and computer science, the curricula must include these topics by relating to the real world scenarios and problems researchers are trying to resolve. Besides, there should be more programming courses included in the syllabi such as data management, algorithms, computer languages, advantages, based on the interests of the students. There must be more training courses covering specific topics which could help the students to understand the theoretical concepts, the experimentation involved, and the software constructs available with pros and cons. There should be some workshops organized in the interest of the students to learn the basic and biggest challenges in the field of bioinformatics, the resources available to manage them, and need for new techniques and methodologies utilizing the in-silico resources or developing new ones. Bioinformatics provides various sources and tools to study and analyze biological data and helps in answering various important questions in the same. The students trying to build their careers in this field must be provided with advanced study and focus on the main objective of the field. This is the time to broaden the horizon with developed concepts related to the real world and how are they applied to solve the existing problems. Write us at info@bioinformaticsreview.com
SOFTWARE
Biotite: A bioinformatics framework for sequence and structure data analysis Image Credit: Stock photos
“Biotite is an efficient framework for bioinformatics analyses such as downloading files, reading and writing structural files, and their modification.” equence and structural data in bioinformatics are everincreasing and the need for its analysis is ever-demanding likewise. As Bioinformaticians analyze the data with their keen knowledge and reach important conclusions, similarly, bioinformaticists provide with the enhanced and advanced tools and software for data analysis.
S
There are some computational biology frameworks available for the structural data analysis of molecular dynamics simulation such as MDAnalysis [1] and MDTraj [2]. A new framework has been introduced known as Biotite, a Python package used to represent sequence and structure data [3].
The package is open source and freely available at GitHub (https://github.com/biotitedev/biotite). This package is simple to use especially for the beginners in programming and computationally efficient because of the implementation of Numpy and Cython. Biotite consists of four sub packages: sequence, structure, databases, and application. The sequence and structure modules serve for the analysis of sequence and structural data analysis respectively, database downloads files from the other databases such as RCSB PDB, and application provides interface for external software [3].
The sequence subpackage encodes each character of the sequence into a symbol code which is stored in a NumPy ndarray in the sequence object. The nucleotide and protein sequences can be read and written into FASTA format. Besides, sequences can be easily aligned globally [4] and locally [5] using dynamic programming and can be easily visualized according to the similarity percentage. The structure subpackage uses AtomArrayStack to represent multimodel three-dimensional structures of proteins which has a (m×n×3) coordinate ndarray with n number of atoms and m number of models, and easily parse the files in MMTF format [6]. It is also capable of loading Bioinformatics Review | 8
trajectories files of molecular dynamics simulation and can measure angles, dihedrals, and distances between the atoms. Besides, users can easily perform structure superimposition and calculate RMSD, RMSF, and secondary structure assignment. Biotite is an efficient framework for bioinformatics analyses such as downloading files, reading and writing structural files, and their modification. For further reading, click here. References 1.
2.
3.
4.
5.
6.
Michaud‐Agrawal, N., Denning, E. J., Woolf, T. B., & Beckstein, O. (2011). MDAnalysis: a toolkit for the analysis of molecular dynamics simulations. Journal of computational chemistry, 32(10), 2319-2327. McGibbon, R. T., Beauchamp, K. A., Harrigan, M. P., Klein, C., Swails, J. M., Hernández, C. X., ... & Pande, V. S. (2015). MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophysical journal, 109(8), 1528-1532. Kunzmann P., Hamacher K. (2018) Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics, 19 (1), 346 Waterman, M. S. (1981). Identification of common molecular subsequence. Mol. Biol, 147, 195-197. Gotoh, O. (1982). An improved algorithm for matching biological sequences. Journal of molecular biology, 162(3), 705-708. Bradley, A. R., Rose, A. S., Pavelka, A., Valasatava, Y., Duarte, J. M., Prlić, A., & Rose, P. W. (2017). MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures. PLoS computational biology, 13(6), e1005575.
Bioinformatics Review | 9
SYSTEMS BIOLOGY
McPAS-TCR: A database of TCR sequences associated with pathology and antigens Image Credit: Stock Photos
“This manually curated database can provide epitope specific TCR sequences easily and can be used for comparative analysis of the sequences recognizing the same epitope, disease-associated TCR sequences along with the links to the previous studies.”
T
-cells, also known as Tlymphocytes, play important role in the cellmediated immune system. T-cells consist of a T-cell receptor (TCR) at their surface which recognizes the foreign and selforigin antigens. These TCRs composed of two chains, namely, TCRα and TCRβ chains which are produced by a random process of DNA rearrangement [1,2]. These TCR repertoires are highly diverse ensuring an effective immune system which cannot be even captured in a blood sample [3]. Recent research has reported the mapping of TCR repertoires using high-throughput sequencing [4-6]
and most recently a manually curated database for TCR sequences is developed which are associated with pathology and the antigens [7]. Tickotsky et al., [7] have tried to link the TCR with its respective antigen to which it binds and recongnizes along with the pathological condition it causes such as cancer, infections, and other diseases in humans and mice. This database currently consists of more than 5000 pathologicallyassociated TCR sequences obtained from 118 published literatures. A web-based search platform is userfriendly and allows users to search for the annotated TCR sequences
using different criteria (https://rstudio.github.io/shinydash board/index.html) Fig. 1. The complete database is also available for downloading.
Fig. 1 Homepage of McPAS-TCR webbased search tool [7]. The TCR sequences are classified into the following categories:
TCRα and TCRβ chains
Sequence category:
Bioinformatics Review | 10
1. Pathogens–bacteria, and parasites,
viruses,
2. Autoimmune–sequences identified in tissues/T-cells from human and mice with an autoimmune condition, 3. Cancer–sequences identified in malignant tissues/T-cells of human origin, or in mice models of malignancies
comparative analysis of the sequences recognizing the same epitope, disease-associated TCR sequences along with the links to the previous studies. For further reading, click here. References 1.
2.
4. Allergy–sequences identified in allergic reactions to various allergens, and
3.
5. Other–sequences not classified to any of the above categories.
4.
Pathology
Additional details
Antigen identification method
Next-generation (NGS)
5.
sequencing
Antigen protein
Epitope protein
Major histocompatibility complex (MHC)
Tissue
Type of T-cell
PubMed ID
6.
7.
Davis, M. M., & Bjorkman, P. J. (1988). T-cell antigen receptor genes and T-cell recognition. Nature, 334(6181), 395. Bassing, C. H., Swat, W., & Alt, F. W. (2002). The mechanism and regulation of chromosomal V (D) J recombination. Cell, 109(2), S45-S55. Laydon, D. J., Bangham, C. R., & Asquith, B. (2015). Estimating T-cell repertoire diversity: limitations of classical estimators and a new approach. Phil. Trans. R. Soc. B, 370(1675), 20140291. Calis, J. J., & Rosenberg, B. R. (2014). Characterizing immune repertoires by high throughput sequencing: strategies and applications. Trends in immunology, 35(12), 581-590. Friedensohn, S., Khan, T. A., & Reddy, S. T. (2017). Advanced methodologies in highthroughput sequencing of immune repertoires. Trends in biotechnology, 35(3), 203-214. Heather, J. M., Ismail, M., Oakes, T., & Chain, B. (2017). High-throughput sequencing of the T-cell receptor repertoire: pitfalls and opportunities. Briefings in bioinformatics. Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E., & Friedman, N. (2017). McPAS-TCR: a manually curated catalogue of pathologyassociated T cell receptor sequences. Bioinformatics, 33(18), 29242929.
This manually curated database can provide epitope specific TCR sequences easily and can be used for Bioinformatics Review | 11
TUTORIAL
How to install Raccoon plugin on Ubuntu for virtual screening using Autodock? Image Credit: Stock photos
“Raccoon plugin provides a graphical user interface (GUI) to perform virtual screening using Autodock.�
s mentioned in our previous articles, Autodock Vina [1] is a very useful bioinformatics tool for molecular docking and provides various options for sitespecific docking and blind docking. But it seemed to be difficult to perform virtual screening using Autodock. Recently, Autodock has developed a plugin known as 'Raccoon' [2], which serves for this purpose.
The users don't have to prepare pdbqt files on their own, they can easily load the ligand and protein files in the Autodock using the Raccoon GUI and perform docking. It involves other features such as ligand filtration, generate configuration files for Autodock, automated generation of virtual screening script, multiple receptor conformations, and automated processing of ligand libraries.
Raccoon plugin provides a graphical user interface (GUI) to perform virtual screening using Autodock.
The installation of this plugin is very easy if you have already installed Autodock on your system, if not, then please follow this tutorial first.
A
How to install Raccoon plugin on Ubuntu? 1. Download the tar file of Raccoon (http://autodock.scripps.edu/r esources/raccoon). 2. save it in a directory let's say, Downloads. 3. Change to the Downloads directory and untar the file: $ cd Downloads $ tar xvzf 1.0b.tar.gz
raccoon-
Bioinformatics Review | 12
You will notice a new 'raccoon.py' file has been added to the same directory. 4. Open a terminal (Ctrl+Alt+T) and change to the directory where you extracted the tar file and type the following commands:
$ cd /home/user/Downloads/mgltoo ls_x86_64Linux2_1.5.6/bin/p ythonsh raccoon.py
$ cd Downloads
If you don't want to type the full path every time you execute the raccoon script, then add an alias in your bashrc file using the following command:
$ pythonsh raccoon.py
$gedit ~/.bashrc
A GUI will be displayed as shown in Fig. 1.
It will open the bashrc file, go at the end of this file and type the following: $ alias raccoon='/home/user/Downloa ds/mgltools_x86_64Linux2_1. 5.6/bin/pythonsh /home/user/Downloads/raccoo n.py'
Save the file, go back to the terminal and type: $ source ~/.bashrc
For c-shell, command: Fig.1 the GUI of Raccoon [2]. Now you have successfully installed raccoon plugin on your system. In most of the cases, the pythonsh command is not recognized, then change to the installed MGL tools directory (let's say, MGL Tools is a subdirectory of Downloads) and type the following commands:
use
the
following
$ alias raccoon '/home/user/Downloads/mglto ols_x86_64Linux2_1.5.6/bin/ pythonsh /home/user/Downloads/raccoo n.py'
patterns of N20NGG (N = any nucleotide). It then calculates E and S score. 1. E-score is the efficacy score ]based on the sequence features such as GC content (%GC), presence of poly-thymidine and location information 2. S-score is the specificity score based on the genome-wide offtarget binding sites. For each sgRNA design, enome-wide sequences are computed that contain an adjacent NRG (R = A or G) protospacer adjacent motif (PAM) site and zero, one, two, or three mismatches complementary to the sgRNA using Bowtie, which are regarded as off-target binding sites. The penalty score for NAG off-target is smaller than NGG off-target. The sgRNAs are finally ranked by the sum of E-score and S-score. The result it then presented according to the E and S score. References 1.
Trott, O., & Olson, A. J. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2), 455-461.
2.
Forli, S., Huey, R., Pique, M. E., Sanner, M. F., Goodsell, D. S., & Olson, A. J. (2016). Computational protein-ligand docking and
If you have any query, then email at muniba@bioinformaticsreview.com CRISPR-ERA looks up all targetable sites for each target gene, for
Bioinformatics Review | 13
virtual drug screening with the AutoDock suite. Nature Protocols, 11(5), 905-919.
Bioinformatics Review | 14
DOCKING
Raccoon2: A GUI facilitating virtual screenings with Autodock and Autodock Vina Image Credit: Stock photos
“Raccoon2 incorporates some new features such as an automatic server connection manager, loading multiple receptor structures and flexible residues, GUI for docking setup and grid box, result filtering options by energy or ligand efficiency, and interactions, easy management or remote jobs, and automatic installation of docking services (for Autodock Vina only).�
I
n the previous article, a new plugin called 'Raccoon' was mentioned, which helps in preparing virtual screening using Autodock. It provides a simple graphical user interface (GUI) where you can easily load the ligand and protein files in Autodock and perform virtual screening. An advanced version of Raccoon called, 'Raccoon2' has been introduced by Forli et al., (2016) [1] as GUI to prepare and analyze Autodock and Autodock Vina virtual screenings. Raccoon2 incorporates some new features such as an automatic server connection manager, loading
multiple receptor structures and flexible residues, GUI for docking setup and grid box, result filtering options by energy or ligand efficiency, and interactions, easy management or remote jobs, and automatic installation of docking services (for Autodock Vina only).
open the terminal and type the following commands:
How to install Raccoon2?
If it shows an error "Permission denied", then try
Raccoon2 is added in MGLTools and the installer can be downloaded from here. It is available for Windows, Linux, and MAC OS. Once the installer is downloaded (let's say in Downloads directory),
$ cd Downloads $ chmod +x mgltools_Linuxx86_64_1.5.7rc1_install $ ./mgltools_Linuxx86_64_1.5.7rc1_install
$ sudo ./mgltools_Linuxx86_64_1.5.7rc1_install
Accept the terms of the software and choose the installation folder, and click 'Finish' after the complete installation. You can see a new Bioinformatics Review | 15
directory named 'MGLTools1.5.7rc1' has been created in the Downloads directory. Now, in order to run this via terminal, create an alias and add it in your bashrc file using the following commands: $gedit ~/.bashrc
It will open the bashrc file, go at the end of this file and type the following: $ alias adtrc='/home/user/Downloads /MGLTools-1.5.7rc1/bin/adt'
Save the file, go back to the terminal and type: $ source ~/.bashrc
Try running adtrc from your terminal, it should be working. References 1.
Forli, S., Huey, R., Pique, M. E., Sanner, M. F., Goodsell, D. S., & Olson, A. J. (2016). Computational protein-ligand docking and
Bioinformatics Review | 16
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 17
Bioinformatics Review | 18