BIOINFORMATICS REVIEW- JUNE 2018

Page 1

JUNE 2018 VOL 4 ISSUE 6

“It is strange that only extraordinary men make the discoveries, which later appear so easy and simple.� -

Georg C. Lichtenberg

A new high-level Python interface for MD simulation using GROMACS

Linux 'sed' command in Perl programming


Public Service Ad sponsored by IQLBioinformatics


Contents

June 2018

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Topics Editorial....

05

03 Bioinformatics Programming Linux 'sed' command in Perl programming 07

04 Software A new high-level Python interface for MD simulation using GROMACS 09


FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR NABAJIT DAS

REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India


Bioinformatics- A broad future ahead: Editorial

EDITORIAL

It has been a wonderful time since BiR came into existence. As we enter a new year, BiR tries to look forward towards the development and wonderful achievements and providing the best knowledge regarding bioinformatics. In the past two years, BiR has hit a long road from a few readers to several thousand.

Muniba Faiza

Founding Editor

Every complimentary and appreciation mail we get feels like an achievement for us. Bioinformatics has got a great future ahead of it with a better understanding and precise methodologies for both dry and the wet lab experimentations. In the last two years, BiR has advanced in many aspects. We have come up with an android app which helps our readers to stay connected with the latest updates, our articles have started to appear in Google Scholar, we get a lot of cherishing emails, and collaboration proposals. BiR is trying to broaden the horizons by covering different domains of bioinformatics. Since bioinformatics is multidisciplinary, to date, the team of BiR has tried to go through almost every aspect of it including big data, sequence analysis, structural bioinformatics, data mining, tools, software, biostatistics, and so on. This year BiR is more focused to provide a rich content to our readers and help to understand the concepts of bioinformatics more easily. The team of BiR is trying to reach to the students to encourage them for their career in bioinformatics and to the researchers currently working in the same area. The last internship at BiR was a great success and we got an amazing response from our interns. We are looking forward to presenting our work at school and college level to introduce this to the young minds who are more fascinated by the technology. We have such a long road to drive on which is not possible without the support of our readers, subscribers, and contributors. We are thankful to our readers wholeheartedly for their support and suggestions and wish them a very happy and prosperous new year

Letters and responses: info@bioinformaticsreview.com


with new hopes and great achievements. We would like to hear your thoughts and feedback about BiR, and what other kinds of articles you would like to read.

EDITORIAL

Please write us at info@bioinformaticsrevew.com


BIOINFORMATICS PROGRAMMING

Linux 'sed' command in Perl programming

Image Credit: Google Images

“The sed command is a stream editor which parses and performs basic text transformations on a file or an input stream from a pipeline. sed allows restricting the command to certain lines or characters. It reads text line-by-line, removes trailing newline and stores them into an empty data buffer known as pattern space, on which the specified commands are executed.” hen it comes to handling large data files to process, it becomes very difficult to write programs for the beginners, especially when you have to execute another script within one. For this purpose, the Linux operating system offers several advantages for bioinformatics programming such as 'awk' and 'sed' one-liners, they prove to be of great help!

Syntax:

The sed command is a stream editor which parses and performs basic text transformations on a file or an input stream from a pipeline. sed allows restricting the command to certain lines or characters. It reads text lineby-line, removes trailing newline and stores them into an empty data buffer known as pattern space, on which the specified commands are executed. It is an amazing utility but the documentation is a bit difficult.

sed substitution command:

W

sed OPTIONS... [INPUTFILE]

[SCRIPT]

In bioinformatics, sometimes we need to edit large fasta files, which is quite difficult and tedious when done manually. The replacement and deletion commands of sed are most widely used, which can edit more than five thousands of sequences in a few seconds.

sed 's/regexp/replacement/g' input > output

or to modify the input file without getting output in another file, -i option is used: sed -i input 's/regexp/replacement/ g'

It has a vast range of applications in bioinformatics programming such as: 

one can easily edit or modify the fasta headers in a file consisting of a large number of sequences,

can search for a particular expression present within the sequences,

can delete text in between the lines, or in terms of bioinformatics, it can easily delete a specific number of residues from specific lines,

can handle large data files, and so on.

In Perl, sed command can be easily executed using exec(), system(), qx//, or backticks (``) depending on the need of the program. The similarity between the sed and Perl is Bioinformatics Review | 7


that they use similar identifiers and/ characters. For example, [table id=2 /] If you want to insert five blank spaces at the beginning of each line, then the following can be used: $ sed 's/^/

/'

Again, here '^' is used for recognizing the first character in a line in Perl and sed. Further reading: https://www.computerhope.com/uni x/used.htm http://www.grymoire.com/Unix/Sed. html#uh-47

Bioinformatics Review | 8


SOFTWARE

A new high-level Python interface for MD simulation using GROMACS Image Credit: Stock photos

“Irrgang et al., [9] have proposed an API for GROMACS called "gmxapi" written in pure Python and implemented as a C++ extension.� he roots of the molecular simulation application can be traced back to physics where it was applied to simplified hard-sphere systems [1]. This field of molecular simulation study has gained a lot of interest since then and applied to perform simulations to fold small protein at multi-microsecond scale [24], predict functional properties of receptors and to capture the intermediate transitions of the complex [5], and to study the movement and behavior of ligand in a binding pocket and also to predict interactions between receptors and ligands [6,7].

T

GROMACS is the most widely used software implemented to study the molecular dynamics (MD) simulations of complex proteins [8].

GROMACS offers a set of commands which can be easily executed for MD simulation of a protein or to a complex protein with a ligand to study protein folding kinetics to computational drug design to the refinement of molecular structures. Recently, Irrgang et al., [9] have proposed an API for GROMACS called "gmxapi" written in pure Python and implemented as a C++ extension. This API allows the users to simply construct the computational task graphs permitting the parallel optimizations and mixing of MD simulation and machine-learning operations using other software packages such as TensorFlow [10]. The API provides a native interface to GROMACS MD engine [11]. Users can simply drive MD simulations via highlevel procedural commands, an

object-oriented interface, or can employ their own extension code. The restrained-ensemble simulations compute population properties from a set of MD simulation data, then compare these computed simulations to residue-residue distance distributions used as experimental data measured via double electronelectron resonance (DEER) spectroscopy. Then, a distance histogram is calculated by the simulation algorithm from the estimated ensemble and calculates a distance-dependent biasing force for the simulations, which are run for an interval of time (Δt) before repeating the process [9]. gmxapi enables custom plugins for user-defined forces, allows custom potential functions, provides the

Bioinformatics Review | 9


optimized performance of the software GROMACS, and allows to build and execute computational graphs.

4.

Bowman, G. et al. (2011) Atomistic folding simulations of the five helix bundle protein 685. J. Am. Chem. Soc., 133, 664–667.

5.

Nury, H. et al. (2010) One-microsecond molecular dynamics simulation of channel gating in a nicotinic receptor homologue. Proc. Natl Acad. Sci. USA, 107, 6275–6280.

References 1.

Alder, B. and Wainwright, T. (1957) Phase transition for a hard sphere system. J. Chem. Phys., 27, 1208–1209.

2.

van der Spoel, D. and van Maaren, P.J. (2006) The origin of layer structure artifacts in simulations of liquid water. J. Chem. Theor. Comput., 2, 1–11.

3.

Lindorff-Larsen, K. et al. (2011) How fast-folding proteins fold. Science, 334, 517–520.

6.

Chong, L. et al. (1999) Molecular dynamics and free-energy calculations applied to affinity maturation in antibody 48g7. Proc. Natl Acad. Sci. USA, 96, 14330–14335.

7.

Huang, D. and Caflisch, A. (2011) The free energy landscape of small molecule unbinding. PLoS Comput. Biol., 7, e1002002.

8.

Hess, B. et al. (2008) Gromacs 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theor. Comput., 4, 435–447.

9.

M Eric Irrgang, Jennifer M Hays, Peter M Kasson (2018) gmxapi: a high-level interface for advanced control and extension of molecular dynamics simulations, Bioinformatics, bty484,

10. https://www.tensorflow.org/ 11. Pronk, S., et al. GROMACS 4.5: a highthroughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013;29(7):845-854

Bioinformatics Review | 10


Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com

Bioinformatics Review | 11


Bioinformatics Review | 12


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.