JULY 2019 VOL 5 ISSUE 7
“The art and science of asking
questions is the source of all knowledge.” -
Thomas Berger
A perl script to convert multiline FASTA sequences into a single line
Installing MODELLER on Linux/Ubuntu
Public Service Ad sponsored by IQLBioinformatics
Contents
July 2019
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
03 Tools Installing MODELLER on Linux/Ubuntu 06
04 Bioinformatics Programming A perl script to convert multiline FASTA sequences into a single line 08
05
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR PRAKASH JHA NABAJIT DAS REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
Bioinformatics and programming languageswhat do you need to know!
EDITORIAL
There are various things which come to mind when someone is going to enter in the field of Bioinformatics and the topmost concern is "Do I need to learn computer languages to pursue my career in Bioinformatics?". The answer is a bit tricky but it could be both "yes" and "no". This article will describe the conditions where you need to learn programming languages in Bioinformatics.
Muniba Faiza
Founding Editor
As a bioinformatics analyst, you just need to know about different software or tools, how to execute them via a pipeline or bash scripting, and how to filter your results, analyze them, and extract some useful information. Generally, all the software packages that are used in Bioinformatics always come with a manual explaining their usage and different features. So, there is no need to learn any specific language for them. Users can easily follow the tutorial and get the results but of course, they have to have a basic understanding of what they are doing and why, whereas, for some analyses such as component analysis using MATLAB (The MathWorks, Inc., Natick, Massachusetts, United States.) may require the knowledge of some specific languages, for example, R. In the other case, if you are in an algorithm-driven lab which is more focussed on developing software, pipelines, and so on, then definitely, you need to have a knowledge of programming languages. At first, you should understand the algorithm development and then a few programming languages to amend it. The circumstances to develop new software are when there is no simple, easy-to-use solution is available to a certain problem. In the field of bioinformatics, some commonly used computer languages include Python, R, MySql, PHP, and Perl. Its always better to know more advanced languages such as Java. In conclusion, it is better to learn some programming languages to
Letters and responses: info@bioinformaticsreview.com
pursue your career in bioinformatics but if you want to be an expert bioinformatics analysis, then not knowing these languages would not be a problem. Share your thoughts at info@bioinformaticsreview.com! With best wishes!
EDITORIAL
Bioinformatics Review (BiR)
TOOLS
Installing MODELLER on Linux/Ubuntu
Image Credit: Stock photos
“MODELLER can be used for other tasks such as modeling of loops in proteins, multiple alignments of structures, comparison of protein structures, and many others.� ODELLER is a software package which is used to predict protein threedimensional structures [1,2]. MODELLER can be used for other tasks such as modeling of loops in proteins, multiple alignments of structures, comparison of protein structures, and many others. In this article, I will explain how to install MODELLER on Linux/Ubuntu.
M
1. Download Let's update and upgrade the system first.
Now start downloading the software in a directory, say, Downloads, by typing the following command: $ sudo wget https://salilab.org/ modeller/9.22/modeller9.22.tar.gz
It will prompt some questions, answer them, and then you are finally done! You can run MODELLER either by: 1. using python (assuming python 2.3 - 3.7 is already installed on your system) $ python modeller_script.py
2. Install Extract the tar package:
2. using mod9.22 scripts $ tar xvzf 9.22.tar.gz
modeller-
Open the terminal (Ctrl+T) and type the following commands:
It will create a new directory in Downloads, named modeller9.22. cd to that directory and start installing.
$ sudo apt-get update
$ cd modeller-9.22
$ sudo apt-get upgrade
$ ./install
$ mod9.22 modeller_script.py That's all for the installation of MODELLER. If you have any query, please email at info@bioinformaticsreview.com.
Bioinformatics Review | 6
References 1.
Webb, B., & Sali, A. (2014). Comparative protein structure modeling using MODELLER. Current protocols in bioinformatics, 47(1), 5-6.
2.
Martí-Renom, M. A., Stuart, A. C., Fiser, A., Sánchez, R., Melo, F., & Šali, A. (2000). Comparative protein structure modeling of genes and genomes. Annual review of biophysics and biomolecular structure, 29(1), 291-325.
Bioinformatics Review | 7
BIOINFORMATICS PROGRAMMING
A perl script to convert multiline FASTA sequences into a single line Image Credit: Stock Photos
“If you are dealing with a big FASTA file consisting of thousands of sequences split into a particular number of residues per line, and you want each sequence into a single line, then you can use this simple Perl program.�
T
here are different software or tools which require different kinds of input, especially, when you are trying to developing a pipeline or want to process multiple large files. If you are dealing with a big FASTA file consisting of thousands of sequences split into a particular number of residues per line, and you want each sequence into a single line, then you can use this simple Perl program. There are two cases to input your multiline fasta file, either you define the filename in your Perl
script or get it through the command line.
while ($line = <IN>) { chomp $line;
1. Define input file within the script if ($line=~m/^>/gi) { The multifasta "input.fasta".
input
file
is
#!/usr/bin/perl use strict; use warnings; my $input_fasta = "input.fasta"; open(IN,"<", "input_fasta") || die ("Can't open $input_fasta $!");
print "\n", $line,"\n"; } else { print $line; } } print "\n";
my $line = <IN>; print $line;
Bioinformatics Review | 8
2. As a command-line argument #!/usr/bin/perl use strict; use warnings; my $input_fasta = $ARGV[0]; open(IN,"<", "$input_fasta") || die ("Can't open $input_fasta $!"); my $line = <IN>; print $line; while ($line = <IN>) { chomp $line; if ($line=~m/^>/gi) { print "\n", $line,"\n"; } else { print $line; } } print "\n";
Bioinformatics Review | 9
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 10
Bioinformatics Review | 11