MARCH 2019 VOL 5 ISSUE 3
“Somewhere, something incredible is waiting to be known.” -
Carl Sagan
How to read fasta sequences as hash using perl?
Installing Roary and Prokka on Ubuntu
Public Service Ad sponsored by IQLBioinformatics
Contents
March 2019
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
05
03 Software Installing Roary and Prokka on Ubuntu 06
05 Phylogenetics How to calculate dN, dS, and dN/dS ratio on a set of genes using MEGA? 10
04 Bioinformatics Programming How to read fasta sequences as hash using perl? 08
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR PRAKASH JHA NABAJIT DAS
REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
BiR-Taking it to the next level: Editorial As we dive into a new year, BiR desires to take a step forward towards the new developments and achievements. BiR has achieved a lot since the time of its existence in the form of our readers who have been a wonderful motivation for us.
Muniba Faiza
EDITORIAL
Founding Editor
Bioinformatics has become a broad field now, covering the important aspects of our lives such as drug designing and in the last three-to-four years, BiR has not only tried but succeeded in covering almost every domain of Bioinformatics including sequence analysis, structural bioinformatics, docking, phylogeny, evolution, tools, software, and so on. This coming year BiR will focus on several other facets of Bioinformatics covering a wide range of domains including cheminformatics, more articles focussing on bioinformatics programming, big data, and more tutorials regarding new software/tools. We have received several suggestions and appreciation from our readers all over the world including some interesting topics to cater more articles about. We are currently working on the suggested topics and soon will be made accessible to all. Besides, BiR would like to welcome new authors who are interested in bioinformatics and sharing their knowledge worldwide. Last year BiR commenced the annual listing of top Indian Bioinformaticians acknowledging our respected scientists and researchers working in the same field. This year BiR will try to arrange talkings and conferences with the bioinformaticians. Besides, BiR hopes to introduce new projects and internships to young researchers working in the same field. We have many miles to go which is not possible without the support of our readers, subscribers, and contributors. We are wholeheartedly thankful to all of you and wish you a very prosperous and happy new year with great achievements ahead. Keep sharing and spreading knowledge.
Letters and responses: info@bioinformaticsreview.com
Please share your thoughts info@bioinformaticsreview.com. With best wishes!
EDITORIAL
Bioinformatics Review (BiR)
and
suggestions
at
SOFTWARE
Installing Roary and Prokka on Ubuntu
Image Credit: Stock photos
“In the last article on Bioinformatics Review, the utilization of Roary [1] and Prokka [2] was explained to create a pangenome from isolated genome sequences. This article is about installing these both packages on Ubuntu.� n the last article on Bioinformatics Review, the utilization of Roary [1] and Prokka [2] was explained to create a pangenome from isolated genome sequences. This article is about installing these both packages on Ubuntu.
I
In order to install Roary and Prokka, you need to install some dependencies such as ncbi-blast+, cdhit, and so on. These are explained in the following sections. So, let's start. 1. Installing dependencies Open the terminal (Ctrl+T) and type the following commands: $ sudo apt-get update $ sudo apt-get upgrade
It's better to upgrade the system before installing new software. After that, start dependencies.
installing
the
Or install all of them at once:
$ sudo apt-get install bedtools ncbi-blast+ mcl cd-hit mafft prank fasttree parallel $ sudo shell
perl -MCPAN
-e
> install Array::Utils Similarly, modules:
install
Log::Log4perl Moose Moose::Role Text::CSV PerlIO::utf8_strict Devel::OverloadInfo Digest::MD5::File
the
following
Bio::Perl Exception::Class File::Basena me File::Copy File::Find::Rule File::Gr ep File::Path File::Slurper File::Spec File::Temp File::Which FindBin Getopt::Long Graph Graph::Writer::Dot List::Util
$ sudo cpanm Array::Utils Bio::Perl Exception::Class File::Basename File::Copy File::Find::Rule File::Grep File::Path File::Slurper File::Spec File::Temp File::Which FindBin Getopt::Long Graph Graph::Writer::Dot List::Util Log::Log4perl Moose Moose::Role Text::CSV PerlIO::utf8_strict Devel::OverloadInfo Digest::MD5::File
2. Installing Roary from source You can install Roary from the source. First, download the latest software from here Bioinformatics Review | 6
(https://github.com/sangerpathogens/Roary/tarball/master) and then go into the directory where you have downloaded the software (let's say, Downloads). $ cd Downloads $ tar xvzf sangerpathogens-Roaryxxxx.tar.gz $ cd sanger-pathogensRoary-db
simple-perl libdigest-md5perl git default-jre bioperl $ sudo cpan Bio::Perl $ git clone https://github.com/tseemann/ prokka.git $HOME/prokka $ $HOME/prokka/bin/prokka -setupdb
Click here to read more about the further usage of Prokka. References 1.
Page, A. J., Cummins, C. A., Hunt, M., Wong, V. K., Reuter, S., Holden, M. T., ‌ & Parkhill, J. (2015). Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, 31(22), 36913693.
2.
Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), 2068-2069.
Now, open your bashrc file, type the following lines at the end of the file: $ sudo gedit ~/.bashrc $ export PATH=/home/user/Downloads /sanger-pathogens-Roarydb/bin $ export PERL5LIB=/home/user/Downl oads/sanger-pathogensRoary-db/lib Check whether it has been installed or not: $ roary -w
It will print the version. 3. Installing Prokka Open the terminal and type the following commands: $ sudo apt-get libdatetime-perl
install libxml-
Bioinformatics Review | 7
BIOINFORMATICS PROGRAMMING
How to read fasta sequences as hash using Perl? Image Credit: Stock Photos
“This is a simple Perl script to read a multifasta file as a hash.�
T
his is a simple Perl script to read a multifasta file as a hash.
Suppose, your multifasta file is "input.fasta", which you want to read as the hash.
my $id = $1;
my $infile = "input.fasta";
}
my %seqs = %{ read_fasta_as_hash($infile) };
elseif ( $line !~ /^\s*$/ ){ $sequences{$id} .= $line; }
#! /usr/bin/perl
}
use warnings;
close (FH);
use strict;
exit;
my $infile = "input.fasta"; my %sequences; open( FH, '<', $infile ) or die $!; while( my $line = <FH> ){
#call the subroutine #your code goes here sub read_fasta_as_hash{ my $inputfile = shift; my $id = ''; my %sequences;
If you want to write a subroutine for reading a fasta file, then you can do like this:
open( INFILE, '<', $inputfile ) or die $!; while( my $line = <INFILE> ){
#! /usr/bin/perl chomp $line;
chomp $line;
use warnings;
if ( $line =~ /^(>.*)$/ )){
use strict 'vars';
if ( $line =~ /^(>.*)$/ )){
Bioinformatics Review | 8
my $id = $1; } elseif ( $line !~ /^\s*$/ ){ $sequences{$id} .= $line; } } close (INFILE); return %sequences; } exit;
Bioinformatics Review | 9
PHYLOGENETICS
How to calculate dN, dS, and dN/dS ratio on a set of genes using MEGA? Image Credit: Wikimedia Commons
â&#x20AC;&#x153;If you want to get a quick idea about the non-synonymous vs synonymous (dN/dS) substitutions, you can easily use MEGA software [1].â&#x20AC;?
I
If you want to get a quick idea about the non-synonymous vs synonymous (dN/dS) substitutions, you can easily use MEGA software [1]. Although HYPHY/Datamonkey provides the best results regarding selection pressure analyses. MEGA also uses HYPHY program [2] to calculate the dN/dS substitutions rate. Here is how you can do it.
You will need a codon fasta file genes, if you have protein sequences, then convert them into nucleotide codon sequences.
i) open
MEGA --> Align --> Edit/Build Alignment --> Retrieve sequences from a file. (If you already have an alignment file then skip this step).
ii) Edit --> Select all -->
v) It will prompt to select a .meg file, select it.
vi) It will ask for analysis preferences as shown in Fig. 1. Choose according to your requirements and click "Compute".
Align by ClustalW/Muscle.
iii) Save the session and export alignment (.meg).
in
MEGA
format
iv) Minimize
the alignment window. Go to the main window and click on "Selection" --> "Estimate selection for each codon (Hyphy)".
Fig. 1 Analysis preference window for dN/dS substitution calculation using MEGA7. Bioinformatics Review | 10
vii) Later, it will ask for the output format and output directory where you want to save the results. Click "Ok". Your job will be finished after a few minutes depending up on the number and length of sequences.
References 1.
2.
Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular biology and evolution, 33(7), 1870-1874. Pond, S. L. K., & Muse, S. V. (2005). HyPhy: hypothesis testing using phylogenies. In Statistical methods in molecular evolution (pp. 125-181). Springer, New York, NY.
Bioinformatics Review | 11
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 12
Bioinformatics Review | 13