April 2019 VOL 5 ISSUE 4
“The most beautiful experience we can have is the mysterious. It is the fundamental emotion that stands at the cradle of true art and true science.� -
Albert Einstein
How to extract fasta sequences from a multi-fasta file based on matching headers in a separate file?
"What is the scope of bioinformatics?" Do we really need to ask this?
Public Service Ad sponsored by IQLBioinformatics
Contents
April 2019
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
03 Bioinformatics Programming How to extract fasta sequences from a multifasta file based on matching headers in a separate file? 06
05
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR PRAKASH JHA NABAJIT DAS
REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION
Volume 1, Number 1, Bioinformatics Review™ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
"What is the scope of bioinformatics?" Do we really need to ask this?
EDITORIAL
"What is the scope of bioinformatics?" This is the question which is most frequently asked by some students and scholars. The real question is do we really need to ask this? Bioinformatics is an interdisciplinary field including computer science, chemistry, mathematics, physics, and many disciplines.
Muniba Faiza
Founding Editor
Bioinformatics is a fast emerging field which has been grown immensely in the last decade. There are more than a thousand databases and multiple bioinformatics tools/software available which are frequently used to extract or develop new information for further use. We are capable of visualizing the biological data easily, ease tedious tasks, develop advanced methods, study the phylogeny of organisms, solve essential problems, and so on. With the application of programming skills in bioinformatics, new innovations can be made easily. Though there are a few limitations in bioinformatics such as lack of data connectivity, redundant data, accurate prediction of protein-protein interaction. These limitations can be overcome by developing and applying new methods and techniques. For example, a few years ago, bioNerDS [1], a recognition system for databases' and software names have been developed which is capable of identifying mentions of named-entities in the literature. This is helpful in exploring various things in bioinformatics on a single platform. Exploration is another aspect which is quite helpful in answering that question as exploration leads to learning which further leads to innovation. The things which exist now were non-existent once but the curiosity and exploration are the reasons for these realities. It is not about the scope, it's about the possibilities and discovering new opportunities. It is better to find solutions to the existing problems and overcoming the limitations either by utilizing the available
Letters and responses: info@bioinformaticsreview.com
resources or inventing new ones. That's what science and research are all about. Please write to us at info@bioinformaticreview.com. With best wishes!
EDITORIAL
Bioinformatics Review (BiR)
BIOINFORMATICS PROGRAMING
How to extract fasta sequences from a multifasta file based on matching headers in a separate file? Image Credit: Stock photos
“This is a simple Perl script to extract FASTA sequences from a large fasta file depending on the matching fasta headers present in another file.� This is a simple Perl script to extract FASTA sequences from a large fasta file depending on the matching fasta headers present in another file.
T
For example, your fasta sequences are present in a file named, "input.fa" and the headers are in another file called "headers.txt". #! /usr/bin/perl
chomp ( my @headers = map { split } <$headerfile> ); #splitting lines on whitespaces.
foreach my $header (@headers) {
close HEADERFILE;
if ( $sequences{$header} ) {
my %seqs; open( INPUTFILE, '<', $input ) or die $!;
print $header, "\n"; print $sequences{$header}, "\n";
{
use warnings;
local $/ = ''; #Reading until blank line
use strict;
while ( <$input> ) { my ( $header, $sequence ) = m/>\s*(\S+)\n(.*)/ms; $sequences{$header} = $sequence;
my $headerfile = 'headers.txt'; my $input = 'input.fa'; open( HEADERFILE, '<', $headerfile ) or die $!;
open( my $seqsfile, ">", "input.fa" );
}
} } close( $seqsfile ); } close INPUTFILE; exit; Bioinformatics Review | 6
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 7
Bioinformatics Review | 8