July 2015 VOL 1 ISSUE7
“The art and science of asking questions is the source of all knowledge.� -
Thomas Berger
GenVisR: A tool for genomic visualization
Deciding the right journal for your paper: 5 things to look for
Public Service Ad sponsored by IQLBioinformatics
Contents
July 2016
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
5
03 Featured Deciding the right journal for your paper: 5 things to look for 09
05
Tools
miRNAs and their Target Prediction Tools: An Overview 14
04
Software
GenVisR: A tool for genomic visualization 11
06 Featured BiR: Impact report – July 2016 & More 16
EDITOR Dr. PRASHANT PANT FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR PRAKASH JHA NABAJIT DAS REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com
PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
EDITORIAL
Editorial: Way to go for Bioinformatics, time for reevaluation of concepts (of education)!!
Dr. Prashant Pant Editor-in-Chief
Although Bioinformatics is a synthesis of broader disciplines, being a new tool around the corner a number of books have surfaced very recently to help tackle the subject in a more methodical way. These books form a foundation of this subject and are therefore referred to by teachers, students, and professionals alike. Some of these books are "almost there" as a standard textbook for a naive, or the quintessential "beginner". In this month’s issue, I take this opportunity of coming up with an editorial to discuss some of these books, not by reviewing the books 'per se' but to try and classify them in their merits along the lines of their popularity among masses and peeking into reasons for their popularity. The criterion for selection of these books is on the basis of what is available in the Indian market and what is recommended by peers. These books are (in the order of their popularity) Bioinformatics and Functional Genomics by Jonathan Pevsner (Wiley-Blackwell, First published November 2003), Bioinformatics: Principles and Applications by Ghosh and Mallick (Oxford University Press, Published 2008), Biological Sequence Analysis by Durbin et al., by Cambridge University Press and Bioinformatics: A Practical Guide to The Analysis of Genes and Proteins by Andreas D. Baxevanis and B.F. Francis Ouellette (Wiley-India Edition, 2006) and finally Bioinformatics by D. Srinivas Rao (Biotech Pharma Publications). Among these, the book by Jonathan Pevsner is one of the best and popular among Indian Universities due to its finesse, quality of content, and an in-depth coverage of the subject. The entire book
Letters and responses: info@bioinformaticsreview.com
EDITORIAL
is divided into three main sections: Sequence Analysis, Functional Genomics, and Genomics. The book discusses major concepts with decent coverage to help develop an understanding of the subject matter and its conceptual intricacies. The book by Andreas et al. substantiates Pevsner very well by bringing a practical guide to deal with sequence data analysis and phylogeny inference using molecular data. All in all these two books, complete the learning loop very much for an undergraduate course syllabus. However, these books have been surpassed in sale and popularity among Indian students who find the second most popular book by Ghosh and Mallick as a better resource to understand Bioinformatics. The book is undoubtedly lucid, simple and straightforward, to the point and more being a textbook in its flavor and approach with which it heads to tease the subject. Its popularity among masses of different tiers of graduating students and academician needs to be understood from various aspects viz. the amount of peer pressure among Indian University students to perform over learning and having fun with the subject matter. The book is an excellent comprehension of Bioinformatics tools and techniques available to us by various databases and other resource centers and caters to the need of graduate students. However, somewhere between motivating a student to take up the subject as a career option, it is reduced as a shortest possible route to clear an examination and to get on with the life. I have been teaching undergraduates for 4 years now and this is the general trend/belief of the masses. Introspecting the causes, it occurs to me that it is neither the book (the man’s best friend) at the root cause nor a specific subject. It is the entire mashed up concept of education that is making masses perceive education as a bitter pill. It is not about Indian versus Western Education. It is about what we perceive education to be, the parents or parents to be, the teachers or the teachers to be, the students, the researchers, the educators and the policymakers of education who need to revisit the basic idea of What is the aim and scope of Education? What is the need for higher education? Why do we need to go beyond a basic level of Education, and if at all we need to, to whom should we administer a particular level of education beyond basics? If these questions are not asked or introspected by us now, we run the risk of having highly educated (?) and yet good for nothing people all over the place ruining the sacrosanct meaning of being educated and learned people. Gladly (or Sadly), the times are so rapidly changing in Indian setup of education at school and higher level that one cannot predict the
future of the future generations. The emphasis is more on skill development and career-oriented courses. It might be excellent for an economy as per economist’s projections or market analyzers, but what about the satisfaction of gaining knowledge and subsequently becoming less anxious, more contented, more subtle. Can’t we strike a balance if not anything else? From where and how the balance will be obtained? There are question and points to ponder over and over again…
EDITORIAL
Write us back at prashant@bioinformaticsreview.com
FEATURED
Deciding the right journal for your paper: 5 things to look for Image Credit: Stock Photos
“Are you wondering whether you should publish your paper with the journal that just sent you a Call for paper? Or have you ever wondered about what to look for before choosing the right journal for your research? Well, here is the definitive list of five things you need to be wary of.�
onsidering the amount of effort and hard work that goes into writing a research paper, it is critical to choose the right journal to reach the right scientific audience. It is particularly important not to literally waste your valuable work by falling prey to predatory journals. So we came up with this short guide that will make it easier for you to decide where to publish without getting into problems or without getting duped.
C
1. First things first, OA or Not? Some journals are purely Open Access(OA). Every single paper published in such journals are available without any fees or subscribe directly via the internet.
Open access journals are good to go. But making science available to the public for free comes with a cost they charge the authors for the publication charges. This makes them out of reach for an independent researcher or authors without much financial backing. To overcome this, some journals are partly open access and allow the authors to choose whether they want their paper to be open access by paid Open Access Charge. So decide carefully whether you want to go open access. If so, you can search for the journal of your interest on DOAJ (Directory of Open Access Journals).
2. Beware of predatory journal Predatory journals are 'Fake' journals that are fraudulently set up to earn some easy cash. These journals attract young researchers by offering close to 100% acceptance rate, namesake/no review. Typically you can know whether a journal is predatory or not by looking into this list or this list (Popularly known as Bealle's List). Although this is not an absolute list but it may come handy when deciding. You can identify a predatory journal by looking at the general signs such as lack of contact/office address on the website (or having just an email address as contact information). Not mentioning phone number of
Bioinformatics Review | 9
editors, missing details of the editorial board etc.
choose a journal, better make sure it participates in LOCKSS.
3. Does your journal participate in archiving programs?
4. Know the scope of the journal you are publishing in
What if the website goes down tomorrow? or maybe the journal goes bankrupt? What will happen to the valuable research that was published in it? The good news is that the most journals participate in an archiving program where they deposit data for permanent storage, i.e. even if the journal shuts down, your paper will not cease to exist. This is done by providing each article with a DOI(digital object identifier) number which is unique and points to the same article, forever. Okay, so what if the data center gets nuked? or maybe what if the data center where all the DOI are stored suddenly catch fire? To prevent such scenarios, a large number of copies of the same content are kept at different locations around the world. This is achieved by a journal participating in LOCKSS (Large Number of Copies Keep Stuff Safe) program. Some journals participate in NCBI PubMed where they deposit published data. So the next time you
The basic aim & essence of publishing research are to reach scientific fraternity, telling them about the significant work that you have done and what it could be used for(Although there are people who publish to merely gain credits). So better reach the right audience, by selecting a journal whose readership and editorial board comprises of people in a related field. The aims and scopes of the journal should match with your work. For example, the Journal of Theoretical Biology is aimed at theoretical studies and it will not accept your wet lab research work. Furthermore, the areas outlined in the journal's homepage are the ones that will be accepted. Articles falling out of scope are usually rejected thus the loss of precious time. 5. Impact (Not Impact Factor) Ask yourself whether the contents of the journal are accessible via search. Make sure you are able to search the
articles published in your selected journal via Google Scholar, PubMed or PubMed Central. You can also consider Impact Factor (by Thomson Reuters) of a journal while deciding buy, try not to overemphasize it. IF is a measure of others work and does not mean it will put any weight to your research. A journal with a good impact factor is likely to be having more readership and thus, citations. You should target a journal that not only publishes your research but also takes strides in publicizing it. What is the point of publishing in a journal that does not really publishes your research? Final Words It is wise to use caution while selecting a journal. It will save you time, money and your efforts that may otherwise go into vain. Emphasizing on these five points may be time taking, but in the end, it pays off to be more cautious. Liked this article? Why not share the URL of this article with a friend and save their day?
Bioinformatics Review | 10
SOFTWARE
GenVisR: A tool for genomic visualization
Image Credit: Google Images
“This article introduces an R package to visualize genomic datasets. It is a suite of tools which makes the visualization and interpretation of genomic regions easier for the researchers.�
he ever-increasing progress of sequencing techniques has developed a massive amount of genomic data [1]. This has led to an exponential growth of genomic datasets which provide huge information to the scientists. For identifying patterns and investigating biological information, it is necessary to visualize the genomes, but it is quite difficult to develop such tools.
T
GenVisR is a Bioconductor R package which provides flexible, user-friendly suite of tools for easy visualization of genomic data. It allows to visualize and interpret genomic data for multiple species under study in three categories: Variants, Copy number alterations and data quality [2]. GenVisR is a compilation of various functions
and tools developed for the easy visualization of genomic data. 1. Visualization of Variants GenVisR provides many functions to analyze the small variants within a genome which is required to be studied during the investigation of the genetic basis of a disease. The available functions in GenVisR to visualize small variants are: a. Lolliplot It keeps a precise control over visualization options provided in GenVisR, for example, to visualize the protein domains, a user can opt for Ensemble annotation databases. It also enables the user to plot mutations (Fig.1).
Fig.1 Output from lollipop for selected TCGA breast cancer samples (Cancer Genome Atlas Network, 2012) shows two mutational hotspots in PIK3CA within the accessory and catalytic kinase domains [2]. b. Waterfall It allows to track the variant recurrence across the multiple genes and illustrates all the mutations in variants and also further differentiates between the variant types. The results are displayed by arranging the samples in a hierarchical manner such that the most recent genes are ranked first and so on (Fig.2).
Bioinformatics Review | 11
diseases. The function lohSpec displays all the LOH regions within the genomic dataset (Fig.4).
Fig.2 Output from waterfall showing mutations for five genes across 50 selected TCGA breast cancer samples with mutation type indicated by color in the grid and per sample/gene mutation rates indicated in the top and left sidebars [2].
Fig.3 Output from GenCon displaying coverage (bottom plots) showing focal deletions in sample A (last exon) and B (second intron) within a gene of interest. GC content (top plot) is encoded via a range of colors for each exon [2].
c. TvTi
b. cnView
It is useful to find the rate of transition and transversion mutations occurred in a set of genes.
It allows the user to plot copy numbers in a broader view and shows an ideogram for an individual sample at the chromosome level.
2. Visualization of alterations in Copy Number
Fig.4 Output from lohSpec for HCC1395 (Griffith et al., 2015), HCC38 and HCC1143 (Daemen et al., 2013) breast cancer cell lines shows LOH events, across all chromosomes, shaded as dark blue. 3. Visualization of Data Quality The quality assessment of sequencing data is of utmost importance for the proper interpretation of variants within the genome. GenVisR provides few functions for the quality assessment of the data.
c. cnSpec a. covBars
Copy number alterations within the genome are identified in various diseases [3]. GenVisR provides various functions to easily visualize the copy number alterations.
It displays copy numbers on a larger scale than cnView. It shows a heat map arranged in a grid indexed by chromosomes and samples.
It is a framework which displays the sequencing coverage for the targeted bases (Fig.5).
c. cnFreq a. GenCov It displays the amplifications and deletions within the genomic region of interest (Fig.3).
It displays the frequency of samples within the genomic dataset which has gained or lost the copy numbers at specific gene loci. d. lohSpec Loss of Heterozygosity (loh) is important for studying genomic
Bioinformatics Review | 12
coverage (bottom plot) and SNP allele fraction (main plot) indicating highly related samples [2].
Fig.5 Output from covBars shows cumulative coverage for 10 samples indicating that for each sample, at least 75% of targeted regions were covered at 35 depth [2]. b. compIdent
Since GenVisR is an R package, therefore, it requires a simple R script to run a particular function and it accepts a default file format known as MAF (Mutation Annotation Format). It was first developed for The Cancer Genome Atlas project (Cancer Genome Atlas Research Network, 2008). For example, as illustrated by Z.L.Skidmore et al. (2016), to create Fig. 2 the following script was written in a standard MAF file containing variant mutation data and choosing which genes to plot [2] : genes ¼ c(“PIK3CA”, “TP53”, “USH2”, “MLL3”, “BRCA1”) GENVISR::WATERFALL(X MAF_FILE, PLOTGENES¼GENES)
It helps to identify the mixed samples that are thought to originate from the same genome (Fig.6).
Fig.6 Output from compIdent for the HCC1395 breast cancer cell line (tumor and normal) shows variant
¼
References: 1.
Kodama,Y. et al. (2012) The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res., 40, D54–D56.
2.
Zachary L. Skidmore1 , Alex H. Wagner1 , Robert Lesurf1 , Katie M. Campbell1 , Jason Kunisaki1 , Obi L. Griffith1,2,3,4,* and Malachi Griffith. Bioinformatics, 2016, 1–3 doi: 10.1093/bioinformatics/btw325.
3.
Beroukhim,R. et al. (2010) The landscape of somatic copy-number alteration across human cancers. Nature, 463, 899–905.
Bioinformatics Review | 13
TOOLS
miRNAs and their Target Prediction Tools: An Overview
Image Credit: Google Images
“miRNA regulates the target mRNAs to make small adjustments to the corresponding resulting protein which consequently leads to the dysregulation of miRNA function which results in a human disease [1].�
M
iRNAs are the small endogenous non-coding RNAs having a length less than or ~22 nucleotides. miRNAs are expressed from long transcripts formed in animals, viruses, singlecelled eukaryotes and plants [1]. miRNAs cause various types of human diseases among which they are more involved in causing many types of cancer such as colon cancer [2], breast cancer [3], prostate cancer [4], lung cancer [5] and so on. miRNA regulates the target mRNAs to make small adjustments to the corresponding resulting protein which consequently leads to the
dysregulation of miRNA function which results in a human disease [1]. How miRNAs are produced? miRNAs are produced in the nucleus initially as long primary transcripts (pri-miRNA) which are done with the help of RNA polymerase II. They originate either from their own noncoding gene or from the intronic region of the protein-coding genes [1]. The pri-miRNA thus formed, fold into a hairpin. This hairpin later binds to the two members of RNase III families of enzymes which are called, Drosha and Dicer. They both play an important role in the formation of miRNA.
Drosha binds to the DGCR8 and form a microprocessor complex in the nucleus and cleaves the pri-miRNA ~70 nucleotides which are known as miRNA precursor (pre-miRNA) [1]. It is then transported to the cytoplasm by exportin-5. Dicer processes the pre-miRNA to produce the mature miRNA / double-stranded miRNA which is ~20 nucleotides in length [1]. miRNA Target Prediction: miRNA binds to the target mRNA through complementary base pairing irrespective of the complete or incomplete binding. It has been observed that miRNA generally binds to the 3'-UTR of the target mRNA Bioinformatics Review | 14
having either of the two binding patterns [6]. There are two classes of binding patterns:
one class of binding pattern include the Watson-Crick complementarity to the target sites at the 5'- end of the miRNA which is known as "seed region". miRNAs are able to suppress their targets with the help of seed regions [1]. the second class deals with the improper base pairing to the target sites at the 5'- end of the miRNA.
A single mRNA may bind to the multiple target sites in a transcript and also several miRNAs can bind to a transcript. Their short length makes them be detected statistically by using statistical techniques [7]. miRNA Target Prediction Tools: As miRNAs are involved in causing various human diseases, therefore, there is a need to predict their targets so that the dysregulation of miRNAs can be controlled. Some of the most widely used miRNA target prediction programs are:
miRanda: It identifies the miRNA targets using two steps, first the miRNA is aligned against the 3'-UTR of the sequence, then in the second
step thermodynamic stability of the complex is calculated for the highest scoring alignment and reported [8].
miRBase: It identifies the miRNA binding sites by applying miRanda algorithm [9]. It uses the dynamic programming to identify the most complementary sites. DIANA-microT 3.0: It is a scoring based algorithm which scores the binding of the miRNA to the target sites of the transcript and then calculates the precision for each interaction [10].
This article gives an overview of the miRNAs and their targets. I will try to give some more information about the target prediction and miRNA function annotation in my upcoming articles.
4. Porkka KP, Pfeiffer MJ, Waltering KK, et al. MicroRNA expression profiling in prostate cancer. Cancer Res 2007; 67(13):6130–5. 5. Yanaihara N. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell 2006;9(3):189–98. 6. Rajewsky N. microRNA target predictions in animals. Nat Genet 2006;38:S8–13. 7. Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 1990;87(6):2264–8. 8. John,B. et al. (2004) Human microRNA targets. PLoS Biol., 2, e363. 9. Griffiths-Jones,S. et al.(2008) miRBase: tools for microRNA genomics. Nucleic Acids Res., 36, D154– D158. 10. Maragkakis,M. et al. (2009) Accurate microRNA target prediction correlates with protein repression levels. BMC Bioinformatics, 10, 295.
References: 1. Bing Liu, Jiuyong Li, and Murray J. Cairns. Identifying miRNAs, targets and functions. Briefings in Bioinformatics. page 119; doi:10.1093/bib/bbs075. 2. Akao Y, Nakagawa Y, Naoe T. MicroRNA-143 and -145 in colon cancer. DNA Cell Biol. 2007;26(5):311–20. 3. Iorio MV, Ferracin M, Liu C-G, et al. MicroRNA gene expression deregulation in human breast cancer. Cancer Res 2005;65(16):7065–70.
Bioinformatics Review | 15
FEATURED
BiR: Impact report - July 2016 & More “miRNA regulates the target mRNAs to make small adjustments to the corresponding resulting protein which consequently leads to the dysregulation of miRNA function which results in a human disease [1].”
M
As we improve the quality of articles and as we are nearing our first collective birthday, Bioinformatics Review has expanded its reach, opening new avenues. The progress is summarised as below:
We have partnered with London Business Conference Group to be an official media partner for Discovery Informatics and Analytics Summit 2016
Our scientific articles are starting to appear in Google Scholar, articles indexed there can be viewed by following this link.
Bioinformatics Review was visited over 143,801 times in July 2016.
As for DIAS Summit 2016, we are giving away 15% coupons to our readers. To claim a discount coupon, please visit our Facebook page. Do share this page with your colleagues and refer them for a coupon.
Bioinformatics Review | 16
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 17
Bioinformatics Review | 18