BIOINFORMATICS REVIEW- OCTOBER 2016

Page 1

Oct 2016 VOL 2 ISSUE 10

“Science has not yet taught us if madness is or is not the sublimity of the intelligence.� -

Edgar Allan Poe

Methods to detect the effects of alternative splicing and transcription on proteins

Conventionally unconventional: Anecdote of small RNAs discoveries


Public Service Ad sponsored by IQLBioinformatics


Contents

October 2016

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Topics Editorial....

05

03 Genomics Conventionally unconventional: Anecdote of small RNAs discoveries 06

05 Algorithms Common features used to develop miRNA target prediction tools 11

04 Software Methods to detect the effects of alternative splicing and transcription on proteins 09


EDITOR Dr. PRASHANT PANT FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR PRAKASH JHA NABAJIT DAS REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com


PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published quarterly for one year (4 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India


EDITORIAL: Welcoming BiR in its 2nd year

EDITORIAL

Bioinformatics, being one of the best field in terms of future prospect, lacks one thing - a news source. For there are a lot of journals publishing a large number of quality research on a variety of topics such as genome analysis, algorithms, sequence analysis etc., they merely get any notice in the popular press.

Dr. Prashant Pant

Editor

One reason behind this, rather disturbing trend, is that there are very few people who can successfully read a research paper and make a news out of it. Plus, the bioinformatics community has not been yet introduced to research reporting. These factors are common to every relatively new (and rising) discipline such as bioinformatics. Although there are a number of science reporting websites and portals, very few accept entries from their audience, which is expected to have expertise in some or the other field. Bioinformatics Review has been conceptualized to address all these concerns. We will provide an insight into the bioinformatics - as an industry and as a research discipline. We will post new developments in bioinformatics, latest research. We will also accept entries from our audience and if possible, we will also award them. To create an ecosystem of bioinformatics research reporting, we will engage all kind of people involved in bioinformatics - Students, professors, instructors and industries. We will also provide a free job listing service for anyone who can benefit out of it.

Letters and responses: info@bioinformaticsreview.com


GENOMICS

Conventionally unconventional: Anecdote of small RNAs discoveries Image Credit: Stock Photos

“The story of small RNAs started in the year 1990 when Napoli et al. reported the occurrence of variegated or white flowers upon exogenous supplementation of chalcone synthase gene in petunia petals [10].”

Past decade has witnessed an incredible increase in a number of small RNAs. As the name indicates, small RNAs are RNA transcripts of small (approximately 21-24 nucleotide) length [1-8]. These small RNA transcripts regulate various biological processes ranging from a response to biotic/abiotic stress to the determination of tissue specificity [18]. Non-coding RNAs are basically classified based on their biogenesis protocol and mode of function.

P

Presently, two classes of noncoding RNAs are well characterized. First is micro RNA (miRNA) and the second one is small interfering RNA (siRNA). siRNAs are further categorized as trans-acting siRNAs (ta-siRNAs),

natural antisense siRNAs (nat-siRNAs), repeat-associated siRNAs (ra-siRNAs) or heterochromatic siRNAs (hcsiRNAs) and long siRNAs (l-siRNA) [8, 9]. Through the scope of this journal, I have attempted to consolidate the storyline of small RNA discovery. However, for a detailed insight of experimental setup, materials used, initial findings and challenges faced readers are recommended to go through the original research work.

(phenylpropanoid pathway) which leads to the production of anthocyanin in petunia petals. In their experiments which are originally aimed to identify the rate-limiting step of the pathway, authors observed that in transgenic plants (having chalcone synthase overexpression cassette) did not improve upon the intensity of petal color (purple in wide type) rather appeared white or variegated.

The story of small RNAs started in the year 1990 when Napoli et al. reported the occurrence of variegated or white flowers upon exogenous supplementation of chalcone synthase gene in petunia petals [10]. Chalcone synthase is the key enzyme that catalyzes a pathway

The molecular aspect of this unconventional result was deciphered by Fire et al. in the year 1998 [11]. Fire and Mello were working on Caenorhabditis elegans to identify the regulatory activity of unc22 encoding mRNA on ‘muscle twitching’. Unc22 gene in C. elegans codes for a

Bioinformatics Review | 6


myofilament which causes characteristic muscle twitching phenotype under low levels of unc22 protein. From their multiple experiments, authors were able to report that exogenous supply of double-stranded RNA (dsRNA) can lead to silencing of the endogenous gene. This unusual silencing was termed as RNA interference (abbreviated as RNAi). Nowadays RNAi is a potent tool widely utilized in functional genomics and biotechnology for targeted gene silencing. It should be noted that siRNAs (small interfering RNAs) are the effector molecules of RNAi. With advancements in time and technology, it was realized that there are several other categories of small RNAs which causes homologydependent silencing of genes. Micro RNA (miRNA) is one such class. miRNA was first discovered by Lee et al. in the year 1993 [12]. The first discovered miRNA was lin4 which was identified in C. elegans. This small RNA causes inhibition of lin14 mRNA. The second miRNA discovered was let7 which was again discovered in C. elegans [12]. Both of these miRNAs regulate developmental transitions. After these discoveries, several small RNA has been identified from different domains of life using direct cloning methods and characterized to

play significant roles [1-7]. Today, researchers apply next-generation sequencing technologies to perform wide-scale sequencing of these small RNAs [1-7]. In the past half-decade, the bioinformatics-based discovery of small RNAs, particularly miRNAs, has gained immense popularity. This method is preferred by researchers owing to its throughput, fast nature, and low cost. As a fact, miRNAs are evolutionarily conserved among related species and hence share sequence and structural homologies. These conserved properties (conservation across species, the secondary structure of precursor RNA and miRNA/miRNA* position in the precursor) can be readily picked up by using a computational tool. However, identification of a novel (not known so far) small RNAs still a computational challenge. One drawback of computational discovery method is that many non-small RNA sequences are also identified along with small RNAs and secondly, this method has no scope in organisms where the genome is not sequenced. In miRBase, a central repository of miRNAs, a record of astonishingly 28,645 miRNAs is present (miRBase v21, http://www.mirbase.org/). It is now accepted beyond doubts that the expression of many genes is governed

by small RNA species. Henceforth, it is commonsensical to understand the conventionally unconventional mechanisms of genic regulation pertaining to plant growth, development and stress tolerance and for improving useful agronomic traits in crops. Authors: Ankur Pandey, G. Joshi

Bhardwaj,

Ritu

References 1.

Bhardwaj AR, Joshi G, Kukreja B, Malik V, Arora P, Pandey R, Shukla RN, Bankar KG, Katiyar-Agarwal S, Goel S, Jagannath A, Kumar A and Agarwal M. Global insights into high temperature and drought stress regulated genes by RNA-Seq in economically important oilseed crop Brassica juncea. (2015) BMC Plant Biology 15(9), DOI 10.1186/s12870-014-0405-1.

2.

Bhardwaj AR, Joshi G, Pandey R, Goel S, Jagannath A, Kumar A, katiyar-Agarwal S and Agarwal M. A genome-wide perspective of miRNAome in response to high temperature, salinity and drought stresses in Brassica juncea (Czern) L. (2014) PLoS ONE 9(3): e92456.

3.

Bhardwaj AR, Pandey R, Agarwal M, KatiyarAgarwal S (2012) Northern Blot Analysis for Expression Profiling of mRNAs and Small RNAs.In: Jin H. and Gassmann W. (Eds.) RNA Abundance Analysis: Methods and Protocols, Methods in Molecular Biology, vol. 883, DOI 10.1007/978-1-61779-839-9_2, Springer Science+Business Media New York 2012.

4.

Kohli D, Joshi D, Deokar AA, Bhardwaj AR, Agarwal M, Katiyar-Agarwal S, Srinivasan R, Jain PK. Identification and characterization of wilt and salt stress-responsive microRNAs from chickpea by high-throughput sequencing. (2014) PLoS ONE 9(10): e108 851. ISSN: 1932-6203

Bioinformatics Review | 7


5.

Katiyar-Agarwal S, Jin H. Role of small RNAs in host-microbe interactions. (2010) Annual review of phytopathology 48: 225-246.

6.

Lakhotia N, Joshi G, Bhardwaj AR, KatiyarAgarwal S, Agarwal M and Kumar A. Identification and characterization of miRNAome in root, stem, leaf and tuber developmental stages of potato (Solanum tuberosum) by high-throughput sequencing. (2014) BMC Plant Biology. 2014 Jan 7;14:6.

7.

Pandey R, Joshi G, Bhardwaj AR, Agarwal M, KatiyaSr-Agarwal. A Comprehensive study on identification and expression profiling of microRNAs in Triticum aestivum during abiotic stress and development. (2014) PLoS ONE 9(4): e95800.

8.

Sunkar R, Chinnusamy V, Zhu J, Zhu J-K. 2007. Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. (2007) Trends in plant science 12: 301-309.

9.

Lelandais-Brière C, Sorin C, Declerck M, Benslimane A, Crespi M, Hartmann C. Small RNA diversity in plants and its impact in development. (2010) Current genomics 11: 14.

10. Napoli C, Lemieux C, Jorgensen R. Introduction of a chimeric chalcone synthase gene into petunia results in reversible cosuppression of homologous genes in trans. (1990) The Plant Cell Online 2: 279-289. 11. Fire A, Xu SQ, Montgomery MK, Kostas SA, Driver SE and Mello CC. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. (1998) Nature 391, 806-811 12. Lee RC, Feinbaum RL, Ambros V. The elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. (1993) Cell 75: 843-854.

Bioinformatics Review | 8


SOFTWARE

Methods to detect the effects of alternative splicing and transcription on proteins Image Credit: Google Images

“Mall et al., (2016) has developed a new software known as "ProtAnnot" as a plug-in in the IGB (Integrated Genome Browser). ProtAnnot provides a deep insight into how the transcription and alternative splicing affects the protein and its function [2].�

A

lternative splicing and the transcription are the most familiar processes amongst the biological processes. Alternative splicing is a process by which various forms of mRNA are generated from the same gene. A gene consists of various exons and introns and the exons are joined together in different ways [1]. This leads to the production of different kind of proteins from the same gene with different forms of mRNA which are known as "transcript variants", or "splice variants" or "isoforms" (Fig.1) [1].

With the help of web tools, conserved regions in the proteins encoded by different splice variants can be easily identified [8], but mapping these regions back on to the gene is very tedious and may cause various error [2]. Fig.1 Alternative Splicing [1]. The proteins produced after the alternative splicing are affected in different ways. As these transcript variants encode for different proteins having different amino acid sequence and hence produce different functions [2]. BLOCKS [3], TM-HMM [4] and InterPro [5] are the most commonly used databases for the protein annotation detection in human and mouse proteins [6,7].

Addressing the above problems, Mall et al., (2016) has developed a new software known as "ProtAnnot" as a plug-in in the IGB (Integrated Genome Browser) [2]. IGB is a userfriendly genome browser which helps the user to analyze the genomic data and the RNA-seq data [9]. ProtAnnot provides a deep insight into how the transcription and alternative splicing affects the protein and its function [2].

Bioinformatics Review | 9


ProtAnnot provides a fast and efficient way to visualize the impact of alternative transcribed proteins and display linked blocks which represent transcript structures and the thickness of the block represents the translated region [2].

Fig. 2 ProtAnnot visualization of Arabidopsis thaliana gene AT4G36690 encoding splicing regulator U2AF65 [2]. References 1.

Advantages of ProtAnnot: 

it uses a color scheme to show the frame of translation. Exon colors between transcripts can help the user to easily determine whether they encode the same protein or not [2].

it provides an exon summary which helps the user can easily identify different regions such as sequences that are included due to alternative splicing, promoters, or 3'-end processing [2,10].

displays protein annotations next to their corresponding transcripts which help the user to identify how different regions of a gene may encode different functions (Fig.2) [2], thereby linking the alternatively transcribed protein function to the respective gene.

allows saving the search results for later use.

https://www.ncbi.nlm.nih.gov/Class/MLACo urse/Modules/MolBioReview/alternative_s plicing.html

2.

Tarun Mall, John Eckstein, David Norris, Hiral Vora, Nowlan H. Freese and Ann E. Loraine. ProtAnnot : an app for Integrated Genome Browser to display how alternative ve splicing and transcription affect proteins. Bioinformatics, 32(16), 2016, 2499–2501. doi: 10.1093/bioinformatics/btw068

3.

Shmuel Pietrokovski, Jorja G. Henikoff and Steven Henikoff. The Blocks Database— A System for Protein Classification. Nucl. Acids Res. (1996) 24 (1):197-200.doi: 10.1093/nar/24.1.197

4.

http://www.cbs.dtu.dk/services/TMHMM/

5.

Alex Mitchell, Hsin-Yu Chang, Louise Daugherty, Matthew Fraser, Sarah Hunter, Rodrigo Lopez, Craig McAnulla, Conor McMenamin, Gift Nuka, Sebastien Pesseat, Amaia Sangrador-Vegas, Maxim Scheremetjew, Claudia Rato, Siew-Yit Yong, Alex Bateman, Marco Punta, Teresa K. Attwood, Christian J.A. Sigrist, Nicole Redaschi, Catherine Rivoire, Ioannis Xenarios, Daniel Kahn, Dominique Guyot, Peer Bork, Ivica Letunic, Julian Gough, Matt Oates, Daniel Haft, Hongzhan Huang, Darren A. Natale, Cathy H. Wu, Christine Orengo, Ian Sillitoe, Huaiyu Mi, Paul D. Thomas and Robert D. Finn (2015). The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research, Jan 2015; doi: 10.1093/nar/gku1243

6.

Cline,M.S. et al. (2004) The effects of alternative splicing on transmembrane proteins in the mouse genome. Pac. Symp. Biocomput., 17–28.

7.

Loraine,A.E. et al. (2013) RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing. Plant Physiol., 162, 1092–1109.

8.

Rodriguez,J.M. et al. (2015) APPRIS WebServer and WebServices. Nucleic Acids Res., 43, W455–W459.

9.

Nicol,J.W. et al. (2009) The Integrated Genome Browser: free software for distribution and exploration of genomescale datasets. Bioinformatics, 25,2730– 2731.

10. English,A.C. et al. (2010) Prevalence of alternative splicing choices in Arabidopsis thaliana. BMC Plant Biol., 10, 102.

Bioinformatics Review | 10


ALGORITHMS

Common features used to develop miRNA target prediction tools Image Credit: Google Images

“miRNAs are the short endogenous RNAs which are ~22 nucleotides long and originate from the non-coding RNAs [1]. miRNAs are expressed from the long transcripts which are produced in viruses, plants, animals and single-celled eukaryotes [2].”

A

s we have discussed the miRNAs, their formation, and functions (miRNA targets and their functions ) and an overview on some of the available miRNA target prediction tools (miRNAs and their Target Prediction Tools: An Overview ) in previous articles. In this article, I have tried to give basic information about the common features which are involved in the formation of most of the miRNA target prediction algorithms and tools. miRNAs are the short endogenous RNAs which are ~22 nucleotides long and originate from the non-coding RNAs [1]. miRNAs are expressed from the long transcripts which are produced in viruses, plants, animals and single-celled eukaryotes [2].

miRNAs targets are the complementary sequences in mRNA, which are usually present in the 3’UTR and inhibits the translation process or induce the target degradation to prevent the protein synthesis. Computational methods are used to identify that how miRNAs specifically target the mRNAs. There are few common features on which the miRNA target prediction tools are based. These features are used for developing an algorithm for a maximum number of the miRNA tools. These features are described below [3]: 1.Seed match The miRNA region of the first 2 to 8 nucleotides from the 5’-end to the 3’-end is called ‘seed sequence’[4]. In most of the tools, a seed sequence

with a Watson-Crick pairing between miRNA and mRNA is considered as seed match and there is no gap in a perfect seed matching [3]. The seed matching is of several kinds depending on the algorithm, but mostly used seed matching are as follows [5-7]: 1. 6-mer: it includes perfect seed matching for six nucleotides between the miRNA seed and the mRNA. 2. 7-mer-m8: it is a perfect seed match between 2-8 nucleotides of miRNA seed sequence. 3. 7mer-A1: it is a perfect seed match between 2-7 nucleotides of miRNA seed sequence in

Bioinformatics Review | 11


addition to an A across the miRNA first nucleotide. 4. 8-mer: It is a perfect seed match between nucleotides 2-8 of miRNA seed sequence in addition to an A across the miRNA first nucleotide. 2.Conservation It is the reservedness of the sequence across the species. This feature analyzes the regions such as the miRNA, 3’-UTR, 5’-UTR [3]. It has been found that the seed region is more conserved than other regions [5]. A small portion of miRNA which interacts with the target mRNA has conserved pairing which compensates for the mismatched seed and known as ‘3’Compensatory sites’ [8]. Conservation analysis helps to predict whether a predicted miRNA target is functional or not [3]. 3. Free Energy It is Gibb’s Free Energy which is calculated to predict the stability of a structure. It is calculated as the change in free energy ( delta- G). In this case, when miRNA binds to the target mRNA and results in a stable structure, then it is considered as the most likely target of that miRNA. The reactions with more negative deltaG are less reactive, therefore have

more stability. The hybridization of miRNA with its target mRNA provide information about the high and low free energy regions and delta-G predicts the strength of bonding between the miRNA and its target mRNA [9]. 4. Site Accessibility Site Accessibility tells about that how easily a miRNA can locate its target mRNA and get hybridized with it. The hybridization of miRNA with its target mRNA involves two steps: 1. miRNA binds to a short accessible region of mRNA. 2. Once it completes binding to the target, the mRNA unfolds [10]. Hence, to find the most probable target of the miRNA, the amount of energy required to make a site accessible is evaluated. There are few other features which are used in most of the target prediction tools algorithms, they are described below: 1. Target-site abundance: It determines the number of sites occurring in the 3’-UTR [11]. 2. Local AU content: It is the concentration of A and U nucleotides

which flank in the corresponding seed region [6,10]. 3. GU Wobble seed match: It calculates the chances of a G pairing with a U instead of C [12]. 4. 3’-Compensatory pairing: It is the pairing region (12-17 nts) in which the base pairs match with miRNA nucleotides. 5. Seed pairing stability: It is the free energy change calculated for a predicted duplex [11]. 6. Position Contribution: It determines the position of a target sequence within the mRNA [13]. These are the common features used to develop the miRNA target prediction algorithms and tools. In my upcoming article, I will try to explain some new features which have been developed recently to predict miRNA targets more efficiently. References 1.

Bartel D. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 004;116:281–97.

2.

Bing Liu, Jiuyong Li, and Murray J. Cairns. Identifying miRNAs, targets and functions. Briefings in Bioinformatics. page 1-19; doi:10.1093/bib/bbs075.

3.

Sarah M. Peterson, Jeffrey A. Thompson, Melanie L. Ufkin, Pradeep Sathyanarayana, Lucy Liaw, and Clare Bates Congdon. Common features of miRNA prediction

Bioinformatics Review | 12


tools. Frontiers in Genetics. doi : 10.3389 /fgene .2014.00023. 4.

Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20. doi: 10.1016/j.cell.2004.12.035.

predictions. Genet. 37, 495–500. doi: 10.1038/ng1536. 8.

9.

Friedman, R. C., Farh, K. K., Burge, C. B., and Bartel, D. P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105. doi: 10.1101/gr.082701.108. Yue, D., Liu, H., and Huang, Y. (2009). Survey of computational algorithms for MicroRNA target prediction. Genomics 10, 478–492. doi: 10.2174/138920209789208219.

5.

Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787–798. doi: 1016/S0092-8674(03)01018-3

6.

Brennecke, J., Stark, A., Russell, R. B., and Cohen, S. M. (2005). Principles of microRNA-target recognition. PLoS Biol. 3:e85. doi: 10.1371/journal.pbio.0030085.

10. Long, D., Lee, R., Williams, P., Chan, C. Y., Ambros, V., and Ding, Y. (2007). Potent effect of target structure on microRNA function. Nat. Struct. Mol. Biol. 14, 287–294. doi: 10.1038/nsmb1226

7.

Krek, A., Grun, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., et al. (2005). Combinatorial microRNA target

11. Garcia, D. M., Baek, D., Shin, C., Bell, G. W., Grimson, A., and Bartel, D. P. (2011).Weak seed-pairing stability and high target-site

abundance decrease the proficiency of lsy6 and other microRNAs. Nat. Struct. Mol. Biol. 18, 1139–1146. doi:10.1038/nsmb.2115 12. Doench, J. G., and Sharp, P. A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev. 18, 504–511. doi: 10.1101/gad.1184404 13. Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and Bartel,D.P.(2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91–105. doi: 10.1016/j.molcel.2007.06.017

Bioinformatics Review | 13


Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com

Bioinformatics Review | 14


Bioinformatics Review | 15


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.