JULY 2017 VOL 3 ISSUE7
“Scientists have become the bearers of the torch of discovery in our quest for knowledge.� -
Stephen Hawking
Role of Information Theory, Chaos Theory, and Linear Algebra and Statistics in the development of alignment-free sequence analysis
Structural analysis of FOXL2 gene and its role in kidney failure
Public Service Ad sponsored by IQLBioinformatics
Contents
July 2017
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Topics Editorial....
05
03 Algorithms Role of Information Theory, Chaos Theory, and Linear Algebra and Statistics in the development of alignment-free sequence analysis 07
05
Structural Bioinformatics
Structural analysis of FOXL2 gene and its role in kidney failure 16
04
Cloud Computing
SparkBLAST: Introduction
13
FOUNDER TARIQ ABDULLAH EDITORIAL EXECUTIVE EDITOR TARIQ ABDULLAH FOUNDING EDITOR MUNIBA FAIZA SECTION EDITORS FOZAIL AHMAD ALTAF ABDUL KALAM MANISH KUMAR MISHRA SANJAY KUMAR NABAJIT DAS REPRINTS AND PERMISSIONS You must have permission before reproducing any material from Bioinformatics Review. Send E-mail requests to info@bioinformaticsreview.com. Please include contact detail in your message. BACK ISSUE Bioinformatics Review back issues can be downloaded in digital format from bioinformaticsreview.com at $5 per issue. Back issue in print format cost $2 for India delivery and $11 for international delivery, subject to availability. Pre-payment is required CONTACT PHONE +91. 991 1942-428 / 852 7572-667 MAIL Editorial: 101 FF Main Road Zakir Nagar, Okhla New Delhi IN 110025 STAFF ADDRESS To contact any of the Bioinformatics Review staff member, simply format the address as firstname@bioinformaticsreview.com PUBLICATION INFORMATION Volume 1, Number 1, Bioinformatics Reviewâ„¢ is published monthly for one year (12 issues) by Social and Educational Welfare Association (SEWA)trust (Registered under Trust Act 1882). Copyright 2015 Sewa Trust. All rights reserved. Bioinformatics Review is a trademark of Idea Quotient Labs and used under license by SEWA trust. Published in India
EDITORIAL
Bioinformatics Review (BiR): Bridging Between The Two Worlds Informatics and Biology are two sciences which are as different from each other as possible. One runs on the core concept of variation and another on strict reasoning. But still, these two have combined in a most natural way under the realm of “Bioinformatics”. For a biologist today it’s difficult to imagine a world without all biological databases and further no branch to decipher the huge enigma that it brings. Bioinformatics Review (BiR) journal is a platform to discover the latest happenings in this melting pot of two varied fields.
Dr. Roopam Sharma
Honorary Editor
The era of “omics” kick-started with the drafting of Human Genome Project (HGP) in 2003. Since then, a number of technological advancements especially, NGS has been generating mind-boggling data for the knowledge banks. Latest inventions like single-cell transcriptomics or metagenomics of most unusual habitats show how the evolution of technological advancements is directly resulting in breakthroughs in biological sciences. Among various areas of biology which has benefited from these advancements is Pathology. In fact, deciphering the molecular and genetic basis of diseases in humans was the guiding force behind human genome sequencing Project. Bioinformatics has led to an impressive increase in recognition of possible pathogenic factors in varied systems, so much so that new techniques are being devised to increase the speed to actually test these factors in the wet lab. If we consider computationally, smaller but ever-changing genomes and transcriptomes of these pathogens, make them a much suitable candidate to test out many hypotheses for Bioinformatics studies. Effector Bioinformatics involves building custom pipelines for distinct species based on characteristics of effectors and size of the genome involved. These can be based on
Letters and responses: info@bioinformaticsreview.com
EDITORIAL
Homology or feature extraction or both, e.g. discovery of RXLR motifs in Oomycete effectors allowed many more effectors to be identified. This collaboration of two sciences for plant pathology has led to the development of many general use platforms like Broad-Fungal Genome Initiative, EuPathDB, PhytoPath and so on, but there is much need of developing specified resources like PHIbase for specific areas like effector biology. The use of machinelearning techniques like artificial neural network approach (which is actually based on biological neural networks) really shows how the two branches are so distinct yet so intertwined. All in all, it’s a brave new world where artificial communication is not only stimulating but also helping us understand the communication (between host and pathogen) going within the realm of life. In this issue, BiR focusses on reviews related to some of the very basic techniques which have been used in computational biology and its applications in various biological studies. We look forward to continued support from our readers and contributors. For suggestions and feedback, do write to us at info@bioinformaticsreview.com
PHYLOGENETICS
Role of Information Theory, Chaos Theory, and Linear Algebra and Statistics in the development of alignment-free sequence analysis Image Credit: Google Images
“The limitations that led to the development of algorithm for alignment-free sequence analysis are 1) incompleteness in approach to sequence divergence and also reflects conservation of contiguity between homologous segments, 2) unfeasibility in searching large databases as a result of escalation in computational load being considered as a power function, 3) heuristic solutions make it harder to assess the statistical relevance of the resulting scores which compromises the establishment of confidence intervals for homology.�
equence alignment is customary to not only find similar regions among a pair of sequences but also to study the structural, functional and evolutionary relationship between organisms. Many tools have been discovered to achieve the goal of alignment of a pair of sequences, separately for nucleotide sequence and amino acid sequence, BLOSSUM & PAM [1] are a few to name. There are many methods of alignment such as pairwise alignment. Multiple
S
sequence alignment, on the other hand, is used for aligning 3 or more sequences. It is considered as the first step in phylogenetic studies. Progressive alignment is the base for developing various multiple alignment tools. For the validation of alignment, benchmark datasets are used. One tool that has been extensively used for this purpose is BaliBASE [2]. It is a database of refined multiple sequence alignments consisting of high quality documented alignments to identify
the strong and weak points of the numerous alignment programs IRMBase [3], SABMark [4], OXBENCH [5] are a few to name. The limitations that led to the development of algorithm for alignment-free sequence analysis are 1) incompleteness in approach to sequence divergence and also reflects conservation of contiguity between homologous segments, 2) unfeasibility in searching large databases as a result of escalation in
Bioinformatics Review | 7
computational load being considered as a power function, 3) heuristic solutions make it harder to assess the statistical relevance of the resulting scores which compromises the establishment of confidence intervals for homology. Various physics methods, mathematical modeling techniques such as Information Theory, Chaos Theory, Linear Algebra, and Statistics have been used to achieve the aim [5,6,7,8,9,10,11]. As the sequence data is increasing exponentially it is unfeasible to use alignment-based methods for distinctly related sequences. Implementation of the idea of alignment-free methods previously done worldwide in various fields such as phylogenomics, NGS, epigenomics, SNP discovery etc., have been discussed briefly in this paper. Diogo Pratas et al (2015) [12] has described an alignment-free computational method, based on blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequence. Cheon Xin Chan (2013) discussed k-mer method [13]. Shea N Gardner and Barry G. Hall (2013) [14] have explained about a software called kSNP v2 for alignment-free SNP and phylogenetics. MAUVE, Cinteny, Apollo, Mizbee are a few tools for visualization purposes to name [15].
Some other well-known websites or software developed for alignmentfree methods are kmacs [16], Spaced words [17], and rush [18]. Linear Algebra & Statistics For the sequences to be considered as objects or vectors, mathematical techniques have been used [10]. By scaling up the vectors N number of sequence combinations could be derived. Euclidean distance has been used to achieve complete independence from the contiguity of conserved segments and find the difference between sequences. The same method has also been used to calculate the correlation among the sequences. To calculate the covariance between sequences, another metric method called Mahalanobis Distance method is used which indicates the extent to which prefix and suffix of a word are equal between a point and a distribution. Statistical significance of the sequence comparison is assessed by Chi-square test [10]. There are many method frames based on kmer/ word frequency, Feature frequency profile (FFP), Composition Vector, Return time distribution (RTD) are to name a few [18,19,20,21]. FFP method works by calculating the count of each possible k-mer in sequence [18, 19]. A k-mer is a unit of
information, in this case, it is the nucleotides or amino acids present in the sequence. Each k-mer count in each sequence is then normalized by dividing it by total of all k-mer’s count in that sequence, therefore converting each sequence into its feature frequency profile. Jensen-Shannon divergence (JSD) is a method of measuring the similarity between two probability distributions, otherwise called as information radius (iRad) or total divergence to the average [22]. The equation defining JSD is: JSD(P||Q) = 1/2D(Q||M)
1/2D(P||M)
+
where, M = ½(P+Q). This divergence method is used to calculate the pairwise distance between two sequences. The resulting distance matrix can then be used to construct a phylogenetic tree using clustering algorithms. In the composition of the vector method, the frequency of the appearance of each possible k-mer in the sequence is calculated. Markov model is used to reduce the influence of random neutral mutations to highlight the role of selective evolution. Composition vector (CV) of a given sequence is then formed by normalizing the frequencies and
Bioinformatics Review | 8
putting them in a fixed order. The pairwise distance of CVs of given sequences is then computed using cosine distance. In general, cosine distance is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine angle between them. distance = (similarity)/ π
cos-1
probabilities of different outcomes in sequence comparison. The equation is: H(WXL)
=
-Σi=1K
PXL,i
log2(pXL,i)
In simple words, the longer the length of the sequence is, the more complex the object is considered. Kolmogorov's complexity is hence used to reconstruct the given sequence [9].
Where, The above equation defines the cosine distance when the vector elements are positive. The resultant matrix is used to construct a tree using clustering algorithms [23,24]. Instead of calculating the count of k-mers as done by previously discussed methods, the RTD method computes the time required for the reappearance of kmers. The values are summarized using two statistical parameters mean and standard deviation. The pairwise distance is calculated using Euclidean distance and the last step is the construction of tree using clustering algorithms. Information Theory The information is a vital part of all forms of communication which is measured in bits [5]. The equation framed for quantifying the capability of transmission of data over a channel has been used to calculate
Global and local characterization of DNA, RNA, proteins, estimation of genome entropy to motif and region classification are some of the existing applications of Information Theory in building alignment-free methods. Some of the principles used to develop based on information theory in developing software are here under. a) Base-base correlation: converts the genome sequence into 16dimensional numeric vectors using the following equation: Tij(K) =
ΣK
Pij(l).log2
where, Pi and Pj denote probabilities of the bases i and j in the genome. Pij(l) indicates the probabilities of bases i and j at the distance l in the genome.
parameter K indicates the maximum distance between the bases i and j. The variation in the values of 16 parameters reflects variation in the genome content and length [25,26,27]. b) Information correlation: partial information correlation: This method employs the base correlation property of the DNA sequence. IC and PIC were calculated using the below formulas: ICi = -2ΣiPilog2Pi + ΣijPij(l)log2Pij(l) PICij(l) = [Pij(l) - PiPj (l)]2 V ICl/PICij(l) where l ∈ {l0,l0+1,...l0+n}
The final vector obtained defines the range of distance between bases. Euclidean distance is used to calculate the pairwise distance between sequences and the distance matrix is used to construct a tree using clustering algorithms [28]. c) Context modeling compress: This method was described by Pinho et al., (2013) [29]. It is a method which is used in DNA sequence analysis. In this method, the next symbol predictions, of one or more statistical models are combined to yield a prediction that is based on events that are recorded in the past. The algorithmic information derived from each symbol prediction can be used
Bioinformatics Review | 9
to compute algorithmic information profiles with a time proportional to the length of the sequence. Universal Sequence Map (Chaos Theory) The proposition of iterative maps or iterative functions for the representation of DNA is considered similar to that of one of the principles of Chaos Theory, Fractal [30, 31]. The coordinate position of each unit of a sequence of nucleotide or amino acid that defines the trajectories in continuous space encodes for both its identity and its context. Mathematically, the chaos game is described by an iterated function system. An IFS is a set of pairs of linear equations, each pair of the form: x = ax+by+e, y = cx+dy+f
Each pair of equations gives the formula for computing the new value of x and y coordinates. This was the period iterative maps introduced by HJ Jefferey [6]. This representation defines a unit square where each corner corresponds to one of the four possible nucleotides [6,7]. Due to the lack of scalability with regard to the number of possible unique units and inability to represent succession schemes, Markov models have been
used for the identification of discrete spaces to represent sequences as cross-tabulated conditional probabilities –Markov Transition tables [7]. To measure the homology and to align sequences Bayesian theory has been used. Therefore the use of iterative maps has been found to be both essential and effective not only for representation of sequences but also for identifying scale independent stochastic models of the succession schemes [8]. A number of web pages such as GitHub [32] are available to demonstrate how to encode and compare arbitrary symbolic sequences. MapReduce is also being utilized for the same purpose [33]. MapReduce coding pattern is most widely used as it finds natural distribution via map functions to process vectorized components and reduction of aggregate intermediate results. Conclusion Multiple sequence alignment being heuristic in nature reflects methodology incompleteness in approach to sequence divergence and also reflects conservation of contiguity between homologous segments. The percentage of unfeasibility in searching large databases as a result of the escalation in computational load
increases when using heuristic solutions. Assessment of statistical scores which compromises the establishment of confidence intervals for homology becomes harder. Alignment-based methods require substitution or evolutionary models and are expensive as they rely on dynamic programming to find the alignment that has an optimal score. On the other hand, alignment-free methods do not assume continuity of homologous regions. It is computationally inexpensive and memory intensive, less dependent on substitution or evolutionary models. Unlike alignment-based methods these are less sensitive to stochastic sequence variation, recombination, horizontal gene transfer etc., which is time efficient. Indexing word counts or positions in fractal space are the alternatives to dynamic programming used in alignment-based methods. An algorithm satisfying the need of an alignment-free program for sequence analysis would be a good solution to overcome all the limitations of the already developed programs in use. Discussion and future developments The awareness developed by the existence and importance of alignment-free methods would help
Bioinformatics Review | 10
the scientific community to develop more efficient tools to overcome the limitations faced while handling alignment-based methods. The current trend is the use of MapReduce to analyze the sequence data. Proper understanding and implementation of alignment-free methods could be used more efficiently in the area of metagenomics, for phylogeny reconstruction, protein classification and finally in decoding the sequence information eventually helping in studying disease patterns. The need of accurate alignments without compromising their specificity and sensitivity is increased thereby, increasing the demand for new algorithms. References 1.
Robert C Edgar1 and Serafim Batzoglou2 Multiple Sequence Alignment
2.
D. Thompson, Frederic Plewniak and Oliver Poch (1999). BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs.
3.
4.
5.
Amarendran R Subramanian, Jan WeyerMenkhoff, Michael Kaufmann and Burkhard MorgensternEmail author BMC Bioinformatics20056:66 DOI: 10.1186/14712105-6-66 Ivo Van Walle Ignace Lasters Lode Wyns Bioinformatics (2005) 21 (7): 1267-1268. DOI: https://doi.org/10.1093/bioinformatics/bth4 93 Raghava, G. P. S., Searle, S. M. J., Audley, P. C., Barber, J. D., & Barton, G. J. (2003). OXBench:
a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics, 4, 47. 6.
Shannon, C.E. (1948) A mathematical theory of The Bell System Technical J., 27, 379–423, 623–656.
7.
Jeffrey HJ: Chaos game representation of gene structure. Nucleic Acid Res. 1990, 18: 2163–2170.
8.
9.
Goldman N.: Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. Nucleic Acids Res. 1993, 21: 2487–2491. Almeida, J.S. & Vinga, S. BMC Bioinformatics (2002) 3: 6. doi:10.1186/1471-2105-3-6 Universal Sequence Maps of arbitrary discrete sequences
10. Gr¨unwald, P. and P. Vit´anyi (2005). Shannon information and Kolmogorov complexity. 11. Strang,G. (1988) Linear Algebra and Its Applications. Thomson, London. 12. Schott,J.R. (1997) Matrix Statistics. Wiley, New York
Analysis
for
13. Pratas, D., Silva, R. M., Pinho, A. J., & Ferreira, P. J. S. G. (2015). An alignment-free method to find and visualise rearrangements between pairs of DNA sequences. Scientific Reports, 5, 10203. 14. Chan, C. X., & Ragan, M. A. (2013). Nextgeneration phylogenomics. Biology Direct, 8, 3. 15. Gardner, S. N., & Hall, B. G. (2013). When Whole-Genome Alignments Just Won’t Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes. PLoS ONE, 8(12), e81760. 16. Leimeister, C.-A., & Morgenstern, B. (2014). kmacs: the k-mismatch average common substring approach to alignment-free
sequence comparison. 30(14), 2000–2008
Bioinformatics,
17. Chris-Andre Leimeister, Marcus Boden, Sebastian Horwege, Sebastian Lindner, Burkhard Morgenstern; Fast alignment-free sequence comparison using spaced-word frequencies. 18. Bernhard Haubold, Linda Krause, Thomas Horn, Peter Pfaffelhuber; An alignment-free test for recombination. Bioinformatics 2013; 29 (24): 3121-3127. doi: 10.1093/bioinformatics/btt550 19. Sims, G. E., Jun, S.-R., Wu, G. A., & Kim, S.-H. (2009). Whole-genome phylogeny of mammals: Evolutionary information in genic and nongenic regions. Proceedings of the National Academy of Sciences of the United States of America, 106(40), 17077–17082. 20. Sims, G. E., & Kim, S.-H. (2011). Wholegenome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proceedings of the National Academy of Sciences of the United States of America, 108(20), 8329–8334. 21. Graham L. Giller (2012). "The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity". Giller Investments Research Notes (20121024/1). 22. Alignment-free distance measure based on return time distribution for sequence analysis: Applications to clustering, molecular phylogeny and subtyping Pandurang Kolekara, , Mohan Kaleb, , Urmila KulkarniKalea 23. http://www.abarimpublications.com/ChaosTheoryIntroduction. html#.WJbzMlN97IU 24. Apostolico, A; Denas, O; Dress, A (September 2010). "Efficient tools for comparative substring analysis.". Journal of Biotechnology. 149 (3): 120–126. doi:10.1016/j.jbiotec.2010.05.006
Bioinformatics Review | 11
25. Apostolico, A; Denas, O (March 2008). "Fast algorithms for computing sequence distances by exhaustive substring composition.". Algorithms for Molecular Biology.
Avenue, Boston, Massachusetts 02115, USA., Xiao SunRelated information2 State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, PR China
31. http://fractalfoundation.org/resources/what -is-chaos-theory/
26. . Cheng, J., Zeng, X., Ren, G., & Liu, Z. (2013). CGAP: a new comprehensive platform for the comparative analysis of chloroplast genomes. BMC Bioinformatics, 14, 95.
28. A novel feature-based method for whole genome phylogenetic analysis without alignment: Application to HEV genotyping and subtyping Zhihua Liua, b, c, , , , Jihong Mengd, Xiao Suna.
33. Almeida, J. S., Grßneberg, A., Maass, W., & Vinga, S. (2012). Fractal MapReduce decomposition of sequence alignment. Algorithms for Molecular Biology : AMB, 7, 1
27. Coronavirus phylogeny based on Base-Base Correlation Zhi-Hua LiuRelated information1 State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, PR China; Harvard Medical School, Dana-Farber Cancer Institute, Department of Biostatistics and Computational Biology, 44 Binney St., Boston, Massachusetts 02115, USA; Harvard School of Public Health, 677 Huntington
32. http://usm.github.com/
29. Genome-based phylogeny of dsDNA viruses by a novel alignment-free method. Gao Y1, Luo L 30. Pinho, A. J., Garcia, S. P., Pratas, D., & Ferreira, P. J. S. G. (2013). DNA Sequences at a Glance. PLoS ONE, 8(11), e79922.
Bioinformatics Review | 12
CLOUD COMPUTING
SparkBLAST: Introduction
Image Credit: Google Images
“Cloud computing has emerged as a powerful tool to process these dynamic tasks over the last decade. Recently, a framework called Apache Spark emerged as a promising framework to implement highly scalable parallel applications [5,6].�
he basic local alignment search tool (BLAST) [1,2] is known for its speed and results, which is also a primary step in sequence analysis. The everincreasing demand for processing huge amount of genomic data has led to the development of new scalable and highly efficient computational tools/algorithms. For example, MapReduce is the most widely accepted framework which supports design patterns representing general reusable solutions to some problems including biological assembly [3] and is highly efficient to handle large datasets running over hundreds to thousands of processing nodes [4]. But the implementation frameworks
T
of MapReduce (such as Hadoop) limits its capability to process smaller data. Cloud computing has emerged as a powerful tool to process these dynamic tasks over the last decade. Recently, a framework called Apache Spark emerged as a promising framework to implement highly scalable parallel applications [5,6]. A new parallelization of BLAST is developed by de Castro et al., 2017 employs cloud computing and Apache Spark as the coordination framework [7]. The SparkBLAST reduces the number of local input/output operations resulting in highly efficient and superior performance [7].
Working: It requires two input files: a) a target database consisting of bacterial genomic sequences, and b) a query file consisting of a set of query genome sequences to be compared with the sequences of the target's database. The basic concept of the working of SparkBLAST is that as soon as it takes the input, it replicates the entire input database on every computing node, then it split the query file into fragments which are later evenly distributed to every node, thus, rendering each node with the local deployment of the BLAST application, a copy of the target database, and a fragment of the query sequences [7]. After that,
Bioinformatics Review | 13
the whole computation is partitioned into tasks by the (Spark's scheduler), which is assigned to the computing nodes depending on the data locality [8]. The replicated target database and the fragment of the query file are loaded into the memory for the execution of each task. The SparkBLAST then install the NCBI BLAST2 locally at each node with the help of Spark Pipe to execute multiple parallels and distributed tasks in the cluster. SparkBLAST uses YARN [9] as the resource manager to execute the tasks as it can be uniformly used by the Spark and Hadoop. The processing of SparkBLAST is divided into three categories [7]: 1. Pre-processing In this first stage, the query file is evenly partitioned into splits which are then distributed among the computing nodes. 2. Main-processing After the data transferred to each processing node, the tasks are then scheduled on each computing node to be executed according to the data locality. At this stage, the NCBI BLAST2 has been installed locally at each node. As a computing core finishes a task, it starts working in a loop for all the tasks assigned to it
and keeps doing until the available core executes all tasks of the assigned job.
operations. Concurrency and Computation: Practice and Experience, 28(8), 2503-2527. 5.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. HotCloud, 10(10-10), 95.
6.
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., ... & Stoica, I. (2012, April). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2-2). USENIX Association.
7.
de Castro, M. R., dos Santos Tostes, C., Dávila, A. M., Senger, H., & da Silva, F. A. (2017). SparkBLAST: scalable BLAST processing using in-memory operations. BMC Bioinformatics, 18(1), 318.
8.
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010, April). Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems (pp. 265-278). ACM.
9.
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., ... & Saha, B. (2013, October). Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (p. 5). ACM.
3. Post-processing During the second stage of processing, each individual task produces small output files, which are merged into a single output file at this stage. Several other tools combine cloud and Hadoop technologies such as CloudBLAST [10], BioDoop [11], and Crossbow [12]. The SparkBLAST was found to outperform the CloudBLAST which is also a cloud platform [7,10]. For further details SparkBLAST, click here.
about
References 1.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410.
2.
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics, 10(1), 421.
3.
O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781.
4.
Senger, H., Gil‐Costa, V., Arantes, L., Marcondes, C. A., Marín, M., Sato, L. M., & Silva, F. A. (2016). BSP cost and scalability analysis for MapReduce
10. Matsunaga, A., Tsugawa, M., & Fortes, J. (2008, December). Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications. In eScience, 2008. eScience'08. IEEE Fourth International Conference on (pp. 222-229). IEEE. 11. Leo, S., Santoni, F., & Zanetti, G. (2009, September). Biodoop: bioinformatics on hadoop. In Parallel Processing Workshops, 2009. ICPPW'09. International Conference on (pp. 415-422). IEEE.
Bioinformatics Review | 14
12. Langmead, B., Hansen, K. D., & Leek, J. T. (2010). Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome biology, 11(8), R83.
Bioinformatics Review | 15
STRUCTURAL BIOINFORMATICS
Structural analysis of FOXL2 gene and its role in kidney failure Image Credit: Stock Photos
“RNA interference was discovered by Andrew Fire and Craig C. Mello in 1998 and jointly received the Nobel Prize for physiology and medicine in 2006 for the discovery of “RNA interference” [6]. RNA interference or RNAi is a posttranscriptional gene silencing (PTGS) mechanism in eukaryotes whose role is important in various stages of life or growth conditions.”
F
OXL2 gene codes for a protein belonging to the fork-headwinged helix transcription factor. The FOXL2 gene is present on chromosome 3 and is a single exon gene of 2.7 kb coding for a protein comprising of 376 amino acids [1]. The protein has a DNA-binding domain, 110 amino acids long with a polyalanine tract. This polyalanine tract is restrained strictly at 14 residues. The reason for this conservation may be attributed to the optimal transcription factor activity occurring at that length. In addition, it has been observed that the FOXL2 protein is highly conserved across human, goat,
mouse, and pufferfish [2]. However, it has been observed that the Cterminal end is more conserved as compared to the N-terminal end [3]. Structure of the FOXL2 protein The FOXL2 protein has a unique DNA binding domain which is called as forkhead (FHD) [1]. This FHD is conserved among the other FOX proteins. It has a helix-turn-helix motif with 3 α-helices (H1, H2, and H3) and two large loops or “wings” (W1 and W2) [4]. These are the general components of the FHD of all FOX proteins. In the case of FoxL2 protein, there is an additional αhelix, H4, between H2 and H3.
Location of the FOXL2 protein The FOXL2 protein is found in the developing eyelids and in fetal and adult ovaries of human, goat, and mouse. It is localized in the nucleus which can be attributed to the fact that it is a transcription factor. In the case of eyelids, it is present in the primordial mesenchyma. Additionally, unique to mice, the FOXL2 gene is also expressed in the pituitary Rathke’s pouch [5] and is said to be playing a role in the organogenesis of the pituitary gland. In the ovaries, FOXL2 is expressed even before folliculogenesis and is present until the adult stages. It is localized in the somatic region with Bioinformatics Review | 16
strong expression being observed in the follicular cells and a diffused expression being observed in the stromal cells [6]. During early ovarian development, FOXL2 is responsible for the determination/differentiation of the somatic cells and in later stages it is involved in the maintenance of ovarian function. Consequences of FOXL2 mutations Mutations in the FOXL2 gene are shown to cause Blepharophimosis Ptosis Epicanthus Inversus Syndrome (BPES). There are two types of BPES - Type 1 is characterized by eyelid malformation premature ovarian failure and Type 2 is characterized by only premature ovarian failure (POF) [7]. An increase (30% of mutations) in the number of residues in the polyalanine tract leads to BPES [8]. The mutation of the gene in BPES causes atrophy or hypertrophy of the superior levator muscles [9] and other periocular muscles which in turn lead to strabismus, a common observation among patients with BPES [10]. These arise from the primordial mesenchyma and the surrounding regions [11]. Sexual dimorphic expression of FOXL2
In females, Y chromosome (which contains the SRY gene) is absent and in such a condition FOXL2 is expressed leading to the development of granulosa cells [12]. In chicken, FOXL2 is expressed in ZW female gonads during days 5 to 8 whereas it is not the case in males during the same stages. In the case of turtles, where the sex of the organism is determined by the environmental temperature, the FOXL2 expression is higher in female promoting temperatures than in male promoting temperatures [13]. Therefore, clear sexual dimorphism can be noted in the expression of the FOXL2 gene. FOXL2 targets and interactions FOXL2 is involved in various cellular pathways like apoptosis, inflammation, antioxidation, etc., and also in the production of progesterone and estrogen, the significance of which will be discussed in detail in further sections. Functions of FOXL2 Role in cholesterol metabolism It is involved in the expression of Gonadotropin-releasing hormone (GnRH) receptor. The GnRH receptor activating sequence (GRAS) has binding sites for 3 transcriptional
factors namely FOXL2, Smad3, and AP1 [14]. Additionally, it has been shown that FOXL2 directly interacts with the promoter of ÎąGSU, activating its expression (glycoprotein hormone Îą subunit which is common in Leutenizing hormone, Follicle stimulating hormone, and testicular hormone) [15].FOXL2 also directly interacts with steroidogenesis acute response (StAR) gene through its alanine-rich C-terminal region, to induce the inhibition of its transcriptional activity [16]. StAR protein enables the translocation of cholesterol from the cytoplasm to inside the mitochondria where pregnenolone is synthesized. Pregnenolone is the precursor for steroid hormones. Another function that FOXL2 plays in steroid hormone synthesis is that it is involved in the upregulation of CYP19, an aromatase involved in the conversion of androgens to estrogens in granulosa cells. CYP19 and FOXL2 have been observed existing together in goat ovaries [17]. Also, other factors involved in steroid biosynthesis like peroxisome proliferator-activated receptor gamma coactivator 1 alpha (PPARGC1A) and NR5A2 are modulated by FOXL2. Role in anti-oxidation
Bioinformatics Review | 17
It is crucial to regulating the oxidative stress in the ovaries as ROS generation is high during ovulation [18-20]. PPARGC1A is not only involved in cholesterol metabolism, but it is also involved in the detoxification of reactive oxygen species (ROS). The expression of other components involved in the detoxification of ROS, namely immediate early response 3 (IER3) and Manganese superoxide dismutase (MnSD) are also upregulated by FOXL2 overexpression [21]. Therefore, FOXL2 also plays an important role in the detoxification of ROS. Also, upregulation of FOXL2 leads to cells pausing in the G1 phase where the damage induced to DNA by oxidative stress is repaired [18-20]. Role in Apoptosis FOXL2 overexpression has been shown to induce the expression of BCL2-related protein A1 (BCL2A1) and tumor necrosis factor α – induced protein 3 (TNFAIP3) [21]. FOXL2 is also known to induce apoptosis using DEAD-box RNA Helicase DP103 as a coactivator [22, 23]. Role in Inflammation It is involved in the regulation of PTGS2/COX2. COX2 is one of the two isoforms of cyclooxygenases which is
involved in the synthesis of prostaglandins. COX2 also catalyzes the conversion of arachidonic acid to PGH2 which in turn is converted to other prostaglandins by synthases [24].
development [1]. Upon interaction with ERα, it prevents the binding of ERα to PTGS2 promoter. Not much is known about the impact of the interaction of FOXL2 to ERα on another target of ERα [35].
From the above-mentioned involvement of FOXL2 in different cellular pathways, it can be concluded that the fate of the cells in the ovary depends on the interaction of FOXL2 to the right set of proteins. As mentioned earlier, FOXL2 interacts with DP103 (also called DDX20 and Gemin-3), Steroidogenic factor 1 (SF1/NR5A1), estrogen receptors α and β (ESR1 and ESR2) and Smad3 [25, 26, 14, 27, 22, 23, 28-31].
FOXL2- A common observation in ovarian and non-ovarian type tumors
When FOXL2 binds to SF1, it upregulates CYP19 aromatase expression and inhibits CYP17 expression which is mediated by unbound SF1. Therefore, FOXL2 plays a major role in balancing the levels of androgen (CYP17) and estrogen (CYP17). It has been observed that mutations in the SF1 lead to male-to-female sex reversal in humans [32] and FOXL2 missense mutation leads to female-to-male sex reversal in goats [33]. The interaction of FOXL2 with SF1 also upregulates Mc2r [34], which codes for an ACTH receptor required for steroidogenesis and adrenal gland
As discussed earlier, FOXL2 is generally found in the normal ovarian stroma. In addition, it is also found in ovarian neoplasms and nonovarian neoplasms like pancreatic mucinous cystic neoplasms (PMCs), hepatobiliary cystadenomas (HBCs), and mixed epithelial and stromal tumor of the kidney (MEST). When the nucleus of cells of PMCs, HBCs and MEST was tested for FOXL2, 100% positivity was observed in PMCs and HBCs and 90% positivity was observed in MEST samples [36]. But this nuclear reactivity may not necessarily mean that there is a mutation in the FOXL2 gene which in turn has led to the tumor since FOXL2 was found in sex chord stromal tumors which lacked a mutation [37]. Sex hormoneneoplasm
cause
for
the
Detailed analysis showed that these neoplasms are characterized by ovarian type stroma and they arise
Bioinformatics Review | 18
due to some history of hormonal therapy, obesity or heavy alcohol use. In the case of obesity, the insulin concentration is high which results in increased synthesis of androgen from ovaries and adrenals. The high levels of androgen precursors are converted to estrogen due to increased concentration of adipose tissue aromatases. Additionally, in obesity cases, there is an inhibition in the production of sex hormone binding globulin which makes the free estrogen levels to remain high [38]. In the case of alcohol abuse, the high intake of alcohol has been linked to increased levels of estrogen and androgen. Therefore, men who consume excessive alcohol develop gynecomastia, change in body hair patterns and testicular atrophy [39]. The last cause associated is hormone therapy where there is the administration of the exogenous hormone. This results in cancer of the breast, endometrial lining, and other estrogen-dependent cancers. Therefore, the common cause in all three cases is the increase in hormone levels. One of the major observations regarding FOXL2 is that it serves as a good immune histochemical marker to identify ovarian type stroma in PMC, HBC, and MEST. Although the
role it plays in these non-ovarian type tumors is yet to be established. [40]. FOXL2 in renal failure In this paper, we will be concentrating on the effects on FOXL2 on renal failure and hence we will be speaking about the mixed epithelial and stromal tumor of the kidney (MEST). These tumors are rare and the case reports are isolated. They are called by the following alternate names leiomyomatous renal hamartomas, congenital mesoblastic nephroma in an adult, cystic hamartoma of the renal pelvis, solitary multilocular cysts of the kidney, multilocular renal cyst with Mullerian-like stroma, and adult metanephric stromal tumor [16]. It is more common among women and occurs in people of age ranging from 19-78. It is characterized by conspicuous abdominal or flank mass, flank pain and/or hematuria. The tumor is solid and cystic, tan to yellow in color with a clear shape from 2-24 cm in size [41]. It occurs in the renal hilum but does not infiltrate into the renal parenchyma. In rare cases, necrosis and calcification may be observed [42]. Cellular composition
The tumor is biphasic i,e., it has two types of cells - mesenchymal and epithelial. The mesenchymal component has spindle cells with varying degrees of differentiation, ranging from smooth muscle to fibroblastic to myofibroblastic cells interspersed with collagen. It resembles cystic nephroma or fibromatosis [29]. The epithelial component is found interspersed between the mesenchymal components. It varies from round and regular tubules to more complex tubule papillary structures with or without cystic dilatation. These are lined by cuboidal to the flattened epithelium (hobnail epithelium). Metastases are uncommon in cases of MEST. Until recently, malignant MESTs were not described. The malignancy can arise in either of the components. The malignant MESTs are characterized by increased cellularity, cytologic atypia, round to ovoid vacuolated nuclei with prominent nucleoli, and high mitotic rate. Rhabdoid, rhabdomyosarcomatous, and chondrosarcomatous components can also be observed sometimes [43, 44, 45]. Tumor markers The epithelial components are positive for both low- and highmolecular- weight cytokeratin and
Bioinformatics Review | 19
Ulex europaeus [42, 46]. The mesenchymal components are positive for estrogen and progesterone receptors in most cases [47]. But the cells are negative for the receptor expression itself.
feature of MEST and it is used as a major immunohistochemical marker to differentiate from other tumors [50].
Common cause of MEST
The FOXL2 protein sequence was downloaded from UniProt (Entry P58012) and subjected to BLAST to find if there were any homologs (Supplementary Table 1). Then, FOXL2 protein (accession number AAY21822.1) was modeled using PHYRE2 and I-TASSER. PHYRE2 predicts the structure using comparative modeling and I-TASSER does the structure prediction using ab initio modeling. Once the models were completed, they were subjected to further tests to check their quality. PROSESS and RAMPAGE were used to check the parameters of the predicted structure.
An increase in estrogen and progesterone levels has been observed in MEST cases. Therefore, the increase in these steroid hormones is assumed to be the cause for MEST [48]. It has been already mentioned that FOXL2 gene plays an essential role in the production of estrogen and progesterone. A single case was identified which showed a translocation t(1;9) which is the cytogenetic cause of the MEST [49]. Differentiating MEST from other renal tumors Unlike multicystic renal cell carcinoma, MEST lacks aggregates of clear cells [50]. Renal synovial sarcoma and MEST are almost similar except that MEST presents with subepithelial condensation of stroma (ovarian stroma) [51, 52, 53]. The characteristic ovarian stroma of MEST is not observed in rhabdoid tumor and the cystic and tubular hobnail epithelium is also not seen in the rhabdoid tumor [25]. Estrogen receptor positivity is a distinctive
Material and methods
Since the crystal structure of the FOXL2 protein is not available, therefore, I tried to predict the model using in-silico tools and the structure was validated using PROSESS and RAMPAGE.described as follows. I-Tasser model I-TASSER predicted 5 different models and the best model was selected based on the one which had the highest C-Score. Figure 1 shows the selected model [40, 54, 34].
Results BLAST results According to the BLAST hit, almost all proteins are FOXL2 from various other organisms (Table 1). Only three of them are from other families of proteins namely human chorionic gonadotropin (hCG) and two unnamed proteins from Oncorhynchus mykiss. Structure modeling
Fig. 1 Structure of FOXL2 predicted by I-Tasser. PROSESS result for the predicted model When the I-TASSER model was subjected to PROSESS, the predicted model showed very poor quality. The non-covalent bond quality, torsion angle quality, and the covalent bond
Bioinformatics Review | 20
quality had scores of 2.5, 0.5 and 6.5 respectively. The non-covalent bond quality and torsion angle quality were the poorest features of among the three. This was because most of the parameters which contribute to these characteristics showed very high standard deviations. Even the flexibility of the predicted model was poor. As a result, the overall quality of the model itself was only 2.5 on a scale of 0 to 10. RAMPAGE result for the predicted model The Ramachandran plot was generated for the predicted model using RAMPAGE [55]. The plot is shown in Figure 2. According to the plot, the number of residues that should be in the favored region is only 58% of the total and the expected percentage is 98%. Similarly, the number of residues which should be present in the allowed region should be only 2% whereas this plot has 29.7% residues in the allowed region. The number of residues in the outlier region is 12.3%.
Fig. 3 The structure of FOXL2 predicted by PHYRE2. PROSESS result for the model
Fig. 2 RAMPAGE result for I-Tasser predicted model. PHYRE2 model The model predicted by PHYRE2 is shown in Figure 3 [56]. The structure was predicted using Interleukin Enhancer-Binding Factor 1. 25% of the sequence showed 100% confidence and the rest of the regions were deemed ‘disordered’.
The model predicted by PHYRE2 was of moderate quality and not as poor as that of the I-TASSER model. The non-covalent bond quality, torsion angle quality, and covalent bond quality had scores of 5.5, 4.5 and 7.5 respectively. Similar to the I-TASSER model, non-covalent bond quality and torsion angle quality had the lowest scores among the three features. Only a few parameters of the two characteristics showed moderate standard deviations. The flexibility of the structure was also within the acceptable range. However, the overall quality score was still only 4.5 which is not very stable since the parameters were not up to satisfactory levels. RAMPAGE result for the Model The Ramachandran Plot was generated using RAMPAGE [55]. Figure 4 shows the plot for the PHYRE2 model. The number of residues, in this case, is 90.6% and the number of residues in the allowed region is 8.3%. The number of residues in the outlier region is only 1%.
Bioinformatics Review | 21
hormones estrogen and progesterone. This could probably mean that the gene for FOXL2 is somehow related to the increased concentration of these hormones which would lead to MEST and which in turn lead to renal failure. However, this has been the conclusion derived from reviewing the literature and detailed experimental studies are required to know the exact mechanism by which this is brought about. Fig. 4 RAMPAGE result for PHYRE2 predicted model Conclusion In conclusion, the structures predicted by both the software were of moderate to poor quality. Even though the structure predicted by PHYRE2 showed better characteristics and scores compared to I-TASSER, it is still not accurate enough. As a result, the model of FOXL2 has to be predicted using experimental techniques such as NMR or X-ray crystallography for a better understanding of the protein functioning. Discussion FOXL2 has a wide variety of biological applications but its role in kidney failure in not fully understood yet. The only common link between FOXL2 and MEST are the sex
Another obscure area is the structure of FOXL2 since not many attempts have been made to predict the structure of the protein. The in silico methods used in this study to predict the structure were not fruitful and resulted in unstable models. Therefore, experimental techniques are the solution to finding the accurate structure of FOXL2. Once the structure of the protein is known, one can get a better understanding of the protein which would consequently open a whole new perspective to the way it functions in so many different pathways. References 1.
2.
Cocquet et al. (2002) Evolution and expression of FOXL2 gene. J Med Genet, 39(12), 916–922, doi: 1136/jmg.39.12.916. Cocquet et al. (2003) Structure, evolution and expression of FOXL2 transcriptional
unit. Cytogenet Genome Res, 101(3-4), 206–211, doi:10.1159/000074338. 3.
Baron et al. (2005) Foxl2 gene and the development of the ovary: a story about goat, mouse, fish and woman. Nutr. Dev., 45, 377–382, doi: 10.1051/rnd:2005028.
4.
Kaestner, K.H. et al. (2000) Unified nomenclature for the winged helix/forkhead transcription factors. Genes Dev., 14, 142–146, doi: 10.1101/gad.14.2.142.
5.
Treier, M. (1998) Multistep signaling requirements for pituitary organogenesis in vivo. Genes Dev., 12(11), 1691–1704, doi: 10.1101/gad.12.11.1691.
6.
Pannetier, M et al. (2003) Expression studies of the PIS-regulated genes suggest different mechanisms of sex determination within mammals. Cytogenet Genome Res, 101(3-4), 199–205, doi:10.1159/000074337.
7.
Zlotogora J. (1983) The blepharophimosis, ptosis, and epicanthus inversus syndrome: delineation of two types. Am J hum Genet, 35(5), 1020–1027.
8.
De Baere E et al. (2003) FOXL2 and BPES: mutational hotspots, phenotypic variability, and revision of the genotype-phenotype correlation. Am J hum Genet, 72(2), 478– 487, doi: 10.1086/346118.
9.
Dollfus H et al. (2003) Sporadic and familial blepharophimosisptosis-epicanthus inversus syndrome: FOXL2 mutation screen and MRI study of the superior levator eyelid muscle. Clin Genet , 63(2), 117– 120.
10. Barishak, Y. R. (1992) Embryology of the eye and its adnexae. Dev Ophthalmol, 24, 1–142 . 11. Oley, C et al. (1988) Blepharophimosis, ptosis, epicanthus inversus syndrome (BPES syndrome). J med Genet, 25(1), 47–51 . 12. Hersmus, R et al. (2008) FOXL2 and SOX9 as parameters of female and male gonadal
Bioinformatics Review | 22
differentiation in patients with various forms of disorders of sex development (DSD). J. Pathol., 215, 31–38, doi: 10.1002/path.2335. 13. Loffler, K. A et al. (2003) Etiology of ovarian failure in blepharophimosis-ptosisepicanthusinversus syndrome (BPES): FOXL2 is a conserved, early-acting gene in vertebrate ovarian development. Endocrinology, 144(7), 3237–3243, doi:10.1210/en.2002-0095. 14. Ellsworth, B. S et al. (2003) The gonadotropin releasing hormone (GnRH) receptor activating sequence (GRAS) is a composite regulatory element that interacts with multiple classes of transcription factors including Smads, AP-1 and a forkhead DNA binding protein. Cell. Endocrinol., 206(1-2), 93–111. 15. Ellsworth, B.S et al. (2006) FoxL2 in the Pituitary: Molecular, Genetic, and Developmental Analysis. Endocrinol., 20(11), 2796-2805, doi: 10.1210/me.20050303. 16. Pisarska, M.D et al. (2004) Forkhead L2 is expressed in the ovary and represses the promoter activity of the Steroidogenic acute regulatory gene. Endocrinology, 145(7), 3424-3433, doi:10.1210/en.20031141. 17. Pannetier, M et al. (2006) FoxL2 activates P450 aromatase gene transcription: towards a better characterization of the early steps of mammalian ovarian development. Mol. Endocrinol., 36(3), 399413, doi:10.1677/jme.1.01947. 18. Benayoun, B.A. (2009) Positive and negative feedback regulates the transcription factor FOXL2 in response to cell stress: evidence for a regulatory imbalance induced by disease-causing mutations. Mol. Genet., 18, 632–644, doi: 10.1093/hmg/ddn389. 19. Benayoun, B.A et al. (2009) FOXL2: at the crossroads of female sex determination and
ovarian function. Adv Exp Med Biol., 665, 207-26. 20. Benayoun, B.A et al. (2009) The forkhead factor FOXL2: a novel tumor suppressor? Biophys. Acta, 1805, 1–5, doi: 10.1016/j.bbcan.2009.09.002. 21. Batista, F et al. (2007) Potential targets of FOXL2, a transcription factor involved in craniofacial and follicular development, identified by transcriptomics. Nat. Acad. Sci. USA, 104, 3330-3335, doi: 10.1073/pnas.0611326104. 22. Lee, K et al. (2005) Transcriptional factor FOXL2 interacts with DP103 and induces apoptosis. Biophys. Res. Commun., 336(3), 876–881, doi: 10.1016/j.bbrc.2005.08.184. 23. Lee, M. B et al. (2005) The DEAD-box protein DP103 (Ddx20 or Gemin-3) represses orphan nuclear receptor activity via SUMO modification. Cell. Biol., 25(5), 1879–1890, doi:10.1128/MCB.25.5.18791890.2005. 24. Smith, W.L., Dewitt, D.L. (1996) Prostaglandin endoperoxide H synthases-1 and -2. Adv. , 62, 167-215. 25. Blount, A.L et al. (2001) FoxL2 and Smad3 coordinately regulate follistatin gene transcription. Biol. Chem., 284(12), 7631– 7645, doi: 10.1074/jbc.M806676200. 26. Corpuz, P.S et al. (2010) FoxL2 is required for activin induction of the mouse and human follicle-stimulating hormone {beta}subunit Mol. Endocrinol., 24(5), 1037–1051, doi: 10.1210/me.2009-0425. 27. Kim, S.Y et al. (2009) Foxl2, a forkhead transcription factor, modulates nonclassical activity of the estrogen receptor-alpha. Endocrinology, 150(11), 5085–5093, doi: 10.1210/en.2009-0313. 28. Park, M et al. (2010) FOXL2 interacts with steroidogenic factor-1 (SF-1) and represses SF-1-induced CYP17 transcription in granulosa cells. Endocrinol., 24(5), 1024– 1036, doi:10.1210/me.2009-0375.
29. Turbiner, J et al. (2007) Cystic nephroma and mixed epithelial and stromal tumor of kidney: a detailed clinicopathologic analysis of 34 cases and proposal for renal epithelial and stromal tumor (REST) as a unifying term. Am J Surg Pathol., 31(4), 489–500, doi:10.1097/PAS.0b013e31802bdd56. 30. Wang, D.S et al. (2009) Foxl2 up-regulates aromatase gene transcription in a femalespecific manner by binding to the promoter as well as interacting with ad4 binding protein/steroidogenic factor 1. Endocrinol., 21(3), 712–725, doi:10.1210/me.20060248. 31. Yang, W.H et al. (2010) Synergistic activation of the Mc2r promoter by FOXL2 and NR5A1 in mice. Reprod., 83(5), 842– 851, doi:10.1095/biolreprod.110.085621. 32. Lin, L., et al. (2007) Heterozygous missense mutations in steroidogenic factor 1 (SF1/Ad4BP, NR5A1) are associated with 46, XY disorders of sex development with normal adrenal function. J.Clin. Endocrinol. Metab. 92(3), 991–999, doi: 1210/jc.20061672. 33. Pailhoux, E et al. (2001) A 11.7-kb deletion triggers intersexuality and polledness in goats. Genet., 29(4), 453–458, doi:10.1038/ng769. 34. Zhang, Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40, doi: 10.1186/14712105-9-40. 35. Caburet, S., et al. (2011) The transcription factor FOXL2: At the crossroads of ovarian physiology and pathology. Molecular and Cellular Endocrinology, 356(1-2), 55-64, doi: 10.1016/j.mce.2011.06.019. 36. Westerhoff et al. (2014) The expression of FOXL2 in pancreatic, hepatobiliary, and renal tumors with ovarian-type stroma. Human Pathology, 45(5), 1010–1014, doi: 10.1016/j.humpath.2013.12.015. 37. Al-Agha, O. M et al. (2011) FOXL2 is a sensitive and specific marker for sex cord–
Bioinformatics Review | 23
stromal tumors of the Am J Surg Pathol, 35, 484-94, doi: 10.1097/PAS.0b013e31820a406c. 38. Renehan, A. G et al. (2006) Obesity and cancer risk: the role of the insulin-IGF axis. Trends Endocrinol Metab, 17(8), 328-36, doi:10.1016/j.tem.2006.08.006. 39. Sarkar, D. K et al. (2001) Role of estrogen in alcohol promotion of breast cancer and prolactinomas. Alcohol Clin Exp Res, 25(5), 230S-236S, doi: 10.1111/j.15300277.2001.tb02401.x. 40. Yang, J et al. (2015) The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12(1), 7-8, doi: 10.1038/nmeth.3213. 41. Pawade, J et al. (1993) Cystic hamartoma of the renal pelvis. Am J Surg Pathol., 17(11), 1169–1175 42. Durham, J. R et al. (1993) Mesoblastic nephroma of adulthood: report of three cases. Am J Surg Pathol., 17(10), 1029– 1038. 43. Jung, S. J et al. (2008) Mixed epithelial and stromal tumor of kidney with malignant transformation: report of two cases and review of literature. Hum , 39(3), 463–468, doi: 10.1016/j.humpath.2007.08.008. 44. Nakachi, Y et al. (1997) Nucleotide compositional constraints on genomes generate alanine-, glycine-, and prolinerich structures in transcription factors. Mol Biol Evol, 14(10), 1042–1049. 45. Svec, A et al. (2001) Malignant mixed epithelial and stromal tumor of the kidney. Virchows Arch., 439(5), 700–702. 46. Truong, L. D et al. (1998) Adult mesoblastic nephroma: expansion of the morphologic spectrum and review of literature. Am J Surg Pathol., 22(7), 827–839
47. Pierson, C. R et al. (2001) Mixed epithelial and stromal tumor of the kidney lacks the genetic alterations of cellular congenital mesoblastic Hum Pathol., 32(5), 513–520, doi:10.1053/hupa.2001.24323. 48. Michal M, Syrucek M. (1998) Benign mixed epithelial and stromal tumor of the kidney. Pathol Res Pract., 194, 445–558. 49. Comperat E et al. (2005) Benign mixed epithelial and stromal tumor of the kidney (MEST) with cytogenetic alteration. Pathol Res Pract., 200, 865–867, doi: 10.1016/j.prp.2004.05.004. 50. Mohanty, S.K, Parwani, A.V. (2009) Mixed Epithelial and Stromal Tumors of the Kidney. Arch Pathol Lab Med, 133, 14831486. 51. Argani, P et al. (1998) Detection of the SYTSSX chimeric RNA of synovial sarcoma in paraffin embedded tissue and its application in problematic cases. Mod Pathol, 11, 65–71. 52. Eble, J. N et al. (2004) Pathology and Genetics of Tumours of the Urinary System and Male Genital Organs. Lyons, France: IARC Press. 53. Koyama, S et al. (2001) Primary synovial sarcoma of the kidney: report of a case confirmed by molecular detection of SYTSSX2 fusion transcripts. Pathol Int., 51(5), 385–391, doi:10.1046/j.14401827.2001.01203.x.
56. Kelley, L. A et al. (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols, 10(6), 845-858, doi: 10.1038/nprot.2015.053. 57. Biegel, J. A et al. (1996) Narrowing the critical region for a rhabdoid tumor locus in 22q11. Genes Chromosomes Cancer.,16(2), 94–105, doi: 10.1002/(SICI)10982264(199606)16:2<94::AIDGCC3>3.0.CO;2-Y. 58. Chida, D et al. (2011) The role of glucocorticoids in pregnancy, parturition, lactation, and nurturing in melanocortin receptor 2-deficient mice. Endocrinology, 152(4), 1652–1660, doi: 10.1210/en.20100935. 59. Cocquet J et al. (2003) Of compositional biases and polyalanine runs in man. Genetics, 165(3), 1613–1617. 60. Nakagawa, T et al. (2004) Malignant mixed epithelial and stromal tumours of the kidney: a report of the first two cases with a fatal clinical outcome. , 44(3), 302–304, doi: 10.1111/j.1365-2559.2004.01782.x. 61. Roy, A (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, 5(4), 725-738, doi: 10.1038/nprot.2010.5. 62. Uhlenhaut, N.H et al. (2009) Somatic sex reprogramming of adult ovaries to testes by FOXL2 ablation. Cell, 139(6), 1130–1142, doi: 10.1016/j.cell.2009.11.021.
54. Yang J, Zhang Z (2015) I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Research, 43(W1), W174-W181, doi: 10.1093/nar/gkv342. 55. Lovell, S.C et al. (2002) Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins: Structure, Function & Genetics., 50(3), 437-450, doi:10.1002/prot.10286.
Bioinformatics Review | 24
Subscribe to Bioinformatics Review newsletter to get the latest post in your mailbox and never miss out on any of your favorite topics. Log on to https://www.bioinformaticsreview.com
Bioinformatics Review | 25
Bioinformatics Review | 26