14 minute read
Department of Chemical Engineering, Heart, Lung, Blood, and Vascular Medicine Institute Division of Pulmonary, Allergy and Critical Care Medicine
Identifying potential mammalian heme/CO sensors through a bioinformatic analysis of the Per-Arnt-Sim (PAS) protein domain
JonPaul Plesniaka, Matthew Dentb, Jason Roseb,c, Jesus Tejerob,c
aChemical Engineering, b Heart, Lung, Blood, and Vascular Medicine Institute, c Division of Pulmonary, Allergy and Critical Care Medicine
JonPaul Plesniak JonPaul Plesniak is a junior chemical engineering major and bioengineering minor at the University of Pittsburgh. He has a passion for innovation and discovery and wants to pursue these interests with a career in product design and development.
Matt Dent is a postdoctoral researcher in Dr. Mark Gladwin’s lab in the Vascular Medicine Institute at the University of Pittsburgh. He earned his Ph.D. in chemistry at the University of WisconsinMadison in 2019, where he used biochemical techniques to characterize heme-mediated carbon monoxide Matthew Dent sensing in microbial transcription factors. Matt’s current research interests include characterizing novel CO signaling pathways in humans. Dr. Rose received his BSE in Biomedical Engineering at the University of Michigan in 2006, his MD at Wayne State University in 2010, and MBA from Carnegie Mellon University in 2017. He completed his internal medicine residency at Duke University Medical Center and pulmonary and critical care Jason Rose, M.D. medicine fellowship at the University of Pittsburgh. He joined the University of Pittsburgh faculty in 2016 and is currently an Assistant Professor of Medicine. His research interests focus on discovering and developing new human therapeutics, particularly in medical countermeasures and inhalational respiratory diseases.
Dr. Tejero received his degree in Organic Chemistry at the University of Zaragoza, Spain in 1998 and earned his PhD in Biochemistry at the University of Zaragoza in 2004. He completed postdoctoral work at the Cleveland Clinic and the University of Pittsburgh, and moved Jesus Tejero, Ph.D. to a faculty position at Pittsburgh in 2011. Dr. the University of Tejero is currently an Associate Professor in the Department of Pulmonary, Allergy and Critical Care Medicine.
Significance Statement
The role of carbon monoxide (CO) in human physiology is not well understood. While high concentrations CO are highly toxic, recent evidence points toward the importance of endogenously produced CO in cellular signaling. Few molecular targets for CO have been characterized in eukaryotes. This study has identified potential mammalian heme/CO sensors for further analysis and characterization.
Category: Computational Research
Keywords: carbon monoxide, PAS domain, heme, phylogenetic analysis
Abstract
Carbon monoxide (CO) is a gaseous molecule commonly found in the environment and also produced endogenously by organisms as a product of heme degradation. A growing body of evidence suggests that CO functions as an important signaling molecule that exhibits cytoprotective and anti-inflammatory properties; however, the precise molecular targets of CO in mammals and other higher organisms are not well characterized. The Per-Arnt-Sim (PAS) domain is a ubiquitous protein fold, found in organisms ranging from bacteria to mammals, that often facilitates sensing of environmental stimuli. Several CO-sensing PAS domains that utilize the iron-containing cofactor heme have been characterized in bacteria; however, few such domains have been studied in higher organisms. The goal of this study is to identify potential heme-based sensors in mammals through bioinformatic analysis of PAS domain proteins. Both structural and sequence alignment tools were implemented to identify phylogenetic relationships between well-studied PAS proteins and uncharacterized PAS proteins. By establishing phylogenetic relationships between PAS proteins of known function and uncharacterized proteins, we identified the mammalian PAS proteins Pask, Kcnh1-8, Pde8A, Hif/Epas1, Sim, Pasd1, and NcoA as potential heme-based sensors.
1. Introduction
CO is produced in the atmosphere as a byproduct of incomplete combustion and is often referred to as the “silent killer” due to its toxicity and lack of color, smell, or taste. Exposure to high levels of CO negatively impacts oxygen transport and mitochondrial respiration, ultimately leading to hypoxia, tissue damage, and death [1]. However, CO is also produced endogenously as a byproduct of heme degradation, and low concentrations of endogenously produced CO play a significant role in maintaining homeostasis in humans [2]. CO has been implicated in regulating numerous physiological processes including vasodilation, mitochondrial function, ion-channel activity, and inflammation pathways [3]. At low concentrations, CO exhibits cytoprotective properties in acute lung injury animal models, demonstrating promise for CO inhalation for anti-inflammatory and therapeutic use [4]. Despite emerging roles in signaling and therapeutics, the specific molecular targets of CO have not been well characterized.
The PAS domain fold is a sequence of ~100 amino acid residues that functions as an input module able to sense gases like oxygen or CO, redox potential, light, and other environmental stimuli. PAS domain secondary structure consists of a series of antiparallel β sheets and α helices in a β-β-α-α-α-β-β-β arrangement (Figure 1) [5]. While the secondary structure is well-conserved, less than 20% average pairwise sequence identity exists between amino acids of PAS proteins. Given this low sequence similarity, sequence-based alignments and bioinformatic analyses prove difficult. Organisms from all kingdoms of life contain proteins bearing a PAS domain, and families of PAS proteins include serine/threonine kinases, chemoreceptors and photoreceptors, circadian clock proteins, voltage-gated ion channels, and regulators of hypoxia response [6]. Many of these proteins have multiple PAS domains, called PAS repeats, in which each PAS domain serves a different role, often environmental sensing or mediation of protein-protein interactions [7].
Heme-based sensors have been identified in many prokaryotic proteins. The direct oxygen sensor protein from E. coli (DosP) has been identified as a heme-based O2-sensing phosphodiesterase [8]. Likewise, the FixL protein has been identified in bacteria as an oxygen sensor that regulates expression of nitrogen fixation genes in response to hypoxia [9]. The regulator of CO metabolism (RcoM) protein also employs heme to sense CO and regulates expression of genes required for CO metabolism in a variety of microorganisms [10]. Recent studies have broadened the scope of heme-based sensors to include mammalian proteins [6]. A growing body of evidence points to PAS domains involved in regulation of the circadian rhythm of mammals as heme-based sensors: NPAS2 and its homolog CLOCK have been shown to bind heme and may function as heme-based CO sensors [11]. Thus, a feedback mechanism potentially exists that is directly dependent on cellular CO concentration [12].
Figure 1: (top) Secondary structure map of the PAS domain. (bottom) Crystal structure of human CLOCK protein PAS domain (6QPJ) [11]. Structure visualized with the PyMOL Molecular Graphics System, Version 2.0 (Schrödinger, LLC).
Although several PAS domains have been characterized as heme-based sensors, we hypothesize that there may be heretofore unidentified heme-based sensors from the PAS family due to the large sequence diversity in this family. This study utilizes bioinformatics analyses to identify putative heme-based sensors in mammals that bear a PAS domain. Using pairwise sequence alignments, we identified regions of conservation between PAS proteins of known and unknown function and established phylogenetic relationships. To address the problem of low sequence identity between PAS domains, we validated these relationships using pairwise structural alignment of structurally characterized PAS domains. To identify putative heme-binding PAS domains that may act as CO sensors, we established phylogenetic connections between uncharacterized PAS proteins and proteins characterized as heme-based CO sensors.
2. Methods
2.1 Establishing a consensus sequence for PAS domains
645 PAS domain-containing proteins were identified by cross-referencing online databases (UniProt, InterPro and ProSite). Annotated PAS domain sequences were ~70 amino acid residues in length. To validate PAS domain sequence annotations, we reviewed available crystallographic data for PAS domain proteins and correlated sequences and secondary structural elements. The appropriate sequence length that fully encompasses the β-β-α-α-α-β-β-β secondary structure characteristic of PAS domains ranged from 100-150 amino acid residues. As a result, we manually expanded the 645 annotated PAS protein sequences to include all residues in the secondary PAS fold. We conducted this annotation process through visualization of crystallographic 3-D structures using the PyMOL Molecular Graphics System, Version 2.0 (Schrödinger, LLC).
2.2 Sequence Alignment
Given the lack of sequence identity between PAS domains, we subdivided the full group of 645 PAS domains into protein sequences from nonbacterial and bacterial organisms to facilitate sequence alignment. We further divided each of these groups based on preliminary sequence alignments, giving rise to 4 sub-groups in total. We used the ClustalW multiple sequence alignment algorithm to align groups of sequences [13]. We identified regions of conservation corresponding to secondary structural elements within each sub-group alignment and used these conserved regions to manually curate a combined alignment with all 645 proteins. From there, we generated a representative list of 219 proteins for easier visual analysis by removing homologs with identical amino acid sequences. Finally, we created a phylogenetic tree from this alignment using the MEGA-X software. MEGA-X implemented a maximum likelihood statistical method, a Jones-Taylor-Thornton substitution model, and uniform rates among sites [14].
2.3 Structural Alignment
To validate pairwise sequence alignment, we carried out a pairwise structural analysis using available crystallographic data of 63 PAS domains (156 proteins from the representative list were not structurally characterized). We employed the DALI server to superimpose pairs of protein structures and measure distances between α carbons in pairwise alignments [15]. The DALI server generated a pairwise distance matrix, and Z-scores from pairwise alignments were used to generate a phylogenetic tree using the MEGA-X software [14].
2.4 Data Processing
We carried out a bootstrap analysis for the sequence-based tree to determine the statistical significance of each node [16]. The MEGA-X software randomly generated 1000 trees and output values correspond to the original tree nodes [14]. Bootstrap values indicate the percent of the 1000 randomly generated trees in which the arrangement from that node occurred. A higher bootstrap value corresponds with a stronger statistical support for the clade. Additionally, we qualitatively inspected sequence-based and structure-based phylogenetic trees to locate potential candidate proteins positioned near experimentally characterized heme-based sensors.
3. Results
To identify relationships between known heme-based sensors and uncharacterized PAS domain proteins, we compared PAS domain sequences of all known PAS domain-containing proteins. Figure 2 displays a phylogenetic tree based on pairwise sequence analysis, and proteins in the tree cluster into identifiable regions. Mammalian proteins cluster into two distinct groups that correspond to one of two PAS repeat domains. Some noteworthy exceptions to this trend are the potassium voltage channels Kcnh1-8, which cluster near the light sensors from plants, and Pask and Pde8A, which are positioned near the known bacterial heme-based sensors RcoM, DosP, and FixL. Bootstrap values are lowest for the large cluster mostly comprised of bacterial PAS domain proteins (bootstrap score < 50). This lack of distinct clustering reflects large sequence divergence amongst PAS proteins from a diverse set of microbial organisms. Regions of higher bootstrap values (bootstrap score > 50) around the mammalian PAS repeats, the light sensors, and some of the outer nodes of the bacterial regions are characteristic of higher sequence similarity. These clusters of high sequence similarity may reflect functional similarities, particularly in the mammalian repeat 1 cluster, where four PAS domain proteins have been identified that bind heme.
Figure 2. Representative sequence-based phylogenetic tree for PAS domains (219 proteins). Characterized heme-based sensors are highlighted in light red, and putative heme-based sensors are highlighted in blue. Bootstrap values for individual nodes are highlighted in purple (<50), cyan (50-75), or yellow (75-99).
Pairwise alignments of PAS domain protein structures reveal clustering similar to that observed in the phylogenetic tree derived from pairwise sequence alignments. Specifically, the structure-based phylogenetic tree displays clustering of two mammalian PAS repeats and a group comprised primarily of light sensors (Figure 3). Consistent
with the sequence-based tree, several bacterial PAS proteins emerge without distinct phylogenetic clustering in the structure-based tree. The similarities in PAS domain clustering observed between structure- and sequence-based phylogenetic trees serve to validate our approach of identifying putative mammalian heme sensors from clusters of proteins with sequence homology and structural similarities.
Figure 3. Structure-based phylogenetic tree for PAS domains (63 proteins). Characterized heme-based sensors are highlighted in light red, and functional clusters are outlined.
4. Discussion
Analysis of sequence- and structure-based phylogenetic trees reveals the evolutionary origins and functional clustering of mammalian PAS domains. The mammalian proteins Pask, Kcnh1-8, and Pde8A do not cluster with other mammalian proteins and instead cluster near plantbased light sensors and bacterial PAS domains in the sequence-based tree. That these mammalian proteins cluster with lower organisms suggests that Pask, Kcnh1-8, and Pde8A may be the most ancient mammalian PAS domains. Most of the other mammalian PAS domains cluster into one of two groups of PAS repeats in both sequence- and structure-based trees. Many uncharacterized proteins in mammalian PAS repeat 1 fall near known heme-based sensors of the circadian rhythm, suggesting that some of these uncharacterized proteins may bind heme. Pde8A and Pask also lie near the known heme-based, bacterial sensors DosP, FixL, and RcoM, suggesting that these mammalian proteins may also interact with heme.
Based on their positioning near known hemebinding PAS proteins, the following mammalian proteins may act as heme-based sensors: Pask, Kcnh1-8, Pde8A, Hif/Epas1, Sim, Pasd1, and NcoA. Excitingly, a study recently demonstrated that one of the Kcnh proteins, Kcnh7, exhibits heme-binding characteristics, supporting our predictive approach [17].
5. Conclusion
A growing body of research suggests that endogenously-produced CO functions as signaling molecule with cytoprotective and anti-inflammatory properties in mammals. Despite these well-documented effects, the molecular mechanisms by which CO acts as a signaling molecule are poorly understood. In this study, we used sequence and structural alignment techniques to establish phylogenetic relationships among proteins bearing a PAS domain. Several PAS proteins, found in organisms ranging from bacteria to humans, have been characterized as hemoprotein-based gas sensors. We identified a number of poorly-characterized PAS proteins that exhibit sequence and structural similarities to known heme-binding gas sensors: Pask, Kcnh1-8, Pde8A, Hif/Epas1, Sim, Pasd1, and NcoA. In our ongoing research, we are expressing and purifying these soluble PAS domains to homogeneity and probing heme binding properties using biochemical and spectroscopic techniques. For those proteins that exhibit heme binding properties, we will probe CO-dependent changes in protein function in order to identify bona fide hemoprotein-based CO sensors. By identifying specific molecular targets of CO, we may be able to better characterize mammalian CO signaling pathways relevant to human health and disease.
6. Acknowledgments
This project was funded by the University of Pittsburgh Swanson School of Engineering and the Office of the Provost. Project opportunity was provided by the Summer Undergraduate Research Internship (SURI). The authors thank Dr. Mark Gladwin for providing his insight into the biological relevance of putative CO sensors and for use of his lab space.
7. References
[1] Rose, J. J., Wang, L., Xu, Q., McTiernan, C. F., Shiva, S., Tejero, J., and Gladwin, M. T. (2017) Carbon Monoxide Poisoning: Pathogenesis, Management, and Future Directions of Therapy. Am. J. Respir. Crit. Care Med. 195, 596-606 [2] Wu, L., and Wang, R. (2005) Carbon Monoxide: Endogenous Production, Physiological Functions, and Pharmacological Applications. Pharmacol. Rev. 57, 585630
[3] Ryter, S. W., Ma, K. C., and Choi, A. M. K. (2018) Carbon monoxide in lung cell physiology and disease. American Journal of Physiology-Cell Physiology 314, C211-C227 [4] Motterlini, R., and Otterbein, L. E. (2010) The therapeutic potential of carbon monoxide. Nature Reviews Drug Discovery 9, 728-743 [5] Henry, J. T., and Crosson, S. (2011) Ligand-binding PAS domains in a genomic, cellular, and structural context. Annu. Rev. Microbiol. 65, 261-286 [6] McIntosh, B. E., Hogenesch, J. B., and Bradfield, C. A. (2010) Mammalian Per-Arnt-Sim Proteins in Environmental Adaptation. Annu. Rev. Physiol. 72, 625-645
[7] Ryter, S. W., Alam, J., and Choi, A. M. K. (2006) Heme Oxygenase-1/Carbon Monoxide: From Basic Science to Therapeutic Applications. Physiol. Rev. 86, 583-650 [8] Shimizu, T. (2013) The Heme-Based Oxygen-Sensor Phosphodiesterase Ec DOS (DosP): Structure-Function Relationships. Biosensors (Basel) 3, 211-237 [9] Gong, W., Hao, B., Mansy, S. S., Gonzalez, G., Gilles-Gonzalez, M. A., and Chan, M. K. (1998) Structure of a biological oxygen sensor: a new mechanism for heme-driven signal transduction. Proc Natl Acad Sci U S A 95, 15177-15182 [10] Kerby, R. L., Youn, H., and Roberts, G. P. (2008) RcoM: A New Single-Component Transcriptional Regulator of CO Metabolism in Bacteria. J. Bacteriol. 190, 3336-3343 [11] Freeman, S. L., Kwon, H., Portolano, N., Parkin, G., Venkatraman Girija, U., Basran, J., Fielding, A. J., Fairall, L., Svistunenko, D. A., Moody, P. C. E., Schwabe, J. W. R., Kyriacou, C. P., and Raven, E. L. (2019) Heme binding to human CLOCK affects interactions with the E-box. Proc Natl Acad Sci U S A 116, 19911-19916 [12] Airola, M. V., Du, J., Dawson, J. H., and Crane, B. R. (2010) Heme binding to the Mammalian circadian clock protein period 2 is nonspecific. Biochemistry 49, 43274338 [13] Thompson, J. D., Gibson, T. J., and Higgins, D. G. (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics Chapter 2, Unit 2.3 [14] Stecher, G., Tamura, K., and Kumar, S. (2020) Molecular Evolutionary Genetics Analysis (MEGA) for macOS. Mol. Biol. Evol. 37, 1237-1239 [15] Holm, L., and Rosenström, P. (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res. 38, W545 - W549 [16] Dopazo, J. (1994) Estimating errors and confidence intervals for branch lengths in phylogenetic trees by a bootstrap approach. J. Mol. Evol. 38, 300-304 [17] Burton, M. J., Cresser-Brown, J., Thomas, M., Portolano, N., Basran, J., Freeman, S. L., Kwon, H., Bottrill, A. R., Llansola-Portoles, M. J., Pascal, A. A., Jukes-Jones, R., Chernova, T., Schmid, R., Davies, N. W., Storey, N. M., Dorlet, P., Moody, P. C. E., Mitcheson, J. S., and Raven, E. L. (2020) Discovery of a heme-binding domain in a neuronal voltage-gated potassium channel. The Journal of biological chemistry 295, 13277-13286