UNIVERSITY OF CHICAGO CENTER FOR RESEARCH INFORMATICS
CRI
ANNUAL REPORT
2014
MESSAGE FROM THE OFFICE OF THE CRIO
We live in an age in which the need for informatics to support our research and clinical mission is greater than ever. As datasets grow exponentially and biomedical science becomes increasingly driven by data-intensive research, the Center for Research Informatics (CRI) becomes a more and more important resource for researchers. Like our peers on campus and nationally, the CRI exists in a resource-constrained environment and for this reason it is very important that we hear from researchers about their priorities. I hope you will reach out to me with your best ideas. Recently it has seemed that each week another company or organization has suffered a data security breach, sometimes exposing the personal information of thousands to millions of people. The BSD Information Security Office was formed this year to improve our information security procedures and to educate the Division about best practices in information security. I urge all BSD units to proactively take advantage of this Office to protect data and systems and work to adopt best practices. I am very proud of the progress that we have made together since the CRI was launched in 2011. As Chief Research Informatics Officer, I am excited by the various informatics and IT services that this Center is able to offer BSD researchers. The CRI is a wonderful resource that I urge you to explore if you are not already using it.
Robert L. Grossman, PhD Chief Research Informatics Officer, Biological Sciences Division Director, Center for Data Intensive Science Core Faculty and Senior Fellow, Institute for Genomics and Systems Biology Core Faculty and Senior Fellow, Computation Institute Professor of Medicine, Section of Genetic Medicine
ROBERT GROSSMAN, PHD CHIEF RESEARCH INFORMATICS OFFICER As Chief Research Informatics Officer, Bob guides informatics activities and initiatives across the BSD. His research group focuses on big data, biomedical informatics, data science, and cloud computing. He is also the PI for the Bionimbus Protected Data Cloud, an open-source cloud-based platform that allows researchers authorized by NIH to compute over human genomic data in a secure and compliant fashion, and the Director of the not-for-profit Open Cloud Consortium, which develops and operates cloud computing infrastructure for the research community. Bob served as the first Director of the CRI from August 2011 to April 2013.
CENTER FOR RESEARCH INFORMATICS
ANNUAL REPORT 2014 TABLE OF CONTENTS
WHO WE ARE
3
THIS YEAR’S HIGHLIGHTS
17
RESEARCH. POWERED BY THE CRI.
25
CONNECTION TO CAMPUS
31
LOOKING AHEAD
39
APPENDIX
41
LETTER FROM THE DIRECTOR
A year ago, I enthusiastically wrote about the CRI’s growth and emerging impact across the research enterprise. Our success is measured by the programs we enable, the collaborations we foster, and the grants and manuscripts we facilitate. By these metrics, the CRI has had an extremely successful year. This accomplishment is a result of our deep talent pool across several interrelated lines of operation, including highperformance computing, HIPAA-secure storage and backup, application development and support, research data warehouse services, and bioinformatics collaboration. I see all of these groups within the CRI working together every day to enable research throughout the BSD. The Center for Research Informatics now finds itself at the right place and time to bring an unprecedented suite of tools and resources to researchers throughout the BSD. After an initial period of developing our infrastructure, talent, and knowledge, we are now able to offer collaboration, consulting, and services across a wide array of projects and initiatives. From the plant biologist requiring high-performance computing to the endocrinologist looking for clinical data for a retrospective study to the surgeon requiring a data-entry system to capture information about patients on a trial, the CRI is positioned as a key collaborator to help make these projects a success. This report is a detailed accounting of our group, our successes, and our plans for the future. We hope you will enjoy learning about us and our offerings, and we look forward to working with you on many exciting projects.
Samuel Volchenboum, MD, PhD Director, Center for Research Informatics Associate CRIO, Biological Sciences Division Associate Director, Institute for Translational Medicine Assistant Professor of Pediatrics Fellow, Computation Institute
WHO WE ARE
Since 2011, the Center for Research Informatics has provided the University of Chicago Biological Sciences Division (BSD) with informatics resources, services, and expertise to enable world-class research. We work with BSD scientists through all stages of their projects, handling clinical, translational, and basic science data with the highest level of efficiency and security, while also collaborating on large-scale initiatives and creating opportunities for informatics education. From secure computing resources to clinical data to bioinformatics analysis, our work contributes to the forefront of informatics and health care innovation.
WHO WE ARE
MISSION AND CORE VALUES
The CRI’s mission is to provide informatics resources and services to the BSD, to participate in clinical and biomedical research of the highest scientific merit, and to support and promote research and education in the field of informatics. Our mission, current activities, and plans for
SCIENTIFIC PROGRESS • • •
Our support leads to grants, publications, and prestige for the University. The work we do is scientifically rigorous and of the highest quality. We strive to be the informatics hub of the University and an asset in recruiting faculty and donors.
PARTNERSHIP • • •
•
As collaborators, we build strong relationships with the researchers who use our services. We work hard to develop customized solutions for challenging problems. We contribute across the multiple phases of a research project, from acquiring data to storing, analyzing, and sharing it, to project and program management. We are deeply engaged in the research we support and proud of the scientific progress and improved health care that result from it.
PROFESSIONALISM • • • • •
4
We provide a client experience of the highest quality. Across multiple service lines, users experience a seamless partnership with one primary point of contact. Our staff are experts in their fields. Our fees, requirements, and expected turnaround times are competitive and presented in a transparent way. Our communication with users is clear, consistent, and always professional.
CRI Annual Report 2014
WHO WE ARE
the future are based on and informed by the core values below. In all our work, from everyday provisioning of resources to large-scale collaborative projects, we prioritize these values of scientific progress, partnership, professionalism, security, accessibility, and innovation.
SECURITY • • • •
Protecting sensitive data is our top priority and we uphold the highest standards for security and compliance. We consistently adhere to IT security policies, procedures, and best practices to ensure compliance with regulatory standards. Our IT infrastructure and services are the most secure on campus. We are proactive in our security measures, remaining vigilant and adaptable to stay ahead of threats.
ACCESSIBILITY • • •
We are committed to matching researchers with appropriate services and then developing personalized solutions to meet specific needs. Faculty of all technical backgrounds can rely on us for assistance in obtaining the services they need. All BSD researchers are welcomed and treated equally, regardless of faculty rank, department, or other affiliations.
INNOVATION • • • •
We are engaged with our research community and committed to evolving along with it. We seek and respond to feedback from our users. We are growing with the ever-changing informatics landscape to continue meeting the needs of researchers. Our work represents the forefront of informatics research.
cri.uchicago.edu
5
WHO WE ARE
CRI SERVICE LINES
Our administration, communications, and project management team
6
Since 2011, the CRI has been providing
collaborators, and are used by faculty
the BSD with a growing and improving
across a wide variety of departments.
selection of state-of-the-art technol-
Our support for BSD research is orga-
ogies and services for working with
nized into four primary service lines:
research data. These secure, stan-
custom applications development, bio-
dards-compliant resources are open
informatics services, IT resources, and
to all members of the BSD and their
the Clinical Research Data Warehouse.
CRI Annual Report 2014
WHO WE ARE
APPLICATIONS DEVELOPMENT The CRI Applications Development team, led
Also maintained by this team are the CRI’s
by Brian Furner, develops and maintains a
data management solutions. The CRI oper-
wide variety of custom applications for BSD
ates and supports Velos eResearch, a clinical
research. These offerings are each unique
trials management system that integrates
and tailored to the researchers’ specific
study administration and data management,
needs. This team’s work includes support for
and REDCap, a web-based application sup-
patient registries, multi-institution clinical
porting data collection strategies for research
research data networks, and unstructured
studies with tools for building and managing
text indexing and searching.
online surveys and databases.
The CRI Applications Development team
BIOINFORMATICS CORE
cri.uchicago.edu
7
WHO WE ARE
APPLICATIONS DEVELOPMENT 2014 HIGHLIGHTS
Built a web-based patient registry application for the Pulmonary Hypertension Program, replacing an older system developed at Rush University and extending it with additional functionality.
Building a web-based consent and specimen tracking system for the GAIN project, a multi-institutional effort that will gather solid-tumor specimens from pediatric patients for genomic analysis.
Implemented a unified data request intake form for the Analytics Core that allows users from across the biomedical enterprise to submit requests and track their progress. The intake form feeds a CRI-hosted instance of IBM Rational Team Concert, customized specifically for the Analytics Core to enable a robust reportable workflow of requests as they travel through the data pipeline.
BIOINFORMATICS CORE
8
Under the leadership of Jorge Andrade,
building of custom pipelines to solve spe-
PhD, the Bioinformatics Core provides
cific problems, grant writing assistance,
advanced
to
and the self-service Galaxy analysis
BSD researchers. The team’s advanced
environment. They have also developed
bioinformaticians all are PhD scientists
a popular program of monthly training
and act as co-authors and collaborators
sessions in bioinformatics tools and
on research projects. The Core’s offer-
technologies. This year, they developed
ings include analysis of high-throughput
and held the CRI’s first multi-day training
genomic data using a variety of pipelines
workshop (for more on the workshop,
developed in-house (see next page), the
see page 34).
bioinformatics
services
CRI Annual Report 2014
WHO WE ARE
BIOINFORMATICS PIPELINES • Illumina pipelines for RNA-Seq, ChIP-Seq, Exome Sequencing, Whole Genome Re-Sequencing, Consensus Genotyping, De-novo Assembly, and Somatic Mutation Detection for Tumor/Normal Pairs • SOLiD pipelines for RNA-Seq, Whole Genome Re-Sequencing, ChIPSeq, and De-novo Assembly • One pipeline for Illumina and Affymetrix Expression Arrays • One pipeline for Affymetrix and Exiqon miRNA Arrays A detailed list of pipelines is available in Appendix C.
The CRI Bioinformatics Core
cri.uchicago.edu
9
WHO WE ARE
BIOINFORMATICS CORE IN 2014
116
requests received
76
projects completed
8+
papers published
Bioinformatics Core analysis clients receive not just a dataset but a fully developed report of their analysis results.
10
CRI Annual Report 2014
WHO WE ARE
IT OPERATIONS AND INFRASTRUCTURE Under
the
direction
of
Thorbjörn
archive resources for both groups and
Axelsson, our IT Operations and Infra-
individuals; support and automation for
structure group provides computing
Galaxy; and expert technical support. In
resources for BSD researchers, includ-
addition, the team works closely with
ing the resources that support all of the
the BSD Information Security Office
CRI’s other activities. Offerings include
(ISO) to ensure that all the CRI’s work is
a high-performance computing cluster
held to the highest standards for infor-
for fast, advanced data processing
mation security. (For more information
and analysis; server hosting and virtu-
on the ISO, see page 24.)
alization; secure storage, backup, and
Our IT Operations and Infrastructure team
cri.uchicago.edu
11
WHO WE ARE
OUR IT INFRASTRUCTURE High Performance Computing (HPC) cluster • • •
36 standard nodes (2.2 GHz, 2304 total cores, 256 GB RAM per node) 2 large memory nodes (2.27 GHz, 80 total cores, 1 TB RAM per node) 61 TB of shared high-performance storage space
Centralized, automated, encrypted, and secure data backup
Virtual Server Infrastructure to provision virtual servers on Linux and Windows platforms
A 1.2-petabyte Isilon cluster for data storage
Large Memory Linux Compute System •
Processors: 2.4 GHz, 80 total cores
•
Memory: 1 TB of RAM
Galaxy, a web-based portal for biomedical analysis that is integrated with the CRI’s HPC resources
CLINICAL RESEARCH DATA WAREHOUSE The Clinical Research Data Warehouse
for faculty researchers. In addition to
(CRDW) contains University of Chicago
maintaining i2b2, a cohort discovery
medical data dating back to 2006, avail-
interface that allows researchers to query
able for research. This team’s work, led
the data in the CRDW, the team has also
by Tim Holper, includes maintenance
contributed to the process of developing
of the CRDW, refreshes of existing data
a de-identified data portal to be released
sources, integration of new data sources,
in 2015.
and fulfillment of complex data requests
12
CRI Annual Report 2014
WHO WE ARE
8,115,516 encounters
749,621
43,399,463 procedures
patients
A LOOK INSIDE THE CRDW
19,316,421 medications
134,156,124 19,156,124
labs
diagnoses
The Clinical Research Data Warehouse team
cri.uchicago.edu
13
WHO WE ARE
ORGANIZATION AND LEADERSHIP For a detailed list of CRI staff, please see Appendix A.
Robert Grossman Chief Research Informatics Officer
Samuel Volchenboum Associate CRIO and CRI Director
Brian Furner Manager of Programming
Timothy Holper Manager of CRDW Development and Operations
Michael Daus Business Administrator
Seong Choi Norm Paterson Programmers
Julie Johnson Healthcare Business Analyst
Tiffany Cyrus Brian Leung Brad Orr Project Managers
Julissa Acevedo Business Systems Analyst
Stacie Landron Programmer/ Report Writer
Tomasz Oliwa Scientific Software Engineer
Luis Maciel Database Administrator
Michael Baltasi Executive Administrator
Caitlin Pike Communications Manager
14
Keith Danahey Lead Programmer, 1200 Patients
Ishai Strauss Programmer
Thomas Sutton Sr. DBA/ETL Developer
CRI Annual Report 2014
Jorge Andrade Director of Bioinformatics
Riyue Bao Tzuni Garcia Kyle Hernandez Lei Huang Sabah Kadri Yan Li Chunling Zhang Bioinformaticians Wenjun Kang Scientific Programmer
Thorbjรถrn Axelsson Director of IT Operations and Infrastructure
Andy Brook Beth Lynn Eicher Michael Jarsulic Sneha Jha Olumide Kehinde Systems Administrators
WHO WE ARE
SAMUEL VOLCHENBOUM, MD, PhD DIRECTOR / ASSOCIATE CRIO Sam has been a part of the CRI since May 2012 in his role as Associate Chief Research Informatics Officer, leading our faculty outreach and education efforts. In April 2013 he was appointed Director of the CRI and now leads our operations and strategic planning. In addition to his work in the CRI, Sam serves the Department of Pediatrics as Assistant Professor, is an Associate Director of the Institute for Translational Medicine, and is a Faculty Fellow in the Computation Institute. His research includes using proteomics to study neuroblastoma, a pediatric solid tumor; applying bioinformatics techniques to large clinical datasets; and creating tools to improve provider communication and patient care.
JORGE ANDRADE, PhD DIRECTOR OF BIOINFORMATICS As the technical director responsible for planning and oversight of the Bioinformatics Core, Jorge has extensive training in bioinformatics as well as many years of experience applying these tools within the pharmaceutical industry. Most recently, he led a 70-person bioinformatics team at the Beijing Genome Institute. Since joining the CRI in 2012, he has built an 8-person team of PhD scientists, all focused on delivering high-quality analyses of genomic and proteomic data. He has instituted the development and deployment of over ten industry-grade analysis pipelines, all running in the CRI high-performance computing environment. He has engaged in over 150 collaborations with over 75 researchers from the University of Chicago and elsewhere. The rigorous analysis provided by his group has become the substrate for numerous grant applications and peer-reviewed manuscripts.
THORBJÖRN AXELSSON DIRECTOR OF IT OPERATIONS AND INFRASTRUCTURE With over 15 years experience in IT for research and higher education, Thorbjörn leads the CRI’s work of running, expanding, and improving our secure computing infrastructure. He works closely with the BSD Information Security Office to ensure that the CRI’s resources and services remain the most secure and compliant available. Prior to joining the CRI in 2014, Thorbjörn was Associate Director of Enterprise Infrastructure at the University of Kansas. He has a broad range of IT expertise, with a background that includes management, research support, IT infrastructure on all levels, security, project management, IT architecture, and software development. Thorbjörn has a Master’s degree in Computing Science from University of Gothenberg, Sweden, and is particularly interested in research computing, IT architecture, and IT security.
cri.uchicago.edu
15
WHO WE ARE
MICHAEL BALTASI, PhD EXECUTIVE ADMINISTRATOR Michael joined the CRI as its Executive Administrator in March 2014, bringing an extensive background in financial analysis, corporate management, and higher education administration to the position. Michael is responsible for planning and oversight of our financial and administrative functions. He works closely with the rest of the leadership team to develop both shortand long-term organizational plans; oversees the CRI’s project portfolio; coordinates activities across service areas; and provides a full range of administrative, financial, and strategic support.
BRIAN FURNER MANAGER OF PROGRAMMING A graduate of the College, Brian has been a part of the University of Chicago community in various capacities since 1993 and with the CRI since its creation. After years of working on the systems administration side of IT within the BSD, Brian transitioned into software development in 2005 and has been focused on that ever since. Through his work in the Department of Medicine, Brian developed broad knowledge of clinical research data, clinical research applications, and the methods for effectively dealing with the complexities of this domain. Lessons learned in the course of this work proved vital as the CRI began development and operations in early 2011. As Manager of Applications Development, Brian oversees a team of developers who are responsible for providing custom software solutions to the BSD research community.
TIMOTHY HOLPER MANAGER OF CRDW OPERATIONS AND DEVELOPMENT Timothy leads the architecture, development, and operations of the Clinical Research Data Warehouse (CRDW) team. With advanced degrees in Computer Science as well as Social Science Research and Statistics, Timothy brings together data warehousing expertise and a deep understanding of research data. Prior to joining the CRI, Timothy developed research applications in the Department of Medicine at the University of Chicago, working on predictive statistical models and parallel processing algorithms. He developed social science databases for crime mapping in public housing and program evaluations such as the City of Chicago’s Community Policing program. Timothy aspires for the CRI to make a significant impact on the volume and quality of data available to researchers in the Biological Sciences Division through the efforts of the CRDW team.
16
CRI Annual Report 2014
THIS YEAR’S HIGHLIGHTS
This year was one of growth for the CRI. We expanded and improved many of the resources we offer to the BSD community, making it easier than ever to do complex research efficiently and securely. Our move to Kenwood Data Center and partnerships with other University groups positioned us for continued future growth, as we simultaneously streamlined our processes and upgraded our equipment to continue providing our current services at the highest possible level.
THIS YEAR’S HIGHLIGHTS
THE CRI CONTINUES TO GROW As we continue to find ways to expand
Executive Administrator and Thorbjörn
and improve the resources we offer to
Axelsson as Director of IT Operations
the BSD research community, our team
and Infrastructure. In addition, three
has grown and changed as well. Since
bioinformaticians, four programmers, a
last year, we have welcomed many new
senior project manager, and a healthcare
staff to the CRI. Two of our leadership
business analyst have joined our team.
roles were filled by Michael Baltasi as
Center for Research Informatics Staff 2014
18
CRI Annual Report 2014
THIS YEAR’S HIGHLIGHTS
CRDW IMPROVEMENTS In November 2013, we made the first of
and reduced turnaround time, as well as
several changes to the CRDW request
core subsidy grants generously provided
process
weekly
by the Institute for Translational Medicine
office hours. Prior to submitting a data
(for details, see http://itm.uchicago.edu/
request, investigators first meet in person
funding/subsidy-awards/core-subsidies),
with CRDW staff. The CRI’s experts help
helped us to keep CRDW data accessible
researchers to get to know the data types
to all researchers who need it.
when
we
introduced
that are available and formulate a well-defined request. This also allows the CRI to produce a good-faith estimate of the time
As in past years, we have continued to expand the amount and types of data in
involved in fulfilling the request.
the CRDW, making it an even more robust
Office hours have helped to streamline
research. Data from billing systems, includ-
the request process, reduced unnecessary
ing both facility and professional fees, have
time spent waiting for clarifications, and
been added, as well as National Death
made it easier for researchers to get the
Registry information. We’ve integrated
precise data they need each time. In addi-
more than 100,000 new patients and one
tion, we have made the request process
million new encounters. In addition, begin-
more open and transparent and reduced
ning during the first quarter of 2015, CRDW
the average turnaround time for requests.
datamart users will now be able to access
As we introduced a chargeback policy for
reports through Cognos, an IBM business
data requests this year, this transparency
intelligence tool.
resource
for
clinical
and
translational
THE CRDW IN 2014
141
data requests fulfilled
25+
publications enabled
cri.uchicago.edu
20
departments served
19
THIS YEAR’S HIGHLIGHTS
THE CRDW REQUEST PROCESS
3
4
Client submits online data request form.
If PHI is requested, CRDW team verifies IRB protocol.
CRDW team creates estimate and statement of work.
8
7
6
5
Full dataset and invoice are delivered to client.
CRDW team conducts data verification and quality check.
Sample dataset is delivered for client approval.
Report writer creates SQL code and integrates data sources.
1
2
CRDW team meets with client at weekly office hours.
BIOINFORMATICS AND MOLECULAR PATHOLOGY In June 2013, the Department of Pathol-
In order to provide these state-of-the-art
ogy launched a new division dedicated
Next-Generation Sequencing clinical test-
to genomic and molecular pathology.
ing services, the division has partnered
This division comprises four laboratories,
with the CRI Bioinformatics Core. CRI bio-
specializing in molecular diagnostics, cyto-
informatician Sabah Kadri, PhD, works full-
genics, clinical genomics, and translational
time with the Department of Pathology to
research. Part of the division’s mission is
provide this complex genomic analysis. The
to provide a comprehensive genetic and
CRI is excited to be a part of this important
genomic laboratory service for physicians
clinical and translational research resource.
and patients.
20
CRI Annual Report 2014
THIS YEAR’S HIGHLIGHTS
DE-IDENTIFIED DATA PORTAL The CRDW team has spent over a year
seeking in a secure and compliant environ-
developing our SEECohorts de-identified
ment. This will enable researchers to study
data portal, set to launch in spring 2015.
de-identified data without submitting a
Like the existing i2b2 system, SEECohorts
data request.
will be a cohort discovery tool allowing users
SEECohorts has been demonstrated to
to query the data in the CRDW. However,
investigators from several departments
i2b2 only returns cohort counts, and users
and received an overwhelmingly positive
must then submit a data request to receive
response. Once released, this portal has
any detailed patient data. The SEECohorts
the potential to greatly simplify cohort dis-
portal will take cohort discovery further by
covery, democratize data availability, and
allowing users to see and interact with a
enhance clinical research throughout the
de-identified version of the data they are
University.
The SEECohorts interface makes it easy to explore de-identified data.
cri.uchicago.edu
21
THIS YEAR’S HIGHLIGHTS
MOVE TO KENWOOD DATA CENTER On February 10, 2014, the CRI completed the extensive process of moving all our resources from the outdated equipment in the Prudential Data Center to the stateof-the-art Kenwood Data Center on the University of Chicago campus. Led by then–Director of Systems and Security Plamen Martinov, this ambitious project set the goal of decommissioning or migrating every resource at Prudential, while maintaining the integrity of all user
Prudential after decommissioning
data. Our team decommissioned 324 systems, moved 140 terabytes of research data, and migrated hundreds of users to Kenwood. They decommissioned three HPC clusters and donated 80 compute nodes to other departments. The project was completed on schedule with no loss of data. Moving to Kenwood has allowed the CRI to improve the security, reliability, and recoverability of our systems through the
Servers prepared for recycling
modernization of data center services and standard architecture. Kenwood is equipped to house systems compliant with federal guidelines, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Federal Information Security Management Act (FISMA). This makes the CRI’s systems a valuable resource for researchers working with patient data or who have received grants that require compliance with these guidelines.
22
Kenwood Data Center
CRI Annual Report 2014
THIS YEAR’S HIGHLIGHTS
EXPANDED COMPUTING RESOURCES Moving to Kenwood also gave us the
We also increased our data storage
opportunity to continue expanding and
capacity from 700 terabytes to 1.2 peta-
improving the computing resources that
bytes (1,200 terabytes), allowing us to
we offer to the BSD community.
provide secure storage with encrypted
Most significantly, we expanded our HPC resources, bringing our overall HPC
backup and restore capabilities to more groups and individual users.
capacity to 2304 standard-memory CPU
Finally, we established a vulnerability
cores plus 80 high-memory CPU cores
management program and achieved 100
and making room for more researchers
percent coverage for all systems. The
to analyze larger amounts of data more
research data stored on CRI resources
quickly and powerfully. In 2014, an aver-
is backed up daily. All of these additions
age of 80 researchers per month used
reflect our commitment to being the
our HPC resources.
University’s most secure and advanced computing resource for research.
CRI STORAGE USAGE
142 groups
696
individuals
658 TB data
24
114M
departments
files
cri.uchicago.edu
23
THIS YEAR’S HIGHLIGHTS
BSD SECURITY INITIATIVE Because the researchers we support often
and consistent approach to information
work with patient records and other very
security across the Division, the BSD cre-
sensitive data, the CRI prioritizes informa-
ated the Information Security Office (ISO)
tion security in our computing infrastruc-
within the Office of the CRIO. Plamen was
ture and operations. We have a strong
appointed inaugural Director of BSD Infor-
record of achieving a secure, highly com-
mation Security.
pliant environment while still working as efficiently as possible.
The ISO now offers a comprehensive set
As CRI Director of Systems and Security,
departments, including risk management
Plamen Martinov worked with CRIO Bob
and compliance, security and risk consult-
Grossman and the research informatics
ing, policies and standards, vulnerability
governance committees to spearhead
management, incident response, security
an information security program that
monitoring, firewall management, and
included encryption of laptops and mobile
security awareness and training. The CRI
devices, consistent cybersecurity policies,
will continue to work closely with the ISO
and attendant monitoring and auditing.
on security initiatives and to hold our own
of information security services to BSD
In August 2014, recognizing the successes of the CRI and the value of a centralized
work to the highest standard of security and compliance.
For more information on the ISO, visit crio.uchicago.edu/security.
24
CRI Annual Report 2014
RESEARCH. POWERED BY THE CRI. In the past year, the CRI significantly expanded the scope of our work to become involved in major, collaborative projects that are changing the face of health care, childhood development, clinical research, and more. In our work with these initiatives, we are leveraging the same state-of-theart computing resources and expertise in custom informatics programming, large-scale data analysis, and data sharing that we already use to enable important BSD research. Our commitment to the projects outlined in this section takes the impact of our work far beyond the University as we contribute to creating opportunities for collaborative research; making health care more personalized, effective, affordable, and accessible; and improving lives.
RESEARCH. POWERED BY THE CRI.
1200 PATIENTS/GENOMIC PRESCRIBING SYSTEM Jointly sponsored by the CRI and the Cen-
relational database along with curated phar-
ter for Personalized Therapeutics, the 1200
macogenomic data from published studies.
Patients project seeks to develop a new
Through the Genomic Prescribing System
medical system model for personalized care,
(GPS), a research web portal, participating
in which the genetic profile of a patient can
care providers can then access information
be incorporated into treatment decisions.
to help them predict how a patient may
Led by University of Chicago physicians Dr.
respond to a given medication.
Mark Ratain and Dr. Peter O’Donnell, business lead and venture partner Ken Bradley, and CRI lead programmer Keith Danahey, the
Rather than delivering raw genotype information, the GPS provides a patient-specific
project has been underway since 2011.
interpretation of the genomic data for a
Patients who agree to participate in the
This information is distilled into a summary
study are genotyped in a CLIA-certified lab.
designed to be understood in 30 seconds
Their genetic information is then stored in a
or less, so that providers can easily use
group of commonly prescribed medications.
The GPS uses a simple stoplight system to illustrate information for providers.
26
CRI Annual Report 2014
RESEARCH. POWERED BY THE CRI.
Beyond the stoplight system, the GPS offers providers brief summaries of patient-specific pharmacogenomic data for each drug.
the tool during clinic visits. Physicians can
the leadership of Keith Danahey, our 1200
use this knowledge to inform their choices
Patients team provides custom program-
in prescribing a medication—for example,
ming, database design, data stewardship, and
pre-identifying patients at higher risk for
data analysis, among other essential work.
severe side effects or predicting when a
Keith’s team was joined by programmer Ishai
patient may need alternative dosing.
Strauss in fall 2014, and together they design
In 2014, the University of Chicago Innovation Fund awarded a $100,000 investment for further development of the GPS tool. The team plans to use this funding to continue developing a more robust version of the tool and to further validate the system in hospital and health care environments outside the University.
powerful yet simple software solutions for both physicians and researchers. In addition, they have undertaken complex data analysis that has led to exciting developments in the field of pharmacogenomics—the study of the effect of genetic variation on drug response or toxicity. The team’s work was accepted for presen-
The GPS has been designed and maintained by the CRI since its inception. Under
tation at the AMIA 2015 Joint Summits on Translational Science in San Francisco, further underscoring the impact of this project.
cri.uchicago.edu
27
RESEARCH. POWERED BY THE CRI.
THIRTY MILLION WORDS The Thirty Million Words (TMW) initiative,
environment is provided to parents to help
directed by Dr. Dana Suskind, is an innova-
monitor progress and set goals.
tive, evidence-based intervention program designed to help narrow the language gap between children from lower-income families and those in wealthier households. Studies have demonstrated that the number of words a child is exposed to before the age of four is significantly correlated with the child’s eventual IQ and academic outcomes. Furthermore, this early language exposure is correlated with income: children from lower-income households hear, on average, about thirty million words fewer than their peers from more affluent homes during this critical developmental period, leaving them less likely to achieve academic success. Dr. Suskind created TMW in 2010 to address this gap by bringing awareness to the importance of spoken language in early childhood development and giving parents the tools and knowledge to enrich their children’s home language environment. TMW
combines
education,
Recent preliminary trials have showed that parents and caregivers who received this quantitative linguistic feedback spoke and interacted more with their children. In April 2014, TMW was selected for a PNC Foundation multi-year grant that will support a larger-scale, five-year longitudinal study of the program’s impact on vocabulary development and school readiness in 200-250 children. TMW will soon be implemented at the community level with a center-based approach that includes daycare facilities, with the long-term goal of reaching parents and caregivers at the citywide level and beyond. TMW’s curriculum and the scientific evaluation of its results require a significant amount of computing power and data storage. The CRI is proud to partner with TMW to fill these needs. For the first phase of the project, our
technology,
and behavioral strategies in an interactive multimedia curriculum for parents. Home visits from coaches, animations that make the underlying science of the project accessible, and videos teaching easy-to-follow strategies lay the foundation for parents to enhance their linguistic interaction with their children. Quantitative feedback gleaned from
IT Operations and Infrastructure team has provisioned a set of virtual machines to host TMW’s software in Kenwood Data Center. Over the next two years, the CRI’s application development experts will have the lead role in creating a suite of applications for Dr. Suskind and the TMW team. We look forward to contributing our resources and expertise to this innovative project.
weekly recordings of the home language
28
CRI Annual Report 2014
RESEARCH. POWERED BY THE CRI.
CAPriCORN Chicago Area Patient Centered Outcomes Research Network In December 2013, a $7 million federal
identify and fill gaps in coverage through new
grant was awarded by the Patient-Centered
partnerships, and improve health care deliv-
Outcomes Research Institute (PCORI) to a
ery and patient outcomes both locally and
coalition of twenty Illinois health and hospi-
nationwide.
tal organizations, including the University of Chicago Medicine. That coalition, the Chicago Area Patient Centered Outcomes Research Network (CAPriCORN) is now part of a nationwide network working to reduce heath disparities among diverse populations of patients and develop better models for health
Building and operating this data network is a major undertaking, requiring a robust computing infrastructure, the creation of procedures for the standardization of data types, and the utmost attention to maintaining patient privacy. Because of the CRI’s suc-
care delivery.
cesses in building and maintaining the Clinical
With the PCORI grant, CAPriCORN was tasked
urally positioned to play a key role in UCM’s
with developing a cross-institutional data
contribution to winning the PCORI grant and
infrastructure capable of pooling and sharing
the development of the CAPriCORN project.
electronic heath record and outcomes data
Brian Furner, CRI’s Manager of Applications
from more than one million Chicago-area
Development,
patients, including many high-risk patients.
informatics lead on this high-profile project,
This pool of information will contribute to
providing support at both institutional and
research about how providers in complex
network levels through his participation in
urban settings can overcome barriers to effec-
the Informatics Work Group, the Data Model
tive treatment, improve health outcomes, and
and Data Standards Committee, and several
drive down the costs of health care for both
cohort working groups.
Research Data Warehouse, the CRI was nat-
common and rare conditions. CAPriCORN’s network will focus specifically on sickle cell disease, anemia, asthma, recurrent Clostrid-
has
been
the
University’s
The CRI has provided the local computational infrastructure for the project, including the
ium difficile, diabetes, and obesity.
data in the CRDW and Clarity (UCM’s billing
In building this network, CAPriCORN’s mission
the data into the CAPriCORN Common Data
is to provide an informatics infrastructure that
Model, and the servers housing the datamart
will support collaborative, patient-centered
and the PopMedNet client used in the proj-
outcomes research in the Chicago area. The
ect. Through the CRI’s efforts, the University
research this network makes possible will
of Chicago has been a consistent leader in
help care providers to overcome the barri-
implementing the informatics components of
ers of fragmentation and limited resources,
this important project.
system), the code that transforms and loads
cri.uchicago.edu
29
RESEARCH. POWERED BY THE CRI.
SHRINE Shared Health Research Information Network The CRI-managed instance of i2b2 allows
better-informed
researchers to query the Clinical Research
potential cohorts for clinical trials, and
Data Warehouse to explore available data
develop stronger grant applications. Com-
and identify potential research cohorts.
bining the research data from multiple insti-
The Chicago-area Shared Health Research
tutions will not only enhance these existing
Information Network (SHRINE) pilot pro-
benefits but will encourage collaboration
gram builds on this capability and expands
across institutions and enable the planning
it across a wider array of data sources.
of research that requires large sample sizes
SHRINE brings together the CTSA and
not easily available at individual locations,
informatics groups at three local research
including research in population health and
institutions: Northwestern University, the
health services.
University of Illinois at Chicago, and the University of Chicago. All three institutions currently have or will soon implement both i2b2 and VIVO, a web application that allows users to search for researchers by various criteria across a network of participating
identify
The Chicago-area SHRINE is seen as a key component in the CTSA grant renewal process and will be an important part of a growing federated research portfolio that will give the University of Chicago a
organizations.
competitive advantage among its peers in
The SHRINE project leverages this existing
nership with the Institute for Translational
infrastructure to create a common tool that
Medicine, the CRI is playing a key role in
can query data repositories at all partici-
developing this local network of de-identi-
pating institutions. Researchers will be able
fied patient data.
procuring grant funding. Through our part-
to specify inclusion and exclusion criteria including demographics, diagnoses, and medications and receive patient counts meeting their criteria from not just their own institutions, but from all hospitals and heath care programs participating in the network. Similar to i2b2, this tool will protect patient
Building on our previous efforts in deploying i2b2 to our research community, CRI staff are providing technical, regulatory, and research guidance on the project. With the initial proof-of-technology phase of the project successfully completed, we are now
privacy by returning only aggregate counts.
focused on the next phase, to include shar-
The cohort counts returned by i2b2 que-
institutions.
ries
30
hypotheses,
ing of cohort counts among the member
enable investigators to generate
CRI Annual Report 2014
CONNECTION TO CAMPUS
In addition to supporting BSD scientists and taking part in innovative research, a third and equally important part of the CRI’s mission is to contribute to an education and training program that will help to create the next generation of informatics experts. It is our goal to have our researchers be comfortable and confident users of both our resources and other technologies for biological computing. By providing educational opportunities to the University of Chicago community—most of them free of charge—we enable users to take this knowledge back to their own departments and labs, enhancing their ability to conduct meaningful, advanced informatics research and raising the profile of the University in the larger informatics community.
CONNECTION TO CAMPUS
BIOINFORMATICS TRAINING For over two years, the CRI Bioinfor-
attracted more than 800 total partic-
matics Core has offered a free monthly
ipants, routinely filling every available
training seminar to the BSD community,
seat.
covering a variety of bioinformatics tools and techniques. Topics have ranged from the Linux command line to R and Bioconductor to integrating the CRI’s high-performance computing resources into bioinformatics analyses. Since these seminars began in May 2012, they have
Topics covered this year included a three-part series on R programming, a two-part series on Python programming, an introduction to the Linux command line, analysis of several types of Illumina data, and an overview of how to use the
Jorge Andrade leads a seminar on the Linux command line
32
CRI Annual Report 2014
CONNECTION TO CAMPUS
COMMENTS FROM PARTICIPANTS “This is a tough area to fully cover in three hours but the team did a phenomenal job. Kevin, Jorge, and Olumide were great at making it easy for the layperson to understand. […] I am looking forward to attending regular CRI sessions.” (January) “I found it remarkably helpful and was exceptionally pleased with being able to attend. I found it useful not only to understand the ‘how’ part of the analysis pipeline, but also the ‘why.’” (February) “I learned a lot, and this will be very helpful for my own work.” (April) “This was incredibly informative and the instructors answered all the students’ questions. I am very impressed with this course and feel like I learned a lot today.” (May) “Very knowledgeable instructor and giving lots of helpful hints.” (July) “Very clear presentation and intro to the subject.” (September)
CRI’s computational infrastructure for
Survey feedback showed that over 98
bioinformatics analysis. In March, guest
percent of participants were satisfied
speakers Natalia Maltsev, MD, PhD, and
with their session, and the CRI bioin-
Dina Sulakhe, MS, demonstrated the
formaticians leading the sessions were
Lynx integrated systems biology plat-
evaluated by 98 percent of participants
form currently being developed in the
as “very knowledgeable” in the subjects
Human Genetics Department and Com-
they were teaching.
putation Institute.
cri.uchicago.edu
33
CONNECTION TO CAMPUS
WORKSHOP LEARNING SERIES Based on the success of our monthly bio-
Our first workshop was held in Decem-
informatics training seminars, this year
ber 2014 and was dedicated to the bio-
we launched the CRI Workshop Learning
informatics analysis of high-throughput
Series. By hosting multi-day educational
genomic data. This four-day workshop
workshops open to researchers and
focused on how to make the most of the
industry professionals from around the
latest technologies and tools for working
United States, we seek to bring educa-
with large and complex datasets, includ-
tional opportunities in informatics to
ing both in-depth practical theory and
a larger audience of scientists and stu-
hands-on training. Instructors included
dents while encouraging collaboration
CRI bioinformaticians and distinguished
and community-building in informatics
guest speakers from the University of
research.
Chicago community and beyond, with
COMMENTS FROM PARTICIPANTS
“I was pleased with the wide range of information that was covered during the workshop, and the diversity of skills I got to practice during these four days.” “Despite having background in the content covered, I found it a very good refresher on things I haven’t thought about in a while.” “Even after speaking for ~3 hours, the instructor was not boring at all. […] I heard someone say in the lunch line how this tutor spoke with clarity and command.” “This course was appropriately geared towards researchers who use high-throughput sequencing in their work, regardless of their knowledge level, which I was quite impressed by. […] I am grateful that the CRI held this course and I think the speakers and instructors did an excellent job guiding us through the process.”
34
CRI Annual Report 2014
CONNECTION TO CAMPUS
sessions covering commonly used bioinformatics tools, genomics data visualization, analysis workflows, R programming, high-performance computing, and more. The workshop also included several social events to encourage discussion and collaboration among participants. Interest in the workshop was high, with 45 applications received. After an academic review, 33 were accepted and attended the workshop. Participants were primarily graduate students and postdoctoral fellows, along with several staff and faculty. Through a partnership with the Committee on Clinical and Translational Science (see next page), 12 participants received course credit for completing the workshop. Survey feedback after the workshop demonstrated that the vast majority of participants found the material very useful and the quality of instruction highly satisfactory. For more information on the Workshop
Learning
Series,
please
visit
learn.cri.uchicago.edu.
cri.uchicago.edu
35
CONNECTION TO CAMPUS
CCTS INFORMATICS COURSES Further reflecting our commitment to informatics
education,
CRI
Director
Sam Volchenboum is an instructor for
CCTS
informatics courses offered through the Committee on Clinical and Translational Science (CCTS). The CCTS, a freestanding academic unit within the BSD, is organized by the Center for Health and the Social Sciences and the Institute for Translational Medicine with the goal of enhancing multidisciplinary training in
With joint input from CHeSS and the ITM, the CCTS works to create new course offerings in clinical and translational science. Areas of concentration include:
clinical and translational science. Sam co-led the development of the Spring 2014 course “Introduction to Clinical Research Informatics,” an introductory survey of the fundamentals of information technology as applied to health care. Sam and co-instructors Sameer Badlani, MD, and David McClintock, MD, taught a curriculum tailored to post-doctoral fellows, residents, and faculty. They focused on technology’s impact on patients, providers, and hospitals, including the topics of decision support, system integration, educational applications,
emerging
technologies,
and security and compliance.
• Comparative Effectiveness Research • Translational Informatics • Health Services Research • Quality and Safety • Clinical Research • Community-Based Research • Global Health • Pharmacogenomics For more information about the CCTS and a list of current course offerings, visit chess.bsd.uchicago. edu/training/ccts.html.
Other courses to which Sam will contribute are currently in development, including a clinical research methods course to be offered in 2015.
36
CRI Annual Report 2014
CONNECTION TO CAMPUS
REDCAP TRAINING The University of Chicago has been part
to fostering a community of capable,
of the REDCap consortium since 2010.
engaged REDCap users by providing train-
REDCap is a web-based application that
ing and educational opportunities.
supports data collection strategies for
We offer online video tutorials and PDF
research studies with tools for building
guides on our REDCap website, available
online surveys and databases, developed
Julissa Acevedo, CRI Business Systems
of Chicago’s REDCap now supports more
Analyst and resident REDCap expert,
than 1200 users from across the BSD and
1000
cri.uchicago.edu/redcap. In addition,
at
by Vanderbilt University. The University
houses over one thousand projects.
provides consultations, demonstrations,
The CRI performs continual upgrades to
sessions for those in need of more per-
REDCap to improve the user experience
sonalized guidance. As we continue to
and offer the latest software features. A
upgrade REDCap, Julissa provides new
major upgrade was completed in Novem-
feature demonstrations and updated train-
ber 2014. In addition to operating this
ing documentation on our website with
valuable resource, the CRI is committed
each upgrade.
and individual and small-group training
GROWTH IN REDCAP PROJECTS, 2013-14
800
600
400
200
cri.uchicago.edu
DEC
NOV
OCT
SEP
AUG
JUL
JUN
MAY
APR
MAR
FEB
JAN
DEC
NOV
OCT
SEP
2014 AUG
JUL
JUN
MAY
APR
MAR
FEB
JAN
2013
37
CONNECTION TO CAMPUS
OUR PARTNERS The CRI is proud to work alongside our campus partners in advancing biomedical and informatics research. Our work is partially made possible by a generous investment from the Institute for Translational Medicine (ITM). More information about the ITM is available at itm.uchicago.edu.
Other partners with whom we are proud to collaborate include: Biological Sciences Division
bsd.uchicago.edu
Biostatistics Core
biotime.uchicago.edu
Chicago Biomedicine Information Systems
help.bsd.uchicago.edu
Center for Health and the Social Sciences
chess.bsd.uchicago.edu
Comprehensive Cancer Center
cancer.uchicago.edu
Computation Institute
ci.uchicago.edu
Genomics Core
genomics.uchicago.edu
Human Imaging Research Office
hiro.bsd.uchicago.edu
Institute for Genomics and Systems Biology
igsb.anl.gov
Institutional Review Board
humansubjects.uchicago.edu
IT Services
itservices.uchicago.edu
Office of Clinical Research
bsdocr.bsd.uchicago.edu
University of Chicago Medicine UCM Center for Quality
38
uchospitals.edu uchospitals.edu/visitor/quality
CRI Annual Report 2014
LOOKING AHEAD
The past year has been the CRI’s most innovative and prolific thus far. Continuing this trend, 2015 will see us play expanding roles in major, multifaceted projects already underway, as well as embarking on new ones. In addition to continuing our work on the projects profiled in this report, we will serve as lead developer for the Harvard-led pediatric GAIN Consortium, for which we will build a multi-institutional specimen tracker and database. We will also provide bioinformatics and IT support for the $10M Transdisciplinary Center for Prematurity Research grant awarded by the March of Dimes Foundation in December 2014, and we will serve as technical lead on a redesign of the International Neuroblastoma Risk Group database, which will include expanding the size and scope of data feeds, standardizing clinical information, and building new querying and visualization tools for researchers and clinicians.
cri.uchicago.edu
39
LOOKING AHEAD
In addition to our work on these high-profile projects, we will continue improving and upgrading the resources we offer to BSD researchers by transitioning to new computing and storage environments. Our data warehouse team is in the process of migrating its operations to the exceptionally powerful IBM Netezza data warehouse appliance and advanced analytics applications, and our current high-performance computing cluster will have its hardware refreshed this year to double its processing ability. As our reputation for providing exceptional technical solutions for research continues to grow, we participate as partners on projects of increasing scope and complexity. To meet these demands, we continue to recruit expert programmers, data scientists, bioinformaticians, and other IT professionals to our team. Our growth both reflects and contributes to the robust research environment of the BSD. Over the next year, we look forward to pushing the limits of technology in our support of the biomedical sciences at the University of Chicago.
The CRI leadership team
40
CRI Annual Report 2014
APPENDIX
cri.uchicago.edu
41
APPENDIX
APPENDIX A: CRI USAGE BY THE NUMBERS
2014 CRDW DATA REQUESTS BY DEPARTMENT
42
CRI Annual Report 2014
APPENDIX
2014 BIOINFORMATICS CORE PROJECTS BY DEPARTMENT
GROWTH IN BIOINFORMATICS CORE USAGE, 2012-2014
cri.uchicago.edu
43
APPENDIX 2014 CRI STORAGE USERS BY DEPARTMENT
44
CRI Annual Report 2014
APPENDIX 2014 CRI HIGH-PERFORMANCE COMPUTING USAGE BY DEPARTMENT
CPU HOURS
NUMBER OF JOBS
cri.uchicago.edu
45
APPENDIX
2014 REDCAP USERS BY DEPARTMENT
46
CRI Annual Report 2014
APPENDIX
OVERALL CRI IMPACT IN THE BSD
cri.uchicago.edu
47
APPENDIX
APPENDIX B: CRI STAFF LIST
Samuel Volchenboum, MD, PhD
BIOINFORMATICS CORE
Director and Associate CRIO
ADMINISTRATION
Stacie Landron, MS, RN Programmer/Report Writer
Jorge Andrade, PhD Director of Bioinformatics
Luis Maciel Senior Systems Analyst/DBA
Michael Baltasi, PhD
Riyue Bao, PhD
Executive Administrator
Bioinformatician
Thomas Sutton, MS Senior DBA/ETL Developer
Tiffany Cyrus, MBA
Tzuni Garcia, PhD
Project Manager
Bioinformatician
Michael Daus
Kyle Hernandez, PhD
Business Administrator
Bioinformatician
Brian Leung
Lei Huang, PhD
Senior Project Manager
Bioinformatician
Brad Orr, MS
Sabah Kadri, PhD
Senior Project Manager
Bioinformatician
IT OPERATIONS AND INFRASTRUCTURE Thorbjรถrn Axelsson, MS Director of IT Operations and Infrastructure Andy Brook Senior Systems Administrator
Beth Lynn Eicher
Caitlin Pike
Wenjun Kang, MS
Senior Systems Administrator
Communications Manager
Scientific Programmer Michael Jarsulic
APPLICATIONS DEVELOPMENT
Yan Li, PhD
Senior Systems Administrator
Bioinformatician
Brian Furner
Chunling Zhang, MS
Manager of Programming
Bioinformatician
Julissa Acevedo
CLINICAL RESEARCH DATA WAREHOUSE
Sneha Jha, MS Intermediate Systems Administrator Olumide Kehinde Business Systems Analyst
Lead Systems Administrator
1200 PATIENTS
Seong Choi
Timothy Holper, MS, MA
Programmer
Manager of CRDW Development
Keith Danahey, MS
and Operations
Lead Application Developer
Julie Johnson, MPH, RN
Ishai Strauss
Healthcare Business Analyst
Programmer
Tomasz Oliwa, PhD Scientific Software Engineer Norm Paterson Programmer
48
CRI Annual Report 2014
APPENDIX
APPENDIX C: BIOINFORMATICS PIPELINES ILLUMINA RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis ChIP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation Exome Sequencing: Raw Data QC, Pre-processing, Mapping with 3 different tools, Realignment and Quality Recalibration, Multiple Samples Variant Calling, Variant Annotation, Variant Comparison, Filtration, and Summarization Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation Consensus Genotyping Pipeline: Genotyping, SNP Detection and InDel Detection using three different methods (Samtools, GATK and Atlas-2), comparison of variant calls, list of consensus call variants, and list of method specific calls De-novo Assembly: Raw data QC, Merging, Clipping, Filtering, Contigs Assemble, Scaffold Assembly, Assemble Statistics, Downstream Analysis Somatic Mutation Detection for Tumor/Normal Pairs: Raw Data QC, Pre-processing, Mapping with 2 different tools, Realignment and Quality Recalibration, Somatic Mutation Detection with 4 different tools, Variant Annotation, and Summarization
SOLiD RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation ChIP-seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation De-novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assemble, Scaffold Assembly, Assemble Statistics, and Downstream Analysis
ILLUMINA AND AFFYMETRIX EXPRESSION ARRAYS Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed Genes, Functional Annotation, and Pathway Enrichment Analysis
AFFYMETRIX AND EXIQUON miRNA ARRAYS Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed miRNAs, Predict miRNA Targeted Genes, Functional Annotation, and Pathway Enrichment Analysis
cri.uchicago.edu
49
APPENDIX
APPENDIX D: RESEARCH INFORMATICS GOVERNANCE AND OVERSIGHT A governance structure set up by the Office of the CRIO guides research informatics across the entire BSD, ensuring that informed long-term decisions for the Division are reached in a transparent and accountable way. The five committees of the Research Informatics Governance Structure bring together senior BSD and University of Chicago Medicine (UCM) leadership; information systems experts; patient privacy experts; and faculty representing basic science, clinical research, and translational research. Decisions from these committees guide the CRIO in establishing policies and procedures, prioritizing new initiatives, safeguarding patient information, and complying with BSD policies and applicable federal and state laws. In addition to the faculty representation on governance committees, an Informatics Oversight Committee is in place to ensure that research informatics activities and future plans are in line with the needs of research faculty. This committee, made up of faculty leaders representing both basic science and clinical departments, reports to the Research Advisory Committee, a BSD/UCM committee that serves as a key advisory body for the Dean for Research and Graduate Education. Membership lists for all research informatics governance and oversight committees are available at crio.uchicago.edu/governance.
APPENDIX E: PUBLICATIONS Below is a selection of recent publications made possible in part by the CRI’s research resources.
BIOINFORMATICS CORE Lowry DB, Hernandez K, Taylor SH, et al. The genetics of divergence and reproductive isolation between ecotypes of Panicum hallii. New Phytol. 2015 Jan; 205(1):402-14. doi: 10.1111/nph.13027. Epub 2014 Sep 23. Malcom JW, Hernandez KM, Likos R, Wayne T, Leibold MA, Juenger TE. Extensive cross-environment fitness variation lies along few axes of genetic variation in the model alga, Chlamydomonas reinhardtii. New Phytol. 2015 Jan;205(2):841-51. doi: 10.1111/nph.13063. Epub 2014 Sep 29. Bao R, Huang L, Andrade J, et al. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Informatics. 2014 Sep 21;13(Suppl 2):67-82. doi: 10.4137/CIN.S13779
50
CRI Annual Report 2014
APPENDIX
Fahrenbach J, Andrade J, McNally EM. The CO-Regulation Database (CORD): a tool to identify coordinately expressed genes. PLOS One. 2014 Mar. Volchenboum S, Andrade J, Huang L, et al. Gene expression profiling of Ewing sarcoma tumours reveals the prognostic importance of tumour–stromal interactions: a report from the Children’s Oncology Group. J Pathol: Clinical Research. 2014. doi: 10.1002/cjp2.9 Spranger S, Bao R, Gajewski T. Melanoma-intrinsic-catenin signaling prevents T cell infiltration and anti-tumor immunity. Journal for ImmunoTherapy of Cancer. 2014, 2(Suppl 3):O15. doi: 10.1186/20511426-2-S3-O15 Widau RC, Parekh A, Ranck MC, et al. The RIG-I like receptor LGP2 protects tumor cells from ionizing radiation. P Natl Acad Sci. 2013 Dec. Chen B, Moore TV, Li Z, et al. Gata5 Deficiency Causes Airway Constrictor Hyperresponsiveness in Mice. Am J Resp Cell Mol. 2013 Nov.
CLINICAL RESEARCH DATA WAREHOUSE Bailey KA, Savic D, Zielinski M, et al. Evidence of non-pancreatic beta cell-dependent roles of Tcf7l2 in the regulation of glucose metabolism in mice. Hum Mol Genet. 2014 Nov 14. doi: 10.1093/hmg/ddu577. He BZ, Ludwig MZ, Dickerson DA, et al. Effect of genetic variation in a Drosophila model of diabetes-associated misfolded human proinsulin. Genetics. 2014 Feb;196 (2):557-67. doi: 10.1534/ genetics.113.157800. Hong S, Le-Rademacher J, Artz A, McCarthy PL, Logan BR, Pasquini MC. Comparison of nonmyeloablative conditioning regimens for lymphoproliferative disorders. Bone Marrow Transpl. 2014 Dec 1. doi: 10.1038/bmt.2014.269. Wang Y, Hong S, Li M, et al. Noggin resistance contributes to the potent osteogenic capability of BMP9 in mesenchymal stem cells. J Orthop Res. 2013 Nov;31 (11):1796-803. doi: 10.1002/jor.22427. Nelson R, Liao C, Fichera A, Rubin DT, Pekow J. Rescue therapy with cyclosporine or infliximab is not associated with an increased risk for postoperative complications in patients hospitalized for severe steroid-refractory ulcerative colitis. Inflamm Bowel Dis. 2014 Jan;20 (1):14-20. doi: 10.1097/01. MIB.0000437497.07181.05. Sofia MA, Rubin DT, Hou N, Pekow J. Clinical presentation and disease course of inflammatory bowel disease differs by race in a large tertiary care hospital. Digest Dis Sci. 2014 Sep;59 (9):2228-35. doi: 10.1007/s10620-014-3160-0. Choi CH, Poroyko V, Watanabe S, et al. Seasonal allergic rhinitis affects sinonasal microbiota. Am J Rhinol Allergy. 2014 Jul-Aug;28 (4):281-6. doi: 10.2500/ajra.2014.28.4050. Kern DW, Wroblewski KE, Schumm LP, Pinto JM, Chen RC, McClintock MK. Olfactory function in Wave 2 of the National Social Life, Health, and Aging Project. J Gerontol B-Psychol. 2014 Nov;69 Suppl 2:S134-43. doi: 10.1093/geronb/gbu093. Li L, Zhan X, Wang N, et al. Does airway surgery lower serum lipid levels in obstructive sleep apnea patients? A retrospective case review. Med Sci Monitor. 2014 Dec 13;20:2651-7. doi: 10.12659/ MSM.892230.
cri.uchicago.edu
51
APPENDIX Naclerio RM, Pinto JM, Baroody FM. Drowning in applications for residency training: a program’s perspective and simple solutions. J Otolaryngol - Head N. 2014 Aug;140 (8):695-6. doi: 10.1001/ jamaoto.2014.1127. Patel RM, Pinto JM. Olfaction: anatomy, physiology, and disease. Clin Anat. 2014 Jan;27 (1):54-60. doi: 10.1002/ca.22338. Pinto JM, Schumm LP, Wroblewski KE, Kern DW, McClintock MK. Racial disparities in olfactory loss among older adults in the United States. J Gerontol A-Biol. 2014 Mar;69 (3):323-9. doi: 10.1093/ gerona/glt063. Pinto JM, Wroblewski KE, Kern DW, Schumm LP, McClintock MK. Olfactory dysfunction predicts 5-year mortality in older adults. PLOS One. 2014;9 (10):e107541. doi: 10.1371/journal.pone.0107541. Watanabe S, Pinto JM, Bashir ME, et al. Effect of prednisone on nasal symptoms and peripheral blood T-cell function in chronic rhinosinusitis. International Forum Of Allergy & Rhinology. 2014 Aug;4 (8):609-16. doi: 10.1002/alr.21336. Yao L, Pinto JM, Yi X, Li L, Peng P, Wei Y. Gray matter volume reduction of olfactory cortices in patients with idiopathic olfactory loss. Chem Senses. 2014 Nov;39 (9):755-60. doi: 10.1093/chemse/ bju047. Choi CH, Poroyko V, Watanabe S, et al. Allergen Exposure Affects Sinonasal Microbiota. Am J Rhinol Allergy. 2014 Jul;28(4):281-6. Li L, Zhan X , Wang N, et al. Does Airway Surgery Lower Serum Lipid Levels in OSA patients? A Retrospective Case Review. Med Sci Monit. 2014 Dec 13;20:2651-2657. Li H, Giger ML, Sun C, et al. Pilot study demonstrating potential association between breast cancer image-based risk phenotypes and genomic biomarkers. Med Phys. 2014 Mar;41(3):031917. doi: 10.1118/1.4865811. Feng Y, Stram DO, Rhie SK, et al. A Comprehensive Examination of Breast Cancer Risk Loci in African American Women. Hum Mol Genet. 2014 May 22. pii: ddu252. Blair DR, Wang K, Nestorov S, Evans JA, Rzhetsky A. Quantifying the impact and extent of undocumented biomedical synonymy. PLOS Comput Biol. 2014 Sep;10 (9):e1003799. doi: 10.1371/ journal.pcbi.1003799. Liu CC, Tseng YT, Li W, et al. DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections. Nucleic Acids Res. 2014 Jul;42 (Web Server issue):W137-46. doi: 10.1093/nar/gku412. Rzhetsky A, Bagley SC, Wang K, et al. Environmental and state-level regulatory factors affect the incidence of autism and intellectual disability. PLOS Comput Biol. 2014 Mar;10 (3):e1003518. doi: 10.1371/journal.pcbi.1003518. Churpek MM, Yuen TC, Park SY, Gibbons R, Edelson DP. Using electronic health record data to develop and validate a prediction model for adverse outcomes in the wards. Crit Care Med. 2014 Apr;42(4):841-8. Town JA, Churpek MM, Yuen TC, Huber MT, Kress JP, Edelson DP. Relationship Between ICU Bed Availability, ICU Readmission, and Cardiac Arrest in the General Wards. Crit Care Med. 2014;42(9):2037-41. Churpek MM, Yuen TC, Winslow C, et al. Multicenter Development and Validation of a Risk Stratification Tool for Ward Patients. Am J Respir Crit Care Med. 2014 Sep 15;190(6):649-55. Churpek MM, Yuen TC, Winslow C, Hall J, Edelson DP. Differences in Vital Signs Between Elderly and Nonelderly Patients Prior to Ward Cardiac Arrest. Crit Care Med. 2014 Dec 31. [Epub ahead of print]
52
CRI Annual Report 2014
PHOTOGRAPHY CREDITS p. 3: Caitlin Pike p. 6, 7, 9, 11, 13, 15-16, 18, 35 (middle), 40: David Christopher p. 17, 35 (bottom): Jorge Andrade p. 22: Brad Orr p. 25: “Test Tubes” by Chesapeake Bay Program, available at https://www.flickr. com/photos/29388462@N06/5434154393/ under CC BY-NC 2.0. Full terms at https:// creativecommons.org/licenses/by-nc/2.0/ p. 31, 32, 35 (top), 39: Sara Serritella, ITM
Thumbprint art on front cover designed by Griffin Brands.
Written and designed by Caitlin Pike. © The University of Chicago, 2015. All rights reserved.