University of Chicago
Center for Research Informatics
ANNUAL REPORT
2012-2013
Message from the BSD Chief Research Informatics Officer The Center for Research Informatics (CRI) was set up two years ago in August 2011 to provide services and resources to support biomedical informatics. The CRI views biomedical informatics very broadly to include bioinformatics, clinical informatics, translational informatics, and health care informatics. As is detailed in this report, during the past two years, the CRI has set up a Clinical Research Data Warehouse and a Bioinformatics Core, updated the BSD’s high-performance computing cluster, and made secure and compliant storage and computing resources available to every BSD researcher. The Office of the CRIO has also set up a governance structure, bringing together BSD and UCM leadership and experts in information systems, patient privacy, and a variety of research fields to guide us in making long-term decisions that serve our researchers, comply with all relevant laws and policies, and protect our patients’ data. In addition, the Office of the CRIO has supported initiatives such as setting up a secure and compliant computing infrastructure so that researchers can more quickly and easily analyze large-scale genomic datasets. As we move forward and continue to grow, it is important that we hear from faculty so that we can provide the services, resources, and education that are important to you. I am very interested in hearing from every BSD researcher to make sure that your needs are met. You can contact me directly or talk to any member of the BSD Research Informatics Oversight Committee (you can find their names on the CRI website and on page 74 of this report). We look forward to hearing from you.
Robert Grossman, Ph.D.
Center for Research Informatics Annual Report 2012-13
TABLE OF CONTENTS Who We Are Clinical and Translational Informatics
2 12
Systems and Security
26
Bioinformatics Core
40
Training and Education
53
Faculty Oversight and Governance
59
Our Partners
65
Looking Ahead
67
Appendix
69
WHO WE ARE
Who We Are
A Letter from the Director As a physician scientist and informaticist, I know firsthand the problems facing basic researchers, translational medicine specialists, and clinicians. Deficiencies in any part of the pipeline can have serious downstream effects. A productive translational research operation requires a solid and secure infrastructure, easy access to powerful analytical tools and high-performance computing, expertise in complex study design and data analysis, and the ability to create and implement platforms for collecting, storing, studying, and presenting research data. When the Center for Research Informatics was established two years ago, the path from the bench to the bedside at the University of Chicago was fragmented and inefficient, serving neither basic scientists nor clinicians well. In the short time since, the CRI has grown into an active and versatile group of professionals, capable of executing on a wide range of technologies to enable worldclass biomedical research. To support these endeavors, we have built an industrial-grade, HIPAA-compliant secure infrastructure, able to store and compute over even the largest and most complex datasets. Every research group, from the basic sciences to clinical faculty, has access to the storage and computing resources provided by the CRI. The Bioinformatics Core takes on the most intricate and complicated data analysis tasks, working closely with investigators to advise on data collection, perform complex computations, and render advanced interpretation of the results. Nearly one hundred groups have taken advantage of the Core so far, and the number of grants and papers directly resulting from our assistance is growing every day. Our Clinical and Translational Group provides state-ofthe-art custom application development for investigators in need of specialized data collection or other assistance with their clinical studies. The crowning achievement of the Center’s first two years has been the design, building, and implementation of a robust Clinical Research Data Warehouse. Starting with just a data feed from the Centricity billing system, the CRDW now boasts data from over 600,000 patients in an easily searchable system that allows researchers to quickly determine cohort size for their studies. The CRI has the BSD’s only system capable of performing complex queries to study quality and other important clinical metrics. Looking ahead, we have many exciting plans for the CRI. In 2014, we will release a feature-rich system for accessing de-identified patient data in the CRDW. We will implement a full-text searchable system for querying pathology and radiology reports. We will further expand our computational infrastructure, doubling our HPC power
4
CRI Annual Report 2012-13
Who We Are
and increasing our storage capacity. We continue to develop, test, and roll out new bioinformatics methods, making these available to researchers through our core service offerings and as self-service workflows on our Galaxy platform. We hope that everyone in the BSD will take advantage of the array of resources offered by the CRI. We look forward to continuing our mission of providing advanced informatics support to enable world-class biomedical research throughout the BSD.
Samuel Volchenboum, MD, PhD
our leadership
Director and Associate CRIO
Samuel Volchenboum, MD, PhD Sam has been a part of the CRI since May 2012 in his role as Associate Chief Research Informatics Officer, leading our faculty outreach and education efforts. In April 2013 he was appointed Director of the CRI and now leads our operations and strategic planning. In addition to his work in the CRI, Sam serves the Department of Pediatrics as Assistant Professor, is an Associate Director of the Institute for Translational Medicine, and is a Faculty Fellow in the Computation Institute. His research includes using proteomics to study neuroblastoma, a pediatric solid tumor; developing software to facilitate realtime mass spectrometry peptide identification; and creating tools to improve provider communication and patient care.
cri.uchicago.edu
5
Who We Are About the CRI The Center for Research Informatics
students, postdoctoral fellows, tech-
(CRI) was created in 2011 to support the
nicians, staff, faculty, researchers, and
University of Chicago Biological Sciences
collaborators. In addition, we support
Division (BSD) research community. We
research and education in informatics
offer state-of-the-art, standards-com-
and work with the Institute for Trans-
pliant technologies for the acquisition,
lational Medicine (ITM), Clinical and
management, and storage of clinical,
Translational Science Awards (CTSA)
translational, and basic research data.
program, and other partners on joint
Our resources and services are open to all members of the BSD; users include
initiatives. Finally, we are strong advocates for informatics within our research communities.
How We’ve Grown: A Timeline Since 2011, when the Office of the CRIO was created to guide research informatics efforts across the BSD, the Center for Research Informatics has grown into a robust service organization. In our first year, we developed a mission, hired several of the Directors who would lead our major initiatives, and began work on key projects. Since early 2012, we’ve seen several important projects come to fruition: a functioning and growing Clinical Research Data Warehouse, with faculty data requests being evaluated by our Data Use Committee and fulfilled by our staff; a Bioinformatics Core providing pipelines, consulting, training, and other services; and an improved, HIPAA-compliant computing and storage infrastructure for researchers. As we work to improve and expand our existing services, we will continue to reach out to faculty to increase user adoption and develop new initiatives. We look forward to seeing where the next year of our timeline will bring us.
February 2011 The Office of the CRIO is created with a charge of overseeing and directing research informatics across the BSD.
6
August 2011 Robert Grossman is appointed CRIO.
CRI Annual Report 2012-13
The CRI is created as a central organization for informatics service, research, and education.
Who We Are
OUR MISSION The CRI’s mission is to provide informatics resources and services to BSD faculty, to support high-quality biomedical and clinical research, and to promote research and education in informatics.
The CRI’s services and resources are
with genomic data. The Systems and
provided by three core groups. The
Security team maintains and improves
Clinical and Translational Informatics
our scientific computing infrastructure,
team manages our Clinical Research
provides technical support to users, and
Data
ensures our technical compliance with
Warehouse
(CRDW),
provides
custom programming for initiatives in
appropriate
clinical research, and supports data
three of these groups, along with our
management
The
administration, governance committees,
analy-
and strategic partners, work together
sis pipelines, consulting services, and
toward our goal of enabling world-class
other expertise for researchers working
research in a secure environment.
Bioinformatics
for
clinical
Core
Don Saner joins the CRI as Director of Clinical and Translational Informatics.
trials.
provides
Hannah Lawrence joins the CRI as Executive Administrator.
cri.uchicago.edu
security
regulations.
All
October 2011 The BSD Research Informatics Governance Structure is approved by the BSD Dean’s Office.
7
Who We Are
our leadership
Chief Research Informatics Officer
Robert Grossman, PhD
As Chief Research Informatics Officer, Bob guides informatics activities and initiatives across the BSD, including providing strategic direction and oversight for the CRI. In addition to his role as CRIO, Bob is a Senior Fellow in the Institute for Genomics and Systems Biology, a Senior Fellow in the Computation Institute, and a Professor of Medicine in the Section of Genetic Medicine. His research group focuses on big data, biomedical informatics, data science, cloud computing, and related areas. Bob served as the first Director of the CRI from August 2011 to April 2013.
The CRI is an initiative developed and
BSD’s investment in research informat-
managed by the Chief Research Infor-
ics, for managing research informatics
matics Officer (CRIO) and Associate
services and resources, for creating new
CRIO. The Office of the CRIO is respon-
research informatics initiatives, and for
sible for advising the Dean and the Dean
operating a Research Informatics Gover-
for Research and Graduate Education of
nance Structure.
the Biological Sciences Division on the
8
February 2012
May 2012
The IRB protocol for the CRDW is approved, and the system goes live for BSD researchers.
Sam Volchenboum is appointed Associate Chief Research Informatics Officer.
CRI Annual Report 2012-13
Jorge Andrade joins the CRI as Director of Bioinformatics.
Who We Are
The CRI’s administrative team: Caitlin Pike, Michael Daus, and Hannah Lawrence
The Bioinformatics Core begins offering services.
September 2012
October 2012
The Bioinformatics Core makes its core informatics pipelines available to researchers.
The cohort discovery tool i2b2 is released to facilitate queries of the CRDW for all BSD researchers.
cri.uchicago.edu
9
Who We Are
our leadership
Executive Administrator
Hannah Lawrence
As Executive Administrator of the CRI, Hannah is responsible for planning and oversight of our financial and administrative functions, including coordination across service areas, management of governance committees, communications, and project management for CRI initiatives. She works closely with our leadership team to develop short- and long-term organizational plans, manages our budget and hiring process, and provides other administrative and strategic support. Prior to joining the CRI, she served as Strategist and Planner in the Office of the Dean of the BSD. There, she was responsible for organizing the Informatics Advisory Group that ultimately led to the Dean’s decision to appoint a CRIO and create the CRI. She also served as the first administrative manager of the BSD’s Faculty Advisory Committee.
November 2012 The CRI makes available a new, HIPAA-compliant HPC cluster, storage and backup resources, and other computing infrastructure.
10
April 2013 Sam Volchenboum is appointed Director of the CRI.
CRI Annual Report 2012-13
Plamen Martinov joins the CRI as Director of Systems and Security.
Who We Are
Our Organization Robert Grossman Chief Research Informatics Officer
Sam Volchenboum Associate CRIO & Director of the CRI
Hannah Lawrence Executive Administrator
Don Saner Director of Clinical and Translational Informatics
Caitlin Pike Communication Specialist
Timothy Holper Manager of CRDW Development
Michael Daus Administrative Specialist
Brian Furner Manager of Programming Seong Choi Keith Danahey Kevin Le Programmers
Jorge Andrade Director of Bioinformatics
Plamen Martinov Director of Systems and Security
Riyue Bao Elizabeth Bartom Kyle Hernandez Lei Huang Jianpeng Xu Chunling Zhang Bioinformaticians
Andy Brook Beth Lynn Eicher Michael Jarsulic Sneha Jha Olumide Kehinde Systems Administrators
Wenjun Kang Scientific Programmer
Bruce Thompson Security Analyst
Julissa Acevedo Business Systems Analyst
Brad Orr Senior Project Manager
Luis Maciel Database Administrator Tiffany Cyrus Project Manager
cri.uchicago.edu
For a detailed list of CRI employees, please see Appendix A.
11
CLINICAL & TRANSLATIONAL INFORMATICS
Clinical and Translational Informatics
The CRDW Since the CRI’s beginnings, one of the
The CRDW incorporates six years’ worth
central projects of the Clinical and Trans-
of data from electronic medical records
lational team has been to build, populate,
and patient billing, including lab values,
and maintain the Clinical Research Data
procedure and diagnosis codes, demo-
Warehouse. Over the past two years,
graphics, medications, and visit informa-
the team has seen this initiative develop
tion. The Clinical and Translational team
from a concept to a functioning and
continues to work to expand the amount
growing data warehouse. The IRB proto-
and types of data available; they are
col outlining its standards, governance,
currently engaged in integrating radiol-
and oversight was approved in February
ogy and pathology notes and discharge
2012, and the team began fulfilling data
summaries.
requests for researchers in May 2012.
our leadership
Director of Clinical and Translational Informatics
Don Saner
Over the past two years, Don has led his team through the process of building and developing the Clinical Research Data Warehouse. In addition, he provides informatics leadership and support for other clinical and translational research projects in conjunction with the Institute for Translational Medicine and our other partners. With over 20 years of experience at the University of Chicago, Don joined the CRI as one of our first staff members in 2011.
14
CRI Annual Report 2012-13
Clinical and Translational Informatics As of August 2013, the CRDW contains1... i2b2 and Data Requests To interact with the CRDW, researchers use a datamart interface. The first datamart implemented by the CRI, i2b2, went live for users in June 2012. i2b2, or “Informatics for Integrating Biology and the Bedside,” is an NIH-funded opensource project created by a National Center for Biomedical Computing based at Partners HealthCare System. i2b2 is designed for cohort identification, allowing researchers to query the CRDW for
patients
607,000 encounters
6.1 million medications
16.1 million
sets of patients meeting search criteria. Applications and benefits of the i2b2 interface include: •
Helping investigators create new research hypotheses
•
Identifying potential cohorts for clinical trials
•
Reducing the time researchers must spend
on
discovery
of
research
cohorts, study feasibility, and subject
procedures
36.8 million labs
93 million diagnoses
13.9 million
recruitment •
Familiarizing researchers with the standard terminologies and data that reside in the CRDW
Researchers log into i2b2 using BSD or hospital credentials and can then explore the CRDW using an intuitive drag-anddrop interface. Queries may be created
cri.uchicago.edu
1 Please note that some numbers regarding the data housed in the CRDW differ from those listed in the CRI’s 2012 Annual Report due to changes in how these data are measured. (1) The number of patients reported here represents only those patients who have encounter data associated with them, while last year’s report included all patients regardless of encounter data. (2) Billing records where debits are canceled out by credits (for example, when a procedure is ordered but never performed) are no longer listed here. (3) This year’s report no longer includes “orphaned” records that were previously included in CRDW summary counts but cannot be returned via i2b2 or in custom data requests.
15
Clinical and Translational Informatics
Data Request Status Summary (as of August 2013)
17
92 completed not approved
on hold
awaiting IRB
awaiting user
in progress
with multiple and/or logic, using search
online form, providing information about
terms based on standard terminologies.
the scientific purpose of the requested
The information returned by the system
data. To protect the University’s data
allows researchers to determine the
and comply with patient privacy laws
number of patients meeting their crite-
and our IRB protocol, the Data Use Com-
ria. These results can inform subsequent
mittee monitors data requests to ensure
requests for full datasets with either
appropriate use of CRDW data. (For
de-identified
IRB-approved
more information about this committee,
protected health information. To date,
see page 63.) In addition, the CRI acts as
almost one hundred users have cre-
an Honest Broker service for researchers,
ated and executed over one thousand
integrating data from different sources
searches.
such as the Cancer Registry and Epic and
data
or
Requests for CRDW data are fulfilled by the Clinical and Translational team, under the oversight of the Data Use and Technical Policy Committees. Researchers submit requests using a simple
16
removing identifiers when necessary to protect patient privacy. Since May 2012, the Clinical and Translational team has fulfilled 92 data requests of varying size and complexity.
CRI Annual Report 2012-13
10
Clinical and Translational Informatics
As the demand for clinical data for
data elements, which have been pri-
research purposes continues to grow,
oritized by the Research Informatics
the future of the CRDW’s develop-
Governance Committee. These elements
ment will include the creation of a fully
include the Cancer Registry; radiology,
de-identified
research,
pathology, and discharge summaries;
which will allow investigators to access
other data elements from Epic’s Clarity
and query de-identified data within
data warehouse; and the integration of
a secure data zone. In addition, the
research-specific databases including
continued development of the CRDW
Velos and REDCap.
datamart
for
includes the incorporation of additional
5
Who uses the CRDW? Since May 2012, our Clinical and Translational Informatics team has processed data requests from 11 different University departments.
Family Medicine Human Genetics Medicine Neurology Obstetrics & Gynecology
Total Data Requests
Orthopaedics
115
Pathology Pediatrics Psychiatry Radiology Surgery 10
20
30
cri.uchicago.edu
40
50
60
70
17
Clinical and Translational Informatics
Biobanking Management The CRI maintains an instance of caTis-
University. These customizations, called
sue, a robust open-source biobanking
caTrack, permit tracking the chain of
management system used by labs to
custody for all samples using barcode
organize freezers and track samples. In
labels and handheld barcode readers,
addition to the standard deployment
which then synchronize with caTissue
of caTissue, the CRI has worked with
when placed in docking stations. This
Dr. Michael Maitland, who manages
system permits tracking of a sample’s
the Cancer Center’s biofluids core, to
origin, when it was drawn, and where it
add customizations created by Indiana
was initially stored.
our contributions
Highlighted Accomplishments from 2012-13 • Expanded the CRDW by incorporating data elements from Epic and Clarity as well as the Cancer Registry • Released i2b2 and began fulfilling data requests • Established data request guidelines and policies to protect patient information • Developed a Data Use Committee review process for data requests • Expanded the team by hiring a business systems analyst, a database administrator, and a project manager • Launched a REDCap users group • Rewrote Dr. David Meltzer’s Hospitalist Protocol application and wrote a dashboard and alerting system for his Continuity of Care program
18
CRI Annual Report 2012-13
Clinical and Translational Informatics
Brian Furner, Keith Danahey, and Tim Holper
Custom Programming: 1200 Patients and TRIDOM Beyond maintaining the CRDW and
Personalized Therapeutics. This pharma-
fulfilling data requests, the Clinical and
cogenomics project seeks to develop a
Translational team works directly with
new medical system model for person-
research groups to provide custom
alized care in which patients’ genetic
research application development for
information can be incorporated into the
their clinical research projects.
decision-making process of prescribing
One such project is 1200 Patients, a
medications.
personalized medicine initiative jointly
Patients who have consented to partic-
sponsored by the CRI and the Center for
ipate in the project are genotyped in a
cri.uchicago.edu
19
Clinical and Translational Informatics
CLIA-certified lab. Their genetic informa-
to the patient’s genomic profile. Physi-
tion is then stored in a relational database
cians can use these summaries to inform
along with curated pharmacogenomic
their choices in prescribing medication—
data from published studies. During clinic
by pre-identifying patients who are likely
visits, a physician dashboard displays a
to experience severe side effects, for
“30-second summary” synthesized from
example, or by predicting when a patient
the information in the database relevant
may need alternative dosing.
Clinical and Translational Informatics team members Tiffany Cyrus, Brian Furner, Don Saner, Luis Maciel, Julissa Acevedo, Tim Holper, Kevin Le, and Keith Danahey
20
CRI Annual Report 2012-13
Clinical and Translational Informatics
our future
Our Goals for 2013-14 • Improve internal efficiency, including time, data repository, bug tracking, reporting, and status tracking, by implementing team project management software • Implement a standard procedure for processing project requests that includes defining business requirements and scope of work, providing a time estimate, and receiving client approval • Create a dashboard with metrics for the CRDW, REDCap, and Velos to increase the visibility of the data in each system • Enhance the skill set of each team member through professional training • Develop and maintain tools for data collection and reporting, including continuing to develop in-house systems to support custom applications and datamarts • Improve internal knowledge and collaboration across the team through biweekly presentations
As new technologies allow the practice
database design as well as data import
of medicine to become increasingly
and overall technical management.
personalized, the 1200 Patients project contributes to this progress by improving doctors’ ability to make patient- specific medication decisions. The CRI helps enable this important initiative by providing custom programming and
TRIDOM (Translational Research Initiative in the Department of Medicine) is a biobanking protocol started in 2005 that stores DNA, plasma, and serum for consented patients who are scheduled
cri.uchicago.edu
21
Clinical and Translational Informatics
for a standard-of-care blood draw.
Tissue Resource Center’s biobanking
The CRI contributed to this project
software, to create an automated feed
by leading a rewrite of the database
to the TRIDOM database. To date, TRI-
that maintains consent and sample
DOM has enrolled over 8,800 patients
information and generates operational
and has banked samples for more than
reports. In addition, the CRI partnered
5,900 of these patients, resulting in a
with eSphere, which makes the Human
total of over 58,000 samples.
total projects
REDCap adoption has progressed at a steady rate since 2011.
total users
1500
1200
900
600
300
22
August
July
June
May
March
February
January
December
November
October
September
August
July
May
March
February
January
December
April
CRI Annual Report 2012-13
April
2013
2012
November
October
September
August
July
2011
June
0
Clinical and Translational Informatics
Clinical Trials Management The Clinical and Translational team
more personalized guidance (for more
supports clinical trials management by
detail, see page 56).
operating two data management solutions, REDCap and Velos eResearch.
Velos eResearch is a clinical trials
The University of Chicago has been a
study administration and clinical data
member of the REDCap consortium
management.
since 2010. This self-managed, secure,
many aspects of running a clinical trial,
web-based application, developed by
including:
the Vanderbilt University Clinical and Translational Science Awards (CTSA), supports data collection strategies for research studies with tools for building and managing online surveys and databases. The University of Chicago’s instance of REDCap currently supports more than 700 users from across the BSD and houses over 600 projects. As the number of REDCap users continues to grow, the CRI has worked to provide opportunities for education and collaboration. A REDCap users group provides the community with an ongoing meeting space for discussion, new feature announcements, tips and tricks, and real-time help. The CRI also offers individual and small group REDCap tutorial sessions for those in need of
management system that integrates The
system
supports
•
Patient recruitment and scheduling
•
IRB and study monitoring
•
Project planning and study design
•
Protocol compliance
•
Web-based
data
capture
on
a
per-protocol basis •
Data safety monitoring and adverse event reporting
Velos is now supporting over 1,500 protocols for more than 550 investigators at the University. The CRI is currently working with the vendor to complete a hardware migration and upgrade that will improve the user experience.
cri.uchicago.edu
23
Clinical and Translational Informatics
faculty spotlight
David Meltzer David Meltzer, MD, PhD, is Chief of the Section of Hospital Medicine, Director of the Center for Heath and the Social Sciences, and an Associate Professor of Medicine, Economics, and Public Policy Studies at the University of Chicago. He also serves as the co-leader of the Institute for Translational Medicine’s Training Cluster, as well as the co-director of the ITM’s academic arm, the Committee for Clinical and Translational Science. Dr. Meltzer’s research explores problems in health economics and public policy, focusing on the theoretical foundations of medical cost-effectiveness analysis and the cost and quality of hospital care. In the past year, the CRI’s Clinical and Translational team has worked with Dr. Meltzer in support of two of his projects: the Hospitalist Project and the Comprehensive Care Program (CCP) initiative. The Hospitalist Project, which is supported in part by the Clinical and Translational Science Awards, has been in operation for over 16 years and has enrolled over 100,000 patients. The
24
CRI Annual Report 2012-13
Clinical and Translational Informatics
aims of this multi-site project are to study the quality and cost of care among hospitalized patients at the University of Chicago and Mercy Hospital, to examine whether there are significant differences in outcomes and costs for patients cared for by hospitalists compared to those cared for by other inpatient attending physicians, and to develop a research infrastructure that allows collaboration among multiple investigators and institutions. Patients enrolled in the Hospitalist Project are administered two separate interviews, one during their hospitalization and one 30 days after discharge. The results are recorded during the patient encounter using iPads connected to a custom-written web application. This information is stored in an SQL server database, from which it can be exported with a custom-built reporting system into SAS and Stata for analysis. This year, the CRI’s Clinical and Translational team updated the database and web-based interface for this project. The CCP initiative is a randomized study started in 2012 with the aim of testing novel care delivery systems for improving the quality and reducing the cost of health care. The study’s hypothesis is that improving continuity in the doctorpatient relationship by having a single physician see patients at high risk of hospitalization in both inpatient and outpatient settings will improve outcomes and lower costs by reducing unnecessary emergency department visits, hospital admissions, and readmissions. The CRI has supported this effort by creating a custom dashboard that serves as a central location for CCP staff and physicians to record study information on patients and integrate this information with data from electronic medical records. In addition to the dashboard, the CRI has implemented a notification system that sends pages and emails to the appropriate CCP physicians and staff when a patient enrolled to the CCP protocol visits the emergency department or is admitted to the hospital.
cri.uchicago.edu
25
SYSTEMS & SECURITY
Systems and Security
About Us The Systems and Security team began
The purpose of the Systems and Secu-
with just two members and a limited
rity team is to provide core services and
infrastructure set up by the Initiative in
scientific computing resources with the
Biomedical Informatics. It has now grown
highest quality of customer service, in
to include five full-time employees who
order to enable BSD faculty to conduct
support and manage the development
advanced biological research in a com-
of infrastructure in multiple data centers.
pliant environment while simultaneously
The team has grown our computing envi-
protecting intellectual property and sen-
ronment considerably since its inception
sitive information.
with generous support from the Institute for Translational Medicine and the BSD.
our leadership
Director of Systems and Security
Plamen Martinov
Plamen joined the CRI leadership team as Director of Systems and Security in April 2013 and manages the team of engineers responsible for the development and operations of our secure computing infrastructure. He leads our efforts to ensure compliance with security regulations and provides regular reports to the Research Informatics Compliance Review and Technical Policy Committees. Prior to joining the CRI, Plamen was Lead Data Security Engineer for Chicago Biomedicine Information Systems.
28
CRI Annual Report 2012-13
Systems and Security
Storage & Backup By the Numbers
Infrastructure The CRI’s resources, which have been upgraded and expanded over the past year, include: •
1,024-core high-performance computing (HPC) cluster (2.2 GHz AMD Opteron 6274)
•
Large Memory Linux supercomputer
854,597,057 files backed up
with 1 TB of RAM, 8 Intel® Xeon® E7-8870 2.4 GHz processors (160 cores) •
700-TB ultra-high-density NAS for data storage that can scale up to 20 PB, available for both labshares and individuals
•
Virtual Server Infrastructure with the capacity to support up to 1,500 virtual servers on Windows or Linux platforms
•
Centralized
1.7 petabytes total labshare capacity
2.5 petabytes total backup capacity
and
automated
data
backup and encryption with the capability to back up 2.1 PB of data •
Galaxy web-enabled biomedical data analytics tool that is fully integrated with the CRI’s HPC cluster
110 terabytes virtual infrastructure total storage
cri.uchicago.edu
29
Systems and Security
The Systems and Security team main-
To date, our resources are supporting a
tains the computing infrastructure that
total of 479 active users: 279 users of our
supports not only the CRI’s activities but
storage resources, 162 users of our HPC
also research for faculty across the BSD.
cluster, and 38 Galaxy users. Kenwood
These new HIPAA-compliant resources
Data Center houses 125 virtual machines,
went live in November 2012 and are
with 350 TB of storage in use and 595 TB
available to all members of the BSD.
of data backed up.
Who uses our virtual machines? Our VMs in Kenwood Data Center serve 11 different BSD groups, in addition to supporting CRI activities.
Genetics Health Studies
Total VMs
Cellular Screening Center
125
Laboratory for Advanced Computing Cardiology Pediatrics Clinical & Translational Informatics Bioinformatics Core Clinical Cancer Genetics Childhood Cancer & Blood Diseases Medicine Administration Radiology, HIRO Academic & Administrative Applications CRI Infrastructure 10
30
20
CRI Annual Report 2012-13
30
40
50
60
Systems and Security
our contributions
Highlighted Accomplishments from 2012-13 • Made a new HPC cluster and large memory servers available to all BSD researchers • Launched a VMware farm and began provisioning virtual machines for researchers • Migrated labshares from outdated equipment to a new state-ofthe-art data center • Introduced the CRI help desk and created an online technical help portal to direct users to the correct sources of information • Hired Plamen as Director and expanded the team by hiring four systems administrators and a security analyst • Created detailed security procedures to demonstrate HIPAA compliance • Worked with the Compliance Review Committee to establish policies and procedures for patient privacy and data security • In collaboration with the Bioinformatics Core, released the CRI’s implementation of Galaxy
cri.uchicago.edu
31
Systems and Security
Kenwood Data Center The CRI’s advanced computing resources
•
are housed in the state-of-the-art Kenwood Data Center on the University of Chicago campus. The CRI has made substantial investments in superior, resilient technology at Kenwood, improving the security, reliability, and recoverability of system resources through the modernization of data center services and standard architecture.
rity team over the past year has been the migration of users’ data from outdated equipment in the Prudential Data Center to the newer, better-equipped Kenwood Data Center. The closing of Prudential and the move to Kenwood will allow us to shift our investments to more efficient and standardized computing platforms and technologies. The targets for completing this move include:
•
Adding two 1-TB Large Memory Servers
Migration has proceeded carefully over the course of a year to achieve the goals of protecting the integrity of all data and ensuring clear communication with (55 TB) out of a total of 75 (79 TB) have been migrated. All home directories have been migrated to new servers, and the team is on track to migrate 10 servers per month. The completion of this project is expected by the end of 2013. Kenwood Data Center is equipped to house systems that are compliant with federal guidelines, including HIPAA and the Federal Information Security Management Act (FISMA). Moving all CRI
Virtualizing and migrating more than 80 servers
•
cluster
users. As of August 2013, 64 labshares
A primary focus of the Systems and Secu-
•
Adding an additional 1,024-core HPC
resources to this facility helps us to protect patient privacy and keep our data secure.
Migrating 80 labshares and 300 home directories
32
CRI Annual Report 2012-13
Systems and Security
650 600 550
Total Data Center Users
500 450 400 350
2,368
300 250 200
Kenwood 150
Prudential
100 50
y or
r
M em
ste lu
iB iB ig
iB iC
ve r
Ba c
ku
p
e ag
s rie
St or
Se r
H om
eD ire
cto
re s ha
y or
La bs
eM em
La
rg
H
PC
0
cri.uchicago.edu
33
Systems and Security
Who uses our labshare storage resources? More than 30 different departments and groups across the University store data on CRI-provided labshares.
Ben May Biochemistry & Molecular Biophysics Bioinformatics Biomedical Sciences Cluster Cancer Research Cell & Molecular Biology Cellular Screening Center Center for Clinical Cancer Genetics Chicago Booth School Childhood Cancer & Blood Diseases Ecology & Evolution Endocrinology, Diabetes, & Metabolism Evolutionary Biology Genetic Medicine Genetics & Clinical Cytogenetics Health Studies Hematology/Oncology Human Genetics Institutional Biosafety Internal Medicine Laboratory - Human Genetics Medicine Molecular Genetics & Cell Biology Neurobiology Neuroscience Obstetrics & Gynecology Pathology Pulmonary/Critical Care Rheumatology Science & Education Surgery
Total Storage Used
151,791 GB
0
34
10,000 GB
20,000 GB
CRI Annual Report 2012-13
30,000 GB
80,000 GB
Systems and Security
Technical Support To complement our improved computing
The CRI’s help desk was opened in sum-
infrastructure, the Systems and Security
mer 2012 with phone and email support
team has made several important steps
staffed by employees who either resolve
over the past year to enhance customer
or triage issues. Concurrently, we intro-
service and technical support for users
duced a new, easier-to-use system for
of our resources.
submitting
trouble
tickets.
Including
Systems and Security team members Beth Lynn Eicher, Sneha Jha, Plamen Martinov, Olumide Kehinde, Dan Sullivan, Brad Orr, Mike Jarsulic, and Bruce Thompson
cri.uchicago.edu
35
Systems and Security
Bruce Thompson and Beth Lynn Eicher in Kenwood Data Center
working through the initial backlog of
seminars in November and December
unresolved tickets, the support staff
highlighted our HPC cluster, Galaxy, and
has now resolved almost ten thousand
i2b2, with each seminar focused on a live
issues. The CRI also created a web portal
demonstration of one resource. In March
to make it easier for users to find sources
and April, the team hosted HPC Lunch
of technical help.
& Learn events tailored to existing and
In addition, the Systems and Security team hosted several events throughout the year to educate users about CRI resources and specific issues. See IT Live!
36
potential HPC users, both to introduce our newest resources and spread the word about our migration to Kenwood. For more detail, see pages 57-58.
CRI Annual Report 2012-13
Systems and Security faculty spotlight
Fabrice Smieliauskas Fabrice Smieliauskas, PhD, is a health economist whose research interests center on the operation of markets for medical technologies. His work includes studies of financial conflicts of interest in medicine and of disparities in the adoption and abandonment of new medical technologies. Dr. Smieliauskas primarily uses SAS and Stata to perform statistical analyses for his research. The focus of Dr. Smieliauskas’s ongoing work is the evidence base for new medical technologies. He is analyzing the response of payers and providers to evidence that a common medical treatment is of limited value to patients, as well as the response to state and federal policies that mandate coverage of drugs for “off-label” indications not approved by the FDA. He is also developing a unique comprehensive database on clinical cancer trials in order to address several open research questions, including the effects of a variety of governmental and institutional policies on the rate and direction of cancer innovation. Dr. Smieliauskas’s research has been enabled in part by the CRI’s Large Memory Server. Before moving his work to the CRI’s infrastructure, he encountered problems using servers that were not capable of handling the large memory requirements of his research. He was deterred from other potential options by high user fees and inability to handle sensitive data in compliance with HIPAA. “By contrast,” he noted, “the new CRI server is able to handle confidential data securely, essential to much of the research at the BSD.” Dr. Smieliauskas also spoke positively about the support provided by the Systems and Security team, saying, “The server administrative team is friendly and responds promptly to user needs and requests. I also have great confidence in CRI management and their desire and ability to continue adding capacity and computing capabilities as they grow.”
cri.uchicago.edu
37
Systems and Security
Data Security and Compliance Secure handling of sensitive human sub-
electronic protected health information
ject data is of the utmost importance to
(ePHI). For this reason, it is essential that
the CRI’s Systems and Security team. The
our infrastructure and security policies
CRI is the primary computing resource
comply with relevant federal guidelines,
for
including HIPAA.
BSD
researchers
working
with
Plamen Martinov and Olumide Kehinde in Kenwood Data Center
38
CRI Annual Report 2012-13
Systems and Security
our future
Our Goals for 2013-14 • Redesign and implement secure, stable, and sustainable computing infrastructure and resources at Kenwood Data Center • Complete the Prudential-to-Kenwood data center move by migrating or discontinuing all resources, while maintaining the integrity of all user data • Update and deliver improved customer service policies, procedures, services, and automation • Improve communication internally and within the user community • Enhance the skill set of each team member and of the team as a whole
To this end, the Research Informatics
committee is also responsible for con-
Compliance Review Committee, made
ducting regular audits to ensure com-
up of IT security professionals from
pliance to these important standards.
other IT organizations across the Uni-
(For more information on this commit-
versity, spent several months drafting,
tee, see page 63.) In addition, the CRI
editing, and approving a set of policies
retains experts in FISMA and HIPAA as
and procedures for data protection. This
consultants.
cri.uchicago.edu
39
BIOINFORMATICS CORE
Bioinformatics Core
our leadership
Director of Bioinformatics
Jorge Andrade, PhD
As the technical director responsible for planning and oversight of the Bioinformatics Core, Jorge works closely with CRI leadership to develop and deliver bioinformatics services and expertise. Jorge joined the CRI in May 2012 and brings extensive experience in the pharmaceutical industry and scientific research community, most recently at the Beijing Genome Institute, where he was an Associate Director.
Bioinformatics Analysis Pipelines The
analysis
that
transforms
high-Â
throughput raw data into biologically meaningful information can present a challenge to clinical, translational, and basic researchers alike. To make it easier for BSD investigators to take full advantage of high-throughput technologies in their research, the Bioinformatics Core has developed a set of pipelines for the
42
analysis of Next-Generation Sequencing and Microarray data. These
automated
pipelines
quickly
absorb and process large amounts of raw data and produce meaningful analysis results. They are executed on the CRI’s high-performance
computing
(HPC)
cluster, which provides the significant
CRI Annual Report 2012-13
Bioinformatics Core
computational
power
necessary
for
In addition, the Bioinformatics Core
working with such large quantities of
maintains a catalog of publicly-available
data. Researchers interested in using
and commercial software tools, refer-
one of the CRI’s pipelines for their data
ence datasets, and databases for use by
analysis can request this on the CRI
the BSD research community. A com-
website. The Core currently offers a
plete list of these resources is available
total of 12 production-ready pipelines
on the CRI’s website.
for a variety of platforms and analyses.
The Core currently offers production-ready pipelines for the following platforms and analyses:
Illumina pipelines for RNA-Seq, ChIP-Seq, Exome Sequencing, Whole Genome Re-Sequencing (WGRS), Consensus Genotyping, and De-Novo Assembly SOLiD pipelines for RNA-Seq, WGRS, ChIP-Seq, and De-Novo Assembly One pipeline for Illumina and Affymetrix Expression Arrays One pipeline for Affymetrix and Exiquon miRNA Arrays
For more information on what each of these pipelines offers, see Appendix B.
cri.uchicago.edu
43
Bioinformatics Core faculty spotlight
Kenan Onel Kenan Onel, MD, PhD, is an Associate Professor of Pediatrics in the Section of Hematology/Oncology and Director of the Pediatric Familial Cancer Clinic. He is an expert on pediatric and other familial genetic cancer syndromes. The Onel Lab uses genomic platforms and systems biology strategies to investigate how genetics contribute to cancer risk and response to therapy. The lab studies families with high-penetrance cancer-predisposing conditions, but no known cancer-predisposing gene mutations, in order to discover new genes that may, when mutated, predispose individuals to cancer. Recently, they have begun to advance this research by taking advantage of the CRI’s Next-Generation Sequencing offerings. Dr. Onel reports, “The Bioinformatics Core has been instrumental in pushing forward our work because of their expertise in handling genomic data.” Dr. Onel’s lab used the CRI’s Galaxy instance to develop a pipeline for exome analysis. In addition, they worked with the CRI’s team of bioinformaticians to develop a powerful command-line pipeline utilizing multiple aligners and callers, allowing a robust analysis over a large number of family studies. The Onel Lab and the CRI bioinformaticians continue to meet weekly to discuss progress. According to Dr. Onel, “The analysts have been intellectually engaged in the projects and extremely professional. They have helped us develop methods for analysis of family data that we could not have done on our own.”
44
CRI Annual Report 2012-13
Bioinformatics Core
Self-Service Data Analysis: Galaxy For
researchers
interested
in
self-
to take advantage of our HPC and
service data analysis, the CRI maintains
large-scale storage resources within a
a customized version of Galaxy, an
self-service data analysis environment.
open-source
workflow
Galaxy can substantially facilitate the
management and system integration
use of common bioinformatics tools by
tool. The CRI’s Galaxy instance is inte-
non-bioinformaticians and those without
grated with our advanced computing
extensive computing expertise.
infrastructure,
bioinformatics
allowing
researchers
our contributions
Highlighted Accomplishments from 2012-13 • Developed, tested, and implemented 12 production-ready pipelines, some of which are currently in use for the NIH-funded Bionimbus Protected Data Cloud contract • Expanded the Core to a team of seven scientists • Hosted monthly training seminars which have drawn over 350 total participants • Developed a project management and invoicing system now used as a template by other BSD Cores • Launched the CRI’s implementation of Galaxy
cri.uchicago.edu
45
Bioinformatics Core
A selection of completed analysis projects illustrates the diversity of the research facilitated by the Bioinformatics Core:
The CRI’s implementation of the Galaxy framework includes workflows for several Next-Generation Sequencing pipelines to enable users to perform, reproduce,
Genome assembly and annotation of the Siberian hamster genome, using SOLiD sequences generated at the University of Chicago Genomics Core facility
and share complete analyses. Available workflows in Galaxy include: •
RNA-Seq: Sample Level for quality control, mapping, and statistics for paired-end Illumina reads (individual
ChIP-Seq data analysis for a project studying how hyperglycemia induces epigenetic changes that lead to renal injury
samples) •
RNA-Seq: Project Level Merge for merging multiple samples and generating
A gene expression profile analysis of the rat brain in response to perimenopausal hormonal signals
a
differentially
expressed
list, for both single- and paired-end Illumina reads •
Exome Sequencing Analysis for quality control, mapping, and recalibration for both single- and paired-end
RNA-Seq analysis of differential gene expression between two types of melanoma tumors
Illumina reads The
CRI’s
Galaxy
platform
became
available to researchers in November 2012, concurrent with the release of our
Quality trimming of large sets of Illumina sequences for a study of nasal microbiota of people with chronic allergies
updated HPC and storage resources. It now supports around 40 active users and is maintained and updated with new tools, workflows, and pipelines in collaboration with the CRI’s Systems and
Exome-wide analysis of 60 samples from four cohorts for research on the genetic components of several cancer types
46
Security team.
CRI Annual Report 2012-13
Bioinformatics Core
Bioinformatics Core team members Jorge Andrade, Wenjun Kang, Riyue Bao, Jianpeng Xu, Chunling Zhang, and Lei Huang
cri.uchicago.edu
47
Bioinformatics Core
Collaboration and Custom Analysis For researchers who are looking for
expertise extends to areas beyond those
personalized analysis, including cus-
of our standard pipelines, including pro-
tom-built pipelines, the Bioinformatics
teomics and genome-wide association
Core provides consulting and custom-
studies.
ized services. Our bioinformaticians’
Who uses the Bioinformatics Core? Since May 2012, the Core has received project requests from 18 different University departments and several external organizations.
Ben May Biochemistry Molecular Biology Ecology & Evolution External Organizations Health Studies Human Genetics Medicine Microbiology Molecular Genetics Neurobiology Neurology Obstetrics & Gynecology Organismal Biology Pathology Pediatrics Radiation Oncology Social Sciences Surgery
Total Project Requests
99
5
48
10
15
CRI Annual Report 2012-13
20
25
30
Bioinformatics Core
The Bioinformatics Core has completed 53 projects since May 2012, with 24 currently in progress and new requests submitted each month. Sept. 2012
in progress
Oct. 2012
completed
Nov. 2012
Total Projects Completed
submitted
Dec. 2012
53
Jan. 2013 Feb. 2013 Mar. 2013 Apr. 2013 May 2013 June 2013 July 2013 10
20
30
40
50
60
When a researcher submits an online
Since May 2012, the Core’s bioinfor-
project request form, the Core returns a
maticians have completed a total of 53
proposal, including the scope of deliver-
projects, with many more in progress. An
ables, a timeline for completion, and the
average of seven new project requests
estimated cost. The execution of each
are submitted each month. These proj-
project is guided by frequent discussion
ects, conducted for researchers from
between researchers and bioinformati-
over 25 different University departments
cians, and updates are provided regularly.
and sections, vary widely in both scope
When a project is complete, results are
and subject matter (for examples, see
delivered in the form of a written report
sidebar on page 46).
and presentation.
cri.uchicago.edu
49
Bioinformatics Core
Other Services: Grant Analysis and Training The Bioinformatics Core supports the
indicated. The Core has so far contrib-
creation of research grants in several
uted to the writing and submission of 13
ways, with the goal of fully developing
research grants, and has established this
and integrating the bioinformatics com-
as an area of focus for future growth.
ponents of each grant and increasing its competitiveness for funding. CRI bioinformaticians can collaborate with researchers directly on the bioinformatics components of their grants, or they can provide cost analysis services and letters of support. In addition, standard language is available to be added to grants, documenting the accessibility of the necessary tools and expertise to complete the bioinformatics research
One more important part of the Core’s mission is to provide training opportunities that will help investigators develop bioinformatics expertise within their own laboratories. To this end, the CRI hosts a free monthly training seminar open to all members of the BSD, covering a different topic in bioinformatics analysis each month. For more detail and a list of past topics, see pages 54-55.
Director of Bioinformatics Jorge Andrade
50
CRI Annual Report 2012-13
Bioinformatics Core
our future
Our Goals for 2013-14 • Recruit and hire two additional Bioinformatics Scientists • Improve efficiency by reducing the standard turnaround time of projects in the following production pipelines: Exome-Seq, RNASeq, and Microarray expression arrays • Increase customer satisfaction by producing high-quality and customer-oriented services (to this end, the Core has already introduced a feedback survey to gather information on potential areas of improvement) • Improve existing pipelines by performing comparative analysis of tools, and develop new pipelines to accommodate new technologies, protocols, and data types • Continue to author and coauthor scientific publications • Increase interest in the Core by developing advertising materials and meeting with department chairs • Increase the Core’s funding through chargebacks, grant inclusion, and expanding internal and external collaboration
cri.uchicago.edu
51
Bioinformatics Core
faculty spotlight
Ernst Lengyel Ernst Lengyel, MD, PhD, is a Professor of Obstetrics/Gynecology, specializing in advanced surgical treatments for patients with ovarian cancer. The Lengyel Lab is dedicated to studying the biology of ovarian cancer metastasis and finding new drugs for its treatment. One of the scientific goals of the Lengyel Lab is to understand the mechanisms of a common problem in the treatment of ovarian cancer: that most patients will develop a resistance to carboplatin and taxol chemotherapy. To study this, the lab sought to identify the miRNA expression profiles of chemoresistant versus chemosensitive patients. They worked with the CRI’s Bioinformatics Core to obtain and analyze these genomic data. With the assistance of the CRI’s bioinformaticians, the lab mined the Cancer Genome Atlas (TCGA) and distinguished unique patient groups of chemosensitive and chemoresistant patients. They then performed six analysis sets, and were able to identify seven miRNA genes upregulated in chemoresistant disease and three in very chemosensitive disease. These findings have since been validated in an independent cohort of patients. The Lengyel Lab’s findings, enabled in part by the CRI, have prognostic and functional implications that may aid in developing therapies to target these miRNA in chemoresistant patients. Dr. Lengyel said, “Without the CRI we would not have had the expertise to take advantage of the TCGA ovarian cancer data.”
52
CRI Annual Report 2012-13
TRAINING & EDUCATION
Training and Education
An integral part of the CRI’s mission is providing training and education for our users so that they become comfortable and confident both with our resources and with other technologies for biological computing.
Bioinformatics Training The Bioinformatics Core presents a free
users after each seminar in an effort to
training seminar with a different topic
improve future sessions and cover the
each month, taught by PhD bioinforma-
topics most important to researchers.
ticians. These seminars cover the use
Of those responding to the survey in
and application of a variety of publicly
February through July 2013, 96 percent
and
software
found their course worthwhile, with 80
and tools for bioinformatics analysis—R,
percent choosing “very” or “extremely”
Bioconductor, and Galaxy, for example.
worthwhile. The CRI’s bioinformaticians
In some cases, the training is directly
have received high marks, with 96 per-
tied in to CRI resources such as our
cent of respondents calling their instruc-
HPC cluster. As investigators bring this
tors “very knowledgeable.” Overall, 92
education back to their laboratories,
percent of respondents reported satis-
bioinformatics expertise can be further
faction with the course they attended.
developed throughout the BSD. Since
Survey participants also provided sug-
these seminars began in May 2012, they
gestions and requests for future seminar
have attracted over 350 participants.
topics, helping the Bioinformatics Core
commercially
available
A post-training survey, introduced in February 2013, requests opinions from
54
to continue to design valuable training opportunities that meet the needs of our research community.
CRI Annual Report 2012-13
Training and Education
Past training seminars have covered a range of systems and software programs useful for bioinformatics analysis.
date
topic
attendance
7/2013
Introduction to Linux Command Line for Bioinformatics
24
6/2013
Introduction to Linux Command Line for Bioinformatics
37
5/2013
Analyzing Illumina ChIP-Seq Data with the CRI
32
4/2013
Introduction to CRI’s HPC Cluster for Bioinformatics Computing
34
3/2013
Analysis of Illumina and Microarray Data with R and Bioconductor
12
2/2013
Analyzing Illumina RNA-Seq Data with the CRI
25
1/2013
Analysis of Microarrays with R and Bioconductor
22
11/2012
Analyzing Illumina Whole Exome Data with the CRI
31
9/2012
Galaxy: Web-Based Bioinformatics Analysis and RNA-Seq Workflow Management
40
8/2012
Analyzing Illumina RNA-Seq Data with the CRI
41
7/2012
Analysis of Microarray Data with R and Bioconductor
16
6/2012
Introduction to R and Execution on HPC
12
5/2012
Introduction to R
25
cri.uchicago.edu
55
Training and Education
What bioinformatics training participants had to say...
Other CRI Training The CRI provides individual and smallgroup training sessions upon request for researchers who need assistance with using REDCap for their studies. Julissa
“Very approachable and helpful instructors.” (June 2013)
Acevedo, the CRI’s Business Systems Analyst, holds an average of eight training and demo sessions per month for groups of one to three researchers at a time.
“I thought it was really cool and I learned a lot about how to navigate around a Linux system. Great job!” (June 2013)
In addition, the CRI hosted several events over the past year with the goal of sharing information about our computing resources with existing and potential users. Three See IT Live! seminars were held in
“Very helpful.” (April 2013)
November and December, aligning with the release of new CRI resources. Each seminar was centered on a live demonstration of the featured resource and
“Well organized, very knowledgeable, and well presented.” (March 2013)
included an overview of the CRI, instructions on obtaining an account, and an opportunity to meet our technical staff and ask questions, as well as free refreshments. These events were open to all
“Very good training class, need more like this.” (February 2013)
56
members of the BSD and were attended by a total of 32 participants.
CRI Annual Report 2012-13
Training and Education
See IT Live! sessions highlighted the
i2b2: Learn how to use our de-identified
following resources:
datamart to identify cohorts and request
Galaxy: Learn how to access Galaxy, a
data for your research
web-based portal providing data stor-
HPC Cluster: Learn how to optimize and
age, data management, and analytical
run jobs on our new high-performance
tools integrated with our computing
computing cluster
resources
Jorge Andrade presents a bioinformatics training seminar
cri.uchicago.edu
57
Training and Education
In addition to the HPC Cluster session
Kenwood. Each event included a discus-
of See IT Live!, the CRI’s Systems and
sion of available resources in Kenwood,
Security team hosted two Lunch & Learn
the rationale and timeline for moving out
events in the spring to further educate
of Prudential, and a question and answer
HPC users about our available resources
session. These events were advertised to
and to provide an overview of the data
both potential and current users and were
center migration from Prudential to
attended by a total of 32 participants.
Don Saner presents at the CRI’s first HPC Lunch & Learn event
58
CRI Annual Report 2012-13
FACULTY OVERSIGHT & GOVERNANCE
Faculty Oversight and Governance
Governance Structure The CRI’s strategic decision-making and
informatics activities across the entire
long-term planning are led by a gover-
BSD, ensuring that informed long-term
nance structure set up by the Office of the
decisions for the Division are reached in
CRIO. These committees guide research
a transparent and accountable way.
Research Informatics Executive Governance Committee
Research Informatics Governance Committee
Research Informatics Technical Policy Committee Research Informatics Data Use Committee Research Informatics Compliance Review Committee
60
CRI Annual Report 2012-13
Faculty Oversight and Governance
The five committees outlined below
establishing policies and procedures,
bring together senior BSD and University
prioritizing new initiatives, safeguarding
of Chicago Medicine (UCM) leadership,
patient information, and complying with
information systems experts, patient
BSD policies and applicable federal and
privacy experts, and faculty represent-
state laws.
ing basic science, clinical research, and translational research. Decisions from these committees guide us in
For a full list of governance committee membership, see Appendix C.
Research Informatics Executive Governance Committee Chair
Mission
Dr. Kenneth Polonsky, Dean and Executive Vice President for Medical Affairs
To provide high-level strategic decisions for all research informatics activities across the BSD, integrating the needs of faculty, clinicians, and BSD and hospital leadership
Members
BSD and UCM executive leadership
cri.uchicago.edu
61
Faculty Oversight and Governance
Research Informatics Governance Committee Chair
Dr. Robert Grossman, Chief Research Informatics Officer
To establish priorities and policies for research informatics across
Mission
the BSD, including those for the development and use of the CRDW and the comprehensive computing resources provided to BSD faculty
Members
Senior faculty and staff leadership from across the BSD and UCM
Research Informatics Technical Policy Committee Chair
Dr. Robert Grossman, Chief Research Informatics Officer
To provide oversight and governance for the technical aspects of
Mission
research informatics across the BSD and to ensure appropriate safeguards for ePHI used in research
Members
62
Staff and faculty with expertise in informatics and information technology security
CRI Annual Report 2012-13
Faculty Oversight and Governance
Research Informatics Data Use Committee Chair
Dr. Dana Edelson, Assistant Professor of Medicine
To review, approve, monitor, and prioritize requests for CRDW data
Mission
release to individual investigators and to approved datamarts and systems, including i2b2 and any subsequently-developed datamarts
Members
Staff and faculty with expertise in regulatory issues, compliance, and data management
Research Informatics Compliance Review Committee Chair
Tyler DeNormandie, Information Systems Manager and Senior Systems Engineer, Health Studies and Family Medicine To advise the CRI Systems and Security group on best practices for
Mission
Members
ensuring compliance with BSD policies and to help ensure the CRI’s implementation of appropriate safeguards for ePHI used in research Information technology security experts representing the major IT organizations at the University and UCM
cri.uchicago.edu
63
Faculty Oversight and Governance
Faculty Oversight Guidance and oversight for research
to the Research Advisory Committee,
informatics throughout the BSD are
a BSD/UCM Committee that reports to
provided by the Informatics Oversight
the Dean for Research and Graduate
Committee, made up of faculty leaders
Education.
representing both basic science and clinical departments. The recommendations of this committee guide us in ensuring that the direction and activities of the CRI are in line with the needs of the research faculty we serve. This committee reports
The Informatics Oversight Committee is chaired by Dr. John Cunningham, Chief of the Section of Pediatric Hematology/ Oncology. For a full list of members, see Appendix D.
Research Advisory Committee
Informatics Oversight Committee
Office of the CRIO
64
CRI Annual Report 2012-13
OUR PARTNERS The Center for Research Informatics is grateful for the support of our strategic partners and collaborators across the University of Chicago. These partners enable the collaborative work that has made the CRI successful in our first years of operations.
Biological Sciences Division bsd.uchicago.edu
Chicago Biomedicine Information Systems help.bsd.uchicago.edu
Comprehensive Cancer Center cancer.uchicago.edu
Computation Institute ci.uchicago.edu
cri.uchicago.edu
65
Our Partners
Human Imaging Research Office hiro.bsd.uchicago.edu
Institute for Genomics & Systems Biology igsb.anl.gov
Institute for Translational Medicine itm.uchicago.edu
Institutional Review Board humansubjects.uchicago.edu
IT Services itservices.uchicago.edu
Office of Clinical Research bsdocr.bsd.uchicago.edu
66
CRI Annual Report 2012-13
LOOKING AHEAD The Center for Research Informatics has achieved many important goals over the past two years. The establishment of the Clinical Research Data Warehouse, the design and implementation of a state-of-the-art high-performance computing and storage infrastructure that can house protected health information, and the development of a solid and robust Bioinformatics Core are all providing the foundation of support for a large number of research programs throughout the BSD. Many grants, papers, and research projects owe part of their success to the services delivered by the CRI. Providing services and training to enable high-quality scientific research was our goal when we were established, and we are happy to have achieved this level of success in the short period of time since the CRI was founded. Building on these accomplishments, we plan to improve and expand our offerings in several ways over the coming years: Our Clinical Research Data Warehouse now serves BSD faculty by offering cohort discovery tools able to search over millions of patient encounters. The next phase for the CRDW will include the development of a de-identified datamart and tools for analyzing de-identified data in a secure manner. We will further enhance the CRDW by including full-text search functionality for pathology and radiology notes. We are starting now to work with researchers to use data from the CRDW to build predictive models that produce alerts to improve hospital operations and quality of care. From identifying patients at risk for readmission to the hospital to predicting which patients may suffer a cardiac arrest while inpatient, we are leading the way for clinical researchers to design, develop, and use complex alerts in their practice. We are working closely with CBIS to ensure that we will be able to implement alert notifications within the Epic electronic medical record.
cri.uchicago.edu
67
Looking Ahead
In the next year we will further expand our high-performance computational resources to provide increased capacity, functionality, and performance with the goal of ensuring that all faculty have access to agile and advanced computing. Coupled with these efforts, we will expand our training and educational offerings to enable more faculty to leverage these computing tools to enhance their research. The CRI is fully invested in advancing world-class research within the Biological Sciences Division. Our successes thus far and our ambitious plan going forward demonstrate our commitment to this goal.
A CRI weekly planning meeting
68
CRI Annual Report 2012-13
APPENDIX
Appendix Appendix A: CRI Staff List Samuel Volchenboum, MD, PhD
Seong Choi
Director & Associate CRIO
Programmer
Administration
Tiffany Cyrus Project Manager
Hannah Lawrence Executive Administrator
Keith Danahey Database/Systems Administrator and
Michael Daus
Programmer
Administrative Specialist
Brian Furner
Caitlin Pike
Manager of Programming
Communication Specialist
Timothy Holper
Bioinformatics Core
Manager of CRDW Development
Jorge Andrade, PhD
Kevin Le
Director of Bioinformatics
Programmer/Analyst
Riyue Bao, PhD
Luis Maciel
Bioinformatician
Database Administrator
Elizabeth Bartom, PhD
Systems and Security
Bioinformatician
Plamen Martinov
Kyle Hernandez, PhD
Director of Systems and Security
Bioinformatician
Andy Brook
Lei Huang, PhD
Senior Systems Administrator
Bioinformatician
Beth Lynn Eicher
Wenjun Kang, MS
Senior Systems Administrator
Scientific Programmer
Michael Jarsulic
Jianpeng Xu, PhD
Senior Systems Administrator
Bioinformatician
Sneha Jha
Chunling Zhang, PhD
Systems Administrator
Bioinformatician Clinical and Translational Informatics Don Saner Director of Clinical and Translational Informatics Julissa Acevedo Business Systems Analyst
70
Olumide Kehinde Senior Systems Administrator Brad Orr Senior Project Manager Bruce Thompson Security Analyst 
CRI Annual Report 2012-13
Appendix Appendix B: Bioinformatics Core Pipelines Illumina RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis ChiP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation Exome Sequencing: Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, and Annotation Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation Consensus Genotyping Pipeline: Genotyping, SNP Detection & InDel Detection using three different methods (Samtools, GATK, and Atlas-2), comparison of variant calls, list of consensus call variants, and list of method specific calls De-Novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assembly, Scaffold Assembly, Assemble Statistics, and Downstream Analysis SOLiD RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation ChiP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation De-Novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assembly, Scaffold Assembly, Assemble Statistics, and Downstream Analysis Illumina and Affymetrix Expression Arrays Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed Genes, Functional Annotation, and Pathway Enrichment Analysis Affymetrix and Exiquon miRNA Arrays Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed miRNAs, Predict miRNA Targeted Genes, Functional Annotation, and Pathway Enrichment Analysis 
cri.uchicago.edu
71
Appendix Appendix C: Research Informatics Governance Committees
Research Informatics Executive Governance Committee Name
Title and Affiliation(s)
Kenneth Polonsky (Chair)
Dean and Executive Vice President for Medical Affairs
Conrad Gilliam
Dean for Research and Graduate Education
Robert Grossman
Chief Research Informatics Officer, BSD
Sharon O’Keefe
President, UCM
Eric Yablonka
Vice President and Chief Information Officer, CBIS
Research Informatics Governance Committee
72
Name
Title and Affiliation(s)
Robert Grossman (Chair)
Chief Research Informatics Officer, BSD
Sameer Badlani
Chief Medical Information Officer, UCM
John Cunningham
Chief, Section of Pediatric Hematology/Oncology
Chris Daugherty
Chair, Institutional Review Board
Dana Edelson
Assistant Professor of Medicine
Conrad Gilliam
Dean for Research and Graduate Education
Marilyn Hanzal
Associate General Counsel, Legal Affairs
Catherine Ostapina
Senior Compliance Advisor & Director, Office of Corporate Compliance
Lainie Ross
Professor of Pediatrics, Medicine, and Surgery
Julian Solway
Associate Dean for Translational Medicine
Walter Stadler
Associate Dean for Clinical Research
Samuel Volchenboum
Director, CRI
Eric Yablonka
Vice President and Chief Information Officer, CBIS
CRI Annual Report 2012-13
Appendix
Research Informatics Technical Policy Committee Name
Title and Affiliation(s)
Robert Grossman (Chair)
Chief Research Informatics Officer, BSD
Paul Chang
Professor of Radiology; Vice Chair of Radiology Informatics
Tyler DeNormandie
Information Systems Manager and Senior Systems Engineer, Health Studies
Roger Engelmann
Image Analysis Software Developer, Human Imaging Research Office
Rajan Gopalakrishnan
Director for Informatics and Information Technology, Comprehensive Cancer Center
John Moses
Director of Enterprise Architecture and New Technologies, UCM
Prasanna Nippani
Assistant Director of Information Technology, UCM
Don Saner (Co-Chair)
Director of Clinical and Translational Informatics, CRI
Samuel Volchenboum
Director, CRI
Research Informatics Data Use Committee Name
Title and Affiliation(s)
Dana Edelson (Chair)
Assistant Professor of Medicine
Samuel Armato
Associate Professor of Radiology
Rajan Gopalakrishnan
Director for Informatics and Information Technology, Comprehensive Cancer Center
Nick Gruszauskas
Technical Director, Human Imaging Research Office
Contessa Hsu
Application Manager, UCM
Millie Maleckar
Director of Regulatory Compliance for Human Subjects, Institutional Review Board
Prasanna Nippani
Assistant Director of Information Technology, UCM
Don Saner
Director of Clinical and Translational Informatics, CRI
Phil Schumm
Senior Biostatistician, Health Studies; Director, Research Computing Group
Cassie Simon
Assistant Director, UCM Cancer Registry
cri.uchicago.edu
73
Appendix
Research Informatics Compliance Review Committee Name
Title and Affiliation(s)
Tyler DeNormandie (Chair)
Information Systems Manager and Senior Systems Engineer, Health Studies
James Clark
Network Security Officer, IT Services
Andrew Kramski
Infrastructure Security Engineer, UCM
Plamen Martinov
Director of Systems and Security, CRI
Catherine Ostapina
Senior Compliance Advisor & Director, Office of Corporate Compliance
Daniel Sullivan
Web Developer and Infrastructure Architect Specialist, CBIS
Bruce Thompson
Security Analyst, CRI
Appendix D: Faculty Oversight
Informatics Oversight Committee
74
Name
Title and Affiliation(s)
John Cunningham (Chair)
Chief, Section of Pediatric Hematology/Oncology
Michael Glotzer
Professor of Molecular Genetics and Cell Biology
Robert Grossman
Chief Research Informatics Officer, BSD
Michelle Le Beau
Director, Comprehensive Cancer Center
Marsha Rosner
Chair, Ben May Department for Cancer Research
Robert Rosner
Professor of Astronomy/Astrophysics and Physics
Matthew Stephens
Professor of Human Genetics and Statistics
Ronald Thisted
Professor of Statistics, Health Studies, and Anesthesia/Critical Care
Samuel Volchenboum
Director, CRI
CRI Annual Report 2012-13
Š The University of Chicago, 2013. All rights reserved. Written and designed by Caitlin Pike. Photography by Robert Kozloff.