CRI Annual Report 2012-2013

Page 1

University of Chicago

Center for Research Informatics

ANNUAL REPORT

2012-2013


Message from the BSD Chief Research Informatics Officer The Center for Research Informatics (CRI) was set up two years ago in August 2011 to provide services and resources to support biomedical informatics. The CRI views biomedical informatics very broadly to include bioinformatics, clinical informatics, translational informatics, and health care informatics. As is detailed in this report, during the past two years, the CRI has set up a Clinical Research Data Warehouse and a Bioinformatics Core, updated the BSD’s high-performance computing cluster, and made secure and compliant storage and computing resources available to every BSD researcher. The Office of the CRIO has also set up a governance structure, bringing together BSD and UCM leadership and experts in information systems, patient privacy, and a variety of research fields to guide us in making long-term decisions that serve our researchers, comply with all relevant laws and policies, and protect our patients’ data. In addition, the Office of the CRIO has supported initiatives such as setting up a secure and compliant computing infrastructure so that researchers can more quickly and easily analyze large-scale genomic datasets. As we move forward and continue to grow, it is important that we hear from faculty so that we can provide the services, resources, and education that are important to you. I am very interested in hearing from every BSD researcher to make sure that your needs are met. You can contact me directly or talk to any member of the BSD Research Informatics Oversight Committee (you can find their names on the CRI website and on page 74 of this report). We look forward to hearing from you.

Robert Grossman, Ph.D.


Center for Research Informatics Annual Report 2012-13

TABLE OF CONTENTS Who We Are Clinical and Translational Informatics

2 12

Systems and Security

26

Bioinformatics Core

40

Training and Education

53

Faculty Oversight and Governance

59

Our Partners

65

Looking Ahead

67

Appendix

69



WHO WE ARE


Who We Are

A Letter from the Director As a physician scientist and informaticist, I know firsthand the problems facing basic researchers, translational medicine specialists, and clinicians. Deficiencies in any part of the pipeline can have serious downstream effects. A productive translational research operation requires a solid and secure infrastructure, easy access to powerful analytical tools and high-performance computing, expertise in complex study design and data analysis, and the ability to create and implement platforms for collecting, storing, studying, and presenting research data. When the Center for Research Informatics was established two years ago, the path from the bench to the bedside at the University of Chicago was fragmented and inefficient, serving neither basic scientists nor clinicians well. In the short time since, the CRI has grown into an active and versatile group of professionals, capable of executing on a wide range of technologies to enable worldclass biomedical research. To support these endeavors, we have built an industrial-grade, HIPAA-compliant secure infrastructure, able to store and compute over even the largest and most complex datasets. Every research group, from the basic sciences to clinical faculty, has access to the storage and computing resources provided by the CRI. The Bioinformatics Core takes on the most intricate and complicated data analysis tasks, working closely with investigators to advise on data collection, perform complex computations, and render advanced interpretation of the results. Nearly one hundred groups have taken advantage of the Core so far, and the number of grants and papers directly resulting from our assistance is growing every day. Our Clinical and Translational Group provides state-ofthe-art custom application development for investigators in need of specialized data collection or other assistance with their clinical studies. The crowning achievement of the Center’s first two years has been the design, building, and implementation of a robust Clinical Research Data Warehouse. Starting with just a data feed from the Centricity billing system, the CRDW now boasts data from over 600,000 patients in an easily searchable system that allows researchers to quickly determine cohort size for their studies. The CRI has the BSD’s only system capable of performing complex queries to study quality and other important clinical metrics. Looking ahead, we have many exciting plans for the CRI. In 2014, we will release a feature-rich system for accessing de-identified patient data in the CRDW. We will implement a full-text searchable system for querying pathology and radiology reports. We will further expand our computational infrastructure, doubling our HPC power

4

CRI Annual Report 2012-13


Who We Are

and increasing our storage capacity. We continue to develop, test, and roll out new bioinformatics methods, making these available to researchers through our core service offerings and as self-service workflows on our Galaxy platform. We hope that everyone in the BSD will take advantage of the array of resources offered by the CRI. We look forward to continuing our mission of providing advanced informatics support to enable world-class biomedical research throughout the BSD.

Samuel Volchenboum, MD, PhD

our leadership

Director and Associate CRIO

Samuel Volchenboum, MD, PhD Sam has been a part of the CRI since May 2012 in his role as Associate Chief Research Informatics Officer, leading our faculty outreach and education efforts. In April 2013 he was appointed Director of the CRI and now leads our operations and strategic planning. In addition to his work in the CRI, Sam serves the Department of Pediatrics as Assistant Professor, is an Associate Director of the Institute for Translational Medicine, and is a Faculty Fellow in the Computation Institute. His research includes using proteomics to study neuroblastoma, a pediatric solid tumor; developing software to facilitate realtime mass spectrometry peptide identification; and creating tools to improve provider communication and patient care.

cri.uchicago.edu

5


Who We Are About the CRI The Center for Research Informatics

students, postdoctoral fellows, tech-

(CRI) was created in 2011 to support the

nicians, staff, faculty, researchers, and

University of Chicago Biological Sciences

collaborators. In addition, we support

Division (BSD) research community. We

research and education in informatics

offer state-of-the-art, standards-com-

and work with the Institute for Trans-

pliant technologies for the acquisition,

lational Medicine (ITM), Clinical and

management, and storage of clinical,

Translational Science Awards (CTSA)

translational, and basic research data.

program, and other partners on joint

Our resources and services are open to all members of the BSD; users include

initiatives. Finally, we are strong advocates for informatics within our research communities.

How We’ve Grown: A Timeline Since 2011, when the Office of the CRIO was created to guide research informatics efforts across the BSD, the Center for Research Informatics has grown into a robust service organization. In our first year, we developed a mission, hired several of the Directors who would lead our major initiatives, and began work on key projects. Since early 2012, we’ve seen several important projects come to fruition: a functioning and growing Clinical Research Data Warehouse, with faculty data requests being evaluated by our Data Use Committee and fulfilled by our staff; a Bioinformatics Core providing pipelines, consulting, training, and other services; and an improved, HIPAA-compliant computing and storage infrastructure for researchers. As we work to improve and expand our existing services, we will continue to reach out to faculty to increase user adoption and develop new initiatives. We look forward to seeing where the next year of our timeline will bring us.

February 2011 The Office of the CRIO is created with a charge of overseeing and directing research informatics across the BSD.

6

August 2011 Robert Grossman is appointed CRIO.

CRI Annual Report 2012-13

The CRI is created as a central organization for informatics service, research, and education.


Who We Are

OUR MISSION The CRI’s mission is to provide informatics resources and services to BSD faculty, to support high-quality biomedical and clinical research, and to promote research and education in informatics.

The CRI’s services and resources are

with genomic data. The Systems and

provided by three core groups. The

Security team maintains and improves

Clinical and Translational Informatics

our scientific computing infrastructure,

team manages our Clinical Research

provides technical support to users, and

Data

ensures our technical compliance with

Warehouse

(CRDW),

provides

custom programming for initiatives in

appropriate

clinical research, and supports data

three of these groups, along with our

management

The

administration, governance committees,

analy-

and strategic partners, work together

sis pipelines, consulting services, and

toward our goal of enabling world-class

other expertise for researchers working

research in a secure environment.

Bioinformatics

for

clinical

Core

Don Saner joins the CRI as Director of Clinical and Translational Informatics.

trials.

provides

Hannah Lawrence joins the CRI as Executive Administrator.

cri.uchicago.edu

security

regulations.

All

October 2011 The BSD Research Informatics Governance Structure is approved by the BSD Dean’s Office.

7


Who We Are

our leadership

Chief Research Informatics Officer

Robert Grossman, PhD

As Chief Research Informatics Officer, Bob guides informatics activities and initiatives across the BSD, including providing strategic direction and oversight for the CRI. In addition to his role as CRIO, Bob is a Senior Fellow in the Institute for Genomics and Systems Biology, a Senior Fellow in the Computation Institute, and a Professor of Medicine in the Section of Genetic Medicine. His research group focuses on big data, biomedical informatics, data science, cloud computing, and related areas. Bob served as the first Director of the CRI from August 2011 to April 2013.

The CRI is an initiative developed and

BSD’s investment in research informat-

managed by the Chief Research Infor-

ics, for managing research informatics

matics Officer (CRIO) and Associate

services and resources, for creating new

CRIO. The Office of the CRIO is respon-

research informatics initiatives, and for

sible for advising the Dean and the Dean

operating a Research Informatics Gover-

for Research and Graduate Education of

nance Structure.

the Biological Sciences Division on the

8

February 2012

May 2012

The IRB protocol for the CRDW is approved, and the system goes live for BSD researchers.

Sam Volchenboum is appointed Associate Chief Research Informatics Officer.

CRI Annual Report 2012-13

Jorge Andrade joins the CRI as Director of Bioinformatics.


Who We Are

The CRI’s administrative team: Caitlin Pike, Michael Daus, and Hannah Lawrence

The Bioinformatics Core begins offering services.

September 2012

October 2012

The Bioinformatics Core makes its core informatics pipelines available to researchers.

The cohort discovery tool i2b2 is released to facilitate queries of the CRDW for all BSD researchers.

cri.uchicago.edu

9


Who We Are

our leadership

Executive Administrator

Hannah Lawrence

As Executive Administrator of the CRI, Hannah is responsible for planning and oversight of our financial and administrative functions, including coordination across service areas, management of governance committees, communications, and project management for CRI initiatives. She works closely with our leadership team to develop short- and long-term organizational plans, manages our budget and hiring process, and provides other administrative and strategic support. Prior to joining the CRI, she served as Strategist and Planner in the Office of the Dean of the BSD. There, she was responsible for organizing the Informatics Advisory Group that ultimately led to the Dean’s decision to appoint a CRIO and create the CRI. She also served as the first administrative manager of the BSD’s Faculty Advisory Committee.

November 2012 The CRI makes available a new, HIPAA-compliant HPC cluster, storage and backup resources, and other computing infrastructure.

10

April 2013 Sam Volchenboum is appointed Director of the CRI.

CRI Annual Report 2012-13

Plamen Martinov joins the CRI as Director of Systems and Security.


Who We Are

Our Organization Robert Grossman Chief Research Informatics Officer

Sam Volchenboum Associate CRIO & Director of the CRI

Hannah Lawrence Executive Administrator

Don Saner Director of Clinical and Translational Informatics

Caitlin Pike Communication Specialist

Timothy Holper Manager of CRDW Development

Michael Daus Administrative Specialist

Brian Furner Manager of Programming Seong Choi Keith Danahey Kevin Le Programmers

Jorge Andrade Director of Bioinformatics

Plamen Martinov Director of Systems and Security

Riyue Bao Elizabeth Bartom Kyle Hernandez Lei Huang Jianpeng Xu Chunling Zhang Bioinformaticians

Andy Brook Beth Lynn Eicher Michael Jarsulic Sneha Jha Olumide Kehinde Systems Administrators

Wenjun Kang Scientific Programmer

Bruce Thompson Security Analyst

Julissa Acevedo Business Systems Analyst

Brad Orr Senior Project Manager

Luis Maciel Database Administrator Tiffany Cyrus Project Manager

cri.uchicago.edu

For a detailed list of CRI employees, please see Appendix A.

11



CLINICAL & TRANSLATIONAL INFORMATICS


Clinical and Translational Informatics

The CRDW Since the CRI’s beginnings, one of the

The CRDW incorporates six years’ worth

central projects of the Clinical and Trans-

of data from electronic medical records

lational team has been to build, populate,

and patient billing, including lab values,

and maintain the Clinical Research Data

procedure and diagnosis codes, demo-

Warehouse. Over the past two years,

graphics, medications, and visit informa-

the team has seen this initiative develop

tion. The Clinical and Translational team

from a concept to a functioning and

continues to work to expand the amount

growing data warehouse. The IRB proto-

and types of data available; they are

col outlining its standards, governance,

currently engaged in integrating radiol-

and oversight was approved in February

ogy and pathology notes and discharge

2012, and the team began fulfilling data

summaries.

requests for researchers in May 2012.

our leadership

Director of Clinical and Translational Informatics

Don Saner

Over the past two years, Don has led his team through the process of building and developing the Clinical Research Data Warehouse. In addition, he provides informatics leadership and support for other clinical and translational research projects in conjunction with the Institute for Translational Medicine and our other partners. With over 20 years of experience at the University of Chicago, Don joined the CRI as one of our first staff members in 2011.

14

CRI Annual Report 2012-13


Clinical and Translational Informatics As of August 2013, the CRDW contains1... i2b2 and Data Requests To interact with the CRDW, researchers use a datamart interface. The first datamart implemented by the CRI, i2b2, went live for users in June 2012. i2b2, or “Informatics for Integrating Biology and the Bedside,” is an NIH-funded opensource project created by a National Center for Biomedical Computing based at Partners HealthCare System. i2b2 is designed for cohort identification, allowing researchers to query the CRDW for

patients

607,000 encounters

6.1 million medications

16.1 million

sets of patients meeting search criteria. Applications and benefits of the i2b2 interface include: •

Helping investigators create new research hypotheses

Identifying potential cohorts for clinical trials

Reducing the time researchers must spend

on

discovery

of

research

cohorts, study feasibility, and subject

procedures

36.8 million labs

93 million diagnoses

13.9 million

recruitment •

Familiarizing researchers with the standard terminologies and data that reside in the CRDW

Researchers log into i2b2 using BSD or hospital credentials and can then explore the CRDW using an intuitive drag-anddrop interface. Queries may be created

cri.uchicago.edu

1 Please note that some numbers regarding the data housed in the CRDW differ from those listed in the CRI’s 2012 Annual Report due to changes in how these data are measured. (1) The number of patients reported here represents only those patients who have encounter data associated with them, while last year’s report included all patients regardless of encounter data. (2) Billing records where debits are canceled out by credits (for example, when a procedure is ordered but never performed) are no longer listed here. (3) This year’s report no longer includes “orphaned” records that were previously included in CRDW summary counts but cannot be returned via i2b2 or in custom data requests.

15


Clinical and Translational Informatics

Data Request Status Summary (as of August 2013)

17

92 completed not approved

on hold

awaiting IRB

awaiting user

in progress

with multiple and/or logic, using search

online form, providing information about

terms based on standard terminologies.

the scientific purpose of the requested

The information returned by the system

data. To protect the University’s data

allows researchers to determine the

and comply with patient privacy laws

number of patients meeting their crite-

and our IRB protocol, the Data Use Com-

ria. These results can inform subsequent

mittee monitors data requests to ensure

requests for full datasets with either

appropriate use of CRDW data. (For

de-identified

IRB-approved

more information about this committee,

protected health information. To date,

see page 63.) In addition, the CRI acts as

almost one hundred users have cre-

an Honest Broker service for researchers,

ated and executed over one thousand

integrating data from different sources

searches.

such as the Cancer Registry and Epic and

data

or

Requests for CRDW data are fulfilled by the Clinical and Translational team, under the oversight of the Data Use and Technical Policy Committees. Researchers submit requests using a simple

16

removing identifiers when necessary to protect patient privacy. Since May 2012, the Clinical and Translational team has fulfilled 92 data requests of varying size and complexity.

CRI Annual Report 2012-13

10


Clinical and Translational Informatics

As the demand for clinical data for

data elements, which have been pri-

research purposes continues to grow,

oritized by the Research Informatics

the future of the CRDW’s develop-

Governance Committee. These elements

ment will include the creation of a fully

include the Cancer Registry; radiology,

de-identified

research,

pathology, and discharge summaries;

which will allow investigators to access

other data elements from Epic’s Clarity

and query de-identified data within

data warehouse; and the integration of

a secure data zone. In addition, the

research-specific databases including

continued development of the CRDW

Velos and REDCap.

datamart

for

includes the incorporation of additional

5

Who uses the CRDW? Since May 2012, our Clinical and Translational Informatics team has processed data requests from 11 different University departments.

Family Medicine Human Genetics Medicine Neurology Obstetrics & Gynecology

Total Data Requests

Orthopaedics

115

Pathology Pediatrics Psychiatry Radiology Surgery 10

20

30

cri.uchicago.edu

40

50

60

70

17


Clinical and Translational Informatics

Biobanking Management The CRI maintains an instance of caTis-

University. These customizations, called

sue, a robust open-source biobanking

caTrack, permit tracking the chain of

management system used by labs to

custody for all samples using barcode

organize freezers and track samples. In

labels and handheld barcode readers,

addition to the standard deployment

which then synchronize with caTissue

of caTissue, the CRI has worked with

when placed in docking stations. This

Dr. Michael Maitland, who manages

system permits tracking of a sample’s

the Cancer Center’s biofluids core, to

origin, when it was drawn, and where it

add customizations created by Indiana

was initially stored.

our contributions

Highlighted Accomplishments from 2012-13 • Expanded the CRDW by incorporating data elements from Epic and Clarity as well as the Cancer Registry • Released i2b2 and began fulfilling data requests • Established data request guidelines and policies to protect patient information • Developed a Data Use Committee review process for data requests • Expanded the team by hiring a business systems analyst, a database administrator, and a project manager • Launched a REDCap users group • Rewrote Dr. David Meltzer’s Hospitalist Protocol application and wrote a dashboard and alerting system for his Continuity of Care program

18

CRI Annual Report 2012-13


Clinical and Translational Informatics

Brian Furner, Keith Danahey, and Tim Holper

Custom Programming: 1200 Patients and TRIDOM Beyond maintaining the CRDW and

Personalized Therapeutics. This pharma-

fulfilling data requests, the Clinical and

cogenomics project seeks to develop a

Translational team works directly with

new medical system model for person-

research groups to provide custom

alized care in which patients’ genetic

research application development for

information can be incorporated into the

their clinical research projects.

decision-making process of prescribing

One such project is 1200 Patients, a

medications.

personalized medicine initiative jointly

Patients who have consented to partic-

sponsored by the CRI and the Center for

ipate in the project are genotyped in a

cri.uchicago.edu

19


Clinical and Translational Informatics

CLIA-certified lab. Their genetic informa-

to the patient’s genomic profile. Physi-

tion is then stored in a relational database

cians can use these summaries to inform

along with curated pharmacogenomic

their choices in prescribing medication—

data from published studies. During clinic

by pre-identifying patients who are likely

visits, a physician dashboard displays a

to experience severe side effects, for

“30-second summary” synthesized from

example, or by predicting when a patient

the information in the database relevant

may need alternative dosing.

Clinical and Translational Informatics team members Tiffany Cyrus, Brian Furner, Don Saner, Luis Maciel, Julissa Acevedo, Tim Holper, Kevin Le, and Keith Danahey

20

CRI Annual Report 2012-13


Clinical and Translational Informatics

our future

Our Goals for 2013-14 • Improve internal efficiency, including time, data repository, bug tracking, reporting, and status tracking, by implementing team project management software • Implement a standard procedure for processing project requests that includes defining business requirements and scope of work, providing a time estimate, and receiving client approval • Create a dashboard with metrics for the CRDW, REDCap, and Velos to increase the visibility of the data in each system • Enhance the skill set of each team member through professional training • Develop and maintain tools for data collection and reporting, including continuing to develop in-house systems to support custom applications and datamarts • Improve internal knowledge and collaboration across the team through biweekly presentations

As new technologies allow the practice

database design as well as data import

of medicine to become increasingly

and overall technical management.

personalized, the 1200 Patients project contributes to this progress by improving doctors’ ability to make patient-­ specific medication decisions. The CRI helps enable this important initiative by providing custom programming and

TRIDOM (Translational Research Initiative in the Department of Medicine) is a biobanking protocol started in 2005 that stores DNA, plasma, and serum for consented patients who are scheduled

cri.uchicago.edu

21


Clinical and Translational Informatics

for a standard-of-care blood draw.

Tissue Resource Center’s biobanking

The CRI contributed to this project

software, to create an automated feed

by leading a rewrite of the database

to the TRIDOM database. To date, TRI-

that maintains consent and sample

DOM has enrolled over 8,800 patients

information and generates operational

and has banked samples for more than

reports. In addition, the CRI partnered

5,900 of these patients, resulting in a

with eSphere, which makes the Human

total of over 58,000 samples.

total projects

REDCap adoption has progressed at a steady rate since 2011.

total users

1500

1200

900

600

300

22

August

July

June

May

March

February

January

December

November

October

September

August

July

May

March

February

January

December

April

CRI Annual Report 2012-13

April

2013

2012

November

October

September

August

July

2011

June

0


Clinical and Translational Informatics

Clinical Trials Management The Clinical and Translational team

more personalized guidance (for more

supports clinical trials management by

detail, see page 56).

operating two data management solutions, REDCap and Velos eResearch.

Velos eResearch is a clinical trials

The University of Chicago has been a

study administration and clinical data

member of the REDCap consortium

management.

since 2010. This self-managed, secure,

many aspects of running a clinical trial,

web-based application, developed by

including:

the Vanderbilt University Clinical and Translational Science Awards (CTSA), supports data collection strategies for research studies with tools for building and managing online surveys and databases. The University of Chicago’s instance of REDCap currently supports more than 700 users from across the BSD and houses over 600 projects. As the number of REDCap users continues to grow, the CRI has worked to provide opportunities for education and collaboration. A REDCap users group provides the community with an ongoing meeting space for discussion, new feature announcements, tips and tricks, and real-time help. The CRI also offers individual and small group REDCap tutorial sessions for those in need of

management system that integrates The

system

supports

Patient recruitment and scheduling

IRB and study monitoring

Project planning and study design

Protocol compliance

Web-based

data

capture

on

a

per-protocol basis •

Data safety monitoring and adverse event reporting

Velos is now supporting over 1,500 protocols for more than 550 investigators at the University. The CRI is currently working with the vendor to complete a hardware migration and upgrade that will improve the user experience.

cri.uchicago.edu

23


Clinical and Translational Informatics

faculty spotlight

David Meltzer David Meltzer, MD, PhD, is Chief of the Section of Hospital Medicine, Director of the Center for Heath and the Social Sciences, and an Associate Professor of Medicine, Economics, and Public Policy Studies at the University of Chicago. He also serves as the co-leader of the Institute for Translational Medicine’s Training Cluster, as well as the co-director of the ITM’s academic arm, the Committee for Clinical and Translational Science. Dr. Meltzer’s research explores problems in health economics and public policy, focusing on the theoretical foundations of medical cost-effectiveness analysis and the cost and quality of hospital care. In the past year, the CRI’s Clinical and Translational team has worked with Dr. Meltzer in support of two of his projects: the Hospitalist Project and the Comprehensive Care Program (CCP) initiative. The Hospitalist Project, which is supported in part by the Clinical and Translational Science Awards, has been in operation for over 16 years and has enrolled over 100,000 patients. The

24

CRI Annual Report 2012-13


Clinical and Translational Informatics

aims of this multi-site project are to study the quality and cost of care among hospitalized patients at the University of Chicago and Mercy Hospital, to examine whether there are significant differences in outcomes and costs for patients cared for by hospitalists compared to those cared for by other inpatient attending physicians, and to develop a research infrastructure that allows collaboration among multiple investigators and institutions. Patients enrolled in the Hospitalist Project are administered two separate interviews, one during their hospitalization and one 30 days after discharge. The results are recorded during the patient encounter using iPads connected to a custom-written web application. This information is stored in an SQL server database, from which it can be exported with a custom-built reporting system into SAS and Stata for analysis. This year, the CRI’s Clinical and Translational team updated the database and web-based interface for this project. The CCP initiative is a randomized study started in 2012 with the aim of testing novel care delivery systems for improving the quality and reducing the cost of health care. The study’s hypothesis is that improving continuity in the doctorpatient relationship by having a single physician see patients at high risk of hospitalization in both inpatient and outpatient settings will improve outcomes and lower costs by reducing unnecessary emergency department visits, hospital admissions, and readmissions. The CRI has supported this effort by creating a custom dashboard that serves as a central location for CCP staff and physicians to record study information on patients and integrate this information with data from electronic medical records. In addition to the dashboard, the CRI has implemented a notification system that sends pages and emails to the appropriate CCP physicians and staff when a patient enrolled to the CCP protocol visits the emergency department or is admitted to the hospital.

cri.uchicago.edu

25



SYSTEMS & SECURITY


Systems and Security

About Us The Systems and Security team began

The purpose of the Systems and Secu-

with just two members and a limited

rity team is to provide core services and

infrastructure set up by the Initiative in

scientific computing resources with the

Biomedical Informatics. It has now grown

highest quality of customer service, in

to include five full-time employees who

order to enable BSD faculty to conduct

support and manage the development

advanced biological research in a com-

of infrastructure in multiple data centers.

pliant environment while simultaneously

The team has grown our computing envi-

protecting intellectual property and sen-

ronment considerably since its inception

sitive information.

with generous support from the Institute for Translational Medicine and the BSD.

our leadership

Director of Systems and Security

Plamen Martinov

Plamen joined the CRI leadership team as Director of Systems and Security in April 2013 and manages the team of engineers responsible for the development and operations of our secure computing infrastructure. He leads our efforts to ensure compliance with security regulations and provides regular reports to the Research Informatics Compliance Review and Technical Policy Committees. Prior to joining the CRI, Plamen was Lead Data Security Engineer for Chicago Biomedicine Information Systems.

28

CRI Annual Report 2012-13


Systems and Security

Storage & Backup By the Numbers

Infrastructure The CRI’s resources, which have been upgraded and expanded over the past year, include: •

1,024-core high-performance computing (HPC) cluster (2.2 GHz AMD Opteron 6274)

Large Memory Linux supercomputer

854,597,057 files backed up

with 1 TB of RAM, 8 Intel® Xeon® E7-8870 2.4 GHz processors (160 cores) •

700-TB ultra-high-density NAS for data storage that can scale up to 20 PB, available for both labshares and individuals

Virtual Server Infrastructure with the capacity to support up to 1,500 virtual servers on Windows or Linux platforms

Centralized

1.7 petabytes total labshare capacity

2.5 petabytes total backup capacity

and

automated

data

backup and encryption with the capability to back up 2.1 PB of data •

Galaxy web-enabled biomedical data analytics tool that is fully integrated with the CRI’s HPC cluster

110 terabytes virtual infrastructure total storage

cri.uchicago.edu

29


Systems and Security

The Systems and Security team main-

To date, our resources are supporting a

tains the computing infrastructure that

total of 479 active users: 279 users of our

supports not only the CRI’s activities but

storage resources, 162 users of our HPC

also research for faculty across the BSD.

cluster, and 38 Galaxy users. Kenwood

These new HIPAA-compliant resources

Data Center houses 125 virtual machines,

went live in November 2012 and are

with 350 TB of storage in use and 595 TB

available to all members of the BSD.

of data backed up.

Who uses our virtual machines? Our VMs in Kenwood Data Center serve 11 different BSD groups, in addition to supporting CRI activities.

Genetics Health Studies

Total VMs

Cellular Screening Center

125

Laboratory for Advanced Computing Cardiology Pediatrics Clinical & Translational Informatics Bioinformatics Core Clinical Cancer Genetics Childhood Cancer & Blood Diseases Medicine Administration Radiology, HIRO Academic & Administrative Applications CRI Infrastructure 10

30

20

CRI Annual Report 2012-13

30

40

50

60


Systems and Security

our contributions

Highlighted Accomplishments from 2012-13 • Made a new HPC cluster and large memory servers available to all BSD researchers • Launched a VMware farm and began provisioning virtual machines for researchers • Migrated labshares from outdated equipment to a new state-ofthe-art data center • Introduced the CRI help desk and created an online technical help portal to direct users to the correct sources of information • Hired Plamen as Director and expanded the team by hiring four systems administrators and a security analyst • Created detailed security procedures to demonstrate HIPAA compliance • Worked with the Compliance Review Committee to establish policies and procedures for patient privacy and data security • In collaboration with the Bioinformatics Core, released the CRI’s implementation of Galaxy

cri.uchicago.edu

31


Systems and Security

Kenwood Data Center The CRI’s advanced computing resources

are housed in the state-of-the-art Kenwood Data Center on the University of Chicago campus. The CRI has made substantial investments in superior, resilient technology at Kenwood, improving the security, reliability, and recoverability of system resources through the modernization of data center services and standard architecture.

rity team over the past year has been the migration of users’ data from outdated equipment in the Prudential Data Center to the newer, better-equipped Kenwood Data Center. The closing of Prudential and the move to Kenwood will allow us to shift our investments to more efficient and standardized computing platforms and technologies. The targets for completing this move include:

Adding two 1-TB Large Memory Servers

Migration has proceeded carefully over the course of a year to achieve the goals of protecting the integrity of all data and ensuring clear communication with (55 TB) out of a total of 75 (79 TB) have been migrated. All home directories have been migrated to new servers, and the team is on track to migrate 10 servers per month. The completion of this project is expected by the end of 2013. Kenwood Data Center is equipped to house systems that are compliant with federal guidelines, including HIPAA and the Federal Information Security Management Act (FISMA). Moving all CRI

Virtualizing and migrating more than 80 servers

cluster

users. As of August 2013, 64 labshares

A primary focus of the Systems and Secu-

Adding an additional 1,024-core HPC

resources to this facility helps us to protect patient privacy and keep our data secure.

Migrating 80 labshares and 300 home directories

32

CRI Annual Report 2012-13


Systems and Security

650 600 550

Total Data Center Users

500 450 400 350

2,368

300 250 200

Kenwood 150

Prudential

100 50

y or

r

M em

ste lu

iB iB ig

iB iC

ve r

Ba c

ku

p

e ag

s rie

St or

Se r

H om

eD ire

cto

re s ha

y or

La bs

eM em

La

rg

H

PC

0

cri.uchicago.edu

33


Systems and Security

Who uses our labshare storage resources? More than 30 different departments and groups across the University store data on CRI-provided labshares.

Ben May Biochemistry & Molecular Biophysics Bioinformatics Biomedical Sciences Cluster Cancer Research Cell & Molecular Biology Cellular Screening Center Center for Clinical Cancer Genetics Chicago Booth School Childhood Cancer & Blood Diseases Ecology & Evolution Endocrinology, Diabetes, & Metabolism Evolutionary Biology Genetic Medicine Genetics & Clinical Cytogenetics Health Studies Hematology/Oncology Human Genetics Institutional Biosafety Internal Medicine Laboratory - Human Genetics Medicine Molecular Genetics & Cell Biology Neurobiology Neuroscience Obstetrics & Gynecology Pathology Pulmonary/Critical Care Rheumatology Science & Education Surgery

Total Storage Used

151,791 GB

0

34

10,000 GB

20,000 GB

CRI Annual Report 2012-13

30,000 GB

80,000 GB


Systems and Security

Technical Support To complement our improved computing

The CRI’s help desk was opened in sum-

infrastructure, the Systems and Security

mer 2012 with phone and email support

team has made several important steps

staffed by employees who either resolve

over the past year to enhance customer

or triage issues. Concurrently, we intro-

service and technical support for users

duced a new, easier-to-use system for

of our resources.

submitting

trouble

tickets.

Including

Systems and Security team members Beth Lynn Eicher, Sneha Jha, Plamen Martinov, Olumide Kehinde, Dan Sullivan, Brad Orr, Mike Jarsulic, and Bruce Thompson

cri.uchicago.edu

35


Systems and Security

Bruce Thompson and Beth Lynn Eicher in Kenwood Data Center

working through the initial backlog of

seminars in November and December

unresolved tickets, the support staff

highlighted our HPC cluster, Galaxy, and

has now resolved almost ten thousand

i2b2, with each seminar focused on a live

issues. The CRI also created a web portal

demonstration of one resource. In March

to make it easier for users to find sources

and April, the team hosted HPC Lunch

of technical help.

& Learn events tailored to existing and

In addition, the Systems and Security team hosted several events throughout the year to educate users about CRI resources and specific issues. See IT Live!

36

potential HPC users, both to introduce our newest resources and spread the word about our migration to Kenwood. For more detail, see pages 57-58.

CRI Annual Report 2012-13


Systems and Security faculty spotlight

Fabrice Smieliauskas Fabrice Smieliauskas, PhD, is a health economist whose research interests center on the operation of markets for medical technologies. His work includes studies of financial conflicts of interest in medicine and of disparities in the adoption and abandonment of new medical technologies. Dr. Smieliauskas primarily uses SAS and Stata to perform statistical analyses for his research. The focus of Dr. Smieliauskas’s ongoing work is the evidence base for new medical technologies. He is analyzing the response of payers and providers to evidence that a common medical treatment is of limited value to patients, as well as the response to state and federal policies that mandate coverage of drugs for “off-label” indications not approved by the FDA. He is also developing a unique comprehensive database on clinical cancer trials in order to address several open research questions, including the effects of a variety of governmental and institutional policies on the rate and direction of cancer innovation. Dr. Smieliauskas’s research has been enabled in part by the CRI’s Large Memory Server. Before moving his work to the CRI’s infrastructure, he encountered problems using servers that were not capable of handling the large memory requirements of his research. He was deterred from other potential options by high user fees and inability to handle sensitive data in compliance with HIPAA. “By contrast,” he noted, “the new CRI server is able to handle confidential data securely, essential to much of the research at the BSD.” Dr. Smieliauskas also spoke positively about the support provided by the Systems and Security team, saying, “The server administrative team is friendly and responds promptly to user needs and requests. I also have great confidence in CRI management and their desire and ability to continue adding capacity and computing capabilities as they grow.”

cri.uchicago.edu

37


Systems and Security

Data Security and Compliance Secure handling of sensitive human sub-

electronic protected health information

ject data is of the utmost importance to

(ePHI). For this reason, it is essential that

the CRI’s Systems and Security team. The

our infrastructure and security policies

CRI is the primary computing resource

comply with relevant federal guidelines,

for

including HIPAA.

BSD

researchers

working

with

Plamen Martinov and Olumide Kehinde in Kenwood Data Center

38

CRI Annual Report 2012-13


Systems and Security

our future

Our Goals for 2013-14 • Redesign and implement secure, stable, and sustainable computing infrastructure and resources at Kenwood Data Center • Complete the Prudential-to-Kenwood data center move by migrating or discontinuing all resources, while maintaining the integrity of all user data • Update and deliver improved customer service policies, procedures, services, and automation • Improve communication internally and within the user community • Enhance the skill set of each team member and of the team as a whole

To this end, the Research Informatics

committee is also responsible for con-

Compliance Review Committee, made

ducting regular audits to ensure com-

up of IT security professionals from

pliance to these important standards.

other IT organizations across the Uni-

(For more information on this commit-

versity, spent several months drafting,

tee, see page 63.) In addition, the CRI

editing, and approving a set of policies

retains experts in FISMA and HIPAA as

and procedures for data protection. This

consultants.

cri.uchicago.edu

39



BIOINFORMATICS CORE


Bioinformatics Core

our leadership

Director of Bioinformatics

Jorge Andrade, PhD

As the technical director responsible for planning and oversight of the Bioinformatics Core, Jorge works closely with CRI leadership to develop and deliver bioinformatics services and expertise. Jorge joined the CRI in May 2012 and brings extensive experience in the pharmaceutical industry and scientific research community, most recently at the Beijing Genome Institute, where he was an Associate Director.

Bioinformatics Analysis Pipelines The

analysis

that

transforms

high-­

throughput raw data into biologically meaningful information can present a challenge to clinical, translational, and basic researchers alike. To make it easier for BSD investigators to take full advantage of high-throughput technologies in their research, the Bioinformatics Core has developed a set of pipelines for the

42

analysis of Next-Generation Sequencing and Microarray data. These

automated

pipelines

quickly

absorb and process large amounts of raw data and produce meaningful analysis results. They are executed on the CRI’s high-performance

computing

(HPC)

cluster, which provides the significant

CRI Annual Report 2012-13


Bioinformatics Core

computational

power

necessary

for

In addition, the Bioinformatics Core

working with such large quantities of

maintains a catalog of publicly-available

data. Researchers interested in using

and commercial software tools, refer-

one of the CRI’s pipelines for their data

ence datasets, and databases for use by

analysis can request this on the CRI

the BSD research community. A com-

website. The Core currently offers a

plete list of these resources is available

total of 12 production-ready pipelines

on the CRI’s website.

for a variety of platforms and analyses.

The Core currently offers production-ready pipelines for the following platforms and analyses:

Illumina pipelines for RNA-Seq, ChIP-Seq, Exome Sequencing, Whole Genome Re-Sequencing (WGRS), Consensus Genotyping, and De-Novo Assembly SOLiD pipelines for RNA-Seq, WGRS, ChIP-Seq, and De-Novo Assembly One pipeline for Illumina and Affymetrix Expression Arrays One pipeline for Affymetrix and Exiquon miRNA Arrays

For more information on what each of these pipelines offers, see Appendix B.

cri.uchicago.edu

43


Bioinformatics Core faculty spotlight

Kenan Onel Kenan Onel, MD, PhD, is an Associate Professor of Pediatrics in the Section of Hematology/Oncology and Director of the Pediatric Familial Cancer Clinic. He is an expert on pediatric and other familial genetic cancer syndromes. The Onel Lab uses genomic platforms and systems biology strategies to investigate how genetics contribute to cancer risk and response to therapy. The lab studies families with high-penetrance cancer-predisposing conditions, but no known cancer-predisposing gene mutations, in order to discover new genes that may, when mutated, predispose individuals to cancer. Recently, they have begun to advance this research by taking advantage of the CRI’s Next-Generation Sequencing offerings. Dr. Onel reports, “The Bioinformatics Core has been instrumental in pushing forward our work because of their expertise in handling genomic data.” Dr. Onel’s lab used the CRI’s Galaxy instance to develop a pipeline for exome analysis. In addition, they worked with the CRI’s team of bioinformaticians to develop a powerful command-line pipeline utilizing multiple aligners and callers, allowing a robust analysis over a large number of family studies. The Onel Lab and the CRI bioinformaticians continue to meet weekly to discuss progress. According to Dr. Onel, “The analysts have been intellectually engaged in the projects and extremely professional. They have helped us develop methods for analysis of family data that we could not have done on our own.”

44

CRI Annual Report 2012-13


Bioinformatics Core

Self-Service Data Analysis: Galaxy For

researchers

interested

in

self-­

to take advantage of our HPC and

service data analysis, the CRI maintains

large-scale storage resources within a

a customized version of Galaxy, an

self-service data analysis environment.

open-source

workflow

Galaxy can substantially facilitate the

management and system integration

use of common bioinformatics tools by

tool. The CRI’s Galaxy instance is inte-

non-bioinformaticians and those without

grated with our advanced computing

extensive computing expertise.

infrastructure,

bioinformatics

allowing

researchers

our contributions

Highlighted Accomplishments from 2012-13 • Developed, tested, and implemented 12 production-ready pipelines, some of which are currently in use for the NIH-funded Bionimbus Protected Data Cloud contract • Expanded the Core to a team of seven scientists • Hosted monthly training seminars which have drawn over 350 total participants • Developed a project management and invoicing system now used as a template by other BSD Cores • Launched the CRI’s implementation of Galaxy

cri.uchicago.edu

45


Bioinformatics Core

A selection of completed analysis projects illustrates the diversity of the research facilitated by the Bioinformatics Core:

The CRI’s implementation of the Galaxy framework includes workflows for several Next-Generation Sequencing pipelines to enable users to perform, reproduce,

Genome assembly and annotation of the Siberian hamster genome, using SOLiD sequences generated at the University of Chicago Genomics Core facility

and share complete analyses. Available workflows in Galaxy include: •

RNA-Seq: Sample Level for quality control, mapping, and statistics for paired-end Illumina reads (individual

ChIP-Seq data analysis for a project studying how hyperglycemia induces epigenetic changes that lead to renal injury

samples) •

RNA-Seq: Project Level Merge for merging multiple samples and generating

A gene expression profile analysis of the rat brain in response to perimenopausal hormonal signals

a

differentially

expressed

list, for both single- and paired-end Illumina reads •

Exome Sequencing Analysis for quality control, mapping, and recalibration for both single- and paired-end

RNA-Seq analysis of differential gene expression between two types of melanoma tumors

Illumina reads The

CRI’s

Galaxy

platform

became

available to researchers in November 2012, concurrent with the release of our

Quality trimming of large sets of Illumina sequences for a study of nasal microbiota of people with chronic allergies

updated HPC and storage resources. It now supports around 40 active users and is maintained and updated with new tools, workflows, and pipelines in collaboration with the CRI’s Systems and

Exome-wide analysis of 60 samples from four cohorts for research on the genetic components of several cancer types

46

Security team.

CRI Annual Report 2012-13


Bioinformatics Core

Bioinformatics Core team members Jorge Andrade, Wenjun Kang, Riyue Bao, Jianpeng Xu, Chunling Zhang, and Lei Huang

cri.uchicago.edu

47


Bioinformatics Core

Collaboration and Custom Analysis For researchers who are looking for

expertise extends to areas beyond those

personalized analysis, including cus-

of our standard pipelines, including pro-

tom-built pipelines, the Bioinformatics

teomics and genome-wide association

Core provides consulting and custom-

studies.

ized services. Our bioinformaticians’

Who uses the Bioinformatics Core? Since May 2012, the Core has received project requests from 18 different University departments and several external organizations.

Ben May Biochemistry Molecular Biology Ecology & Evolution External Organizations Health Studies Human Genetics Medicine Microbiology Molecular Genetics Neurobiology Neurology Obstetrics & Gynecology Organismal Biology Pathology Pediatrics Radiation Oncology Social Sciences Surgery

Total Project Requests

99

5

48

10

15

CRI Annual Report 2012-13

20

25

30


Bioinformatics Core

The Bioinformatics Core has completed 53 projects since May 2012, with 24 currently in progress and new requests submitted each month. Sept. 2012

in progress

Oct. 2012

completed

Nov. 2012

Total Projects Completed

submitted

Dec. 2012

53

Jan. 2013 Feb. 2013 Mar. 2013 Apr. 2013 May 2013 June 2013 July 2013 10

20

30

40

50

60

When a researcher submits an online

Since May 2012, the Core’s bioinfor-

project request form, the Core returns a

maticians have completed a total of 53

proposal, including the scope of deliver-

projects, with many more in progress. An

ables, a timeline for completion, and the

average of seven new project requests

estimated cost. The execution of each

are submitted each month. These proj-

project is guided by frequent discussion

ects, conducted for researchers from

between researchers and bioinformati-

over 25 different University departments

cians, and updates are provided regularly.

and sections, vary widely in both scope

When a project is complete, results are

and subject matter (for examples, see

delivered in the form of a written report

sidebar on page 46).

and presentation.

cri.uchicago.edu

49


Bioinformatics Core

Other Services: Grant Analysis and Training The Bioinformatics Core supports the

indicated. The Core has so far contrib-

creation of research grants in several

uted to the writing and submission of 13

ways, with the goal of fully developing

research grants, and has established this

and integrating the bioinformatics com-

as an area of focus for future growth.

ponents of each grant and increasing its competitiveness for funding. CRI bioinformaticians can collaborate with researchers directly on the bioinformatics components of their grants, or they can provide cost analysis services and letters of support. In addition, standard language is available to be added to grants, documenting the accessibility of the necessary tools and expertise to complete the bioinformatics research

One more important part of the Core’s mission is to provide training opportunities that will help investigators develop bioinformatics expertise within their own laboratories. To this end, the CRI hosts a free monthly training seminar open to all members of the BSD, covering a different topic in bioinformatics analysis each month. For more detail and a list of past topics, see pages 54-55.

Director of Bioinformatics Jorge Andrade

50

CRI Annual Report 2012-13


Bioinformatics Core

our future

Our Goals for 2013-14 • Recruit and hire two additional Bioinformatics Scientists • Improve efficiency by reducing the standard turnaround time of projects in the following production pipelines: Exome-Seq, RNASeq, and Microarray expression arrays • Increase customer satisfaction by producing high-quality and customer-oriented services (to this end, the Core has already introduced a feedback survey to gather information on potential areas of improvement) • Improve existing pipelines by performing comparative analysis of tools, and develop new pipelines to accommodate new technologies, protocols, and data types • Continue to author and coauthor scientific publications • Increase interest in the Core by developing advertising materials and meeting with department chairs • Increase the Core’s funding through chargebacks, grant inclusion, and expanding internal and external collaboration

cri.uchicago.edu

51


Bioinformatics Core

faculty spotlight

Ernst Lengyel Ernst Lengyel, MD, PhD, is a Professor of Obstetrics/Gynecology, specializing in advanced surgical treatments for patients with ovarian cancer. The Lengyel Lab is dedicated to studying the biology of ovarian cancer metastasis and finding new drugs for its treatment. One of the scientific goals of the Lengyel Lab is to understand the mechanisms of a common problem in the treatment of ovarian cancer: that most patients will develop a resistance to carboplatin and taxol chemotherapy. To study this, the lab sought to identify the miRNA expression profiles of chemoresistant versus chemosensitive patients. They worked with the CRI’s Bioinformatics Core to obtain and analyze these genomic data. With the assistance of the CRI’s bioinformaticians, the lab mined the Cancer Genome Atlas (TCGA) and distinguished unique patient groups of chemosensitive and chemoresistant patients. They then performed six analysis sets, and were able to identify seven miRNA genes upregulated in chemoresistant disease and three in very chemosensitive disease. These findings have since been validated in an independent cohort of patients. The Lengyel Lab’s findings, enabled in part by the CRI, have prognostic and functional implications that may aid in developing therapies to target these miRNA in chemoresistant patients. Dr. Lengyel said, “Without the CRI we would not have had the expertise to take advantage of the TCGA ovarian cancer data.”

52

CRI Annual Report 2012-13


TRAINING & EDUCATION


Training and Education

An integral part of the CRI’s mission is providing training and education for our users so that they become comfortable and confident both with our resources and with other technologies for biological computing.

Bioinformatics Training The Bioinformatics Core presents a free

users after each seminar in an effort to

training seminar with a different topic

improve future sessions and cover the

each month, taught by PhD bioinforma-

topics most important to researchers.

ticians. These seminars cover the use

Of those responding to the survey in

and application of a variety of publicly

February through July 2013, 96 percent

and

software

found their course worthwhile, with 80

and tools for bioinformatics analysis—R,

percent choosing “very” or “extremely”

Bioconductor, and Galaxy, for example.

worthwhile. The CRI’s bioinformaticians

In some cases, the training is directly

have received high marks, with 96 per-

tied in to CRI resources such as our

cent of respondents calling their instruc-

HPC cluster. As investigators bring this

tors “very knowledgeable.” Overall, 92

education back to their laboratories,

percent of respondents reported satis-

bioinformatics expertise can be further

faction with the course they attended.

developed throughout the BSD. Since

Survey participants also provided sug-

these seminars began in May 2012, they

gestions and requests for future seminar

have attracted over 350 participants.

topics, helping the Bioinformatics Core

commercially

available

A post-training survey, introduced in February 2013, requests opinions from

54

to continue to design valuable training opportunities that meet the needs of our research community.

CRI Annual Report 2012-13


Training and Education

Past training seminars have covered a range of systems and software programs useful for bioinformatics analysis.

date

topic

attendance

7/2013

Introduction to Linux Command Line for Bioinformatics

24

6/2013

Introduction to Linux Command Line for Bioinformatics

37

5/2013

Analyzing Illumina ChIP-Seq Data with the CRI

32

4/2013

Introduction to CRI’s HPC Cluster for Bioinformatics Computing

34

3/2013

Analysis of Illumina and Microarray Data with R and Bioconductor

12

2/2013

Analyzing Illumina RNA-Seq Data with the CRI

25

1/2013

Analysis of Microarrays with R and Bioconductor

22

11/2012

Analyzing Illumina Whole Exome Data with the CRI

31

9/2012

Galaxy: Web-Based Bioinformatics Analysis and RNA-Seq Workflow Management

40

8/2012

Analyzing Illumina RNA-Seq Data with the CRI

41

7/2012

Analysis of Microarray Data with R and Bioconductor

16

6/2012

Introduction to R and Execution on HPC

12

5/2012

Introduction to R

25

cri.uchicago.edu

55


Training and Education

What bioinformatics training participants had to say...

Other CRI Training The CRI provides individual and smallgroup training sessions upon request for researchers who need assistance with using REDCap for their studies. Julissa

“Very approachable and helpful instructors.” (June 2013)

Acevedo, the CRI’s Business Systems Analyst, holds an average of eight training and demo sessions per month for groups of one to three researchers at a time.

“I thought it was really cool and I learned a lot about how to navigate around a Linux system. Great job!” (June 2013)

In addition, the CRI hosted several events over the past year with the goal of sharing information about our computing resources with existing and potential users. Three See IT Live! seminars were held in

“Very helpful.” (April 2013)

November and December, aligning with the release of new CRI resources. Each seminar was centered on a live demonstration of the featured resource and

“Well organized, very knowledgeable, and well presented.” (March 2013)

included an overview of the CRI, instructions on obtaining an account, and an opportunity to meet our technical staff and ask questions, as well as free refreshments. These events were open to all

“Very good training class, need more like this.” (February 2013)

56

members of the BSD and were attended by a total of 32 participants.

CRI Annual Report 2012-13


Training and Education

See IT Live! sessions highlighted the

i2b2: Learn how to use our de-identified

following resources:

datamart to identify cohorts and request

Galaxy: Learn how to access Galaxy, a

data for your research

web-based portal providing data stor-

HPC Cluster: Learn how to optimize and

age, data management, and analytical

run jobs on our new high-performance

tools integrated with our computing

computing cluster

resources

Jorge Andrade presents a bioinformatics training seminar

cri.uchicago.edu

57


Training and Education

In addition to the HPC Cluster session

Kenwood. Each event included a discus-

of See IT Live!, the CRI’s Systems and

sion of available resources in Kenwood,

Security team hosted two Lunch & Learn

the rationale and timeline for moving out

events in the spring to further educate

of Prudential, and a question and answer

HPC users about our available resources

session. These events were advertised to

and to provide an overview of the data

both potential and current users and were

center migration from Prudential to

attended by a total of 32 participants.

Don Saner presents at the CRI’s first HPC Lunch & Learn event

58

CRI Annual Report 2012-13


FACULTY OVERSIGHT & GOVERNANCE


Faculty Oversight and Governance

Governance Structure The CRI’s strategic decision-making and

informatics activities across the entire

long-term planning are led by a gover-

BSD, ensuring that informed long-term

nance structure set up by the Office of the

decisions for the Division are reached in

CRIO. These committees guide research

a transparent and accountable way.

Research Informatics Executive Governance Committee

Research Informatics Governance Committee

Research Informatics Technical Policy Committee Research Informatics Data Use Committee Research Informatics Compliance Review Committee

60

CRI Annual Report 2012-13


Faculty Oversight and Governance

The five committees outlined below

establishing policies and procedures,

bring together senior BSD and University

prioritizing new initiatives, safeguarding

of Chicago Medicine (UCM) leadership,

patient information, and complying with

information systems experts, patient

BSD policies and applicable federal and

privacy experts, and faculty represent-

state laws.

ing basic science, clinical research, and translational research. Decisions from these committees guide us in

For a full list of governance committee membership, see Appendix C.

Research Informatics Executive Governance Committee Chair

Mission

Dr. Kenneth Polonsky, Dean and Executive Vice President for Medical Affairs

To provide high-level strategic decisions for all research informatics activities across the BSD, integrating the needs of faculty, clinicians, and BSD and hospital leadership

Members

BSD and UCM executive leadership

cri.uchicago.edu

61


Faculty Oversight and Governance

Research Informatics Governance Committee Chair

Dr. Robert Grossman, Chief Research Informatics Officer

To establish priorities and policies for research informatics across

Mission

the BSD, including those for the development and use of the CRDW and the comprehensive computing resources provided to BSD faculty

Members

Senior faculty and staff leadership from across the BSD and UCM

Research Informatics Technical Policy Committee Chair

Dr. Robert Grossman, Chief Research Informatics Officer

To provide oversight and governance for the technical aspects of

Mission

research informatics across the BSD and to ensure appropriate safeguards for ePHI used in research

Members

62

Staff and faculty with expertise in informatics and information technology security

CRI Annual Report 2012-13


Faculty Oversight and Governance

Research Informatics Data Use Committee Chair

Dr. Dana Edelson, Assistant Professor of Medicine

To review, approve, monitor, and prioritize requests for CRDW data

Mission

release to individual investigators and to approved datamarts and systems, including i2b2 and any subsequently-developed datamarts

Members

Staff and faculty with expertise in regulatory issues, compliance, and data management

Research Informatics Compliance Review Committee Chair

Tyler DeNormandie, Information Systems Manager and Senior Systems Engineer, Health Studies and Family Medicine To advise the CRI Systems and Security group on best practices for

Mission

Members

ensuring compliance with BSD policies and to help ensure the CRI’s implementation of appropriate safeguards for ePHI used in research Information technology security experts representing the major IT organizations at the University and UCM

cri.uchicago.edu

63


Faculty Oversight and Governance

Faculty Oversight Guidance and oversight for research

to the Research Advisory Committee,

informatics throughout the BSD are

a BSD/UCM Committee that reports to

provided by the Informatics Oversight

the Dean for Research and Graduate

Committee, made up of faculty leaders

Education.

representing both basic science and clinical departments. The recommendations of this committee guide us in ensuring that the direction and activities of the CRI are in line with the needs of the research faculty we serve. This committee reports

The Informatics Oversight Committee is chaired by Dr. John Cunningham, Chief of the Section of Pediatric Hematology/ Oncology. For a full list of members, see Appendix D.

Research Advisory Committee

Informatics Oversight Committee

Office of the CRIO

64

CRI Annual Report 2012-13


OUR PARTNERS The Center for Research Informatics is grateful for the support of our strategic partners and collaborators across the University of Chicago. These partners enable the collaborative work that has made the CRI successful in our first years of operations.

Biological Sciences Division bsd.uchicago.edu

Chicago Biomedicine Information Systems help.bsd.uchicago.edu

Comprehensive Cancer Center cancer.uchicago.edu

Computation Institute ci.uchicago.edu

cri.uchicago.edu

65


Our Partners

Human Imaging Research Office hiro.bsd.uchicago.edu

Institute for Genomics & Systems Biology igsb.anl.gov

Institute for Translational Medicine itm.uchicago.edu

Institutional Review Board humansubjects.uchicago.edu

IT Services itservices.uchicago.edu

Office of Clinical Research bsdocr.bsd.uchicago.edu

66

CRI Annual Report 2012-13


LOOKING AHEAD The Center for Research Informatics has achieved many important goals over the past two years. The establishment of the Clinical Research Data Warehouse, the design and implementation of a state-of-the-art high-performance computing and storage infrastructure that can house protected health information, and the development of a solid and robust Bioinformatics Core are all providing the foundation of support for a large number of research programs throughout the BSD. Many grants, papers, and research projects owe part of their success to the services delivered by the CRI. Providing services and training to enable high-quality scientific research was our goal when we were established, and we are happy to have achieved this level of success in the short period of time since the CRI was founded. Building on these accomplishments, we plan to improve and expand our offerings in several ways over the coming years: Our Clinical Research Data Warehouse now serves BSD faculty by offering cohort discovery tools able to search over millions of patient encounters. The next phase for the CRDW will include the development of a de-identified datamart and tools for analyzing de-identified data in a secure manner. We will further enhance the CRDW by including full-text search functionality for pathology and radiology notes. We are starting now to work with researchers to use data from the CRDW to build predictive models that produce alerts to improve hospital operations and quality of care. From identifying patients at risk for readmission to the hospital to predicting which patients may suffer a cardiac arrest while inpatient, we are leading the way for clinical researchers to design, develop, and use complex alerts in their practice. We are working closely with CBIS to ensure that we will be able to implement alert notifications within the Epic electronic medical record.

cri.uchicago.edu

67


Looking Ahead

In the next year we will further expand our high-performance computational resources to provide increased capacity, functionality, and performance with the goal of ensuring that all faculty have access to agile and advanced computing. Coupled with these efforts, we will expand our training and educational offerings to enable more faculty to leverage these computing tools to enhance their research. The CRI is fully invested in advancing world-class research within the Biological Sciences Division. Our successes thus far and our ambitious plan going forward demonstrate our commitment to this goal.

A CRI weekly planning meeting

68

CRI Annual Report 2012-13


APPENDIX


Appendix Appendix A: CRI Staff List Samuel Volchenboum, MD, PhD

Seong Choi

Director & Associate CRIO

Programmer

Administration

Tiffany Cyrus Project Manager

Hannah Lawrence Executive Administrator

Keith Danahey Database/Systems Administrator and

Michael Daus

Programmer

Administrative Specialist

Brian Furner

Caitlin Pike

Manager of Programming

Communication Specialist

Timothy Holper

Bioinformatics Core

Manager of CRDW Development

Jorge Andrade, PhD

Kevin Le

Director of Bioinformatics

Programmer/Analyst

Riyue Bao, PhD

Luis Maciel

Bioinformatician

Database Administrator

Elizabeth Bartom, PhD

Systems and Security

Bioinformatician

Plamen Martinov

Kyle Hernandez, PhD

Director of Systems and Security

Bioinformatician

Andy Brook

Lei Huang, PhD

Senior Systems Administrator

Bioinformatician

Beth Lynn Eicher

Wenjun Kang, MS

Senior Systems Administrator

Scientific Programmer

Michael Jarsulic

Jianpeng Xu, PhD

Senior Systems Administrator

Bioinformatician

Sneha Jha

Chunling Zhang, PhD

Systems Administrator

Bioinformatician Clinical and Translational Informatics Don Saner Director of Clinical and Translational Informatics Julissa Acevedo Business Systems Analyst

70

Olumide Kehinde Senior Systems Administrator Brad Orr Senior Project Manager Bruce Thompson Security Analyst 

CRI Annual Report 2012-13


Appendix Appendix B: Bioinformatics Core Pipelines Illumina RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis ChiP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation Exome Sequencing: Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, and Annotation Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation Consensus Genotyping Pipeline: Genotyping, SNP Detection & InDel Detection using three different methods (Samtools, GATK, and Atlas-2), comparison of variant calls, list of consensus call variants, and list of method specific calls De-Novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assembly, Scaffold Assembly, Assemble Statistics, and Downstream Analysis SOLiD RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation ChiP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation De-Novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assembly, Scaffold Assembly, Assemble Statistics, and Downstream Analysis Illumina and Affymetrix Expression Arrays Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed Genes, Functional Annotation, and Pathway Enrichment Analysis Affymetrix and Exiquon miRNA Arrays Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed miRNAs, Predict miRNA Targeted Genes, Functional Annotation, and Pathway Enrichment Analysis 

cri.uchicago.edu

71


Appendix Appendix C: Research Informatics Governance Committees

Research Informatics Executive Governance Committee Name

Title and Affiliation(s)

Kenneth Polonsky (Chair)

Dean and Executive Vice President for Medical Affairs

Conrad Gilliam

Dean for Research and Graduate Education

Robert Grossman

Chief Research Informatics Officer, BSD

Sharon O’Keefe

President, UCM

Eric Yablonka

Vice President and Chief Information Officer, CBIS

Research Informatics Governance Committee

72

Name

Title and Affiliation(s)

Robert Grossman (Chair)

Chief Research Informatics Officer, BSD

Sameer Badlani

Chief Medical Information Officer, UCM

John Cunningham

Chief, Section of Pediatric Hematology/Oncology

Chris Daugherty

Chair, Institutional Review Board

Dana Edelson

Assistant Professor of Medicine

Conrad Gilliam

Dean for Research and Graduate Education

Marilyn Hanzal

Associate General Counsel, Legal Affairs

Catherine Ostapina

Senior Compliance Advisor & Director, Office of Corporate Compliance

Lainie Ross

Professor of Pediatrics, Medicine, and Surgery

Julian Solway

Associate Dean for Translational Medicine

Walter Stadler

Associate Dean for Clinical Research

Samuel Volchenboum

Director, CRI

Eric Yablonka

Vice President and Chief Information Officer, CBIS

CRI Annual Report 2012-13


Appendix

Research Informatics Technical Policy Committee Name

Title and Affiliation(s)

Robert Grossman (Chair)

Chief Research Informatics Officer, BSD

Paul Chang

Professor of Radiology; Vice Chair of Radiology Informatics

Tyler DeNormandie

Information Systems Manager and Senior Systems Engineer, Health Studies

Roger Engelmann

Image Analysis Software Developer, Human Imaging Research Office

Rajan Gopalakrishnan

Director for Informatics and Information Technology, Comprehensive Cancer Center

John Moses

Director of Enterprise Architecture and New Technologies, UCM

Prasanna Nippani

Assistant Director of Information Technology, UCM

Don Saner (Co-Chair)

Director of Clinical and Translational Informatics, CRI

Samuel Volchenboum

Director, CRI

Research Informatics Data Use Committee Name

Title and Affiliation(s)

Dana Edelson (Chair)

Assistant Professor of Medicine

Samuel Armato

Associate Professor of Radiology

Rajan Gopalakrishnan

Director for Informatics and Information Technology, Comprehensive Cancer Center

Nick Gruszauskas

Technical Director, Human Imaging Research Office

Contessa Hsu

Application Manager, UCM

Millie Maleckar

Director of Regulatory Compliance for Human Subjects, Institutional Review Board

Prasanna Nippani

Assistant Director of Information Technology, UCM

Don Saner

Director of Clinical and Translational Informatics, CRI

Phil Schumm

Senior Biostatistician, Health Studies; Director, Research Computing Group

Cassie Simon

Assistant Director, UCM Cancer Registry

cri.uchicago.edu

73


Appendix

Research Informatics Compliance Review Committee Name

Title and Affiliation(s)

Tyler DeNormandie (Chair)

Information Systems Manager and Senior Systems Engineer, Health Studies

James Clark

Network Security Officer, IT Services

Andrew Kramski

Infrastructure Security Engineer, UCM

Plamen Martinov

Director of Systems and Security, CRI

Catherine Ostapina

Senior Compliance Advisor & Director, Office of Corporate Compliance

Daniel Sullivan

Web Developer and Infrastructure Architect Specialist, CBIS

Bruce Thompson

Security Analyst, CRI

Appendix D: Faculty Oversight

Informatics Oversight Committee

74

Name

Title and Affiliation(s)

John Cunningham (Chair)

Chief, Section of Pediatric Hematology/Oncology

Michael Glotzer

Professor of Molecular Genetics and Cell Biology

Robert Grossman

Chief Research Informatics Officer, BSD

Michelle Le Beau

Director, Comprehensive Cancer Center

Marsha Rosner

Chair, Ben May Department for Cancer Research

Robert Rosner

Professor of Astronomy/Astrophysics and Physics

Matthew Stephens

Professor of Human Genetics and Statistics

Ronald Thisted

Professor of Statistics, Health Studies, and Anesthesia/Critical Care

Samuel Volchenboum

Director, CRI

CRI Annual Report 2012-13


Š The University of Chicago, 2013. All rights reserved. Written and designed by Caitlin Pike. Photography by Robert Kozloff.



Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.