CRI 2014 Annual Report

Page 1

UNIVERSITY OF CHICAGO CENTER FOR RESEARCH INFORMATICS

CRI

ANNUAL REPORT

2014


MESSAGE FROM THE OFFICE OF THE CRIO

We live in an age in which the need for informatics to support our research and clinical mission is greater than ever. As datasets grow exponentially and biomedical science becomes increasingly driven by data-intensive research, the Center for Research Informatics (CRI) becomes a more and more important resource for researchers. Like our peers on campus and nationally, the CRI exists in a resource-constrained environment and for this reason it is very important that we hear from researchers about their priorities. I hope you will reach out to me with your best ideas. Recently it has seemed that each week another company or organization has suffered a data security breach, sometimes exposing the personal information of thousands to millions of people. The BSD Information Security Office was formed this year to improve our information security procedures and to educate the Division about best practices in information security. I urge all BSD units to proactively take advantage of this Office to protect data and systems and work to adopt best practices. I am very proud of the progress that we have made together since the CRI was launched in 2011. As Chief Research Informatics Officer, I am excited by the various informatics and IT services that this Center is able to offer BSD researchers. The CRI is a wonderful resource that I urge you to explore if you are not already using it.

Robert L. Grossman, PhD Chief Research Informatics Officer, Biological Sciences Division Director, Center for Data Intensive Science Core Faculty and Senior Fellow, Institute for Genomics and Systems Biology Core Faculty and Senior Fellow, Computation Institute Professor of Medicine, Section of Genetic Medicine

ROBERT GROSSMAN, PHD CHIEF RESEARCH INFORMATICS OFFICER As Chief Research Informatics Officer, Bob guides informatics activities and initiatives across the BSD. His research group focuses on big data, biomedical informatics, data science, and cloud computing. He is also the PI for the Bionimbus Protected Data Cloud, an open-source cloud-based platform that allows researchers authorized by NIH to compute over human genomic data in a secure and compliant fashion, and the Director of the not-for-profit Open Cloud Consortium, which develops and operates cloud computing infrastructure for the research community. Bob served as the first Director of the CRI from August 2011 to April 2013.


CENTER FOR RESEARCH INFORMATICS

ANNUAL REPORT 2014 TABLE OF CONTENTS

WHO WE ARE

3

THIS YEAR’S HIGHLIGHTS

17

RESEARCH. POWERED BY THE CRI.

25

CONNECTION TO CAMPUS

31

LOOKING AHEAD

39

APPENDIX

41


LETTER FROM THE DIRECTOR

A year ago, I enthusiastically wrote about the CRI’s growth and emerging impact across the research enterprise. Our success is measured by the programs we enable, the collaborations we foster, and the grants and manuscripts we facilitate. By these metrics, the CRI has had an extremely successful year. This accomplishment is a result of our deep talent pool across several interrelated lines of operation, including highperformance computing, HIPAA-secure storage and backup, application development and support, research data warehouse services, and bioinformatics collaboration. I see all of these groups within the CRI working together every day to enable research throughout the BSD. The Center for Research Informatics now finds itself at the right place and time to bring an unprecedented suite of tools and resources to researchers throughout the BSD. After an initial period of developing our infrastructure, talent, and knowledge, we are now able to offer collaboration, consulting, and services across a wide array of projects and initiatives. From the plant biologist requiring high-performance computing to the endocrinologist looking for clinical data for a retrospective study to the surgeon requiring a data-entry system to capture information about patients on a trial, the CRI is positioned as a key collaborator to help make these projects a success. This report is a detailed accounting of our group, our successes, and our plans for the future. We hope you will enjoy learning about us and our offerings, and we look forward to working with you on many exciting projects.

Samuel Volchenboum, MD, PhD Director, Center for Research Informatics Associate CRIO, Biological Sciences Division Associate Director, Institute for Translational Medicine Assistant Professor of Pediatrics Fellow, Computation Institute


WHO WE ARE

Since 2011, the Center for Research Informatics has provided the University of Chicago Biological Sciences Division (BSD) with informatics resources, services, and expertise to enable world-class research. We work with BSD scientists through all stages of their projects, handling clinical, translational, and basic science data with the highest level of efficiency and security, while also collaborating on large-scale initiatives and creating opportunities for informatics education. From secure computing resources to clinical data to bioinformatics analysis, our work contributes to the forefront of informatics and health care innovation.


WHO WE ARE

MISSION AND CORE VALUES

The CRI’s mission is to provide informatics resources and services to the BSD, to participate in clinical and biomedical research of the highest scientific merit, and to support and promote research and education in the field of informatics. Our mission, current activities, and plans for

SCIENTIFIC PROGRESS • • •

Our support leads to grants, publications, and prestige for the University. The work we do is scientifically rigorous and of the highest quality. We strive to be the informatics hub of the University and an asset in recruiting faculty and donors.

PARTNERSHIP • • •

As collaborators, we build strong relationships with the researchers who use our services. We work hard to develop customized solutions for challenging problems. We contribute across the multiple phases of a research project, from acquiring data to storing, analyzing, and sharing it, to project and program management. We are deeply engaged in the research we support and proud of the scientific progress and improved health care that result from it.

PROFESSIONALISM • • • • •

4

We provide a client experience of the highest quality. Across multiple service lines, users experience a seamless partnership with one primary point of contact. Our staff are experts in their fields. Our fees, requirements, and expected turnaround times are competitive and presented in a transparent way. Our communication with users is clear, consistent, and always professional.

CRI Annual Report 2014


WHO WE ARE

the future are based on and informed by the core values below. In all our work, from everyday provisioning of resources to large-scale collaborative projects, we prioritize these values of scientific progress, partnership, professionalism, security, accessibility, and innovation.

SECURITY • • • •

Protecting sensitive data is our top priority and we uphold the highest standards for security and compliance. We consistently adhere to IT security policies, procedures, and best practices to ensure compliance with regulatory standards. Our IT infrastructure and services are the most secure on campus. We are proactive in our security measures, remaining vigilant and adaptable to stay ahead of threats.

ACCESSIBILITY • • •

We are committed to matching researchers with appropriate services and then developing personalized solutions to meet specific needs. Faculty of all technical backgrounds can rely on us for assistance in obtaining the services they need. All BSD researchers are welcomed and treated equally, regardless of faculty rank, department, or other affiliations.

INNOVATION • • • •

We are engaged with our research community and committed to evolving along with it. We seek and respond to feedback from our users. We are growing with the ever-changing informatics landscape to continue meeting the needs of researchers. Our work represents the forefront of informatics research.

cri.uchicago.edu

5


WHO WE ARE

CRI SERVICE LINES

Our administration, communications, and project management team

6

Since 2011, the CRI has been providing

collaborators, and are used by faculty

the BSD with a growing and improving

across a wide variety of departments.

selection of state-of-the-art technol-

Our support for BSD research is orga-

ogies and services for working with

nized into four primary service lines:

research data. These secure, stan-

custom applications development, bio-

dards-compliant resources are open

informatics services, IT resources, and

to all members of the BSD and their

the Clinical Research Data Warehouse.

CRI Annual Report 2014


WHO WE ARE

APPLICATIONS DEVELOPMENT The CRI Applications Development team, led

Also maintained by this team are the CRI’s

by Brian Furner, develops and maintains a

data management solutions. The CRI oper-

wide variety of custom applications for BSD

ates and supports Velos eResearch, a clinical

research. These offerings are each unique

trials management system that integrates

and tailored to the researchers’ specific

study administration and data management,

needs. This team’s work includes support for

and REDCap, a web-based application sup-

patient registries, multi-institution clinical

porting data collection strategies for research

research data networks, and unstructured

studies with tools for building and managing

text indexing and searching.

online surveys and databases.

The CRI Applications Development team

BIOINFORMATICS CORE

cri.uchicago.edu

7


WHO WE ARE

APPLICATIONS DEVELOPMENT 2014 HIGHLIGHTS

Built a web-based patient registry application for the Pulmonary Hypertension Program, replacing an older system developed at Rush University and extending it with additional functionality.

Building a web-based consent and specimen tracking system for the GAIN project, a multi-institutional effort that will gather solid-tumor specimens from pediatric patients for genomic analysis.

Implemented a unified data request intake form for the Analytics Core that allows users from across the biomedical enterprise to submit requests and track their progress. The intake form feeds a CRI-hosted instance of IBM Rational Team Concert, customized specifically for the Analytics Core to enable a robust reportable workflow of requests as they travel through the data pipeline.

BIOINFORMATICS CORE

8

Under the leadership of Jorge Andrade,

building of custom pipelines to solve spe-

PhD, the Bioinformatics Core provides

cific problems, grant writing assistance,

advanced

to

and the self-service Galaxy analysis

BSD researchers. The team’s advanced

environment. They have also developed

bioinformaticians all are PhD scientists

a popular program of monthly training

and act as co-authors and collaborators

sessions in bioinformatics tools and

on research projects. The Core’s offer-

technologies. This year, they developed

ings include analysis of high-throughput

and held the CRI’s first multi-day training

genomic data using a variety of pipelines

workshop (for more on the workshop,

developed in-house (see next page), the

see page 34).

bioinformatics

services

CRI Annual Report 2014


WHO WE ARE

BIOINFORMATICS PIPELINES • Illumina pipelines for RNA-Seq, ChIP-Seq, Exome Sequencing, Whole Genome Re-Sequencing, Consensus Genotyping, De-novo Assembly, and Somatic Mutation Detection for Tumor/Normal Pairs • SOLiD pipelines for RNA-Seq, Whole Genome Re-Sequencing, ChIPSeq, and De-novo Assembly • One pipeline for Illumina and Affymetrix Expression Arrays • One pipeline for Affymetrix and Exiqon miRNA Arrays A detailed list of pipelines is available in Appendix C.

The CRI Bioinformatics Core

cri.uchicago.edu

9


WHO WE ARE

BIOINFORMATICS CORE IN 2014

116

requests received

76

projects completed

8+

papers published

Bioinformatics Core analysis clients receive not just a dataset but a fully developed report of their analysis results.

10

CRI Annual Report 2014


WHO WE ARE

IT OPERATIONS AND INFRASTRUCTURE Under

the

direction

of

Thorbjörn

archive resources for both groups and

Axelsson, our IT Operations and Infra-

individuals; support and automation for

structure group provides computing

Galaxy; and expert technical support. In

resources for BSD researchers, includ-

addition, the team works closely with

ing the resources that support all of the

the BSD Information Security Office

CRI’s other activities. Offerings include

(ISO) to ensure that all the CRI’s work is

a high-performance computing cluster

held to the highest standards for infor-

for fast, advanced data processing

mation security. (For more information

and analysis; server hosting and virtu-

on the ISO, see page 24.)

alization; secure storage, backup, and

Our IT Operations and Infrastructure team

cri.uchicago.edu

11


WHO WE ARE

OUR IT INFRASTRUCTURE High Performance Computing (HPC) cluster • • •

36 standard nodes (2.2 GHz, 2304 total cores, 256 GB RAM per node) 2 large memory nodes (2.27 GHz, 80 total cores, 1 TB RAM per node) 61 TB of shared high-performance storage space

Centralized, automated, encrypted, and secure data backup

Virtual Server Infrastructure to provision virtual servers on Linux and Windows platforms

A 1.2-petabyte Isilon cluster for data storage

Large Memory Linux Compute System •

Processors: 2.4 GHz, 80 total cores

Memory: 1 TB of RAM

Galaxy, a web-based portal for biomedical analysis that is integrated with the CRI’s HPC resources

CLINICAL RESEARCH DATA WAREHOUSE The Clinical Research Data Warehouse

for faculty researchers. In addition to

(CRDW) contains University of Chicago

maintaining i2b2, a cohort discovery

medical data dating back to 2006, avail-

interface that allows researchers to query

able for research. This team’s work, led

the data in the CRDW, the team has also

by Tim Holper, includes maintenance

contributed to the process of developing

of the CRDW, refreshes of existing data

a de-identified data portal to be released

sources, integration of new data sources,

in 2015.

and fulfillment of complex data requests

12

CRI Annual Report 2014


WHO WE ARE

8,115,516 encounters

749,621

43,399,463 procedures

patients

A LOOK INSIDE THE CRDW

19,316,421 medications

134,156,124 19,156,124

labs

diagnoses

The Clinical Research Data Warehouse team

cri.uchicago.edu

13


WHO WE ARE

ORGANIZATION AND LEADERSHIP For a detailed list of CRI staff, please see Appendix A.

Robert Grossman Chief Research Informatics Officer

Samuel Volchenboum Associate CRIO and CRI Director

Brian Furner Manager of Programming

Timothy Holper Manager of CRDW Development and Operations

Michael Daus Business Administrator

Seong Choi Norm Paterson Programmers

Julie Johnson Healthcare Business Analyst

Tiffany Cyrus Brian Leung Brad Orr Project Managers

Julissa Acevedo Business Systems Analyst

Stacie Landron Programmer/ Report Writer

Tomasz Oliwa Scientific Software Engineer

Luis Maciel Database Administrator

Michael Baltasi Executive Administrator

Caitlin Pike Communications Manager

14

Keith Danahey Lead Programmer, 1200 Patients

Ishai Strauss Programmer

Thomas Sutton Sr. DBA/ETL Developer

CRI Annual Report 2014

Jorge Andrade Director of Bioinformatics

Riyue Bao Tzuni Garcia Kyle Hernandez Lei Huang Sabah Kadri Yan Li Chunling Zhang Bioinformaticians Wenjun Kang Scientific Programmer

Thorbjรถrn Axelsson Director of IT Operations and Infrastructure

Andy Brook Beth Lynn Eicher Michael Jarsulic Sneha Jha Olumide Kehinde Systems Administrators


WHO WE ARE

SAMUEL VOLCHENBOUM, MD, PhD DIRECTOR / ASSOCIATE CRIO Sam has been a part of the CRI since May 2012 in his role as Associate Chief Research Informatics Officer, leading our faculty outreach and education efforts. In April 2013 he was appointed Director of the CRI and now leads our operations and strategic planning. In addition to his work in the CRI, Sam serves the Department of Pediatrics as Assistant Professor, is an Associate Director of the Institute for Translational Medicine, and is a Faculty Fellow in the Computation Institute. His research includes using proteomics to study neuroblastoma, a pediatric solid tumor; applying bioinformatics techniques to large clinical datasets; and creating tools to improve provider communication and patient care.

JORGE ANDRADE, PhD DIRECTOR OF BIOINFORMATICS As the technical director responsible for planning and oversight of the Bioinformatics Core, Jorge has extensive training in bioinformatics as well as many years of experience applying these tools within the pharmaceutical industry. Most recently, he led a 70-person bioinformatics team at the Beijing Genome Institute. Since joining the CRI in 2012, he has built an 8-person team of PhD scientists, all focused on delivering high-quality analyses of genomic and proteomic data. He has instituted the development and deployment of over ten industry-grade analysis pipelines, all running in the CRI high-performance computing environment. He has engaged in over 150 collaborations with over 75 researchers from the University of Chicago and elsewhere. The rigorous analysis provided by his group has become the substrate for numerous grant applications and peer-reviewed manuscripts.

THORBJÖRN AXELSSON DIRECTOR OF IT OPERATIONS AND INFRASTRUCTURE With over 15 years experience in IT for research and higher education, Thorbjörn leads the CRI’s work of running, expanding, and improving our secure computing infrastructure. He works closely with the BSD Information Security Office to ensure that the CRI’s resources and services remain the most secure and compliant available. Prior to joining the CRI in 2014, Thorbjörn was Associate Director of Enterprise Infrastructure at the University of Kansas. He has a broad range of IT expertise, with a background that includes management, research support, IT infrastructure on all levels, security, project management, IT architecture, and software development. Thorbjörn has a Master’s degree in Computing Science from University of Gothenberg, Sweden, and is particularly interested in research computing, IT architecture, and IT security.

cri.uchicago.edu

15


WHO WE ARE

MICHAEL BALTASI, PhD EXECUTIVE ADMINISTRATOR Michael joined the CRI as its Executive Administrator in March 2014, bringing an extensive background in financial analysis, corporate management, and higher education administration to the position. Michael is responsible for planning and oversight of our financial and administrative functions. He works closely with the rest of the leadership team to develop both shortand long-term organizational plans; oversees the CRI’s project portfolio; coordinates activities across service areas; and provides a full range of administrative, financial, and strategic support.

BRIAN FURNER MANAGER OF PROGRAMMING A graduate of the College, Brian has been a part of the University of Chicago community in various capacities since 1993 and with the CRI since its creation. After years of working on the systems administration side of IT within the BSD, Brian transitioned into software development in 2005 and has been focused on that ever since. Through his work in the Department of Medicine, Brian developed broad knowledge of clinical research data, clinical research applications, and the methods for effectively dealing with the complexities of this domain. Lessons learned in the course of this work proved vital as the CRI began development and operations in early 2011. As Manager of Applications Development, Brian oversees a team of developers who are responsible for providing custom software solutions to the BSD research community.

TIMOTHY HOLPER MANAGER OF CRDW OPERATIONS AND DEVELOPMENT Timothy leads the architecture, development, and operations of the Clinical Research Data Warehouse (CRDW) team. With advanced degrees in Computer Science as well as Social Science Research and Statistics, Timothy brings together data warehousing expertise and a deep understanding of research data. Prior to joining the CRI, Timothy developed research applications in the Department of Medicine at the University of Chicago, working on predictive statistical models and parallel processing algorithms. He developed social science databases for crime mapping in public housing and program evaluations such as the City of Chicago’s Community Policing program. Timothy aspires for the CRI to make a significant impact on the volume and quality of data available to researchers in the Biological Sciences Division through the efforts of the CRDW team.

16

CRI Annual Report 2014


THIS YEAR’S HIGHLIGHTS

This year was one of growth for the CRI. We expanded and improved many of the resources we offer to the BSD community, making it easier than ever to do complex research efficiently and securely. Our move to Kenwood Data Center and partnerships with other University groups positioned us for continued future growth, as we simultaneously streamlined our processes and upgraded our equipment to continue providing our current services at the highest possible level.


THIS YEAR’S HIGHLIGHTS

THE CRI CONTINUES TO GROW As we continue to find ways to expand

Executive Administrator and Thorbjörn

and improve the resources we offer to

Axelsson as Director of IT Operations

the BSD research community, our team

and Infrastructure. In addition, three

has grown and changed as well. Since

bioinformaticians, four programmers, a

last year, we have welcomed many new

senior project manager, and a healthcare

staff to the CRI. Two of our leadership

business analyst have joined our team.

roles were filled by Michael Baltasi as

Center for Research Informatics Staff 2014

18

CRI Annual Report 2014


THIS YEAR’S HIGHLIGHTS

CRDW IMPROVEMENTS In November 2013, we made the first of

and reduced turnaround time, as well as

several changes to the CRDW request

core subsidy grants generously provided

process

weekly

by the Institute for Translational Medicine

office hours. Prior to submitting a data

(for details, see http://itm.uchicago.edu/

request, investigators first meet in person

funding/subsidy-awards/core-subsidies),

with CRDW staff. The CRI’s experts help

helped us to keep CRDW data accessible

researchers to get to know the data types

to all researchers who need it.

when

we

introduced

that are available and formulate a well-defined request. This also allows the CRI to produce a good-faith estimate of the time

As in past years, we have continued to expand the amount and types of data in

involved in fulfilling the request.

the CRDW, making it an even more robust

Office hours have helped to streamline

research. Data from billing systems, includ-

the request process, reduced unnecessary

ing both facility and professional fees, have

time spent waiting for clarifications, and

been added, as well as National Death

made it easier for researchers to get the

Registry information. We’ve integrated

precise data they need each time. In addi-

more than 100,000 new patients and one

tion, we have made the request process

million new encounters. In addition, begin-

more open and transparent and reduced

ning during the first quarter of 2015, CRDW

the average turnaround time for requests.

datamart users will now be able to access

As we introduced a chargeback policy for

reports through Cognos, an IBM business

data requests this year, this transparency

intelligence tool.

resource

for

clinical

and

translational

THE CRDW IN 2014

141

data requests fulfilled

25+

publications enabled

cri.uchicago.edu

20

departments served

19


THIS YEAR’S HIGHLIGHTS

THE CRDW REQUEST PROCESS

3

4

Client submits online data request form.

If PHI is requested, CRDW team verifies IRB protocol.

CRDW team creates estimate and statement of work.

8

7

6

5

Full dataset and invoice are delivered to client.

CRDW team conducts data verification and quality check.

Sample dataset is delivered for client approval.

Report writer creates SQL code and integrates data sources.

1

2

CRDW team meets with client at weekly office hours.

BIOINFORMATICS AND MOLECULAR PATHOLOGY In June 2013, the Department of Pathol-

In order to provide these state-of-the-art

ogy launched a new division dedicated

Next-Generation Sequencing clinical test-

to genomic and molecular pathology.

ing services, the division has partnered

This division comprises four laboratories,

with the CRI Bioinformatics Core. CRI bio-

specializing in molecular diagnostics, cyto-

informatician Sabah Kadri, PhD, works full-

genics, clinical genomics, and translational

time with the Department of Pathology to

research. Part of the division’s mission is

provide this complex genomic analysis. The

to provide a comprehensive genetic and

CRI is excited to be a part of this important

genomic laboratory service for physicians

clinical and translational research resource.

and patients.

20

CRI Annual Report 2014


THIS YEAR’S HIGHLIGHTS

DE-IDENTIFIED DATA PORTAL The CRDW team has spent over a year

seeking in a secure and compliant environ-

developing our SEECohorts de-identified

ment. This will enable researchers to study

data portal, set to launch in spring 2015.

de-identified data without submitting a

Like the existing i2b2 system, SEECohorts

data request.

will be a cohort discovery tool allowing users

SEECohorts has been demonstrated to

to query the data in the CRDW. However,

investigators from several departments

i2b2 only returns cohort counts, and users

and received an overwhelmingly positive

must then submit a data request to receive

response. Once released, this portal has

any detailed patient data. The SEECohorts

the potential to greatly simplify cohort dis-

portal will take cohort discovery further by

covery, democratize data availability, and

allowing users to see and interact with a

enhance clinical research throughout the

de-identified version of the data they are

University.

The SEECohorts interface makes it easy to explore de-identified data.

cri.uchicago.edu

21


THIS YEAR’S HIGHLIGHTS

MOVE TO KENWOOD DATA CENTER On February 10, 2014, the CRI completed the extensive process of moving all our resources from the outdated equipment in the Prudential Data Center to the stateof-the-art Kenwood Data Center on the University of Chicago campus. Led by then–Director of Systems and Security Plamen Martinov, this ambitious project set the goal of decommissioning or migrating every resource at Prudential, while maintaining the integrity of all user

Prudential after decommissioning

data. Our team decommissioned 324 systems, moved 140 terabytes of research data, and migrated hundreds of users to Kenwood. They decommissioned three HPC clusters and donated 80 compute nodes to other departments. The project was completed on schedule with no loss of data. Moving to Kenwood has allowed the CRI to improve the security, reliability, and recoverability of our systems through the

Servers prepared for recycling

modernization of data center services and standard architecture. Kenwood is equipped to house systems compliant with federal guidelines, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Federal Information Security Management Act (FISMA). This makes the CRI’s systems a valuable resource for researchers working with patient data or who have received grants that require compliance with these guidelines.

22

Kenwood Data Center

CRI Annual Report 2014


THIS YEAR’S HIGHLIGHTS

EXPANDED COMPUTING RESOURCES Moving to Kenwood also gave us the

We also increased our data storage

opportunity to continue expanding and

capacity from 700 terabytes to 1.2 peta-

improving the computing resources that

bytes (1,200 terabytes), allowing us to

we offer to the BSD community.

provide secure storage with encrypted

Most significantly, we expanded our HPC resources, bringing our overall HPC

backup and restore capabilities to more groups and individual users.

capacity to 2304 standard-memory CPU

Finally, we established a vulnerability

cores plus 80 high-memory CPU cores

management program and achieved 100

and making room for more researchers

percent coverage for all systems. The

to analyze larger amounts of data more

research data stored on CRI resources

quickly and powerfully. In 2014, an aver-

is backed up daily. All of these additions

age of 80 researchers per month used

reflect our commitment to being the

our HPC resources.

University’s most secure and advanced computing resource for research.

CRI STORAGE USAGE

142 groups

696

individuals

658 TB data

24

114M

departments

files

cri.uchicago.edu

23


THIS YEAR’S HIGHLIGHTS

BSD SECURITY INITIATIVE Because the researchers we support often

and consistent approach to information

work with patient records and other very

security across the Division, the BSD cre-

sensitive data, the CRI prioritizes informa-

ated the Information Security Office (ISO)

tion security in our computing infrastruc-

within the Office of the CRIO. Plamen was

ture and operations. We have a strong

appointed inaugural Director of BSD Infor-

record of achieving a secure, highly com-

mation Security.

pliant environment while still working as efficiently as possible.

The ISO now offers a comprehensive set

As CRI Director of Systems and Security,

departments, including risk management

Plamen Martinov worked with CRIO Bob

and compliance, security and risk consult-

Grossman and the research informatics

ing, policies and standards, vulnerability

governance committees to spearhead

management, incident response, security

an information security program that

monitoring, firewall management, and

included encryption of laptops and mobile

security awareness and training. The CRI

devices, consistent cybersecurity policies,

will continue to work closely with the ISO

and attendant monitoring and auditing.

on security initiatives and to hold our own

of information security services to BSD

In August 2014, recognizing the successes of the CRI and the value of a centralized

work to the highest standard of security and compliance.

For more information on the ISO, visit crio.uchicago.edu/security.

24

CRI Annual Report 2014


RESEARCH. POWERED BY THE CRI. In the past year, the CRI significantly expanded the scope of our work to become involved in major, collaborative projects that are changing the face of health care, childhood development, clinical research, and more. In our work with these initiatives, we are leveraging the same state-of-theart computing resources and expertise in custom informatics programming, large-scale data analysis, and data sharing that we already use to enable important BSD research. Our commitment to the projects outlined in this section takes the impact of our work far beyond the University as we contribute to creating opportunities for collaborative research; making health care more personalized, effective, affordable, and accessible; and improving lives.


RESEARCH. POWERED BY THE CRI.

1200 PATIENTS/GENOMIC PRESCRIBING SYSTEM Jointly sponsored by the CRI and the Cen-

relational database along with curated phar-

ter for Personalized Therapeutics, the 1200

macogenomic data from published studies.

Patients project seeks to develop a new

Through the Genomic Prescribing System

medical system model for personalized care,

(GPS), a research web portal, participating

in which the genetic profile of a patient can

care providers can then access information

be incorporated into treatment decisions.

to help them predict how a patient may

Led by University of Chicago physicians Dr.

respond to a given medication.

Mark Ratain and Dr. Peter O’Donnell, business lead and venture partner Ken Bradley, and CRI lead programmer Keith Danahey, the

Rather than delivering raw genotype information, the GPS provides a patient-specific

project has been underway since 2011.

interpretation of the genomic data for a

Patients who agree to participate in the

This information is distilled into a summary

study are genotyped in a CLIA-certified lab.

designed to be understood in 30 seconds

Their genetic information is then stored in a

or less, so that providers can easily use

group of commonly prescribed medications.

The GPS uses a simple stoplight system to illustrate information for providers.

26

CRI Annual Report 2014


RESEARCH. POWERED BY THE CRI.

Beyond the stoplight system, the GPS offers providers brief summaries of patient-specific pharmacogenomic data for each drug.

the tool during clinic visits. Physicians can

the leadership of Keith Danahey, our 1200

use this knowledge to inform their choices

Patients team provides custom program-

in prescribing a medication—for example,

ming, database design, data stewardship, and

pre-identifying patients at higher risk for

data analysis, among other essential work.

severe side effects or predicting when a

Keith’s team was joined by programmer Ishai

patient may need alternative dosing.

Strauss in fall 2014, and together they design

In 2014, the University of Chicago Innovation Fund awarded a $100,000 investment for further development of the GPS tool. The team plans to use this funding to continue developing a more robust version of the tool and to further validate the system in hospital and health care environments outside the University.

powerful yet simple software solutions for both physicians and researchers. In addition, they have undertaken complex data analysis that has led to exciting developments in the field of pharmacogenomics—the study of the effect of genetic variation on drug response or toxicity. The team’s work was accepted for presen-

The GPS has been designed and maintained by the CRI since its inception. Under

tation at the AMIA 2015 Joint Summits on Translational Science in San Francisco, further underscoring the impact of this project.

cri.uchicago.edu

27


RESEARCH. POWERED BY THE CRI.

THIRTY MILLION WORDS The Thirty Million Words (TMW) initiative,

environment is provided to parents to help

directed by Dr. Dana Suskind, is an innova-

monitor progress and set goals.

tive, evidence-based intervention program designed to help narrow the language gap between children from lower-income families and those in wealthier households. Studies have demonstrated that the number of words a child is exposed to before the age of four is significantly correlated with the child’s eventual IQ and academic outcomes. Furthermore, this early language exposure is correlated with income: children from lower-income households hear, on average, about thirty million words fewer than their peers from more affluent homes during this critical developmental period, leaving them less likely to achieve academic success. Dr. Suskind created TMW in 2010 to address this gap by bringing awareness to the importance of spoken language in early childhood development and giving parents the tools and knowledge to enrich their children’s home language environment. TMW

combines

education,

Recent preliminary trials have showed that parents and caregivers who received this quantitative linguistic feedback spoke and interacted more with their children. In April 2014, TMW was selected for a PNC Foundation multi-year grant that will support a larger-scale, five-year longitudinal study of the program’s impact on vocabulary development and school readiness in 200-250 children. TMW will soon be implemented at the community level with a center-based approach that includes daycare facilities, with the long-term goal of reaching parents and caregivers at the citywide level and beyond. TMW’s curriculum and the scientific evaluation of its results require a significant amount of computing power and data storage. The CRI is proud to partner with TMW to fill these needs. For the first phase of the project, our

technology,

and behavioral strategies in an interactive multimedia curriculum for parents. Home visits from coaches, animations that make the underlying science of the project accessible, and videos teaching easy-to-follow strategies lay the foundation for parents to enhance their linguistic interaction with their children. Quantitative feedback gleaned from

IT Operations and Infrastructure team has provisioned a set of virtual machines to host TMW’s software in Kenwood Data Center. Over the next two years, the CRI’s application development experts will have the lead role in creating a suite of applications for Dr. Suskind and the TMW team. We look forward to contributing our resources and expertise to this innovative project.

weekly recordings of the home language

28

CRI Annual Report 2014


RESEARCH. POWERED BY THE CRI.

CAPriCORN Chicago Area Patient Centered Outcomes Research Network In December 2013, a $7 million federal

identify and fill gaps in coverage through new

grant was awarded by the Patient-Centered

partnerships, and improve health care deliv-

Outcomes Research Institute (PCORI) to a

ery and patient outcomes both locally and

coalition of twenty Illinois health and hospi-

nationwide.

tal organizations, including the University of Chicago Medicine. That coalition, the Chicago Area Patient Centered Outcomes Research Network (CAPriCORN) is now part of a nationwide network working to reduce heath disparities among diverse populations of patients and develop better models for health

Building and operating this data network is a major undertaking, requiring a robust computing infrastructure, the creation of procedures for the standardization of data types, and the utmost attention to maintaining patient privacy. Because of the CRI’s suc-

care delivery.

cesses in building and maintaining the Clinical

With the PCORI grant, CAPriCORN was tasked

urally positioned to play a key role in UCM’s

with developing a cross-institutional data

contribution to winning the PCORI grant and

infrastructure capable of pooling and sharing

the development of the CAPriCORN project.

electronic heath record and outcomes data

Brian Furner, CRI’s Manager of Applications

from more than one million Chicago-area

Development,

patients, including many high-risk patients.

informatics lead on this high-profile project,

This pool of information will contribute to

providing support at both institutional and

research about how providers in complex

network levels through his participation in

urban settings can overcome barriers to effec-

the Informatics Work Group, the Data Model

tive treatment, improve health outcomes, and

and Data Standards Committee, and several

drive down the costs of health care for both

cohort working groups.

Research Data Warehouse, the CRI was nat-

common and rare conditions. CAPriCORN’s network will focus specifically on sickle cell disease, anemia, asthma, recurrent Clostrid-

has

been

the

University’s

The CRI has provided the local computational infrastructure for the project, including the

ium difficile, diabetes, and obesity.

data in the CRDW and Clarity (UCM’s billing

In building this network, CAPriCORN’s mission

the data into the CAPriCORN Common Data

is to provide an informatics infrastructure that

Model, and the servers housing the datamart

will support collaborative, patient-centered

and the PopMedNet client used in the proj-

outcomes research in the Chicago area. The

ect. Through the CRI’s efforts, the University

research this network makes possible will

of Chicago has been a consistent leader in

help care providers to overcome the barri-

implementing the informatics components of

ers of fragmentation and limited resources,

this important project.

system), the code that transforms and loads

cri.uchicago.edu

29


RESEARCH. POWERED BY THE CRI.

SHRINE Shared Health Research Information Network The CRI-managed instance of i2b2 allows

better-informed

researchers to query the Clinical Research

potential cohorts for clinical trials, and

Data Warehouse to explore available data

develop stronger grant applications. Com-

and identify potential research cohorts.

bining the research data from multiple insti-

The Chicago-area Shared Health Research

tutions will not only enhance these existing

Information Network (SHRINE) pilot pro-

benefits but will encourage collaboration

gram builds on this capability and expands

across institutions and enable the planning

it across a wider array of data sources.

of research that requires large sample sizes

SHRINE brings together the CTSA and

not easily available at individual locations,

informatics groups at three local research

including research in population health and

institutions: Northwestern University, the

health services.

University of Illinois at Chicago, and the University of Chicago. All three institutions currently have or will soon implement both i2b2 and VIVO, a web application that allows users to search for researchers by various criteria across a network of participating

identify

The Chicago-area SHRINE is seen as a key component in the CTSA grant renewal process and will be an important part of a growing federated research portfolio that will give the University of Chicago a

organizations.

competitive advantage among its peers in

The SHRINE project leverages this existing

nership with the Institute for Translational

infrastructure to create a common tool that

Medicine, the CRI is playing a key role in

can query data repositories at all partici-

developing this local network of de-identi-

pating institutions. Researchers will be able

fied patient data.

procuring grant funding. Through our part-

to specify inclusion and exclusion criteria including demographics, diagnoses, and medications and receive patient counts meeting their criteria from not just their own institutions, but from all hospitals and heath care programs participating in the network. Similar to i2b2, this tool will protect patient

Building on our previous efforts in deploying i2b2 to our research community, CRI staff are providing technical, regulatory, and research guidance on the project. With the initial proof-of-technology phase of the project successfully completed, we are now

privacy by returning only aggregate counts.

focused on the next phase, to include shar-

The cohort counts returned by i2b2 que-

institutions.

ries

30

hypotheses,

ing of cohort counts among the member

enable investigators to generate

CRI Annual Report 2014


CONNECTION TO CAMPUS

In addition to supporting BSD scientists and taking part in innovative research, a third and equally important part of the CRI’s mission is to contribute to an education and training program that will help to create the next generation of informatics experts. It is our goal to have our researchers be comfortable and confident users of both our resources and other technologies for biological computing. By providing educational opportunities to the University of Chicago community—most of them free of charge—we enable users to take this knowledge back to their own departments and labs, enhancing their ability to conduct meaningful, advanced informatics research and raising the profile of the University in the larger informatics community.


CONNECTION TO CAMPUS

BIOINFORMATICS TRAINING For over two years, the CRI Bioinfor-

attracted more than 800 total partic-

matics Core has offered a free monthly

ipants, routinely filling every available

training seminar to the BSD community,

seat.

covering a variety of bioinformatics tools and techniques. Topics have ranged from the Linux command line to R and Bioconductor to integrating the CRI’s high-performance computing resources into bioinformatics analyses. Since these seminars began in May 2012, they have

Topics covered this year included a three-part series on R programming, a two-part series on Python programming, an introduction to the Linux command line, analysis of several types of Illumina data, and an overview of how to use the

Jorge Andrade leads a seminar on the Linux command line

32

CRI Annual Report 2014


CONNECTION TO CAMPUS

COMMENTS FROM PARTICIPANTS “This is a tough area to fully cover in three hours but the team did a phenomenal job. Kevin, Jorge, and Olumide were great at making it easy for the layperson to understand. […] I am looking forward to attending regular CRI sessions.” (January) “I found it remarkably helpful and was exceptionally pleased with being able to attend. I found it useful not only to understand the ‘how’ part of the analysis pipeline, but also the ‘why.’” (February) “I learned a lot, and this will be very helpful for my own work.” (April) “This was incredibly informative and the instructors answered all the students’ questions. I am very impressed with this course and feel like I learned a lot today.” (May) “Very knowledgeable instructor and giving lots of helpful hints.” (July) “Very clear presentation and intro to the subject.” (September)

CRI’s computational infrastructure for

Survey feedback showed that over 98

bioinformatics analysis. In March, guest

percent of participants were satisfied

speakers Natalia Maltsev, MD, PhD, and

with their session, and the CRI bioin-

Dina Sulakhe, MS, demonstrated the

formaticians leading the sessions were

Lynx integrated systems biology plat-

evaluated by 98 percent of participants

form currently being developed in the

as “very knowledgeable” in the subjects

Human Genetics Department and Com-

they were teaching.

putation Institute.

cri.uchicago.edu

33


CONNECTION TO CAMPUS

WORKSHOP LEARNING SERIES Based on the success of our monthly bio-

Our first workshop was held in Decem-

informatics training seminars, this year

ber 2014 and was dedicated to the bio-

we launched the CRI Workshop Learning

informatics analysis of high-throughput

Series. By hosting multi-day educational

genomic data. This four-day workshop

workshops open to researchers and

focused on how to make the most of the

industry professionals from around the

latest technologies and tools for working

United States, we seek to bring educa-

with large and complex datasets, includ-

tional opportunities in informatics to

ing both in-depth practical theory and

a larger audience of scientists and stu-

hands-on training. Instructors included

dents while encouraging collaboration

CRI bioinformaticians and distinguished

and community-building in informatics

guest speakers from the University of

research.

Chicago community and beyond, with

COMMENTS FROM PARTICIPANTS

“I was pleased with the wide range of information that was covered during the workshop, and the diversity of skills I got to practice during these four days.” “Despite having background in the content covered, I found it a very good refresher on things I haven’t thought about in a while.” “Even after speaking for ~3 hours, the instructor was not boring at all. […] I heard someone say in the lunch line how this tutor spoke with clarity and command.” “This course was appropriately geared towards researchers who use high-throughput sequencing in their work, regardless of their knowledge level, which I was quite impressed by. […] I am grateful that the CRI held this course and I think the speakers and instructors did an excellent job guiding us through the process.”

34

CRI Annual Report 2014


CONNECTION TO CAMPUS

sessions covering commonly used bioinformatics tools, genomics data visualization, analysis workflows, R programming, high-performance computing, and more. The workshop also included several social events to encourage discussion and collaboration among participants. Interest in the workshop was high, with 45 applications received. After an academic review, 33 were accepted and attended the workshop. Participants were primarily graduate students and postdoctoral fellows, along with several staff and faculty. Through a partnership with the Committee on Clinical and Translational Science (see next page), 12 participants received course credit for completing the workshop. Survey feedback after the workshop demonstrated that the vast majority of participants found the material very useful and the quality of instruction highly satisfactory. For more information on the Workshop

Learning

Series,

please

visit

learn.cri.uchicago.edu.

cri.uchicago.edu

35


CONNECTION TO CAMPUS

CCTS INFORMATICS COURSES Further reflecting our commitment to informatics

education,

CRI

Director

Sam Volchenboum is an instructor for

CCTS

informatics courses offered through the Committee on Clinical and Translational Science (CCTS). The CCTS, a freestanding academic unit within the BSD, is organized by the Center for Health and the Social Sciences and the Institute for Translational Medicine with the goal of enhancing multidisciplinary training in

With joint input from CHeSS and the ITM, the CCTS works to create new course offerings in clinical and translational science. Areas of concentration include:

clinical and translational science. Sam co-led the development of the Spring 2014 course “Introduction to Clinical Research Informatics,” an introductory survey of the fundamentals of information technology as applied to health care. Sam and co-instructors Sameer Badlani, MD, and David McClintock, MD, taught a curriculum tailored to post-doctoral fellows, residents, and faculty. They focused on technology’s impact on patients, providers, and hospitals, including the topics of decision support, system integration, educational applications,

emerging

technologies,

and security and compliance.

• Comparative Effectiveness Research • Translational Informatics • Health Services Research • Quality and Safety • Clinical Research • Community-Based Research • Global Health • Pharmacogenomics For more information about the CCTS and a list of current course offerings, visit chess.bsd.uchicago. edu/training/ccts.html.

Other courses to which Sam will contribute are currently in development, including a clinical research methods course to be offered in 2015.

36

CRI Annual Report 2014


CONNECTION TO CAMPUS

REDCAP TRAINING The University of Chicago has been part

to fostering a community of capable,

of the REDCap consortium since 2010.

engaged REDCap users by providing train-

REDCap is a web-based application that

ing and educational opportunities.

supports data collection strategies for

We offer online video tutorials and PDF

research studies with tools for building

guides on our REDCap website, available

online surveys and databases, developed

Julissa Acevedo, CRI Business Systems

of Chicago’s REDCap now supports more

Analyst and resident REDCap expert,

than 1200 users from across the BSD and

1000

cri.uchicago.edu/redcap. In addition,

at

by Vanderbilt University. The University

houses over one thousand projects.

provides consultations, demonstrations,

The CRI performs continual upgrades to

sessions for those in need of more per-

REDCap to improve the user experience

sonalized guidance. As we continue to

and offer the latest software features. A

upgrade REDCap, Julissa provides new

major upgrade was completed in Novem-

feature demonstrations and updated train-

ber 2014. In addition to operating this

ing documentation on our website with

valuable resource, the CRI is committed

each upgrade.

and individual and small-group training

GROWTH IN REDCAP PROJECTS, 2013-14

800

600

400

200

cri.uchicago.edu

DEC

NOV

OCT

SEP

AUG

JUL

JUN

MAY

APR

MAR

FEB

JAN

DEC

NOV

OCT

SEP

2014 AUG

JUL

JUN

MAY

APR

MAR

FEB

JAN

2013

37


CONNECTION TO CAMPUS

OUR PARTNERS The CRI is proud to work alongside our campus partners in advancing biomedical and informatics research. Our work is partially made possible by a generous investment from the Institute for Translational Medicine (ITM). More information about the ITM is available at itm.uchicago.edu.

Other partners with whom we are proud to collaborate include: Biological Sciences Division

bsd.uchicago.edu

Biostatistics Core

biotime.uchicago.edu

Chicago Biomedicine Information Systems

help.bsd.uchicago.edu

Center for Health and the Social Sciences

chess.bsd.uchicago.edu

Comprehensive Cancer Center

cancer.uchicago.edu

Computation Institute

ci.uchicago.edu

Genomics Core

genomics.uchicago.edu

Human Imaging Research Office

hiro.bsd.uchicago.edu

Institute for Genomics and Systems Biology

igsb.anl.gov

Institutional Review Board

humansubjects.uchicago.edu

IT Services

itservices.uchicago.edu

Office of Clinical Research

bsdocr.bsd.uchicago.edu

University of Chicago Medicine UCM Center for Quality

38

uchospitals.edu uchospitals.edu/visitor/quality

CRI Annual Report 2014


LOOKING AHEAD

The past year has been the CRI’s most innovative and prolific thus far. Continuing this trend, 2015 will see us play expanding roles in major, multifaceted projects already underway, as well as embarking on new ones. In addition to continuing our work on the projects profiled in this report, we will serve as lead developer for the Harvard-led pediatric GAIN Consortium, for which we will build a multi-institutional specimen tracker and database. We will also provide bioinformatics and IT support for the $10M Transdisciplinary Center for Prematurity Research grant awarded by the March of Dimes Foundation in December 2014, and we will serve as technical lead on a redesign of the International Neuroblastoma Risk Group database, which will include expanding the size and scope of data feeds, standardizing clinical information, and building new querying and visualization tools for researchers and clinicians.

cri.uchicago.edu

39


LOOKING AHEAD

In addition to our work on these high-profile projects, we will continue improving and upgrading the resources we offer to BSD researchers by transitioning to new computing and storage environments. Our data warehouse team is in the process of migrating its operations to the exceptionally powerful IBM Netezza data warehouse appliance and advanced analytics applications, and our current high-performance computing cluster will have its hardware refreshed this year to double its processing ability. As our reputation for providing exceptional technical solutions for research continues to grow, we participate as partners on projects of increasing scope and complexity. To meet these demands, we continue to recruit expert programmers, data scientists, bioinformaticians, and other IT professionals to our team. Our growth both reflects and contributes to the robust research environment of the BSD. Over the next year, we look forward to pushing the limits of technology in our support of the biomedical sciences at the University of Chicago.

The CRI leadership team

40

CRI Annual Report 2014


APPENDIX

cri.uchicago.edu

41


APPENDIX

APPENDIX A: CRI USAGE BY THE NUMBERS

2014 CRDW DATA REQUESTS BY DEPARTMENT

42

CRI Annual Report 2014


APPENDIX

2014 BIOINFORMATICS CORE PROJECTS BY DEPARTMENT

GROWTH IN BIOINFORMATICS CORE USAGE, 2012-2014

cri.uchicago.edu

43


APPENDIX 2014 CRI STORAGE USERS BY DEPARTMENT

44

CRI Annual Report 2014


APPENDIX 2014 CRI HIGH-PERFORMANCE COMPUTING USAGE BY DEPARTMENT

CPU HOURS

NUMBER OF JOBS

cri.uchicago.edu

45


APPENDIX

2014 REDCAP USERS BY DEPARTMENT

46

CRI Annual Report 2014


APPENDIX

OVERALL CRI IMPACT IN THE BSD

cri.uchicago.edu

47


APPENDIX

APPENDIX B: CRI STAFF LIST

Samuel Volchenboum, MD, PhD

BIOINFORMATICS CORE

Director and Associate CRIO

ADMINISTRATION

Stacie Landron, MS, RN Programmer/Report Writer

Jorge Andrade, PhD Director of Bioinformatics

Luis Maciel Senior Systems Analyst/DBA

Michael Baltasi, PhD

Riyue Bao, PhD

Executive Administrator

Bioinformatician

Thomas Sutton, MS Senior DBA/ETL Developer

Tiffany Cyrus, MBA

Tzuni Garcia, PhD

Project Manager

Bioinformatician

Michael Daus

Kyle Hernandez, PhD

Business Administrator

Bioinformatician

Brian Leung

Lei Huang, PhD

Senior Project Manager

Bioinformatician

Brad Orr, MS

Sabah Kadri, PhD

Senior Project Manager

Bioinformatician

IT OPERATIONS AND INFRASTRUCTURE Thorbjรถrn Axelsson, MS Director of IT Operations and Infrastructure Andy Brook Senior Systems Administrator

Beth Lynn Eicher

Caitlin Pike

Wenjun Kang, MS

Senior Systems Administrator

Communications Manager

Scientific Programmer Michael Jarsulic

APPLICATIONS DEVELOPMENT

Yan Li, PhD

Senior Systems Administrator

Bioinformatician

Brian Furner

Chunling Zhang, MS

Manager of Programming

Bioinformatician

Julissa Acevedo

CLINICAL RESEARCH DATA WAREHOUSE

Sneha Jha, MS Intermediate Systems Administrator Olumide Kehinde Business Systems Analyst

Lead Systems Administrator

1200 PATIENTS

Seong Choi

Timothy Holper, MS, MA

Programmer

Manager of CRDW Development

Keith Danahey, MS

and Operations

Lead Application Developer

Julie Johnson, MPH, RN

Ishai Strauss

Healthcare Business Analyst

Programmer

Tomasz Oliwa, PhD Scientific Software Engineer Norm Paterson Programmer

48

CRI Annual Report 2014


APPENDIX

APPENDIX C: BIOINFORMATICS PIPELINES ILLUMINA RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis ChIP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation Exome Sequencing: Raw Data QC, Pre-processing, Mapping with 3 different tools, Realignment and Quality Recalibration, Multiple Samples Variant Calling, Variant Annotation, Variant Comparison, Filtration, and Summarization Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation Consensus Genotyping Pipeline: Genotyping, SNP Detection and InDel Detection using three different methods (Samtools, GATK and Atlas-2), comparison of variant calls, list of consensus call variants, and list of method specific calls De-novo Assembly: Raw data QC, Merging, Clipping, Filtering, Contigs Assemble, Scaffold Assembly, Assemble Statistics, Downstream Analysis Somatic Mutation Detection for Tumor/Normal Pairs: Raw Data QC, Pre-processing, Mapping with 2 different tools, Realignment and Quality Recalibration, Somatic Mutation Detection with 4 different tools, Variant Annotation, and Summarization

SOLiD RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation ChIP-seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation De-novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assemble, Scaffold Assembly, Assemble Statistics, and Downstream Analysis

ILLUMINA AND AFFYMETRIX EXPRESSION ARRAYS Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed Genes, Functional Annotation, and Pathway Enrichment Analysis

AFFYMETRIX AND EXIQUON miRNA ARRAYS Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed miRNAs, Predict miRNA Targeted Genes, Functional Annotation, and Pathway Enrichment Analysis

cri.uchicago.edu

49


APPENDIX

APPENDIX D: RESEARCH INFORMATICS GOVERNANCE AND OVERSIGHT A governance structure set up by the Office of the CRIO guides research informatics across the entire BSD, ensuring that informed long-term decisions for the Division are reached in a transparent and accountable way. The five committees of the Research Informatics Governance Structure bring together senior BSD and University of Chicago Medicine (UCM) leadership; information systems experts; patient privacy experts; and faculty representing basic science, clinical research, and translational research. Decisions from these committees guide the CRIO in establishing policies and procedures, prioritizing new initiatives, safeguarding patient information, and complying with BSD policies and applicable federal and state laws. In addition to the faculty representation on governance committees, an Informatics Oversight Committee is in place to ensure that research informatics activities and future plans are in line with the needs of research faculty. This committee, made up of faculty leaders representing both basic science and clinical departments, reports to the Research Advisory Committee, a BSD/UCM committee that serves as a key advisory body for the Dean for Research and Graduate Education. Membership lists for all research informatics governance and oversight committees are available at crio.uchicago.edu/governance.

APPENDIX E: PUBLICATIONS Below is a selection of recent publications made possible in part by the CRI’s research resources.

BIOINFORMATICS CORE Lowry DB, Hernandez K, Taylor SH, et al. The genetics of divergence and reproductive isolation between ecotypes of Panicum hallii. New Phytol. 2015 Jan; 205(1):402-14. doi: 10.1111/nph.13027. Epub 2014 Sep 23. Malcom JW, Hernandez KM, Likos R, Wayne T, Leibold MA, Juenger TE. Extensive cross-environment fitness variation lies along few axes of genetic variation in the model alga, Chlamydomonas reinhardtii. New Phytol. 2015 Jan;205(2):841-51. doi: 10.1111/nph.13063. Epub 2014 Sep 29. Bao R, Huang L, Andrade J, et al. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Informatics. 2014 Sep 21;13(Suppl 2):67-82. doi: 10.4137/CIN.S13779

50

CRI Annual Report 2014


APPENDIX

Fahrenbach J, Andrade J, McNally EM. The CO-Regulation Database (CORD): a tool to identify coordinately expressed genes. PLOS One. 2014 Mar. Volchenboum S, Andrade J, Huang L, et al. Gene expression profiling of Ewing sarcoma tumours reveals the prognostic importance of tumour–stromal interactions: a report from the Children’s Oncology Group. J Pathol: Clinical Research. 2014. doi: 10.1002/cjp2.9 Spranger S, Bao R, Gajewski T. Melanoma-intrinsic-catenin signaling prevents T cell infiltration and anti-tumor immunity. Journal for ImmunoTherapy of Cancer. 2014, 2(Suppl 3):O15. doi: 10.1186/20511426-2-S3-O15 Widau RC, Parekh A, Ranck MC, et al. The RIG-I like receptor LGP2 protects tumor cells from ionizing radiation. P Natl Acad Sci. 2013 Dec. Chen B, Moore TV, Li Z, et al. Gata5 Deficiency Causes Airway Constrictor Hyperresponsiveness in Mice. Am J Resp Cell Mol. 2013 Nov.

CLINICAL RESEARCH DATA WAREHOUSE Bailey KA, Savic D, Zielinski M, et al. Evidence of non-pancreatic beta cell-dependent roles of Tcf7l2 in the regulation of glucose metabolism in mice. Hum Mol Genet. 2014 Nov 14. doi: 10.1093/hmg/ddu577. He BZ, Ludwig MZ, Dickerson DA, et al. Effect of genetic variation in a Drosophila model of diabetes-associated misfolded human proinsulin. Genetics. 2014 Feb;196 (2):557-67. doi: 10.1534/ genetics.113.157800. Hong S, Le-Rademacher J, Artz A, McCarthy PL, Logan BR, Pasquini MC. Comparison of nonmyeloablative conditioning regimens for lymphoproliferative disorders. Bone Marrow Transpl. 2014 Dec 1. doi: 10.1038/bmt.2014.269. Wang Y, Hong S, Li M, et al. Noggin resistance contributes to the potent osteogenic capability of BMP9 in mesenchymal stem cells. J Orthop Res. 2013 Nov;31 (11):1796-803. doi: 10.1002/jor.22427. Nelson R, Liao C, Fichera A, Rubin DT, Pekow J. Rescue therapy with cyclosporine or infliximab is not associated with an increased risk for postoperative complications in patients hospitalized for severe steroid-refractory ulcerative colitis. Inflamm Bowel Dis. 2014 Jan;20 (1):14-20. doi: 10.1097/01. MIB.0000437497.07181.05. Sofia MA, Rubin DT, Hou N, Pekow J. Clinical presentation and disease course of inflammatory bowel disease differs by race in a large tertiary care hospital. Digest Dis Sci. 2014 Sep;59 (9):2228-35. doi: 10.1007/s10620-014-3160-0. Choi CH, Poroyko V, Watanabe S, et al. Seasonal allergic rhinitis affects sinonasal microbiota. Am J Rhinol Allergy. 2014 Jul-Aug;28 (4):281-6. doi: 10.2500/ajra.2014.28.4050. Kern DW, Wroblewski KE, Schumm LP, Pinto JM, Chen RC, McClintock MK. Olfactory function in Wave 2 of the National Social Life, Health, and Aging Project. J Gerontol B-Psychol. 2014 Nov;69 Suppl 2:S134-43. doi: 10.1093/geronb/gbu093. Li L, Zhan X, Wang N, et al. Does airway surgery lower serum lipid levels in obstructive sleep apnea patients? A retrospective case review. Med Sci Monitor. 2014 Dec 13;20:2651-7. doi: 10.12659/ MSM.892230.

cri.uchicago.edu

51


APPENDIX Naclerio RM, Pinto JM, Baroody FM. Drowning in applications for residency training: a program’s perspective and simple solutions. J Otolaryngol - Head N. 2014 Aug;140 (8):695-6. doi: 10.1001/ jamaoto.2014.1127. Patel RM, Pinto JM. Olfaction: anatomy, physiology, and disease. Clin Anat. 2014 Jan;27 (1):54-60. doi: 10.1002/ca.22338. Pinto JM, Schumm LP, Wroblewski KE, Kern DW, McClintock MK. Racial disparities in olfactory loss among older adults in the United States. J Gerontol A-Biol. 2014 Mar;69 (3):323-9. doi: 10.1093/ gerona/glt063. Pinto JM, Wroblewski KE, Kern DW, Schumm LP, McClintock MK. Olfactory dysfunction predicts 5-year mortality in older adults. PLOS One. 2014;9 (10):e107541. doi: 10.1371/journal.pone.0107541. Watanabe S, Pinto JM, Bashir ME, et al. Effect of prednisone on nasal symptoms and peripheral blood T-cell function in chronic rhinosinusitis. International Forum Of Allergy & Rhinology. 2014 Aug;4 (8):609-16. doi: 10.1002/alr.21336. Yao L, Pinto JM, Yi X, Li L, Peng P, Wei Y. Gray matter volume reduction of olfactory cortices in patients with idiopathic olfactory loss. Chem Senses. 2014 Nov;39 (9):755-60. doi: 10.1093/chemse/ bju047. Choi CH, Poroyko V, Watanabe S, et al. Allergen Exposure Affects Sinonasal Microbiota. Am J Rhinol Allergy. 2014 Jul;28(4):281-6. Li L, Zhan X , Wang N, et al. Does Airway Surgery Lower Serum Lipid Levels in OSA patients? A Retrospective Case Review. Med Sci Monit. 2014 Dec 13;20:2651-2657. Li H, Giger ML, Sun C, et al. Pilot study demonstrating potential association between breast cancer image-based risk phenotypes and genomic biomarkers. Med Phys. 2014 Mar;41(3):031917. doi: 10.1118/1.4865811. Feng Y, Stram DO, Rhie SK, et al. A Comprehensive Examination of Breast Cancer Risk Loci in African American Women. Hum Mol Genet. 2014 May 22. pii: ddu252. Blair DR, Wang K, Nestorov S, Evans JA, Rzhetsky A. Quantifying the impact and extent of undocumented biomedical synonymy. PLOS Comput Biol. 2014 Sep;10 (9):e1003799. doi: 10.1371/ journal.pcbi.1003799. Liu CC, Tseng YT, Li W, et al. DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections. Nucleic Acids Res. 2014 Jul;42 (Web Server issue):W137-46. doi: 10.1093/nar/gku412. Rzhetsky A, Bagley SC, Wang K, et al. Environmental and state-level regulatory factors affect the incidence of autism and intellectual disability. PLOS Comput Biol. 2014 Mar;10 (3):e1003518. doi: 10.1371/journal.pcbi.1003518. Churpek MM, Yuen TC, Park SY, Gibbons R, Edelson DP. Using electronic health record data to develop and validate a prediction model for adverse outcomes in the wards. Crit Care Med. 2014 Apr;42(4):841-8. Town JA, Churpek MM, Yuen TC, Huber MT, Kress JP, Edelson DP. Relationship Between ICU Bed Availability, ICU Readmission, and Cardiac Arrest in the General Wards. Crit Care Med. 2014;42(9):2037-41. Churpek MM, Yuen TC, Winslow C, et al. Multicenter Development and Validation of a Risk Stratification Tool for Ward Patients. Am J Respir Crit Care Med. 2014 Sep 15;190(6):649-55. Churpek MM, Yuen TC, Winslow C, Hall J, Edelson DP. Differences in Vital Signs Between Elderly and Nonelderly Patients Prior to Ward Cardiac Arrest. Crit Care Med. 2014 Dec 31. [Epub ahead of print]

52

CRI Annual Report 2014


PHOTOGRAPHY CREDITS p. 3: Caitlin Pike p. 6, 7, 9, 11, 13, 15-16, 18, 35 (middle), 40: David Christopher p. 17, 35 (bottom): Jorge Andrade p. 22: Brad Orr p. 25: “Test Tubes” by Chesapeake Bay Program, available at https://www.flickr. com/photos/29388462@N06/5434154393/ under CC BY-NC 2.0. Full terms at https:// creativecommons.org/licenses/by-nc/2.0/ p. 31, 32, 35 (top), 39: Sara Serritella, ITM

Thumbprint art on front cover designed by Griffin Brands.

Written and designed by Caitlin Pike. © The University of Chicago, 2015. All rights reserved.



Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.