Columbia’s Institute for Data Sciences and Engineering An Applied Sciences Innovation Hub
1
“Friday’s inaugural symposium for Columbia University’s new Institute for Data Science and Engineering was a celebration of an idea and an ambition. The idea: there is a deep, historic movement taking place across disciplines, which is that data science as a field is increasingly core to the way we understand the world in all fields of endeavor. Columbia University’s ambition is to build a leading program in research and education in this emerging field.”
NYC Media Lab/ The Lab Report 2
A Broad Institute Nine Schools • SEAS (School of Engineering and Applied • • • • • • • •
Science) (lead) Arts and Science Journalism Business Architecture, Planning and Preservation International and Public Affairs Medical School Public Health Law
3
Institute Plans Initial plans to hire 30 new faculty in data science in 5 years recruit 150 doctoral students;
45 additional faculty will be hired, at 5 a year, over the next 15 years
44,000 sq. feet of new academic space will be ready by 2016 4
Institute Status 48 founding Institute faculty 9 Executive Committee Members Organizing committees for each Center
~150 affiliated faculty members, University-wide
5
The Centers of the Columbia Institute for Data Sciences and Engineering SMART CITIES NEW MEDIA HEALTH ANALYTICS CYBERSECURITY FINANCIAL ANALYTICS FOUNDATIONS OF DATA SCIENCE
6
Center for Smart Cities Co-chairs from Civil Engineering and Electrical Engineering
7 committee members, 23 affiliates
7
Research in Smart Cities Integrating the digital city with the physical city • Monitoring building energy consumption in New York • Improve the power supply through smart grid technology • Deploy sensing devices to facilitate everyday activities in a crowded urban environment
8
Infrastructure Monitoring Monitoring large suspension bridge vibrations
Fixed Reference
9
Developing Green Infrastructure
10
Urban visualization Visualizing and interacting in 3D with georeferenced urban data
11
Center for New Media Cc-chairs from Journalism and the Center from Computational Learning Systems 10 committee members, 19 affiliates
12
Research in New Media New forms of digital media Analyzing and creating social media • Creating visualizations •
Acquiring Information • From language – speech analysis, machine translations, identifying emotions • From images and video – extracting information from images
13
14
15
16
17
Center for Health Analytics Chair from Biomedical Informatics
10 committee members, 15 affiliates
18
Research in Health Analytics Analyzing big data for:
• Patient data • Genomic databases • Public health records
Using electronic health records
• To discover patterns of diseases, effective drugs, treatments, and therapies
Sequencing genomics • Showing associations with single mutations and geneticallyassociated diseases DNA Sequencing on a chip
19
Health Analytics Center
Individual, Population
Clinical, Healthcare
Molecular, Cellular
20
EHR and time series analysis – Glucose predictability Glucose
0.45 0.4 0.35
0.4-0.45 0.35-0.4
0.3
0.3-0.35
0.25 MI
0.25-0.3 0.2-0.25
0.2
0.15-0.2
0.15
0.1-0.15 0.05-0.1
0.1
0-0.05
0.05
450
0
-0.1-0
50 7 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100
-0.05
tau
(Albers et al., 2009)
2
delta-t (days)
0.83 0.17
21
Center for Financial Analytics Chair from Industrial Engineering and Operations Research 6 committee members, 24 affiliates
22
Research in Financial Analytics Big data for better financial services and solutions • Use predictive analytics to optimize financial decisions • Understand and regulate highfrequency trading • Predict and manage systemic risks • Real-time analysis of unstructured data/information, e.g., corporate and government actions, commentary, social media
23
Systemic risk “Social” network of financial institutions Complex Very high dimensional “Edges” Lending Assets Derivatives Minoui and Reyes (2011 IMF Report)
24
The Contagion Effect
 A vicious cycle during crisis time, leading to contagion
 Approach: stochastic network using publicly available data 25
Foundations of Data Science Co-chairs from Computer Science and Statistics 6 committee members, 42 affiliates
26
Foundations of Data Science Machine learning Computational learning theory
Statistical prediction Algorithms and optimization
Software and hardware infrastructure for computation with big data 27
Graph & Network Algorithms • Matching nodes into a network • New students show up to school • Have a matrix of their profile vectors • At graduation, observe formed network • Predict network for next year’s freshmen?
28
New York, color coded by inferred similar social behavior
29
Center for Cybersecurity Chair from Computer Science
30
Research in Cybersecurity Essential to critical infrastructure • Government, financial transactions, electronic commerce, and personal computing
Security and survivability of large-scale, heterogeneous cybersystems • • • • •
Threat mitigation Threat detection and analysis Cyberattack reaction and recovery Cyberattack tolerance Large-scale distributed (re)action
31
Breaking commodity devices to learn how to fix them using Symbiotes  CISCO Phone IP vulnerability
 HP printer firmware update vulnerability
32
Collaborations Industrial Affiliates Foundations International
33
Industrial Affiliates Different levels of access Reduction in tuition and ICR for members
Project work Capstone project course participation Events, recruiting 34
Initial Industrial Affiliates Partners Bloomberg Mediaocean Microsoft Research Google 35
Degree Programs Certification of Achievement in Data Science Fall 2013 Four courses
MS in Data Science Fall 2014 Core in fundamentals of data science Tracks in application areas corresponding to centers 36
Certification in Data Science Joint between SEAS and GSAS All courses specifically designed for Certification
Probability & Statistics (STATS) Algorithms for Data Science (CS/ IEOR) Machine Learning for Data Science (CS) Exploratory Data Analysis and Visualization (STATS)
Aiming for ~ 25 on-campus students in Fall 2013 37
38
39