Kathleen Mckeown (Columbia Univ.): Columbia’s Institute for Data Sciences and Engineering

Page 1

Columbia’s Institute for Data Sciences and Engineering An Applied Sciences Innovation Hub

1


“Friday’s inaugural symposium for Columbia University’s new Institute for Data Science and Engineering was a celebration of an idea and an ambition. The idea: there is a deep, historic movement taking place across disciplines, which is that data science as a field is increasingly core to the way we understand the world in all fields of endeavor. Columbia University’s ambition is to build a leading program in research and education in this emerging field.”

NYC Media Lab/ The Lab Report 2


A Broad Institute  Nine Schools • SEAS (School of Engineering and Applied • • • • • • • •

Science) (lead) Arts and Science Journalism Business Architecture, Planning and Preservation International and Public Affairs Medical School Public Health Law

3


Institute Plans  Initial plans to hire 30 new faculty in data science in 5 years  recruit 150 doctoral students;

 45 additional faculty will be hired, at 5 a year, over the next 15 years

 44,000 sq. feet of new academic space will be ready by 2016 4


Institute Status  48 founding Institute faculty  9 Executive Committee Members  Organizing committees for each Center

 ~150 affiliated faculty members, University-wide

5


The Centers of the Columbia Institute for Data Sciences and Engineering SMART CITIES NEW MEDIA HEALTH ANALYTICS CYBERSECURITY FINANCIAL ANALYTICS FOUNDATIONS OF DATA SCIENCE

6


Center for Smart Cities Co-chairs from Civil Engineering and Electrical Engineering

7 committee members, 23 affiliates

7


Research in Smart Cities Integrating the digital city with the physical city • Monitoring building energy consumption in New York • Improve the power supply through smart grid technology • Deploy sensing devices to facilitate everyday activities in a crowded urban environment

8


Infrastructure Monitoring Monitoring large suspension bridge vibrations

Fixed Reference

9


Developing Green Infrastructure

10


Urban visualization Visualizing and interacting in 3D with georeferenced urban data

11


Center for New Media Cc-chairs from Journalism and the Center from Computational Learning Systems 10 committee members, 19 affiliates

12


Research in New Media New forms of digital media Analyzing and creating social media • Creating visualizations •

Acquiring Information • From language – speech analysis, machine translations, identifying emotions • From images and video – extracting information from images

13


14


15


16


17


Center for Health Analytics Chair from Biomedical Informatics

10 committee members, 15 affiliates

18


Research in Health Analytics Analyzing big data for:

• Patient data • Genomic databases • Public health records

Using electronic health records

• To discover patterns of diseases, effective drugs, treatments, and therapies

Sequencing genomics • Showing associations with single mutations and geneticallyassociated diseases DNA Sequencing on a chip

19


Health Analytics Center

Individual, Population

Clinical, Healthcare

Molecular, Cellular

20


EHR and time series analysis – Glucose predictability Glucose

0.45 0.4 0.35

0.4-0.45 0.35-0.4

0.3

0.3-0.35

0.25 MI

0.25-0.3 0.2-0.25

0.2

0.15-0.2

0.15

0.1-0.15 0.05-0.1

0.1

0-0.05

0.05

450

0

-0.1-0

50 7 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100

-0.05

tau

(Albers et al., 2009)

2

delta-t (days)

0.83 0.17

21


Center for Financial Analytics Chair from Industrial Engineering and Operations Research 6 committee members, 24 affiliates

22


Research in Financial Analytics Big data for better financial services and solutions • Use predictive analytics to optimize financial decisions • Understand and regulate highfrequency trading • Predict and manage systemic risks • Real-time analysis of unstructured data/information, e.g., corporate and government actions, commentary, social media

23


Systemic risk  “Social” network of financial institutions  Complex  Very high dimensional  “Edges”  Lending  Assets  Derivatives Minoui and Reyes (2011 IMF Report)

24


The Contagion Effect

 A vicious cycle during crisis time, leading to contagion

 Approach: stochastic network using publicly available data 25


Foundations of Data Science Co-chairs from Computer Science and Statistics 6 committee members, 42 affiliates

26


Foundations of Data Science  Machine learning  Computational learning theory

 Statistical prediction  Algorithms and optimization

 Software and hardware infrastructure for computation with big data 27


Graph & Network Algorithms • Matching nodes into a network • New students show up to school • Have a matrix of their profile vectors • At graduation, observe formed network • Predict network for next year’s freshmen?

28


New York, color coded by inferred similar social behavior

29


Center for Cybersecurity Chair from Computer Science

30


Research in Cybersecurity Essential to critical infrastructure • Government, financial transactions, electronic commerce, and personal computing

Security and survivability of large-scale, heterogeneous cybersystems • • • • •

Threat mitigation Threat detection and analysis Cyberattack reaction and recovery Cyberattack tolerance Large-scale distributed (re)action

31


Breaking commodity devices to learn how to fix them using Symbiotes  CISCO Phone IP vulnerability

 HP printer firmware update vulnerability

32


Collaborations  Industrial Affiliates  Foundations  International

33


Industrial Affiliates  Different levels of access  Reduction in tuition and ICR for members

 Project work  Capstone project course participation  Events, recruiting 34


Initial Industrial Affiliates Partners  Bloomberg  Mediaocean  Microsoft Research  Google 35


Degree Programs  Certification of Achievement in Data Science  Fall 2013  Four courses

 MS in Data Science  Fall 2014  Core in fundamentals of data science  Tracks in application areas corresponding to centers 36


Certification in Data Science Joint between SEAS and GSAS All courses specifically designed for Certification    

Probability & Statistics (STATS) Algorithms for Data Science (CS/ IEOR) Machine Learning for Data Science (CS) Exploratory Data Analysis and Visualization (STATS)

Aiming for ~ 25 on-campus students in Fall 2013 37


38


39


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.