Gleaning Knowledge from Big Data for Connected Context Computing Jane Hsu National Taiwan University Intel-NTU CCC Center July 18, 2013
Intel-NTU Connected Context Computing Center
Outline • Connected Context Computing • motivation • vision
• The IoT Data Challenges • From Data to Knowledge • Conclusion
Intel-NTU Connected Context Computing Center 2
Technologies Today
ubiquity Portability
Productivity
Intel-NTU Connected Context Computing Center
By 2015… More Users
More Devices
>1 Billion More Netizen’s1
>15 Billion Connected Devices2
1. 2. 3. 4.
4
More Data
>>1 Zetabyte Internet Traffic3
IDC “Server Workloads Forecast” 2009. IDC “The Internet Reaches Late Adolescence” Dec 2009, extrapolation by Intel for 2015 ECG “Worldwide Device Estimates Year 2020 - Intel One Smart Network Work” forecast Source: http://www.cisco.com/assets/cdc_content_elements/networking_solutions/service_provider/visual_networking_ip_traffic_chart.html extrapolated to 2015 source: Gartner June 2010, CAGR from 2009à2014
Intel-NTU Connected Context Computing Center
Dream for The Future
Needed: Smart Machines to Work Together Intel-NTU Connected Context Computing Center
A Global Network of Things
Intel-NTU Connected Context Computing Center
Instrumented and Connected Devices
Temperature
Traffic
Water Flow
Electricity
How do we use and manage the devices? Intel-NTU Connected Context Computing Center
The Lifecycle of Data
Apply Compute
Store
Transport Generate
* Slides adapted from a presentation at Asia Academic Forum 2012 by Dr. Wen-Hann Wang
Intel-NTU Connected Context Computing Center
Protect
Vision: Connected Context Computing To design end-to-end solutions for intelligent interaction and secure information sharing amongst a multitude of connected devices that • efficiently sense data • effectively communicate data • collaboratively analyze the context, and • proactively serve their users
Intel-NTU Connected Context Computing Center 9
Outline • Connected Context Computing • The IoT Data Challenges • Greener Buildings • Smarter Agriculture • Safer Transportation
• From Data to Knowledge • Conclusion
Intel-NTU Connected Context Computing Center 10
The IoT Data Challenge Thousands in diversity
Cloud
Trillions in scale Complex in interdependency
Big Data
Continuous in time Distributed in space
?
Dynamic, noisy, and unreliable in nature
Intel-NTU Connected Context Computing Center
Data to Decision, Info to Insight Business intelligence Biodiversity trends Virtual travel guides
Medical Scans Social Media News & Journals
Language translation
Satellite Images
Insight
Compute Transport
E-commerce TV & Video
Health monitoring Sensors & Surveillance
Augmented reality Video analytics Extreme weather prediction
Intel-NTU Connected Context Computing Center
Greener Buildings
Intel-NTU Connected Context Computing Center 13
Power Monitoring
Intel-NTU Connected Context Computing Center 14
Appliance State Monitoring & Control
Intel-NTU Connected Context Computing Center 15
Video-based Activity Recognition
Intel-NTU Connected Context Computing Center 16
MicroGrid of Smart Homes Private Cloud
Intel-NTU Connected Context Computing Center 1/1
Ref Source: http://austinspc.com/2011/04/10/what-is-a-microgrid/
Contexts for Greener Buildings • Activity Recognition • Appliance states • Power consumption • Temperature • Humidity • Light
Data
Features
• Air (CO2) • Sound • Indoor location
Intel-NTU Connected Context Computing Center 18
Context
Service
Smarter Agriculture
Intel-NTU Connected Context Computing Center 19
Automatic Greenhouse Water curtain
Heater
Exhaust Intel-NTU Connected Context Computing Center
Automatic frames
Smart Greenhouse
12 Fix Nodes
Real-‐'me data inquiry interface 8 Mobile Nodes
M2M Networking (Fix + Mobile Nodes) Visualiza'on interface Gateway
Objectives Automa'c Greenhouse
• • • •
Scalability (unlimited number of sensor nodes in a single PAN) Robustness (dynamic topology, routing and localization) Heterogeneous (ZigBee + WiFi + different devices) Smart service (light, irrigation and inspection)
Intel-NTU Connected Context Computing Center 21
Temperature/Humidity Map by DADSM2M Map es'mated by 29 real sensor readings
Map es'mated by 9 real sensor readings and 9 synthe'c UD readings (MAE:0.43)
Map es'mated by 9 real sensor readings (MAE:0.74)
Map es'mated by 9 real sensor readings, 9 synthe'c UD readings, and 50 synthe'c random readings (MAE:0.46)
Intel-NTU Connected Context Computing Center 22
Crowd Sensing
Intel-NTU Connected Context Computing Center 23
Contexts for Smarter Agriculture • Weather • Sunlight • Pest population • Battery level • Price • Energy supply
Data
Features
• Energy costs • Demand/yield
Intel-NTU Connected Context Computing Center 24
Context
Service
Safer Transportation
Intel-NTU Connected Context Computing Center 25
Intra-Vehicle Sensing
Intel-NTU Connected Context Computing Center 26
Inter-Vehicle Sensing
Intel-NTU Connected Context Computing Center 27
Distributed Environment Sensing –  Road-side unit (RSU) design for inter-vehicle map reconstruction
Intel-NTU Connected Context Computing Center 28
Online Traffic Information
Intel-NTU Connected Context Computing Center 29
Contexts for Intelligent Transportation • The vehicle in front is turning • The vehicle in front is braking • Vehicle location • Crossroads • Traffic • Long weekend
Data
Features
• Distraction • Sleepy or drunk driver
Intel-NTU Connected Context Computing Center 30
Context
Service
Outline • Connected Context Computing • The IoT Data Challenges • Greener Buildings • Smarter Agriculture • Safer Transportation
• From Data to Knowledge • Conclusion Intel-NTU Connected Context Computing Center 31
Intel-NTU Connected Context Computing Center 32
Not So Intelligent Transportation
http://www.youtube.com/watch?v=U_MnvHwjcIQ
Intel-NTU Connected Context Computing Center 33
The Key Challenges/Barriers Streaming data from heterogeneous devices w/ varying capabilities Limited bandwidth/computation/storage resources Non-scalability due to computational complexity Expensive (if not impossible) to label data Incomplete or inaccurate information Non-cooperative things è
centralized global optimization does not work
è
needs to anticipate probable actions by others
Intel-NTU Connected Context Computing Center 34
Context + Prediction è Action
Intel-NTU Connected Context Computing Center 35
Reactive vs. Anticipatory “The vehicle in front is braking, step on the brakes.” Context
Reactive Engine
Action
“The traffic light ahead is turning yellow, and the vehicles in front may stop at the red light, step on the brakes.” Context
Anticipatory Reasoning
Predictive Model Intel-NTU Connected Context Computing Center 36
Action
Rule-based Reasoning • If the traffic light is red, then stop. • If the traffic is heavy, then slow down. • If vehicle in front is braking, then step on the brakes. • If it is raining, then the road is slippery. (world model) • If driver brakes, then vehicle slows down. (action model) Context
Rule Engine
Condition-action Rules
Intel-NTU Connected Context Computing Center 37
Action
Knowledge Extraction from Unstructured Text Corpus: Clueweb09-Chinese •
177,489,357 pages
•
592 GB, compressed
•
4.5 TB uncompressed
After 5 iterations (on Hadoop version) •
145 categories
•
206 predicates and 61 relations
•
overall precision 71.2%
•
categories precision 76%
•
relations precision 52.6% Intel-NTU Connected Context Computing Center 38
Possibilistic Reasoning
Context
Possibilistic Reasoning
Fuzzy Petri Nets
Intel-NTU Connected Context Computing Center 39
Action
Probabilistic Reasoning 0.8
accelerate
Stop
Fast
accelerate
0.3
0.7
0.2 0.05 0.95
brake
Crash
Slow 0.99
brake 0.01
Context
Approximate/ Exact Solver
POMDP
Intel-NTU Connected Context Computing Center 40
Action
Appliance State Recognition 5-fold cross-validation • For average accuracy and joint accuracy, FCRFs has the best results • Compare with PCRFs and FCRFs, the Co-temporal relationships help improve the recognition accuracy
Intel-NTU Connected Context Computing Center
AAAI10 PAIR
41
Machine Learning
Prediction Engine
Anticipatory Reasoning for IoT Context Engine
Anticipatory Reasoning
Data
Data Streams Gateway
Wukong self-configurable platform
Control/Action WuDevice WuDevice
WuDevice
Intel-NTU Connected Context Computing Center 42
Talking Tail-light Extra time to react Okay!! I need to apply the brake now!
I am braking hard!!
101 0
Intel-NTU Connected Context Computing Center 43
Mul--‐Sensor Fusion
Intel-NTU Center
• Belief-‐based localiza'on and tracking • Stereo-‐based moving object detec'on algorithm
System Specifica-on
44
UX-Based Design and Simulation • Immersive VR • Proactive Alerts
Intel-NTU Connected Context Computing Center 45
Driver Predic-on Model
Intel-NTU Center
• Sparse learning process for big data – Guaranteed Global convergence – Saved lots of I/O and memory
4
Memory Disk
Learning Algorithm to minimize
2 1
Hinge Loss
0
Network Capacity
Ac-ve Set
Ramp Loss
3
-‐3 -‐2 -‐1 0 1 2 3
v.pv1 dist( q, N(v.pv1) ) q v.pv3 dist( q, N(v.pv2) ) v.pv2
α i with ∇ iP f (α ) >> 0
Solve each Block with 46 standard solvers. • Guaranteed Global Convergence. • Saved lots of I/O and Memory.
v v.pv1 v.pv2 v.pv3
Distributed Anomaly Detec-on
Intel-NTU Center
• Traveling route anomaly detec'on – surroundings of NTU
• Aggressive driver detec'on
normal route
Traveling route
abnormal type I end
start 47
abnormal type II
Distributed Learning Challenges • How to reduce expensive Network I/O ?
w* = arg min f(w; D) w
Existing Approaches
M1
• Select important samples on each site [2]. D • Still expensive for Sensor Network Applications (large n=|sites|, small |Dn|). 1
Our Approach
[1]
…….
D3
D2
Dn
w* = arg min f(w; D) w
• Build Index (tree/hash-table) through Network • Visit only sites containing crucial samples • Sparsify number of crucial samples via sparse modeling v.pv1
D1
M1 v
v.pv2
D2
v.pv3
D3
Crucial Samples Intel-NTU Connected Context Computing Center 48
…….
Dn
Sweetfeedback for Energy Savings • 16 window sensors • 3 Arduinos • 2 gumball machines w/ Arduinos • 2 displays
Intel-NTU Connected Context Computing Center 49
Conclusion IoT data challenge: volume, velocity, variety, varacity Approach: Data-driven+ Model-driven Many forms of knowledge may be gleaned from data generated by sensors, online resources, and crowds using scalable machine algorithms. • Contexts • Alert/anomaly event detection • Commonsense knowledge • Predictive models
Intel-NTU Connected Context Computing Center 50
Acknowledgements • Our Sponsors • National Science Council • National Taiwan University • Intel Corporation
• At the Intel-NTU Connected Context Computing Center, we are designing end-to-end solutions to achieve greener buildings, smarter agriculture, and safer transportation
Intel-NTU Connected Context Computing Center 51
Thank You! Q&A http://ccc.ntu.edu.tw/
Intel-NTU Connected Context Computing Center 52