prof._saurabh_amin by Epoch Foundation

High-Confidence Control of Networked Infrastructures .

Saurabh Amin 2013 MIT ILP-EPOCH Taiwan Symposium Big Data: Technologies and Applications July 18th, 2013

Outline

Infrastructure resilience: motivation & approach

High-Confidence Control Electricity networks Transportation networks Water networks

Economic Incentive Mechanisms

Outline

Infrastructure resilience: motivation & approach

High-Confidence Control Electricity networks Transportation networks Water networks

Economic Incentive Mechanisms

3 / 42

Motivation: Big data for networked infrastructures . Sensor-actuator nets â&#x2021;&#x2019; New capabilities . , State awareness & diagnostics , Real-time closed-loop control , Demand management , Incident management . . Interdependencies â&#x2021;&#x2019; New concerns . / Security & reliability failures / Remote accessibility via open nets / Strategic behavior by private entities .

/ Increased failure surface due large number of field devices Large-scale critical infrastructures are Cyber-Physical Systems (CPS) 4 / 42

Cyber-attacks to infrastructures

Insiders: Maroochy Shire plant(2000)

Insiders: LA traffic control (2008)

Hackers: Building EMS (2012)

Govt. demo: NYC power grid (2012) 5 / 42

Cyber-attacks and privacy threats Integrity: A1 & A3 A5

Deception causes lack of integrity Trustworthiness of CPS data

yË&#x153;

Physical System

Availability: A2 & A4

Denial-of-service (DoS) causes lack of availability

u Ë&#x153;

Controller

Accessibility of CPS components Privacy Disaggregate usage data collection causes lack of privacy Minimization of privacy-sensitive data

Deception & DoS attacks to CPS u(t) h0 = ti+1

Plant

Hold

y(t) Sample

Controller

h = tk+1

Privacy-preserving sampling of CPS

6 / 42

Claim #1: Cyber attacks ̸= Random faults . Attackers . Malicious insiders Computer hackers cyber criminals, cyber warriors, hacktivists, rogue hackers, spies

. . Attacker may manipulate CPS data . Time between telemetry requests can be used for malicious traffic injection

Both malicious and legitimate traffic can travel through encrypted tunnels

A. Cárdenas, S. Amin, S. Sastry, et al. [ASIACCS] S. Amin, X. Litrico, S. Sastry, A. Bayen. [HSCC ’10] 7 / 42

Claim #2: IT security is necessary but not sufficient

A. Cárdenas, S. Amin, S. Sastry. [HotSec ’08] A. Cárdenas, S. Amin, G. Schwartz. [HiCoNS’12] 8 / 42

Claim #3: CPS operators underinvest in security Stuxnet worm [’10-’11] Targets SCADA systems Four zero-day exploits, windows rootkit, antivirus evasion, p-2-p updates, network infection routines Reprograms PLC code Information stealing: Duqu [’11-’12] Network induced risks Security is a public good

Supervisory Control Regulatory Control

Infrastructures are privately managed Individual & social incentives differ S. Amin, G. Schwartz, S. Sastry. GameSec ’10, CDC ’11, Automatica

Source: Symantec, NYT

9 / 42

Claim #4: Reliability-Security failures are non-isolable

G. Schwartz, S. Amin, et al. [Allerton â&#x20AC;&#x2122;11], S. Amin, G. Schwartz, S. Sastry. [CDCâ&#x20AC;&#x2122;11] 10 / 42

Claim #5: Security legislation needs a scientific base . Cybersecurity Act S.2105 vs. SECURE IT Act S. 2151 . S.2105 [Lieberman et al.]: DHS to access risks and vulnerabilities to critical infrastructures. Recommends a regulation that requires private companies owning designated critical infrastructure to certify that their cybersecurity capabilities rise to an appropriate level.

S. 2151 [McCain et al.]: Federal contractors required to inform the government about cyber threats. Provides liability protections for the private sector to share cyber threat information through established channels and the Department of Commerce.

Big questions: Regulations? Incentives? Privacy laws? R. BĂśhme, G. Schwartz. [WEISâ&#x20AC;&#x2122;10] G. Schwartz, B. Johnson, S. Sastry [Work-in-progress]

11 / 42

Outline

Infrastructure resilience: motivation & approach

High-Confidence Control Electricity networks Transportation networks Water networks

Economic Incentive Mechanisms

12 / 42

Resilient control for infrastructure security 1

Threat assessment How to model attacker and his strategy? Consequences to the physical infrastructure

Attack diagnosis How to detect manipulations of sensor-control data? Stealthy [undetected] attacks

Attack resilient control Design of resilient control algorithms? Fundamental limitations

Diagnosis

Assessment

Response 13 / 42

Indian Blackout of 2012

. Control + Incentive issues: . 1 Overdraw by utilities

High loading

Weak transmission

Mis-operation of protection systems

620M people without power 10x severe that US blackout of 2003 14 / 42

Electricity Theft: India

World Bank Reports: â&#x2C6;ź 30 â&#x2C6;&#x2019; 50% electricity is stolen in some jurisdictions 15 / 42

Deception attacks to AMIs 5.,#'67248/9"72'1,%,'

!"#$%&'

*26700.6%'90$6.'4$32,#4'

. Stealthy attacker . Knows/learns CPS parameters

+,-.'/.%.0'0.,1$234'

()*'

Adapts to diagnosis algorithm Injects malicious data after obtaining unauthorized access Achieves his goal yet evade detection

A. Cárdenas et al. 2012 Real data: Y1 , . . . , Yn ˆ1 , . . . , Y ˆn Fake readings: Y ˆi = Y ˆi + ai Attack model: Y 16 / 42

Previous work on energy usage profiles

!"#$%&'($)*+$,%-'-.*

*+("/&'0%(&.)$&,'+%+1) *+$,-&$&.).,#,)

!"#$%&'()

Y1 , . . . , Yn

Difficult to obtain “attack” data

More false alarm rates

Difficult to generalize to new “smart” attacks

Easier to attack and difficult to tune 17 / 42

Detection of stealthy attacks . Adversary’s goal . ˆ1 , . . . , Y ˆn ) min f (Y

Exponential weighted moving average (EWMA)

+,-.'/.%.0'0.,1$234'

ˆ1 , . . . , Y ˆ) ⩽ 0 g(Y

!"#$%&'

*26700.6%'90$6.'4$32,#4'

()*'

ˆn ˆ1 ,...,Y Y

E.g.: Minimize energy bill while not being .detected by a classifier . Anomaly detection . Cumulative sum (CUSUM)

5.,#'67248/9"72'1,%,'

Local outlier factor (LOF)

Real data: Y1 , . . . , Yn ˆ1 , . . . , Y ˆn Fake readings: Y

Generalized likelihood ratio (GLR)

ˆi = Y ˆi + ai Attack model: Y 18 / 42

Learning with good data and attack invariants We only have “good” data No access to “attack” data Train only one class (“good” data) We know “attack invariant” Known attacker objective: minimize energy bill while not being detected by classifier Use composite hypothesis testing to select attack probability distribution Find worst possible undetected attack for each classifier, and compute the corresponding cost (e.g., kWh lost).

A. Cárdenas et al. 2012 19 / 42

Evaluation Autoregressive moving average (ARMA) based generalized likelihood ratio (GLR) provides maximum diagnostic ability for “stealthy attacks”. A. Cárdenas, et. al. ’12

Work in progress Attacker mistraining classifier and detection other anomalies (e.g., consumer on vacation) Incentives for deploying diagnostic systems 20 / 42

Incidents on transportation infrastructures Random failures: due to congestion, accidents, inclement weather Strategic failures: security exploits by hackers / malicious entities

Hackers: Road signs near MIT (2008)

Insiders: LA traffic control (2008)

Hackers: Tolling system(2008)

UCSD-UW Demo: Car hacking (2011) 21 / 42

Active management of traffic incidents . Incident report . CA highway patrol: Accident on I-680 (11/15/2010) blocking two right lanes near post-mile 35 upstream of the offramp to Crow Canyon Rd. . . Scenario simulation (cell transmission model) . 120(veh.Ghours(

Total(Network(Delay(per(5(minutes(

(b)( Mainline(Speed(Contour( accident(

(c)( HOV(Lane(Speed(Contour(

(a)( (d)(

Courtesy: A. Kurzhanskiy

22 / 42

Incident response: control strategy I (a) Open HOV lane for everyone upstream of the accident (b) Start ramp metering (queue control on El Cerro & Diablo ramps) (c) Redirect traffic from Sycamore Valley Rd. to Crow Canyon Rd. . Control improves performance and reliability . El(Cerro:(ALINEA(RM(

65(veh.Jhours(

Total(Network(Delay(per(5(minutes(

Diablo:(ALINEA(RM( Sycamore(Valley:( Redirect(to(Crow(Canyon( (b)( Mainline(Speed(Contour( accident(

(c)( HOV(Lane(Speed(Contour(

(a)( (d)(

23 / 42

Incident response: control strategy II Strategy (a)–(c) plus (d) Diverge traffic to a parallel arterial (San Ramon Valley Blvd.) using changeable message signs (CMSs) . Strategy I (a)–(c) is better than (a)–(d) . CMS$ Signalized$intersec:on$at$ Sycamore$Valley$Rd.$&$ San$Ramon$Valley$Blvd.$

75$veh.Hhours$

Total$Network$Delay$per$5$minutes$

San n$V mo

$Ra

(b)$ Mainline$Speed$Contour$

alle r$

tou

.$de

lvd y$B

(c)$ San$Ramon$Valley$Detour$Speed$Contour$ signal$

(a)$

(d)$

24 / 42

Game-theoretic routing for incident response . Two questions . Q1 How to optimally route (using ramp meters) a fraction of compliant drivers, when rest are non-compliant (selfish)? [Roughgarden, Tardos (2002)],[Stier-Moses (2007)], [Krichene, Amin, Bayen (2012)]

α -fraction of drivers can be routed by central authority (1 − α )-fraction of drivers are choose their route selfishly after knowing the central authority’s strategy Q2 How to optimally inform (via CMSs) a fraction of informed drivers, when rest are uninformed? [Liu, Amin (in progress)]

λ -fraction of drivers know congestion levels via CMS (1 − λ )-fraction of drivers are uninformed and learn routing strategies after observing their own delays

. 25 / 42

Gignac water SCADA system . SCADA components . Level & velocity sensors ASA : Canal manager

PLCs & gate actuators

Feeder canal : 8 km

Wireless communication . Right Bank : 15 km

GIGNAC Secondary channels : ~270 km

Left Bank : 30 km

26 / 42

Gignac canal network . Physical infrastructure .

. Cyber infrastructure .

. 27 / 42

Regulatory control of canal pools . Control objective . Manipulate gate opening Control upstream water level .

Reject disturbances (offtake withdrawals)

. SCADA interface .

. Avencq cross-regulator .

. 28 / 42

Defender and attacker models Defender Estimate Model [Freq. Domain] ˆ ydi

ad ad î−1 − i [q î + p î ] = i e−τi s q s s

y id q i -1 Pool i p i -1

y id+1 qi

Pool i+1

q i +1 p i +1

Parameters: adi , τi , Laplace variable: s Design robust decentralized PI control ˆi−1 = κi−1i ˆydi , q

ˆi = κii ˆ q ydi

Controllers: κi−1i , κii

Test site before attack

Attacker Compromise ydi and inject gi ˜ydi = ydi + gi Regulate pi to steal water

Test site after attack 29 / 42

Cyber-attack on the Avencq canal pool Field operational test (October 12th , 2009)

30 / 42

Cyber-attack on the Avencq canal pool Successful attack

31 / 42

Model-based diagnosis scheme Sensors: ydi , ydi+1 and yui , yui+1 1

0.06

Observer 2 0.04

Observer 1

p i -1

0.02

0 0

y iu

Decision rule

Pool i

Correct Diagnosis

y iu+1 Pool i + 1

Defense

y id

y id+1

p i +1

Water Withdrawal (m /s) in pools through offtake 0.15

Fault

Pool i+1 p withdrawal

Pool i withdrawal

0.1

0.05

0 0

Î´p2

Î´p1 5

Hours

time (hours)

32 / 42

Attack diagnosis: upstream level sensors hacked u

0.06

Observer 2 0.04

Observer 1

p i -1

0.02

0 0

Upstream Sensors Hacked

~y u i

Pool i

pi Correct Diagnosis

Attack Defense

y id

~y u i +1 Pooli + 1

p i +1

y id+1

Water Withdrawal (m /s) in pools through offtake

0.15

Fault

p1 p2

0.1

Pool i withdrawal Î´p1

0.05

0 0

Pool i+1 withdrawal

Î´p2 15

Hours

time (hours)

Correct diagnosis of withdrawal in both pools 33 / 42

Attack diagnosis: downstream level sensors hacked d

Downstream Sensors Hacked

0.06

Observer 2 0.04

Observer 1

p i -1

0.02

0 0

y iu

Pool i

pi Incomplete Diagnosis

Incomplete Diagnosis

~y d i y iu+1

Pool i + 1 ~ d

yi +1

Attack Defense

p i +1 Water Withdrawal (m /s) in pools through offtake 0.15

Fault

p1 p2

0.1

Pool i withdrawal Î´p1

0.05

0 0

Pool i+1 withdrawal

Î´p2 15

Hours

time (hours)

Withdrawal detected in both pools 34 / 42

Model-based fault/attack diagnosis . Security implications .

Enhanced model (redundancy) improves detection

Multiple pool sensor attacks can evade detection [stealth]

? (6 @A" (& )$ 7% $ ,"(-./0"$ 0"(%1)"0"'2%$

3"'%+)$ ?+00D$!"2D$ %*4'(-%$

Sensors located near offtakes are critical Localized sensor attacks do not lead to global degradation

-$ &( @%* %$ EF (&7 (6

;(1-2%$ <1'7'+#'$*'=12%>$

?+'2)+-$<7'+#'$*'=12%>$

!"#$ %&"'()*+%$

?-(%%*&(-$9(1-2$$ :*(4'+%*%$$

3"&1)*2@$8$ )"-*(A*-*2@$$ %&"'()*+%$

5-")2%$B$ (:C*%+)*"%$

56(&7$8$9(1-2$$ :*(4'+%*%$ $

Downstream Sensors Hacked

0.06

Observer 2 0.04

Observer 1

p i -1

0.02

0 0

y iu

Pool i

pi Incomplete Diagnosis

Incomplete Diagnosis

~ y id y iu+1

Pool i + 1 ~ d

yi +1

Attack Defense

p i +1 Water Withdrawal (m /s) in pools through offtake 0.15

Fault

p1 p2

0.1

Pool i withdrawal Î´p1

0.05

0 0

Pool i+1 withdrawal

Î´p2 15

Hours

time (hours)

35 / 42

Outline

Infrastructure resilience: motivation & approach

High-Confidence Control Electricity networks Transportation networks Water networks

Economic Incentive Mechanisms

36 / 42

Interdependent security & incentives to secure . A problem of incentives . Due to presence of network-induced interdependencies, the individually optimal [Nash] security allocations are sub-optimal. . . Interdependencies due to . Network induced risks â&#x2021;&#x2019; vulnerability to distributed DOS attacks

Courtesy: C. Goldschmidt (Symantec)

Negative externalities

Goal: Develop mechanisms to reduce CPS incentive sub-optimality [Amin, Schwartz, Sastry, CDC â&#x20AC;&#x2122;11, 37 / 42

Interdependence for networked control systems (NCS) . NCS security & reliability . Security failures (attacks S) & reliability failures (faults R) are difficult to distinguish Model for communication network failures F: Pr(S ∪ R | F) = Pr(R | F) + Pr(S | F) − Pr(R | F)Pr(S | F) = Pr(R | F) + (1 − Pr(R | F))Pr(S | F), | {z } | {z } direct failure (reliability)

indirect failure (security)

Interdependence: Pr(S | F) = α (η ) .

α (·): strictly increasing function η : number of insecure players (NCS)

Network induced interdependencies 38 / 42

Environment summary A game of M plant-controller systems (players)

For player i 1

(S) or (N) (Stage 1 choice variable) If (S) then i incurs per period security cost, ℓi ∈ [0, ∞) { S, player i invests in security, V i := N, player i does not invest in security u i ∈ Rm – control input (Stage 2 choice variable) 39 / 42

Individual optima [Nash equilibria] and social optima . Increasing incentive case . If a player secures, other player gain from securing increases

Security cost of Player 2

Individual optima {S,S} {N,N}

{S,S} & {N,N} Social Optima

{S,S} {N,N} Individual optima ยน Social optima

Security cost of Player 1 40 / 42

Individual optima [Nash equilibria] and social optima . Decreasing incentive case . If a player secures, other player gain from securing decreases

Security cost of Player 2

Individual optima {S,S} {N,N}

{S,N} & {N,S} Social Optima {S,S} {N,N} {S,N} & {N,S} Individual optima ยน Social optima

Security cost of Player 1 41 / 42

High-confidence networked infrastructure systems . Theory of robust control . Assessment, diagnosis, & response

Reliability and Security Risk Management Internet Diagnosis, Response, and Reconfiguration

Stealthy attack diagnosis .

Attack-resilient control

. Theory of incentive mechanisms . Information deficiencies Individual vs. social incentives .

Control Network Detection and Regulation

Electric Power

Sensor Actuator Network

Buildings

Physical Infrastructures Water & Gas

Transportation

Interdependent network risks Attacks

Defenses

Faults

IT security + Robust control + Incentivesâ&#x2021;&#x2019; Resilience 42 / 42