prof._roberto_rigobon

Page 1

Avoiding the Big Data – Small Info Syndrome Roberto Rigobon

MIT Sloan, NBER, CSAC


Big Distance!

Data

The world is not lacking of Data

Information

Lacking of Careful Empirics

Knowledge

Lacking of Managerial Data Analysis


Data Types Type

Advantages

Disadvantages

Standard

• Representative • High Quality

• • • • •

Limited Collection Cost Understand process Identification


Data Types Type

Advantages

Disadvantages

Standard

• Representative • High Quality

• • • • •

Limited Collection Cost Understand process Identification

Big Data

• Size, details, frequency • Might solve some identification problems • Might improve measurement

• • • • •

Statistics is hard Not representative Need question Understand process Identification


Principles of Good Big-Data Users Warnings Purpose of collection

• Have a question before you have the data! • Data collection is part of the design

Understand process

• Correlation is not causality • Forecast-in-sample is useless

Empirical procedure

• Identification is usually crucial • Control for spurious correlation

Outcome

• What is the big data providing? • Visualization, • Measurement, • or Prediction?


Inflation and Price Indexes  Alberto Cavallo

 We want to produce alternative measures of inflation  We need prices  We need methodology to weight those prices  We need to understand product introductions and discontinuations  We need to understand store behavior  How to collect prices of products not sold on the internet? Services?


Online Information and Indexes Our Approach to Daily Inflation Statistics

1

2 Use scraping technology

Connect to thousands of online retailers every day

3 Find individual items

4 Store and process key item information in a database

• Date • Item • Price • Description

5 Develop daily inflation statistics for ~20 countries


How do we collect data? 

Our prices are collected from public online sources, using a technique called “web scraping”

A software downloads a webpage, analyses the html code, “scrapes” price data, and stores it in a database


Countries covered


Argentina


USA


Annual Inflation Rates Argentina

Colombia

Russia

Australia

Germany

UK

China – Supermarket Index

Ireland

Venezuela


US Recession

Sep 16 2008


US Sector Inflation Series Food and Beverage Inflation Index

Energy and Transportation Inflation Index

Furnishing and Household Equipment Inflation Index

Recreation and Electronics Inflation Index


Properties of Inflation Indexes  Congruence  Stores keep markups between online and offline prices relatively constant in the medium run

 Anticipation  Online Price Indexes trends anticipate official inflation shifts.

 Online prices are easier to change, retailers are more competitive, and consumers have less memory.

 Demand trends  Changes in inflation trends by retailers reflect changes in the demand they are facing


Measuring Anticipation  Due to the behavior of online stores in some countries we find anticipation 

We compute anticipation in a simple VAR framework

Compute impulse responses

USA 1.8

1.6

1.4

1.2 95 1

90 10

0.8

5 mean

0.6

0.4

0.2

16 0 -1

0

1

2

3

4

5

6

7

8

9

10

11

12


Anticipation of US CPI Sectors 1% Shock to PriceStats Aggregate Inflation 2

1% Shock to PriceStats Food Inflation

1% Shock to PriceStats Household Inflation

1.5

1.5

1

1

.5

.5

1.5

1

.5

0

0 0

1

2

3

4 5 Month

CI

6

7

8

9

0 0

1

2

Cumulative IRF

3

4 5 Month

CI

1% Shock to PriceStats Health Inflation

6

7

8

9

0

2

Cumulative IRF

3

4 5 Month

CI

1% Shock to PriceStats Transportation Inflation

1.5

1

6

7

8

9

Cumulative IRF

1% Shock to PriceStats Recreation Inflation

2

2

1.5

1.5

1

1

.5

.5

1

.5

0

0 0

1

2

3 CI

Source: PriceStats

4 5 Month

6

7

8

Cumulative IRF

9

0 0

1

2

3 CI

4 5 Month

6

7

8

Cumulative IRF

9

0

1

2

3 CI

4 5 Month

6

7

8

Cumulative IRF

9


Global Inflation Series ď‚— World, Emerging Markets, Developed Markets, and Eurozone ď‚— By Sector: All, Food, and Fuel


Apr-13

Feb-13 Mar-13

Jan-13

Dec-12

Nov-12

Oct-12

Sep-12

3

Aug-12

Jul-12

Jun-12

May-12

Apr-12

Mar-12

Feb-12

Jan-12

Dec-11

Nov-11

Oct-11

Sep-11

Aug-11

Jul-11

Jun-11

May-11

Apr-11

Feb-11 Mar-11

Jan-11

Dec-10

Nov-10

Oct-10

Sep-10

Aug-10

Jul-10

Jun-10

May-10

Apr-10

Mar-10

Inflation Rate (%)

PriceStats World Inflation Series World Inflation (Annual Inflation Rate)

4

3.5 PriceStats World Inflation

CPI - World Inflation

2.5

2

1.5


Big Data Advantages

Challenges

Cost

Non-Representative

Volume, Variety, and Velocity

Econometrics is much harder • Understanding of the phenomena is even more important • No technique can substitute for deeper thinking

If collection is properly designed it can solve problems of identification

If collection is NOT properly design, it mostly computes irrelevant correlations

Good Outcomes: • Visualization • Measurement • Predictability

Spurious Relationships


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.