AirBnB: Getting the Perfect Score

Page 1

Raw Data

AirBnB: Getting the Perfect Score Analyzed and Created by: Caroline Deng, Jantima Somboonsong, Lauren Tai, Lauren Yee, Gordon Wong


Table of Contents 1.  Company Overview 2.  Business Understanding 3.  Data Understanding 4.  Data Preparation 5.  Modeling 6.  Evaluation 7.  Deployment 8.  Risks & Implications


1. Company Overview


Company Overview “Airbnb is a trusted community marketplace for people to list, discover, and book unique accommodations around the world�

Countries

191+ Cities

34,000+ Listings Worldwide

2,000,000+ Total Guests

60 mil+


Company Overview


Company Overview


Company Overview


Company Overview


Company Overview


Company Overview


Company Overview


Company Overview


Company Overview “Airbnb is a trusted community marketplace for people to list, discover, and book unique accommodations around the world�

Countries

191+ Cities

34,000+ Listings Worldwide

2,000,000+ Total Guests

60 mil+


Company Overview

35,957 listings in New York City as of February 2, 2016


2. Business Understanding


Business Understanding

Goals

We wanted to dive into AirBnb ratings and reviews to see if we could predict whether or certain listings would get a perfect rating and what went into it Perfect rating = Score of 100


Business Understanding Rating System How does the rating system works? What factors go into it? What impact does it have on the lister and user?


3. Data Understanding


Data Understanding The Data A Snapshot: Taken from InsideAirbnb.com (not oďŹƒcial AirBnB data) Last Updated February 2, 2016 CSV format: 124 MB 35,957 records 92 Columns Mixture of numerical, categorical, and text Data limitations Messy


Word Cloud


4. Data Preparation


Data Preparation Tools OpenRefine, Python, Pandas, R, Excel 3 Steps Sample & Eliminate Clean Create New Variables


Data Preparation: Sample & Eliminate

Take random sample 10% of original data Remove Columns Discard irrelevant and repetitive columns Based on low correlation to review score 19 columns remain

0 id
 1 boroughs
 2 zipcode
 3 property_type
 4 room_type
 5 bathrooms
 6 bedrooms
 7 beds
 8 bed_type
 9 amenities
 10 price
 11 security_deposit.2
 12 cleaning_fee.2
 13 guests_included
 14 extra_people
 15 availability_365
 16 number_of_reviews 17 review_scores_rating 18 description 19 name


Data Preparation: Cleaning Remove all entries in which Review Score Rating were not numbers Remove entries with empty ‘Summary’ Transform Boolean True/False values into 0’s and 1’s Assign categorical variables to numeric


Data Preparation: Data Dictionary Boroughs: 1 - Bronx 2 - Brooklyn 3 - Manhattan 4 - Queens 5 - Staten Island

Property Type: 1 - Apartment 2 - Bed & Breakfast 3 - Condominium 4 - Dorm 5 - House 6 - Hut 7 - Loft 8 - Other 9 - Townhouse 0 - (blank)

Listing Type: 1 - Entire home/apt 2 - Private room 3 - Shared room

Bed Type: 1 - Airbed 2 - Couch 3 - Futon 4 - Pull-out Sofa 5 - Real Bed


Data Preparation: Create New Variables Create a new variable for target Binary variable of 1 and 0 1 - Score of 100 0 - Score of less than 100 Create a variable for count of amenities Using R, count the number of amenities oered by each AirBnb listing


5. Modeling


Logistic Regression Boroughs:

0.0090

Zip Code:

-0.00002

Property Type:

-0.0003

Room Type:

0.0272

Bathrooms:

0.0404

Beds:

-0.0254

Bed Type:

0.0225

Amenities Count:

0.0078

-0.01

Price:

0.00010

-0.02

Security Deposit:

-0.00002

-0.03

Cleaning Fee:

-0.000075

Number of Guests:

0.0283

Extra People Cost:

-0.0009

Availability 365:

-0.0003

Number of Reviews:

-0.0064

Coefficients 0.04 0.03 0.02 0.01 0

AUC: .674


Classification Tree Number of Reviews

Number of Reviews Amenities Count

Guests Included

Perfect Review: -  Less than 6.5 reviews -  Less than 1.5 reviews (aka 1) -  Include more than 3.5 guests

Base Rate = 0.280 AUC = 0.746

What to Avoid: -  More than 6.5 reviews -  More than 19.5 reviews -  Include less than 16.5 amenities


Classification Tree Number of Reviews Number of Reviews Guests Included Extra People

Price

Amenities Count Security Deposit

Base Rate = 0.280 AUC = 0.754

365 Availability


Classification Tree

-  -  -  -

Perfect Review: Less than 6.5 reviews Less than 1.5 reviews (aka 1) Include more than 3.5 guests Include less than 21.5 amenities

-  -  -  -

What to Avoid: More than 6.5 reviews Less than 19.5 reviews Include less than 6.5 amenities Make them pay less than $425 for security deposit


Nearest Neighbor

Nearest Neighbor AUC: 0.6620499


Text Mining Description •  Count Vectorizer: AUC 0.534 •  TFIDF: AUC 0.551 Space •  Count Vectorizer: AUC 0.578 •  TFIDF: AUC 0.603 Name •  Count Vectorizer: AUC 0.514 •  TFIDF: AUC 0.522


6. Evaluation


Method: K-Fold Cross Validation

A max depth of 4 achieves the highest area under the ROC curve outcome

Varying the complexity does not really have an impact for logistic regression


Metric: Area Under the ROC Curve


7. Risks & Limitations


Limitations Data may be skewed towards 100 Users may face pressure to give higher ratings Fake reviews


Deployment Hypotheticals Mobile app or online host recommendation system ●  Suggestions to hosts: ○  Have less reviews ○  Increase the number of guests that you can accommodate ○  Have less than 16 amenities


Deployment

http://thenextweb.com/apps/2015/02/21/airbnb-launches-new-dashboard-updated-app-hosts/#gref


Questions?


Appendix 1: Original data set features id

medium_url

host_thumbnail_url

state

bedrooms

listing_url

picture_url

host_picture_url

zipcode

beds

scrape_id

xl_picture_url

host_neighbourhood

market

bed_type

last_scraped

host_id

host_listings_count

smart_location

amenities

name

host_url

host_total_listings_count

country_code

square_feet

summary

host_name

host_verifications

country

price

space

host_since

host_has_profile_pic

latitude

weekly_price

description

host_location

host_identity_verified

longitude

monthly_price

experiences_oered

host_about

street

is_location_exact

security_deposit

neighborhood_overview

host_response_time

neighbourhood

property_type

cleaning_fee

notes

host_response_rate

neighbourhood_cleansed

room_type

guests_included

transit

host_acceptance_rate

neighbourhood_group_cleansed

accommodates

extra_people

thumbnail_url

host_is_superhost

city

bathrooms


Appendix 1: Original data set features maximum_nights

review_scores_accuracy

require_guest_phone_verification

calendar_updated

review_scores_cleanliness

calculated_host_listings_count

has_availability

review_scores_checkin

reviews_per_month

availability_30

review_scores_communication

availability_60

review_scores_location

availability_90

review_scores_value

availability_365

requires_license

calendar_last_scraped

license

number_of_reviews

jurisdiction_names

first_review

instant_bookable

last_review

cancellation_policy

review_scores_rating

require_guest_profile_picture

Count: 92 features


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.