Talk on our Cross-Device Tracking algorithm at Adform

Page 1

Cross-Device Tracking September 2017 Morten Arngren Lead Data Scientist

Vlad Sandulescu Senior Data Scientist

Jan Kremer Data Scientist


đ&#x;“ˇ a lot of experimentation

Snapshot

!" Still under development


ADFORM PLATFORM


Cookie Churn

đ&#x;‘ť

' '

' '

Cross-Device

' đ&#x;“ą % đ&#x;’ť đ&#x;“ž time



⌖ mission

đ&#x;‘ť đ&#x;‘˝ )


⌖ mission

đ&#x;‘ť đ&#x;‘˝ )


đ&#x;“ˆ data

â?“

Ground Truth?

Cookie

â?˛

#



đ&#x;“ž

Time

Location

OS

Device

External Deterministic Providers


(Deterministic)

>700K

(Adform bid responses)

events/day

â?˛ Cookie

#



đ&#x;“ž

OS

Device

Time Location

1.5B

events/day


(Adform bid responses) (Deterministic)

â?˛

>700K events/day

Cookie

#



đ&#x;“ž

OS

Device

Time Location

1.5B

events/day 1 Months of data

94%

2M events Filtering Remove 
 cookies with
 multiple clusters

74K

Data set may not be representative.

6%

clusters

Co o

kie

Ch

Cro ss-

u rn

De vic e


2017-01-21

Training 2.236.222 events 2017-02-21 2017-03-10

Validation 4.776.974 events 2017-04-27

300M 300M

More of it‌

time


Cross-Device Graph

) đ&#x;‘˝

Data

đ&#x;‘ť

Cookie Time

â?˛

Location

#

OS



⌖

đ&#x;“ž

Cluster cookies into user profiles.

Device

Cookie IP

Device User-Agent


Cross-Device Graph Graph cutting algorithms • • • •

Min-Cut Markov Random Fields Community Detection Graph Clustering…

Production Scale

B

events Cookie IP

Device User-Agent

B

nodes

B

edges


Scale….what to do?? Divide Graph …by location (Pre-Clustering) Ignored

Cookie IP

1

2

3

(Connected Components Algorithm)

4

n-1

n


Cookie association by Pair-wise classification

đ&#x;‘ť A

đ&#x;‘˝ B

C

) D

E Positive

Negative

A B

A C

A E

C D

A D

B E

Create cookie pairs as classification data set

B C

C E

‌with ground truth

B D

D E


Crafting Features for each cookie pair‌ A B

A

B Pageviews

Observations ID Time

Location

Device

cookie_id

Aggregation 0

12

24 hours

log_time ip_v4 country_id region_id city_id zip_code_id device_type_id os_id browser_id browser_language_codes mobile_app_id

0

set a set b

31 days


Crafting Features for each cookie pair‌ Similarity

A B Pageviews

-

Observations ID Time

Location

Device

cookie_id

Aggregation 0

12

24 hours

log_time ip_v4 country_id region_id city_id zip_code_id device_type_id os_id browser_id browser_language_codes mobile_app_id

Features

Cosine Similarity Correlation Similarity Hellinger Distance

0

set a set b

31 days

Max Overlap

Jaccard Index


Crafting Features

PC 1+2

for each cookie pair‌ A B Observations ID Time

Location

Device

Features

cookie_id

25 dim. feature vector

Numerical

log_time ip_v4 country_id region_id city_id zip_code_id device_type_id os_id browser_id browser_language_codes mobile_app_id

Vectorial

Classifier

Sets

đ&#x;Œ´

đ&#x;Œ´đ&#x;Œ´

0.78

A B

Gradient Boosted Decision Trees

(http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html)


Classification

Clustering

Pruning Degrees of edges

A

A

B

G

B

C D E

F

2

2

2

2

A

B

D

E

C E

2

3

C

F

D

F

1 95% percentile

G

(Connected Components Algorithm)

G


Pre-Clustering

Classification

Clustering

Pruning

A

B

A

B

A A

B

C

G

C D

C

B

F

C

D

E

D

E

E E

F

D

F

F

G

G

Clusters

Users

G

Full Graph

Components

(Connected Components Algorithm)

Sub-Graph (XGBoost)

(Connected Components Algorithm)


đ&#x;“ˆ

performance


(Preliminary YouGov data set…)

Performance Graphs


(Preliminary YouGov data set…)

Performance Graphs

(Dasgupta et al. “Overcomming Browser Cookie Churn with Clustering” 2012)


(Preliminary YouGov data set…)

Performance Graphs


(Preliminary YouGov data set…)

Performance Graphs

Precision

Recall

F1-score

91% 32% 47% (Preliminary YouGov data set…)


(Preliminary YouGov data set…)

Performance Graphs

Precision

Recall

F1-score

93% 43% 59% (Preliminary YouGov data set…)


production


Scheduled tasks Predicting

Training XD Deterministic Data

Serving

PreProcessing

PreClustering

PreProcessing

Train Classifier

PreClustering

Model parameters

Model ready

( ) S3

Kafka

/

Predict Cross Device ID’s

Cross-Device Cookies Ready

XD Data

( ) /

S3

Kafka

Serve‌


đ&#x;Œ?

2017-05-01

â?˛

Germany 7 days

2017-05-08

1.3B events (10%)

2017-06-25

đ&#x;Œ?

Germany

2017-07-24

â?˛

56B 30 days events 300M 300M

More of it‌

210M 138M time

175M


Predicting

đ&#x;Œ?

2017-05-01

â?˛

Germany 7 days

2017-05-08

1.3B events (10%)

XD Data

2017-06-25

PreProcessing

đ&#x;Œ?

Germany

2017-07-24

â?˛

56B 30 days events Model parameters

PreClustering

30 min.

Predict Cross Device ID’s

2 hours

More of it‌ Cross-Device Cookies Ready

time


â˜˘ challenges


Large pre-clusters

IP Obscurity

176.22.252.???

Cookie IP

One IP to rule them all….

Noisy pre-clusters


â?˛ future


Data

Predicting

đ&#x;“ˆ

Features

Contextual Information

XD Data

PreProcessing

More of it‌

Topic modelling on website content

PreClustering

Validate‌ Model parameters

4

Predict Cross Device ID’s

Large pre-clusters IP Obscurity More‌

Cross-Device Cookies Ready

✈ đ&#x;“š

Streaming

Context


References [1] Tran et al. “Classification and Learning-to-rank Approaches for Cross-Device Matching” CIKM Cup 2016 [2] Dasgupta et al. ”Overcoming Browser Cookie Churn with Clustering” CIKM Cup 2016 [3] Blondel et al. “Fast unfolding of communities in large networks” 2008 [4] Malloy et al. “Internet Device Graphs” KDD 2017


6 thank you


Profile-cookie space Precision / Recall Precision =

TP TP + FP

=

Recall =

TP TP + FN

=

FP TN FN

Estimated Class A Class B

TP Precision • Recall F1 = 2 • Precision + Recall


đ&#x;Œ´

đ&#x;Œ˛

F1

F3

>1

F4 <0.5

2

>0

<=0

<=1

F8

+

>=0.5

17

<0.8

F5 <=6

>6

>=0.8

F2

12 10

4

>1

<=1

3

12

+

1

+ Boosted Decision Trees (http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html)

0.78

A B


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.