http://lacam.di.uniba.it:8000/~nico/corsi/aa/IBL by Nicola Fanizzi

Instance-based Learning Corso di Apprendimento Automatico Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica UniversitĂ degli Studi di Bari

November 12, 2009

Corso di Apprendimento Automatico

Instance-based Learning

Outline

k -N EAREST N EIGHBOR Locally weighted regression kD-trees, ball-trees Rectangles Radial basis functions Case-based reasoning Lazy and eager learning

Corso di Apprendimento Automatico

Instance-based Learning

Instance-Based Learning I Key idea: just store all training examples hxi , f (xi )i N EAREST N EIGHBOR: Given query instance xq , first locate nearest training example xn , then estimate ˆf (xq ) ← f (xn ) k -N EAREST N EIGHBOR: Given xq , take vote among its k nearest nbrs (if discrete-valued target function) take mean of f values of k nearest nbrs (if real-valued) ˆf (xq ) ←

Corso di Apprendimento Automatico

i=1 f (xi )

Instance-based Learning

Instance-Based Learning II

Corso di Apprendimento Automatico

Instance-based Learning

kNN and Voronoi Diagram

Corso di Apprendimento Automatico

Instance-based Learning

When To Consider N EAREST N EIGHBOR

Instances map to points in <n Less than 20 attributes per instance Lots of training data Advantages: Training is very fast Learn complex target functions Donâ&#x20AC;&#x2122;t lose information Disadvantages: Slow at query time Easily fooled by irrelevant attributes

Corso di Apprendimento Automatico

Instance-based Learning

Behavior in the Limit I Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative). N EAREST N EIGHBOR: As number of training examples → ∞ approaches Gibbs Algorithm Gibbs: with probability p(x) predict 1, else 0 k -N EAREST N EIGHBOR: As number of training examples → ∞ and k gets large approaches Bayes optimal Bayes optimal: if p(x) > .5 then predict 1, else 0 Note: Gibbs has at most twice the expected error of Bayes optimal Corso di Apprendimento Automatico

Instance-based Learning

Behavior in the Limit II

Corso di Apprendimento Automatico

Instance-based Learning

Distance-Weighted k NN Might want weight nearer neighbors more heavily... ˆf (xq ) ← where wi ≡

i=1 wi f (xi )

i=1 wi

1 d(xq , xi )2

and d(xq , xi ) is distance between xq and xi Note now it makes sense to use all training examples instead of just k → Shepard’s method

Corso di Apprendimento Automatico

Instance-based Learning

Distances

Most instance-based schemes use Euclidean distance: q (1) (2) (1) (2) (1) (2) (x1 − x1 )2 + (x2 − x2 )2 + ...(xk − xk )2 where x(1) and x(2): two instances with k attributes Taking the square root is not required when comparing distances Other popular metric: cityblock metric Adds differences without squaring them

Corso di Apprendimento Automatico

Instance-based Learning

Normalization

Different attributes are measured on different scales â&#x2020;&#x2019; need to be normalized: ai =

vi â&#x2C6;&#x2019; mini vi maxi vi â&#x2C6;&#x2019; mini vi

vi : the actual value of attribute i Nominal attributes: distance either 0 or 1 Common policy for missing values: assumed to be maximally distant (given normalized attributes)

Corso di Apprendimento Automatico

Instance-based Learning

Curse of Dimensionality Imagine instances described by 20 attributes, but only 2 are relevant to target function Curse of dimensionality: N EAREST N EIGHBOR is easily mislead when high-dimensional X One approach: Stretch jth axis by weight zj , where z1 , . . . , zn chosen to minimize prediction error Use cross-validation to automatically choose weights z1 , . . . , zn Note setting zj to zero eliminates this dimension altogether see [Moore and Lee, 1994]

Corso di Apprendimento Automatico

Instance-based Learning

Finding Nearest Neighbors Efficiently

Simplest way of finding nearest neighbor: linear scan of the data Classification takes time proportional to the product of the number of instances in training and test sets

Nearest-neighbor search can be done more efficiently using appropriate data structures We will discuss two methods that represent training data in a tree structure: kD-trees and ball trees

Corso di Apprendimento Automatico

Instance-based Learning

kD-tree Example I

Corso di Apprendimento Automatico

Instance-based Learning

kD-tree Example II

Corso di Apprendimento Automatico

Instance-based Learning

kD-tree Example III

Corso di Apprendimento Automatico

Instance-based Learning