Charles BOUVEYRON — Université Côte d'Azur — Statistical Artificial Intelligence: Learning from Comp

Page 1

Statistical Artificial Intelligence: Learning from Complex Data Globally Sparse Discriminant Analysis of High-Dimensional Data

Charles BOUVEYRON Professor of Statistics Chair INRIA in ”Data Science” Laboratoire LJAD, UMR CNRS 7351 Epione team, Inria Sophia-Antipolis Université Côte d’Azur charles.bouveyron@univ-cotedazur.fr @cbouveyron

1


Introduction: the AI revolution hasn’t happened yet! Artificial intelligence is a strategic field of research:

with direct application in most scientific fields (Medicine, Biology, Astrophysics, Humanities) and with probably the most impact in innovation and transfer (health, transport, defense).

2


Introduction: the AI revolution hasn’t happened yet! Artificial intelligence is a strategic field of research:

with direct application in most scientific fields (Medicine, Biology, Astrophysics, Humanities) and with probably the most impact in innovation and transfer (health, transport, defense).

The recent and impressive NN results should not hide the remaining issues:

deep learning has impressive results in a few specific cases and with a high-level supervision,

use of DL techniques in various fields are promising but not well understood.

2


Introduction: the AI revolution hasn’t happened yet! Artificial intelligence is a strategic field of research:

with direct application in most scientific fields (Medicine, Biology, Astrophysics, Humanities) and with probably the most impact in innovation and transfer (health, transport, defense).

The recent and impressive NN results should not hide the remaining issues:

deep learning has impressive results in a few specific cases and with a high-level supervision,

use of DL techniques in various fields are promising but not well understood. ”Artificial Intelligence: the revolution hasn’t happened yet” M. Jordan (UC Berkley)

2


Introduction: open problems for AI

Some open problems are critical:

reliability of models and algorithms,

handling data heterogeneity (categorical, functional, networks, images, texts, ...),

unsupervised learning (clustering, dimension reduction),

learning from small data (n small / p large),

3


Introduction: open problems for AI

Some open problems are critical:

reliability of models and algorithms,

handling data heterogeneity (categorical, functional, networks, images, texts, ...),

unsupervised learning (clustering, dimension reduction),

learning from small data (n small / p large),

Combination of statistical theory with deep learning techniques is certainly the future of AI!

3


Introduction: a new AI team @ UCA/Inria To address those problems, we are building a new research team at UCA & Inria:

Unsupervised learning Theory of deep learning

Models & Algorithms Adaptive & robust learning for AI

Heterogeneous & complex data Applications & innovation

Figure: Scientific objectives of the (upcoming) Maasai team. 4


Introduction: a few recent recent projects

FunLBM: Coclustering of functional data

5


Introduction: a few recent recent projects

FunLBM: Coclustering of functional data

HDMI: single-image denoising

5


Introduction: a few recent recent projects

FunLBM: Coclustering of functional data

HDMI: single-image denoising

5

WAMiC: Deep clustering with mixtures


Introduction: a few recent recent projects

8 0

0

2

4

6

≈W

X

6

8

Y

10

WAMiC: Deep clustering with mixtures

4

2

FunLBM: Coclustering of functional data

10

0

200

400

600

800

600

800

0

2

0

2

Class 1

10 8 6 4 0

0

2

≈W

2

X

T

8

Y

Hello World

10

gsHDDA: classification of HD data 6

HDMI: single-image denoising

4

0

200

400 Class 3

5

Hello World


A motivating example: lung cancer diagnosis

The considered data set:

data from n = 87 patient with lung cancer, p = 636 radiomic features were extracted CT images, the objective is to build a classifier able to discriminate 3 subtypes of lesions.

Beyond the diagnosis task, the goal is to isolate some radiomic markers of the lesion subtypes.

6


Basics of PCA & sparsity: PCA Let us consider a n Ă— p data matrix X = (x1 , ..., xn )T that one wants to project onto a "good" d-dimensional subspace. Principal component analysis (PCA): T the optimal choice is spanned by the top-d eigenvectors of X X, PCA can also be view as a factorization into a low-rank decomposition.

Y X

T

≈

W

Figure: PCA viewed as a low-rank decomposition. 7


Basics of PCA & sparsity: SPCA However, regular PCA fails when p is large (Johnstone & Lu ’09): sparse versions of PCA (SPCA, Zou et al., ’06) have been developed consequently, sparse PCA allows to regularize the problem but does not improve significantly the interpretation of the results.

Y X

T

W

Figure: Sparse PCA viewed as a low-rank decomposition. 8


Basics of PCA & sparsity: gsPPCA Our objective is to truly perform unsupervised variable selection within PCA: the projection matrix W should be row-sparse, leading to the globally sparse PCA problem, this solution allows to identify the relevant original variables while reducing the dimensionality.

Y

Hello World

X

T

≈

W

Figure: Globally sparse PCA viewed as a low-rank decomposition. 9


Probabilistic PCA Let us consider probabilistic PCA (PPCA, Tipping & Bishop, ’99) which assumes that each observation is generated by the following model: x = Wy + ε

where y ∼ N (0, Id ) is a low-dimensional Gaussian latent vector,

W is a p × d parameter matrix called the loading matrix,

and ε ∼ N (0, σ 2 Ip ) is a Gaussian noise term.

10

(1)


Probabilistic PCA Let us consider probabilistic PCA (PPCA, Tipping & Bishop, ’99) which assumes that each observation is generated by the following model: x = Wy + ε

where y ∼ N (0, Id ) is a low-dimensional Gaussian latent vector,

W is a p × d parameter matrix called the loading matrix,

and ε ∼ N (0, σ 2 Ip ) is a Gaussian noise term.

(1)

Two remarks:

the PPCA model justifies PCA under a Gaussian assumption,

PPCA allows to recover the principal components even in the limit noiseless setting σ → 0!

10


The gsPPCA model

We propose the following model to handle variable selection within PPCA: x = VWy + V̄ε1 + Vε2

(2)

where V = diag(v) such that the matrix VW is row-sparse, leading to global sparsity, ε1 ∼ N (0, σ12 Ip ) and ε2 ∼ N (0, σ22 Ip ) are respectively the noises of the inactive and active variables, we impose Gaussian priors wij ∼ N (0, α−2 ) on the loadings.

We want to investigate the noiseless case σ2 → 0.

11


The gsPPCA model In the context of the globally sparse PPCA model, we demonstrate that: Theorem. In the noiseless limit σ2 → 0, x converges in probability to a random variable x̃ whose density is p(x̃|v, α, σ12 ) = N (x̃V̄ |0, σ1 Ip−q )Bessel(x̃V |1/α, (d − q)/2).

(3)

This theorem allows us to efficiently compute the noiseless marginal log-likelihood defined as L(X, v, α, σ1 ) =

n X i=1

12

log P(x̃ = xi |v, α, σ1 ).


The gsPPCA model In the context of the globally sparse PPCA model, we demonstrate that: Theorem. In the noiseless limit σ2 → 0, x converges in probability to a random variable x̃ whose density is p(x̃|v, α, σ12 ) = N (x̃V̄ |0, σ1 Ip−q )Bessel(x̃V |1/α, (d − q)/2).

(3)

This theorem allows us to efficiently compute the noiseless marginal log-likelihood defined as L(X, v, α, σ1 ) =

n X

log P(x̃ = xi |v, α, σ1 ).

i=1

A last (big) issue:

there are 2p possible models for v!

the direct optimization over v is intractable. 12


The relaxed gsPPCA model Our solution:

use a a relaxed model of (2) to propose a family of p ordered models,

only compute the marginal likelihood along the proposed path of models.

13


The relaxed gsPPCA model Our solution:

use a a relaxed model of (2) to propose a family of p ordered models,

only compute the marginal likelihood along the proposed path of models.

The gsPPCA algorithm:

input: a data matrix X ∈ Rp 2 main steps:

output:

13

VEM algorithm on a relaxed model for finding a path of candidate variables, model selection using (3) along the path of models for finding the relevant variables. the number q of relevant variables, the sparsity pattern v ∈ {0, 1}p .


An introductory example As an educational example: we simulated a data set according to model (2), with n = 50, p = 30, d = 5 and q = 10, such that v = (1, ..., 1, 0, ..., 0). | {z } | {z } p

p−q

1.00 −2000

Evidence

Values

0.75

0.50

−3000 −4000 −5000

0.25

−6000 0.00 0

10

20

Variables

30

5

10

15

20

25

Variables

Figure: Variable selection with gsPPCA on the introductory example. 14

30


Radiomics: lung cancer diagnosis The considered data set:

data from n = 87 patient with lung cancer, p = 636 radiomic features were extracted CT images, the objective is to build a classifier able to discriminate 3 subtypes of lesions.

Experimental setup:

the cohort was randomly divided 50 times, in a learning subset (57 patients) and a test subset (30 patients).

15


70 %

60 %

50 %

40 %

30 %

sHDDA

HDDA

sPLS−DA

PLS−DA

ASDA

pLDA

sLDA

20 %

LDA

Percentage of observations correctly identified

Radiomics: classification results

Figure: Correct classification rate for gsHDDA and competitors. 16


Radiomics: variable selection

40 %

20 %

0%

sHDDA−class 3 100 % Percentage of tests where each variable is selected

60 %

Percentage of tests where each variable is selected

Percentage of tests where each variable is selected

80 %

sHDDA−class 2 100 %

80 %

60 %

40 %

20 %

0%

80 %

60 %

40 %

20 %

0%

gsHDDA

sPLS−DA 100 % Percentage of tests where each variable is selected

sHDDA−class 1 100 %

80 %

60 %

40 %

20 %

0%

sPLSDA

Figure: Variable selection for each class using gsHDDA (left) and with sPLSDA (right).

17


Conclusion

Take home messages:

AI is just rising! The real AI revolution is still ahead!

academic research is a key element to solve (the numerous) open problems,

we believe in a enriching combination of statistical and deep learning tools.

18


Conclusion

Take home messages:

AI is just rising! The real AI revolution is still ahead!

academic research is a key element to solve (the numerous) open problems,

we believe in a enriching combination of statistical and deep learning tools.

References: C. Bouveyron, P. Latouche and P.-A. Mattei, Bayesian Variable Selection for Globally Sparse Probabilistic PCA, Electronic Journal of Statistics, in press, 2018. F. Orlhac, P.-A. Mattei, C. Bouveyron and N. Ayache, Class-specific Variable Selection in High-Dimensional Discriminant Analysis through Bayesian Sparsity, Journal of Chemometrics, in press, 2019.

18


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.