Thierry NAGELLEN — Orange — Ubiquitous big data for knowledge extraction from secured data silos

Page 1

Ubiquitous Big Data for knowledge extraction from secured data silos Thierry Nagellen – Orange Labs thierry.nagellen@orange.com


Why Ubiquitous Big Data?

2

Orange

Thierry Nagellen - 2018


AI: beyond software some hardware constraints

3

Orange

Thierry Nagellen - 2018


“Our work in this area was initially motivated by our aim to reduce service response times and resource usage in our cloud environment which operates globally and at scale‌ Managing data access locality in geo-distributed systems is important because doing so can significantly improve data access latencies, given that intra-datacenter communication latencies are two orders of magnitude smaller than cross-datacenter communication latencies: e.g. 1ms vs 100ms.â€? Facebook - Akkio

4

Orange

Ref: https://www.usenix.org/conference/osdi18/presentation/annamalai

Thierry Nagellen - 2018


Reality of constraints: Energy, Latency‌

5

Orange

Ref: http://web.eecs.umich.edu/~jahausw/publications/kang2017neurosurgeon.pdf

Thierry Nagellen - 2018


Lambda architecture: historical data + streaming data (Nathan Marz – 2011) Objective: Maintaining code that needs to produce the same result in two complex distributed systems

6

Orange

Ref: https://dzone.com/articles/lambda-architecture-with-apache-spark

Thierry Nagellen - 2018


Some emerging trends

SOLID – Tim Berners-Lee Re-decentralize the web Smart IoT Embedded Narrow AI 7

Orange

Thierry Nagellen - 2018


Federated Learning (Google) 1. A subset of existing clients is selected, each of which downloads the current model. (M -> A) 2. Each client in the subset computes an updated model based on their local data. (A -> B) 3. The model updates are sent from the selected clients to the sever. (B -> C -> M) 4. The server aggregates these models (typically by averaging) to construct an improved global model. (M) M.

8

Orange

Ref: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html

Thierry Nagellen - 2018


Parameter Server: split the data – centralize the model

Spark Summit 2016 talk by Erik Ordentlich (Yahoo) and Badri Bhaskar (Yahoo) 9

Orange

Thierry Nagellen - 2018


Parameter server with TensorFlow (Deep learning example)

10

Orange

Ref: https://medium.com/polyaxon/distributed-deep-learning-with-polyaxon-6d9f1288e4b8 https://docs.riseml.com/guide/advanced/distributed_tensorflow/

Thierry Nagellen - 2018


IBM Centralized vs decentralized (Parallel SGD vs Asynchronous SGD)

11

Orange

Ref: https://www.ibm.com/blogs/research/2017/12/deep-learning-training-10x-improvement/

Thierry Nagellen - 2018


What about Data & Model parallelization?

12

Orange

Ref: https://www.youtube.com/watch?v=vwXolaBQfaU

Thierry Nagellen - 2018


But how to discover data characteristics? Our main challenges: Find relevant data to solve my problem (from the user point of view) No time to describe the data Solve the vocabulary issue: (a) Per vertical silo (b) Multi languages Reduce data preparation time if you want to apply automatically some algorithms Some answers: Semantic search engine Natural Language interface for data description Semantization chain New approach for structured data: Probabilistic Relational Model

“Probabilistic relational models (PRMs) are a rich representation language for structured statistical models. They combine a frame-based logical representation with probabilistic semantics based on directed graphical models (Bayesian networks).� 13

Orange

Ref: https://ai.stanford.edu/~koller/Papers/Getoor+al:SRL07.pdf

Thierry Nagellen - 2018


Orange Dataforum: semantization chain

14

Orange

English French Dutch Spanish Slovak Romanian German

Thierry Nagellen - 2018


Probabilistic Relational Model to combine with semantics

15

Orange

Ref: https://www.slideshare.net/AnthonyCoutant/

Thierry Nagellen - 2018


Thanks!

Thierry Nagellen – Orange Labs thierry.nagellen@orange.com


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.