Bruno BACHIMONT — Université de Technologie de Compiègne — Building Science on Data : Epistemologica

Page 1

Building Science on Data : Epistemological Questions for a Paradigm Shift Bruno Bachimont Sorbonne université Costech, université de technologie de Compiègne


Contents •  An approach of reality o  A paradigm shift?

•  Some questions : o  An epistemological question: •  A new version of Meno Paradox? o  An hermeneutical question: •  A new nominalism? o  A semantic question: •  Is meaning an ergodic system?

Sophi.I.A.

08/11/18

2


An approach of reality A new paradigm

Sophi.I.A.

08/11/18

3


3 main properties •  Big o  Huge amount of data: need for automatic treatments. o  The whole as such: specific properties of the mass.

•  Dynamic o  Automatic and periodic production of data

•  Heterogeneous •  The digital as a single medium for manipulating different data : o  Data are co-manipulated even if they are different in nature or in meaning

Sophi.I.A.

08/11/18

4


4V

Sophi.I.A.

http://www.datasciencecentral.com/profiles/blogs/data-veracity

08/11/18

5


Google Flu

France :

Sophi.I.A.

08/11/18

6


Discussions

Sophi.I.A.

08/11/18

7


New predictions / vaticinations •  New scientific paradigm for humanities (Manovich) •  New empirism: Replacement of theory by data exploration (Anderson) •  New management tool: anchoring decision into data objectivity

Sophi.I.A.

08/11/18

8


Cultural analytics •  Term coined by Lev Manovitch •  Three steps: o  Data collect o  Statistical analysis o  Dynamic vizualisation

DATA

Sophi.I.A.

Analysis

Visualization

08/11/18

9


Linkfluence.net

Sophi.I.A.

08/11/18

10


A new paradigm?

Sophi.I.A.

08/11/18

11


Sophi.I.A.

08/11/18

12


Sophi.I.A.

08/11/18

13


Sophi.I.A.

08/11/18

14


Epistemological question

Sophi.I.A.

08/11/18

15


A triple phenomenotechnique •  Collecting and formatting data o  Data are collected by means of tools that select and format data in order to capture them.

•  Transforming and analyzing data o  Data are transformed in order to be analyzed; o  Several tools (machine learning, deep learning, statistical and probabilistic analyses, etc.) are used

•  Visualizing and presenting results o  Graphic tools in order to present meaningful results Sophi.I.A.

08/11/18

16


The collect problem •  Data are not reality but a construction based on it: o  Data are the consequence of some activity (of the user). o  Collected data are not representative by such but should be elicitated.

•  Double data curation: o  Data are records of some activity •  Their link to activity should be qualified; •  Data collection should be elicitated regarding this activity: to which extent is this collection really representative? o  Data are transformed and formatted through collecting: •  For example, texts become word bags (without any structure) Sophi.I.A.

08/11/18

17


The treatment problem

Un-interpretable intermediate steps

Sophi.I.A.

08/11/18

18


Indistinguishable distinctions

Sophi.I.A.

08/11/18

19


The visualization question

•  Graphical metaphor: o  Visualization has a semantic on its own: it may suggest meaning without warranty of its relationship to data

•  Interpretation model: o  Graphical semiotic is in one-to-one correspondence with data: graphical semiotic reflect data systematicity and semantic. o  Exploring visualization tools amounts to explore data.

Sophi.I.A.

08/11/18

20


Images

Sophi.I.A.

08/11/18

21


Mythologies…

© Vincent Minier

« The Pillars of Creation » Sophi.I.A.

08/11/18

22


Aporia: Meno paradox •  How is it possible to learn something new: o  If it is really new, how to recognize it? o  If it is recognizable, it is not new.

•  Big data are so complex that interpreting them is to recognize something that we already know: o  Treatments are complex and not interpretable; o  Visualization shows properties loosely connected to data.

Sophi.I.A.

08/11/18

23


An hermeneutical question A new nominalism ?

Sophi.I.A.

08/11/18

24


Heritage: epistemology of measure

•  Phenomenotechnique that produces measures and theories that transform them into scientific facts are homogeneous: o  The very same theories are used to build measurement instruments and to interpret their productions

•  Such an epistemology is a secular result: o  Measure, calculation, mathematizing o  Scientific imagery (cf. Daston & Galison)

Sophi.I.A.

08/11/18

25


Toward a new epistemology: an epistemology of data •  Bridging thee gaps: o  Gap between data and activities they are records of; o  Gap between data nature and treatments applied on them? o  Gap between a shown objectivity and data properties?

Sophi.I.A.

08/11/18

26


A new nominalism? •  Nominalism: o  Can be defined as the criticism of the analogy established between language and reality

•  Historically: o  First nominalist révolution in the 14th century: •  Language is no longer an access to nature: it should be replaced by experiment and calculation. o  A second nominalist revolution today? •  Language is no longer an access to culture: it should be replaced by data collection and exploration.

Sophi.I.A.

08/11/18

27


Roscelin of Compiègne •  1050 (Compiègne) – 1121 (Besançon) •  Words are « flatus vocis »

Sophi.I.A.

08/11/18

28


William of Ockham

Sophi.I.A.

08/11/18

29


First nominalist revolution •  Criticism of medieval realism o  Propositions are true insofar as their terms et their grammatical structure mimic reality structure •  Terms correspond to essences; •  Syntax corresponds to essence dependencies; o  E.g. man is an animal •  Every linguistic difference refers to a real difference.

•  Ockham (with others) criticizes this vision: o  World is composed of individuals: no essences o  Individuals are of two kinds: substance and property.

Sophi.I.A.

08/11/18

30


Second nominalist revolution •  Huge data bases o  contents •  E.g. 2 millions digitized hours of video at INA (french legal deposit for TV and Radio) •  E.g. YouTube •  Etc. o  data,; •  Commercial metadata(e.g. Amazon) •  Linked data •  Etc.

Sophi.I.A.

08/11/18

31


Second nominalist revolution •  Understanding culture and society is no longer an interpretative/hermeutical process but : o  A statistical and quantitative analysis; o  A qualitative and perceptive visualisation:

•  Language is no longer a common environment but a useful syntax to be manipulated: o  Words are syntactical tokens but not meaningful signs.

Sophi.I.A.

08/11/18

32


A new approach of meaning ? Between ergodicity and singularity

Sophi.I.A.

08/11/18

33


Computation turns meaning into data •  data: o  Is « datum » (i.e. given): without history or origin, a data is simply there, with its bold positivity. o  Is a starting point, on which we rely to compute something; o  Can be reduced to the fact of being computable or manipulable: o  Surrogate reality: •  Data are the very reality for performed computations

Sophi.I.A.

08/11/18

34


Is World a data? •  By hypothesis, data are considered to be homogeneous in order to be computed together: o  Co-manipulation; o  To be distinguished from commensurable

•  Computations are interpretable only if reality variability is coherent with data variability: o  Is there any correspondence in meaning variability with computed variability?

Sophi.I.A.

08/11/18

35


Two variability schemes Singular What is given Particular General Sophi.I.A.

08/11/18

36


Ergodicity in a picture

= A forest, as a set of trees in their various possible states Sophi.I.A.

A single tree in its temporal evolution 08/11/18

37


Ergodicity •  The problem: o  Reality is not ergodic (cf. D. North) o  Meaning is not ergodic (, Hermeneutic tradition).

•  Two approaches: o  Naturwissenschaften: •  Difference is only approximation, an error, to be reduced in order to find the very reality: o  Difference should eliminated, it is the contingence; o  Kulturwissenschaften(Weber, Rastier, Rickert) •  Difference is essential since it is the very definition of the phenomon in its difference to the convention / norm which is only a descriptive tool: o  Difference is the essential. Sophi.I.A.

08/11/18

38


Consequences •  If meaning is not ergodic, data cannot be simply analysed to understand culture and humanities; •  Need for o  An epistemological interpretation: big data as heuristics •  Assessing the ergodic hypothesis regarding the studied phenomena. o  An hermeneutical interpretation : story telling about big data •  Find again singularity behind computed results, by coming back to human and social facts.

Sophi.I.A.

08/11/18

39


Conclusion •  big data o  Are probably an effective paradigm shift ; o  Should be decomposed in several tasks: •  Data curation: o  representativity, semiotic bias; •  Result interpretation: : o  Data epistemology: assessing ergodic hypothesis; o  Data hermeneutics: recover singularity from results to human experience..

Sophi.I.A.

08/11/18

40


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.