Building Science on Data : Epistemological Questions for a Paradigm Shift Bruno Bachimont Sorbonne université Costech, université de technologie de Compiègne
Contents • An approach of reality o A paradigm shift?
• Some questions : o An epistemological question: • A new version of Meno Paradox? o An hermeneutical question: • A new nominalism? o A semantic question: • Is meaning an ergodic system?
Sophi.I.A.
08/11/18
2
An approach of reality A new paradigm
Sophi.I.A.
08/11/18
3
3 main properties • Big o Huge amount of data: need for automatic treatments. o The whole as such: specific properties of the mass.
• Dynamic o Automatic and periodic production of data
• Heterogeneous • The digital as a single medium for manipulating different data : o Data are co-manipulated even if they are different in nature or in meaning
Sophi.I.A.
08/11/18
4
4V
Sophi.I.A.
http://www.datasciencecentral.com/profiles/blogs/data-veracity
08/11/18
5
Google Flu
France :
Sophi.I.A.
08/11/18
6
Discussions
Sophi.I.A.
08/11/18
7
New predictions / vaticinations • New scientific paradigm for humanities (Manovich) • New empirism: Replacement of theory by data exploration (Anderson) • New management tool: anchoring decision into data objectivity
Sophi.I.A.
08/11/18
8
Cultural analytics • Term coined by Lev Manovitch • Three steps: o Data collect o Statistical analysis o Dynamic vizualisation
DATA
Sophi.I.A.
Analysis
Visualization
08/11/18
9
Linkfluence.net
Sophi.I.A.
08/11/18
10
A new paradigm?
Sophi.I.A.
08/11/18
11
Sophi.I.A.
08/11/18
12
Sophi.I.A.
08/11/18
13
Sophi.I.A.
08/11/18
14
Epistemological question
Sophi.I.A.
08/11/18
15
A triple phenomenotechnique • Collecting and formatting data o Data are collected by means of tools that select and format data in order to capture them.
• Transforming and analyzing data o Data are transformed in order to be analyzed; o Several tools (machine learning, deep learning, statistical and probabilistic analyses, etc.) are used
• Visualizing and presenting results o Graphic tools in order to present meaningful results Sophi.I.A.
08/11/18
16
The collect problem • Data are not reality but a construction based on it: o Data are the consequence of some activity (of the user). o Collected data are not representative by such but should be elicitated.
• Double data curation: o Data are records of some activity • Their link to activity should be qualified; • Data collection should be elicitated regarding this activity: to which extent is this collection really representative? o Data are transformed and formatted through collecting: • For example, texts become word bags (without any structure) Sophi.I.A.
08/11/18
17
The treatment problem
Un-interpretable intermediate steps
Sophi.I.A.
08/11/18
18
Indistinguishable distinctions
Sophi.I.A.
08/11/18
19
The visualization question
• Graphical metaphor: o Visualization has a semantic on its own: it may suggest meaning without warranty of its relationship to data
• Interpretation model: o Graphical semiotic is in one-to-one correspondence with data: graphical semiotic reflect data systematicity and semantic. o Exploring visualization tools amounts to explore data.
Sophi.I.A.
08/11/18
20
Images
Sophi.I.A.
08/11/18
21
Mythologies…
© Vincent Minier
« The Pillars of Creation » Sophi.I.A.
08/11/18
22
Aporia: Meno paradox • How is it possible to learn something new: o If it is really new, how to recognize it? o If it is recognizable, it is not new.
• Big data are so complex that interpreting them is to recognize something that we already know: o Treatments are complex and not interpretable; o Visualization shows properties loosely connected to data.
Sophi.I.A.
08/11/18
23
An hermeneutical question A new nominalism ?
Sophi.I.A.
08/11/18
24
Heritage: epistemology of measure
• Phenomenotechnique that produces measures and theories that transform them into scientific facts are homogeneous: o The very same theories are used to build measurement instruments and to interpret their productions
• Such an epistemology is a secular result: o Measure, calculation, mathematizing o Scientific imagery (cf. Daston & Galison)
Sophi.I.A.
08/11/18
25
Toward a new epistemology: an epistemology of data • Bridging thee gaps: o Gap between data and activities they are records of; o Gap between data nature and treatments applied on them? o Gap between a shown objectivity and data properties?
Sophi.I.A.
08/11/18
26
A new nominalism? • Nominalism: o Can be defined as the criticism of the analogy established between language and reality
• Historically: o First nominalist révolution in the 14th century: • Language is no longer an access to nature: it should be replaced by experiment and calculation. o A second nominalist revolution today? • Language is no longer an access to culture: it should be replaced by data collection and exploration.
Sophi.I.A.
08/11/18
27
Roscelin of Compiègne • 1050 (Compiègne) – 1121 (Besançon) • Words are « flatus vocis »
Sophi.I.A.
08/11/18
28
William of Ockham
Sophi.I.A.
08/11/18
29
First nominalist revolution • Criticism of medieval realism o Propositions are true insofar as their terms et their grammatical structure mimic reality structure • Terms correspond to essences; • Syntax corresponds to essence dependencies; o E.g. man is an animal • Every linguistic difference refers to a real difference.
• Ockham (with others) criticizes this vision: o World is composed of individuals: no essences o Individuals are of two kinds: substance and property.
Sophi.I.A.
08/11/18
30
Second nominalist revolution • Huge data bases o contents • E.g. 2 millions digitized hours of video at INA (french legal deposit for TV and Radio) • E.g. YouTube • Etc. o data,; • Commercial metadata(e.g. Amazon) • Linked data • Etc.
Sophi.I.A.
08/11/18
31
Second nominalist revolution • Understanding culture and society is no longer an interpretative/hermeutical process but : o A statistical and quantitative analysis; o A qualitative and perceptive visualisation:
• Language is no longer a common environment but a useful syntax to be manipulated: o Words are syntactical tokens but not meaningful signs.
Sophi.I.A.
08/11/18
32
A new approach of meaning ? Between ergodicity and singularity
Sophi.I.A.
08/11/18
33
Computation turns meaning into data • data: o Is « datum » (i.e. given): without history or origin, a data is simply there, with its bold positivity. o Is a starting point, on which we rely to compute something; o Can be reduced to the fact of being computable or manipulable: o Surrogate reality: • Data are the very reality for performed computations
Sophi.I.A.
08/11/18
34
Is World a data? • By hypothesis, data are considered to be homogeneous in order to be computed together: o Co-manipulation; o To be distinguished from commensurable
• Computations are interpretable only if reality variability is coherent with data variability: o Is there any correspondence in meaning variability with computed variability?
Sophi.I.A.
08/11/18
35
Two variability schemes Singular What is given Particular General Sophi.I.A.
08/11/18
36
Ergodicity in a picture
= A forest, as a set of trees in their various possible states Sophi.I.A.
A single tree in its temporal evolution 08/11/18
37
Ergodicity • The problem: o Reality is not ergodic (cf. D. North) o Meaning is not ergodic (, Hermeneutic tradition).
• Two approaches: o Naturwissenschaften: • Difference is only approximation, an error, to be reduced in order to find the very reality: o Difference should eliminated, it is the contingence; o Kulturwissenschaften(Weber, Rastier, Rickert) • Difference is essential since it is the very definition of the phenomon in its difference to the convention / norm which is only a descriptive tool: o Difference is the essential. Sophi.I.A.
08/11/18
38
Consequences • If meaning is not ergodic, data cannot be simply analysed to understand culture and humanities; • Need for o An epistemological interpretation: big data as heuristics • Assessing the ergodic hypothesis regarding the studied phenomena. o An hermeneutical interpretation : story telling about big data • Find again singularity behind computed results, by coming back to human and social facts.
Sophi.I.A.
08/11/18
39
Conclusion • big data o Are probably an effective paradigm shift ; o Should be decomposed in several tasks: • Data curation: o representativity, semiotic bias; • Result interpretation: : o Data epistemology: assessing ergodic hypothesis; o Data hermeneutics: recover singularity from results to human experience..
Sophi.I.A.
08/11/18
40