Computatianal and Evolutionary Aspects of Language

Page 1

Computatianal and Evolutionary Aspects of Language

By Melih Sรถzdinler For CMPE 58B Language Tree For Indo European Languages


Introduction 

Review paper including the mathematical descriptions of language on three different levels *Computatianal  

A1:Formal Language Theory A2:Learning Theory

*Evolutionary 

A3:Evolutionary dynamics

Mainly we will address some questions 1

/31


Introduction 

Questions that come in our minds

− − − −

What is language? What is grammer? What is difference between learning language and learning other generative systems? In what sense logical necessity occurs, is it genetically determined?

Through the presentation gradually we will cover these questions. 2

/31


Before get into

3

/31


Before get into 

Brain structure is needed to be understood. Efforts to traslate written texts as “Google Tanslate” and “Yahoo Babel Fish” does and are working. 4

/31


Before get into 

Can it be a machine that takes a sentence from any language and decides that −

− −

“The sentence belongs to Language X1 and I can understand this” No machine could do this Probably “Google Translate” and “Babelfish” memorizing the phrases at all language they have an services in order to translate 5

/31


Before get into

Chomsky Hierarchy

6

/31


Before get into

7

/31


A1: A1: Formal Formal Language Language Theory Theory Language is − The mode of communication − Crucial part of our behaviour − Cultural object to define our social identity  Language has rules, roughly − In English “We went to the school” − In Turkish “We the school went to”  There are always specific rules to generate valid and meaningful lingustic structures*. 

“Lingustic structures” is the scientific study of “natural languages”** Natural Language which arises as the result of the innate facility possessed by the human intellect

8

/31


A1: Formal Language Theory  

L is lanuage {S,A,B,F} are 'non-terminals' containing alphabet consists two symbols(0 and 1) for simpilicity. Grammer generates sentences in the form of m

 

n

01 01 0 Finite language Infinite language 1 9

/31


A1: A1: Formal Formal Language Language Theory Theory Countably infinitely many grammers; any finite list of “rewrite rules”* can be encoded by an integer.  When the language is called “Computable”  Computable languages could be represented with machines called “Turing Machines” 

“Rewrite Rules” a certain string can be rewritten as another stirng

10

/31


A1: A1: Formal Machines Language for Languages Theory 

Regular Languages – Finite State Automata

Context-Free Languages – Push-down Automata 

Context-sensitive Languages – Turing Machine 

11

/31


A1:A1: Formal Chomsky Language Hierarchy Theory

  

Finite State Grammers are subset of Context Frees Context Free Grammers are subset of Context Sensitives Context Sensitive Grammers are subset of Phrase 12 Structure Grammer that is Turing Complete /31


A1: A1:Formal Natural Language Languages(NL) Theory 

  

Are infinite: imagine a list contains all sentences in Turkish. Finite state grammers are inadequate to cover NL Fundemental structures of NL are trees Tree is the derivation of sentences within the rule system of a particula grammer. Trees may result with ambiguity that means more than one tree is asociated with a given sentence.

Natural Language which arises as the result of the innate facility possessed by the human intellect

13

/31


A1: Formal A2: Learning Language Theory Theory 

 

One can define a grammer by deciding which trees should be in its grammer. This is the part of Chomsky hierarchy and “Learning Theory”. Difference between learning and memorization We can somehow understand and produce sentences that we may never heard and used. 14

/31


A2: A1:Paradax Formal of Language Language Theory Acquis.  

“Environmental input” Child construct an internal representation of underlaying grammer. ‘Poverty of Stimulus’: Environmental input does not uniquely specify the grammatical rules(Chomksy1972). ‘The paradox of language acquisition’ is that children of the same speech community reliably grow up to speak the same lang.(Wexler1980) 15

/31


A2: How Children Learn Correct A1: Formal Language Theory Grammer  

 

There are a restricted set of grammers The theory of this restricted set is “universal grammer”(UG). Formally UG is not just a grammer, it is a theory of collection of grammers UG is recently being more acceptable theory. 40 years ago, it was controversial

− −

The idea of innate and genetically determined UG But in math approach of

learning theory UG is logical necessity Universal Grammar is made up of a set of rules that apply to most or all natural human languages

16

/31


A1: Formal A2: Learnability Language Theory 

Speaker – Hearer pair

− − −

− − −

Speaker uses grammer G to construct sentences of language L. Hearer receives sentences and should be able to use grammer G to construct other sentences L. In Math perspective, the hearer has an algorithm takes inputs a list of sentences and generates language as an output. “Text T” contains infinite list of sentences occuring at least once. Text TN is the first N sentences of T Algorithm A: [A(TN) = L if N>M] provides correct lang 17

/31


A1: Box1: FormalGold's Language Theorem Theory We interested in what sets of languages can be learnable  Key result of learning theory Gold's Theorem(Gold1967) − Implies no algorithm can learn the set of languages that contains set of regular languages. − Super-finite language: contains set of all finite languages and at least one infinite languages. − If the learner infers that the target language is an infinite language, and actual target language is an finite language contained in infinite language.  Indeed never converge onto correct language 

All finite languages are regular. For instance L = { a b | n >= 0 } n

m

18

/31


A1: Box1: FormalGold's Language Theorem Theory Classical Learning Theory is formulated by Gold

Assumptions, restrictive  

Assumptions, unrestrictive  

i)The learner has to identify target language exactly ii)The learner receives only positive examples iii)The learner has access to the arbitrary number of examples iv)The learner is not limited by comp complexity

Extensions

Statistical Learning Theory(Vapnik1971,1998)  

 

Language and indicator functions (Is sentence S in L ? [T/F] ) Linguistic examples provided with distribution P both pos and neg examples P also provides a metric for distances Learnability 19

/31


A1: Box1: FormalGold's Language Theorem Theory 

Extensions

Statistical Learning Theory(Vapnik1971,1998)

Vapnik1971 conclude that a set of languages is learnable iff it has finite VC dimension.  VC dimension is combinatorial measure of the complexity of a set of languages.  If the set of languages is completely arbitrary therefore has infinite VC dimension, learning is not possible.  Then in VC framework assumptions i),ii) and iii) are removed. Vailant1975 added complexity issues which is assumption iv) in Gold's model. Consequently there are sets of languages that are learnable in principle but these are in NP time complexity. Angluin1995 tried to modified the problem into query based learning.  Models provide to learn regular languages in polynomial time but not context-free languages. 

Necessity of restriction because of complexity

20

/31


A2: A1: Learning Formal Language Finite Languages Theory 

S1,S2,S3 are three sentences, result with 8 langs

− − 

If a sentence S1 is given to both A and B

− −

 

Learner A knows 8 possible languages Learner B knows 2 languagaes L1={S1,S2} & L2={S3} A can not decide target lang B can have some implications, he knows that S2 is the part of the laguage and S3 is not.

B extrapolate beyond the experience Indeed the ability to search underlying rules requires a restricted search space. 21

/31


A2: The Necessity of innate A1: Formal Language Theory Exepectations 

The human brain is equipped with a learning algorithm AH which enable us to learn certain languages. There exist over 6000 languages in the world. AH can learn each of these but it is impossible to learn every computable languages. The existance of 'Before Data' is equivalent to 'innate' How to discover AH biologically is it our job as CS? - No 22

/31


A2: The Necessity of innate A1: Formal Language Theory Exepectations 

There are contradicted ideas − −

− − −

AH is language specific or general purpose. No matter how it is, mechanism or AH operates on linguistic input and enables child to learn the rules of languages. This mechanism can learn the rules of the restricted set of languages; the theory behind is UG. Greenberg1978 and Comrie1981 dispute the individual linguistic universals but can not be denied. Neural Networks are important tools to model neural mechanisms but no neural network can learn unrestricted set of languages. 23

/31


A1:A3: Formal Language Language Evolution Theory 

Language Evolution

Understanding language evolution requires a theoretical framework explaining how darwinian dynamics lead to fundamental properties of human language such as arbitrary signs, lexicons, syntax and grammar

Basic approach

The basic approach is similar to evolutionary game theory. There is a population of individuals. Each individual uses a particular language. Individuals talk to each other. Successful communication results in a pay-off that contributes to fitness.

24

/31


Language Evolution A1:A3: Formal Language Theory 

Cultural Evolution with constant universal grammer

− −

From biological perspective not only property of an individual, also extended phenotype* of a population. Eq 1 is calculated  

by assuming UG is constant from generation to generation Fitness equation for each Li is calculated using communicative payoffs. Learning matrix Q and average fitness of population implies a measure for “linguistic coherence” See Box2 for details.

Eq 1 actually describes selection of languages for increased communicative function and increased learnability.

*Phenotype:kalıtılmla ilgili olan dış görünüş

25

/31


A1:A3: Formal Language Language Evolution Theory 

In figure 4 you will see a plot for relations between lingistic coherence and # of candidate grammers in UG 

Remember average fitness of population implies a measure for “linguistic coherence” Fitness depends on the cumulative communication pay-off.

*Phenotype:kalıtılmla ilgili olan dış görünüş

26

/31


A1:A3: Formal Language Language Evolution Theory Figure 4

27

/31


A3:Evolution of Universal Grammer 

Evolution of universal grammar −

Evolution of UG requires variaton in UG 

UG is in fact neither a grammer nor universal.

In Eq 2 n grammers for each UG vary from U1 to UM − UI mutates genetically another UJ with prob WIJ − The equation describes mutation and selection among different universal grammers. − In evolution historical evolution of human with successor of UGs led to UG of the currently living organisms. − UG emerged at some point that allowed languages of unlimited expressibility. 28 −

M

/31


Outlook 

Languages changes since transmission between generations is not perfect. −

 

Grammaticalization and Creolization is possible

Many language changes are selectively neutral Some questions arises − − − −

What is the interplay between the biological evolution of UG and the cultural evolution of language? What is the mechanism for adaptation among the various languages generated by a given UG? What are the restrictions imposed by UG? Can we identify genes that are crucial for linguistic or other cognitive functions?What can we say about the evolution of those genes? 29

/31


Outlook ď Ź

All of these questions are needed to be coevaluated by many disciplines including linguistics, cognitive science, psychology, genetics, animal behaviour, evolutionary biology, neurobiology and computer science.

30

/31


Discussion ď Ź

Fortunately we have language to talk to each other.

31

/31


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.