Arabic Nominals in HPSG A Verbal Noun Perspective

Page 1

Arabic Nominals in HPSG: A Verbal Noun Perspective Abstract Semitic languages exhibit rich nonconcatenative morphological operations, which can generate a myriad of derived lexemes. Especially, the feature rich, root-driven morphology in the Arabic language demonstrates the construction of several verbal nouns such as gerunds, active participles, passive participles, locative nouns, etc. To capture this rich morphology by natural linguistic processing, the best choice can be Head-driven Phrase Structure Grammar (HPSG). It combines the best ideas from its predecessors and integrates all linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) of natural language processing. Although HPSG is a successful syntactic theory, it lacks the representation of complex nonconcatenative morphology. In this work, we propose a novel HPSG representation which includes the morphological, syntactical and semantic features for Arabic nominals and various verbal nouns. We also present the lexical type hierarchy and derivational rules for generating these verbal nouns using the HPSG framework. Finally, we have implemented the lexical type hierarchy, Attribute Value Matrix (AVM) and construction rules in the TRALE (An extension of the Attribute Logic Engine) platform to validate the proposed HPSG formalism. Chapter 1 Introduction Head-driven Phrase Structure Grammar (HPSG) is an attractive tool for capturing complex linguistic constructs. It combines the best ideas from its predecessor - Generalized phrase structure grammar (GPSG) [15], Lexical functional grammar (LFG) [6], Government and binding theory (GB) [8]. It is very suitable for natural language processing as it integrates the essential linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) of natural language processing. It is also flexible to modify for specific language. 1.1

Motivation

Semitic languages like Arabic, Amharic and Hebrew, exhibit rich nonconcatenative morphological operations for construction of lexicons. We can have a large coverage of vocabulary in these languages by computational linguistic modeling of their morphology. Among these Semitic languages, we have chosen Arabic for nonconcatenative morpholog-


ical analysis. It is the best instance of nonconcatenative morphology among the living languages. More than two hundred and eighty million people speak in this language as a first language and it is official language of twenty two countries. It ranks fifth by number CHAPTER 1. INTRODUCTION of native speakers. Despite these facts, the morphological analysis of Arabic language is a relatively new area of research. It is also the intellectual and liturgical language of the Islamic World. 1.2

Scope of the Work

The HPSG analysis for nonconcatenative morphology in general and for Semitic languages in particular are relatively new. However, the intricate nature of Arabic morphology motivated several research projects addressing the issues [1, 7, 40]. HPSG representations of Arabic verbs and morphologically complex predicates are discussed in [2–4]. An indepth analysis of declensions in Arabic nouns has been presented in [18]. The diversity and importance of Arabic nominals is broader than that of their counterparts in other languages. Modifiers, such as adjectives and adverbs, are treated as nominals in Arabic. Moreover, Arabic nouns can be derived from verbs or other nouns. Derivation from verbs is one of the primary means of forming Arabic nouns, for which no HPSG analysis has been conducted yet. Arabic nouns can be categorized based on several dimensions like derivation (derived from verb or noun), ending type (sound ending or weak ending), declension (declinable or indeclinable), etc. Based on derivation, Arabic nouns can be divided into two categories as follows: 1. Non-derived nouns: These are not derived from any other noun or verb. 2. Derived nouns: These are derived from other nouns or verbs. ☛ ☛ (h. i.sanun - which means An example of a non-derived, static noun is à❆➆❦ “horse”): it is not derived from any noun or verb and no verb is generated from this word. On the ☛ other hand, ✔❑✏ ☛ ❆➾ (katibun - which means “writer”) is an example of a derived noun. ■✳ This


CHAPTER 1. INTRODUCTION word is generated from the verb ✏❏➺ (kataba ) which means “He wrote” in English. ■✳

This

simple example provides a glimpse of the complexity of the derivational, nonconcatenative morphology for constructing a noun from a verb in Arabic. In this work, we analyze and propose the HPSG constructs required for capturing the syntactic and semantic effects of this rich morphology. An HPSG formalization of Arabic nominal sentences has been presented in [29]. The formalization covers seven types of simple Arabic nominal sentences while taking care of the agreement aspect. In [24], an HPSG analysis of broken plural and gerund has been presented. Main assumption in that work revolves around the Concrete Lexical Representations (CLRs) located between an HPSG type lexicon and phonological realization. But in that work the authors have not addressed other forms of verbal nouns including participles. In this work, we analyze all type of verbal noun generated from strong (or sound) triliteral root verb. We analyze their derivation from verb, their syntactic and semantic information. We do not analyze derivation of any type of verbal noun generated from strong quadriliteral or weak verb. Because, All eight types of verbal nouns are derived from strong triliteral root verb and these derivations follow regular patterns. On the other hand, the pattern of derivation from quadriliteral or weak verb is not so regular. So, analyzing their derivations need more effort. Moreover, most of the time maximum three types of verbal nouns are derived from these type of root verbs. 1.3

Contribution

Our contributions towards the HPSG analysis of Arabic nouns presented in this dissertation are as follows: • We formulate the structure of Attribute Value Matrix (AVM) for Arabic noun and extend the AVM for Arabic verb proposed in [2]. We make this design robust so that


CHAPTER 1. INTRODUCTION it can handle not only lexeme and word construction but also phrase and sentence construction. • We capture the syntactic and semantic effects of Arabic morphology. • We determine the placement of verbal nouns and its subtypes in lexical type hier- archy with proper justification. • Generally, Arabic morphology is root pattern morphology.

Different lexemes

can be generated from same root, using different patterns. We utilize this root pattern morphology to design lexical rules to avoid the requirement of exhaustive lexical entry for four types of verbal noun derived from all strong triliteral root verbs.

As a result,

hundreds

of verbal nouns can be recognized by barely

associating the root verbs with set of lexical rules applicable for that root verbs. Thus, lexical entry in the dictionary is very much optimized. • We implement the designed AVM, type hierarchy and lexical rules in TRALE (An extension of the Attribute Logic Engine) [34] which is a freeware system developed in prolog and integrates phrase structure parsing, semantic-head-driven generation and constraint logic programming with typed feature structures as term. 1.4

Organization of Rest of the Dissertation

Chapter 2 gives a background by explaining the linguistic concepts and necessary tools. It discusses about several linguistic topics ranging from morphology, syntax to semantics. Then it provides a sketch of Arabic grammar, mainly the morphology associated to its word construction.

Next, it gives a brief introduction about HPSG, the mathematical

theory of languages used in our thesis. At the last part of this chapter, a detail discussion is presented on related works done so far. Chapter 3 presents our contribution to the development of a generic structure of the


CHAPTER 1. INTRODUCTION Attribute Value Matrix of Arabic noun. It also describes the type hierarchy of Arabic noun and its subtypes based on derivation dimension. Next, it discusses about the construction rules for four type Arabic verbal nouns derived from strong triliteral root verb. It also designs the lexical entry for other four types of verbal noun which do not follow rigorous regular patterns. Chapter 4 gives a brief description of TRALE lexical compiler. Then, it shows necessary components of TRALE and how we implement our HPSG formalism using TRALE. Finally, Chapter 5 gives the conclusion. In this chapter, we gives the concrete contribution of our work from a technical point of view. We finish this chapter by giving direction for further research on this topics. Chapter 2 Background and Related Works The topics discussed in this chapter serve as a background of the rest of the thesis. In Section 2.1 we explain some theoretical linguistic which is necessary to develop linguistic models. Section 2.2 gives an introduction of morphology and more specifically morphology in Arabic language and its effect on other linguistic layers. Section 2.3 gives an overview of Head-driven Phrase Structure Grammar (HPSG). Finally, in Section 2.4, we present the state of the research works on HPSG modeling with emphasis on the Arabic language. In this chapter, we frequenly use Arabic alphabet. We present the transliteration of Arabic alphabet in the Table 5.1.

2.1

Theoretical Linguistics

Scientific study of human language is called linguistic. Among all branches of linguistic, theoretical linguistic is the most important for developing models of linguistic knowledge. The core subjects of theoretical linguistics are phonology, morphology, syntax and semantics. All parts of theoretical linguistics can be summarized as follows: • Phonology: is the systematic use of sound to encode meaning in any spoken human language, or the field of linguistics studying this use. In other words, it is concerned


CHAPTER 2. BACKGROUND AND RELATED WORKS with the function, behaviour and organization of sounds as linguistic items. • Morphology: is the study of word formation. It is the study of the internal structure of words or in other words it is the study of the patterns of word formation in a particular language, description of such patterns and the behavior and combination of morphemes. • Syntax:

is the study of the principles and rules for constructing phrases or

sentences in natural languages. • Semantics: is the study of meaning. It typically focuses on the relation between signifiers, such as words, phrases, signs and symbols. • Pragmatics: is the study the ways in which context contributes to meaning. It studies how the transmission of meaning depends not only on the linguistic knowledge (e.g. grammar, lexicon etc.) of the speaker and listener, but also on the context of the utterance, knowledge about the status of those involved, the inferred intent of the speaker, and so on. • Discourse: is the study of connected speech. A discourse constitutes sequences of relations to objects, subjects or predicates. Discourse can be observed in multimodal/multimedia forms of communication including the use of spoken, written and signed language in contexts spanning from oral history to instant message conversations to textbooks. Although phonology is a significant part of theoretical linguistics, it is beyond the scope of this thesis. Because, it deals with language sounds and our works begins from the word formation i.e. morphology. For the background purpose, we discuss the concepts related to the morphology, syntax and semantic layer. We have taken the linguistic definitions from [25].


CHAPTER 2. BACKGROUND AND RELATED WORKS 2.1.1

Morphology

Morphology is the study of the internal structure of words or in other words it is the study of the patterns of word formation in a particular language, description of such patterns and the behavior and combination of morphemes. It can be thought of as a system of adjustments in the shapes of words that contribute to adjustments in the way speakers intend their utterances to be interpreted. A word is sometimes placed, in a hierarchy of grammatical constituents, above the morpheme level and below the phrase level. We will discuss more on the concept of constituents in Section 2.1.2. A morpheme is the smallest meaningful unit in the grammar of a language. The word ‘dogs’ consists of two morphemes: ‘dog’, and ‘-s’, a plural marker on nouns. A morpheme can be categorized based upon how it combines with other morphemes to form a word. Here are some kinds of morpheme types: • Bound morpheme: A bound morpheme is a grammatical unit that never occurs by itself, but is always attached to some other morpheme. In above example, ‘-s’ is a bound morpheme. • Free morpheme: A free morpheme is a grammatical unit that can occur by itself. However, other morphemes such as affixes can be attached to it. In above example, ‘dog’ is a bound morpheme. • Affix: An affix is a bound morpheme that is joined before, after, or within a root or stem. In above example, ‘-s’ is an affix. • Root: A root is the portion of a word that carries the principle portion of meaning of the words in which it functions. It is common to a set of derived or inflected forms, if any, when all affixes are removed. A root is a stem also. In above example, ‘dog’ is a root. Another example of root is ‘speak’. It carries the principle portion of meaning of this word. ‘speaker’ is not a root rather it derived from root.


CHAPTER 2. BACKGROUND AND RELATED WORKS • Stem: A stem is the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added. In the above two examples, ‘dog’ is a root and a stem. But, ‘speaker’ is a stem and ‘speak’ is its root. • Clitic: A clitic is a morpheme that has syntactic characteristics of a word, but shows evidence of being phonologically bound to another word. Example of clitic can be ‘within’, ‘into’, etc. Among these morphemes clitic is beyond the scope of this thesis. Root, stem and affix will be discussed after the discussion of morphosyntactic operations. Morphosyntactic operation is an ordered, dynamic relation between one linguistic form and another. There are two kinds of morphosyntactic operations: • Derivation - is the formation of a new word or inflectable stem from another word or stem. It typically occurs by the addition of an affix. The derived word is often of a different word class (or category) from the original. It may thus take the inflectional affixes of the new word class. Example - ‘speaker’ is derived from ‘speak’. ‘Speak’ is root and stem also. ‘Speaker’ is a new stem which is derived from ‘speak’ by derivational operation. Here derivational affix (suffix) ‘er’ is used for this operation. The derived word ‘speaker’ is a stem but not root. This is because, it can be further analyzed into meaningful unit ‘speak’ which is the root of ‘speaker’. Another notable thing in this example is ‘speak’ is a verb where the derived word ‘speaker’ is a noun. Thus the word class of derived word is changed from its root. • Inflection - is variation in the form of a word, typically by means of an affix, that expresses a grammatical contrast which is obligatory for the stems word class in some given grammatical context.

As an example, ‘speakers’ is inflected

from the stem ‘speaker’. This inflection is necessary if ‘speaker’ is used for plural form. Here ‘s’ suffix is used for inflection. The word ‘speakers’ is not a stem. Its category is


CHAPTER 2. BACKGROUND AND RELATED WORKS same as the category of ‘speaker’. Thus, it is different from derivation as syntactic category does not change here. Morphology deals with two kinds of information. • Firstly, what information is encoded by the morpheme. For example, we can take an Arabic word kataba - he wrote. A variety of information is encoded in this word and its other inflected or derived form. Some are listed below: – Agreement: kataba

☛☛ - ■✏ ❏ ➺ – he wrote. Person – 3rd, Number – ☛ Singular,

✳ Gender – Masculine., Mood – Indicative. ☛ ✏ ☛ ☛ – Event structure: kataba - ❏➺ – he wrote. Tense – Past, Aspect – Perfect. ■✳ ☞ – Agency: kutiba☛- ✏ ❏☛➺ – it was written. Voice – Passive. ■✳ ❏ ☞☞ – Illocutionary force: uktub - ■✳ ✏ ➺ ❅ – Write. Mode – Command. ❏ ☛

✳❏ ☛

☞ ✏ – Part-of-Speech: kitabun - á✠ ❑ ❆

➺☛ – a

✏☛

➺ – verb.

book. kataba - ■☛✳ ☞ ✏☛ ☛ – Definiteness: al-kitabu - ❍✳ ❆❏➸☛ ❐❅ – the book Determiner – Definite.

– Complex Predicate: kattaba ■✳ – Causation.

☛ ✎ ✏ ☛ ☛ ❏➺ – he made to write. Semantic relation


There are many more syntactic and semantic phenomena those can be expressed using morphology. • Secondly, morphological process which is a means of changing a stem to adjust its meaning to fit its syntactic and communicational context.

It encodes mor-

phosyntactic operations. As an example, plural formation is a morphosyntactic CHAPTER 2. BACKGROUND AND RELATED WORKS operation, whereas suffixation is a kind of morphological process that English uses to encode plural formation. The morphological process for concatenative and nonconcatenative morphosyntactic operations are shown below: – Concatenative operations are those where morphemes are linearly concatenated. This process is also called Agglutination and the language that use it extensively, is called Agglutinative language. For example: ∗ Prefixation: Morphemes concatenated at the front, e.g., clear – un clear ∗ Suffixation: Morphemes concatenated at the back, e.g., walk – walked ∗ Circumfixation: Morphemes concatenated both at the front and back, e.g., mind – un mindful – Nonconcatenative operations are those where morphemes are nonlinearly embedded. The language that use this process frequently, is called Fusional language. For example: ∗ Infixation: Root letter morphemes embedded at the middle, e.g., kataba — kat taba ∗ Simulfixation: Front morpheme shifted to the back, e.g., e at — ate ∗ Modification: Middle vowel changed, e.g., man — me n ∗ Suppletion: Whole stem changed, e.g., go — went In this thesis, we mainly focus on nonconcatenative operation as well as concatenative operation and give a mathematical formalism to capture their rich diversity of Arabic. 2.1.2

Syntax

Syntax is the study of the principles and rules for constructing phrases or sentences in natural languages. In addition, the term syntax is also used to refer directly to the rules


and principles that govern the sentence structure of any individual language. There are CHAPTER 2. BACKGROUND AND RELATED WORKS a number of theoretical approaches to the discipline of syntax. Some popular approaches among these are • Generative grammar, • Categorical grammar, • Dependency grammar, • Stochastic/probabilistic grammars/network theories, • Functionalist grammars. Modern research in syntax attempts to describe languages in terms of such rules which are often addressed as construction rules. These rules are the base of generative grammar. Our current research is also on forming these rules. So, in the discussion of syntax we put much emphasis on construction. A construction is an ordered arrangement of grammatical units forming a larger unit. Different usages of the term construction include or exclude stems and words. There are several kinds of construction. Some of these are • Apposition - is a construction consisting of two or more adjacent units that have identical referents. Example - My friend John. • Clause - is a grammatical unit that includes, at minimum, a predicate and an ex- plicit or implied subject, and expresses a proposition.

Example - It is cold,

although the sun is shining. This sentence contains two clauses. It is cold - it is the main clause and although the sun is shining - it is the subordinate clause. • Direct speech - is quoted speech that is presented without modification, as it might have been uttered by the original speaker. Example - Patrick Henry said, “Give me liberty or give me death”.


CHAPTER 2. BACKGROUND AND RELATED WORKS • Indirect speech - is reported speech that is presented with grammatical modifications, rather than as it might have been uttered by the original speaker. Example - Patrick Henry said to give him liberty or give him death. • Phrase - is a syntactic structure that consists of more than one word but lacks the subject-predicate organization of a clause. For example, the house at the end of the street is a phrase. It acts like a noun. Unlike clause, phrase lacks the subjectpredicate organization. • Sentence - is a grammatical unit that is composed of one or more words or phrases that generally bear minimal syntactic relation to the words or phrases that precede or follow it.. Example - I am reading a book. This sentence is a composition of three phrases. • Stem - is the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added. It has been discussed in detail in Section 2.1.1. • Word - is a unit which is a constituent at the phrase level and above. It is sometimes identifiable according to such criteria as being the minimal possible unit in a reply. All these constructions can be classified into two categories. These are lexical construction and phrasal or combinatoric construction. Lexical construction deals with the forming of lexicon that is forming of words and stems. As an example, forming of speaker from speak is a lexical construction. On the other hand phrasal construction deals with formation of larger unit than word and stem. So, this type of construction forms phrase, clauses and sentences. Constituent is an import concept in discussion of construction.

A constituent is

one of two or more grammatical units that enter syntactically or morphologically into a construction at any level. For example, the sentence, I eat bananas every day. – contains the following constituents: 1. Immediate constituents: I, eat bananas everyday


CHAPTER 2. BACKGROUND AND RELATED WORKS 2. Ultimate constituents: I, eat, banana, -s, everyday There are several related, cross-cutting and sometimes confusing concepts related to constituents. We explain the concepts at syntactic level. Syntactic constituents can be classified under syntactic category. A syntactic category is a set of words and/or phrases in a language which share a significant number of common characteristics. The classification is based on similar structure and sameness of distribution (the structural relationships between these elements and other items in a larger grammatical structure), and not on meaning. It is also known as syntactic class. Among the major syntactic categories there are phrasal syntactic categories like NP (noun phrase), VP (verb phrase), PP (prepositional phrase) and lexical categories that serve as heads of phrasal syntactic categories like noun, verb and others. For example a prepositional phrase (PP) is a phrase that has a preposition as its head. The definition is similar for noun phrase (NP) and A verb phrase (VP). Constituents can perform syntactic functions in the construction.

A syntactic

function is the grammatical relationship of one constituent to another within a syntactic construction.

There are various kinds of syntactic functions such as subject, predicate,

object, complement, adjunct, modifier and others. Syntactic functions are significant in categorical grammar.

As HPSG is based on

generative paradigm, here syntactic function are not used for syntax modeling. Here we model the syntax by construction rules. 2.1.3

Semantics

Semantics is the study of meaning. It typically focuses on the relation between signifiers, such as words, phrases, signs and symbols. In linguistics, it is the study of interpretation of signs or symbols as used by agents or communities within particular circumstances and contexts. The formal study of semantics intersects with many other fields of inquiry, including lexicology, syntax, pragmatics, etymology and others.

The formal study of


CHAPTER 2. BACKGROUND AND RELATED WORKS semantics is therefore complex. Semantics is very much related with reference. References are used for agreement. There are several types of agreements as mentioned in HPSG 94 [33]. Some of these are 1. Index agreement: It arises when indices are required to be token identical. That is the value of semantic index of a lexicon needs to agree with the same value of semantic index of other lexicon. 2. Syntactic agreement: It arises when strictly syntactic objects (e.g. CASE values) are identified. That is the a lexicon has a syntactic requirement and this requirement can be fulfilled by other lexicon which has certain syntactic object value. 3. Pragmatic agreement: It arises when contextual background assumptions are required to be consistent. Agreement is not syntactic in most of the languages.

To show this, we consider

this sentence - the beef sandwich at table six is getting restless. The referent of subject in this sentence is not “the beef sandwich� rather the customer who ordered it. Like English, agreement in Arabic language is not syntactic; rather it is semantic.

Which

properties of referents are encoded by agreement features is subject to cross-linguistic variation, but common choices include person, number, gender. In some languages, gender distinctions correspond to semantic sortal distinctions such as sex, human/nonhuman, animate/inanimate or shape. Arabic is an example of this type of language. So, here along with person, number and gender, human/nonhuman distinction must be preserved for agreement. We will discuss this with example in Section 3.1.3. 2.2

Arabic Morphology

Arabic is rich in nonconcatenative morphology. This nonconcatenative morphology is mainly root-pattern morphology. In this section, we introduce root-pattern morphology


CHAPTER 2. BACKGROUND AND RELATED WORKS and its effect in Arabic verb and verbal noun. Then, we discuss different types of Arabic verbal nouns. 2.2.1

Root-Pattern Morphology

Arabic verb is an excellent example of nonconcatenative root-pattern based morphology. A combination of root letters are plugged in a variety of morphological patterns with priory fixed letters and particular vowel melody that generates verbs of a particular type which has some syntactic and semantic information [3]. Root of any stem denotes a semantic core and vowel pattern bears the syntactic information. Derivation from common root but different pattern shares common meaning. Similarly, derivation from same pattern but different root shares common syntactic information. A particular combination of rootpattern brings fixed syntactic and semantic meaning. Root and pattern must co-exist and combination of root and pattern specify semantic meaning. These information will be conceivable from the following figures. Figure 2.1 shows how different sets of root letters plugged into the same vowel pattern generate different verbs with same syntactic information. Similarly, Figure 2.2 shows how same set of root letters plugged into different vowel pattern generate two lexemes with completely different syntactic information. But at the same time, these two lexemes share related semantic meaning. Besides vowel pattern, a particular verb type depends on the root class. This root class is determined on basis of the phonological characteristics of the root letters. Root classes can be categorized on basis of the number of root letters, position or existence of vowels among these root letters and the existence of a gemination (tashdeed).

Most Arabic

verbs are generated from triliteral and quadriliteral roots. In Modern Standard Arabic five character root letters are obsolete. Phonological and morphophonemic rules can be applied to various kinds of sound and irregular roots. Among these root classes, sound root class is the simplest and it is easy to categorize its morphological information. A


CHAPTER 2. BACKGROUND AND RELATED WORKS

Root (k,t,b) (n,s,r)

k t b

(He wrote)

Root

kataba nasara helped)

(He

stem stem Pattern (_a_a_a)

Figure 2.1: Root-pattern morphology1: 3rd person singular masculine sound perfect active form-I verb formation from same pattern ( a a a) sound root consists of three consonants all of which are different [37]. On the other hand, non-sound root classes are categorized in several subtypes depending on the position of weak letters (i.e., vowels) and gemination or hamza ( ✆➠). All these subtypes carry morphological information. 2.2.2

Morphology in Arabic Verb and Verbal Noun

From any particular sequence of root letters (i.e., triliteral or quadriliteral or weak or sound), up to fifteen different verb stems may be derived, each with its own template or vowel pattern. These stems have different semantic information. Western scholars usually refer to these forms as Form I, II, . . . , XV. Form XI to Form XV are rare in Classical Arabic and are even more rare in Modern Standard Arabic. These forms are discussed in detail in [37]. Table 2.1 shows the semantic effect and example of the mostly used verb


CHAPTER 2. BACKGROUND AND RELATED WORKS

Root (k,t,b)

kataba (He wrote)

Pattern (_a_a_a)

ste m

kaa ti bun(Writer)

Pattern (_aa_i_un)

Figure 2.2: Root-pattern morphology2: same root (k,t,b) contains same kind of semantic meaning forms [i.e. Form I to X]. Every particular sequence of root letters may not have a meaning word for a particular verb form. As an example, the root sequence - k, t, b, does not have a meaning word for Form IX. These morphological verb forms has no relation with the verb form based on events structure. There are three type of verb form based on event structure - perfect, imperfect and imperative. Perfect indicates that the event has been completed, imperfect indicates that the event has not yet been completed, and imperative indicates that the event is a command. It is worth mentioning that Form I has eight subtypes depending on the vowel following the middle letter in perfect and imperfect forms. Some types of verbal noun formation depend on these subtypes. Any combination of root letters for Form I verb will follow any one of these eight patterns. We refer these patterns as Form IA, IB, IC, . . ., IH. These subtypes are shown in Table 2.2 with corresponding examples. For example, the vowels on the middle letter for Form IA: nasara yansuru are a and u for perfect and imperfect forms, respectively. Similarly, other forms depend on the combination of vowels on these two positions. Not all kinds of combinations exist. In Form IH, the middle letter is a long vowel and there is no short vowel on this letter. In summary, we can generate different types of verbal nouns based on these verb forms, root types (position of weak


CHAPTER 2. BACKGROUND AND RELATED WORKS Table 2.1: Arabic Verb Form

Form

Example

☛✳✏❏☛ ■ ➺ (kataba )

Form I (Transitive)

✎✏☛ ☛ ☛ ■✳ ❏➺ (kattaba ) ☛ ✏❑ ☛ ☛ ■✳ ❆➾ ☛ ❏➺ ☛ (kataba ) ☛ ✎✏☛ ☛

Form II (Causative) Form III (Ditransitive) Form IV (Factitive) Form V (Reflexive)

■✳

Form VI (Reciprocity) Form VII (Submissive) Form VIII (Reciprocity) Form IX (Color or bodily defect) Form X (Control)

❅ (aktaba )

Meaning He wrote He caused to write He corresponded He dictated It was written on its own

❑ ✏❑ ☛ ☛ ☛

They wrote to each other ✏ ■✳ ❏➸ ✏ ❑ ✏❏ ☛) ✠ He was subscribed ☛(takattaba

☛☛ ☛ ■ ✏☛ ☛ ✳ ✏ ❆➽ (takataba ) ☛☛ ✎☛ ■✳ ➸❑ ☛❅ (inkataba ) ■☛

☛✳ ✏☛ ❏■ ➸ ✳ ✜❏➺ (istaktaba ) ☛❅ ☛❅ (iktataba ) ◗Ô❣ ☛❅ (ih. marra )

They wrote to each other It turned to red He asked to write


CHAPTER 2. BACKGROUND AND RELATED WORKS letter or gemination) and number of root letters. Table 2.2: Subtype of Form I Root Verb

Form

Example

Perfect

Imperfect

mid-vowel

mid-vowel

◗å➈❏ ◗å➈✢ (na sara yan suru ) ☞ ✠ ☛. ☛.

☞ ☛ ☛✠ ❍✳ ◗ å➈ ☛✠ ❍✳ (d.☛ araba yad. ribu ☛◗å➉☛ ☞ ✠ Form−IB ☛

Form−IA

☛✠ ✐❏➤ ✐❏➥ (fatah ☞✏ ☛. a☛yaftah. u )

Form−IC

Form−ID Form−IE Form−IF ☞

Ð◗

✏☛ ☛✠☛ ☛ ❹yasma,u ➞Ò❶ ➞ (sami,a ☞☛

)

☞☛ ☛ ☛ Ö ☛Þ

☞ ☛☞ ☛

➸❑✡ Ð◗

➺ (karuma yakrumu )

a

)

a

i

a

a

i

a

u

u

i

i

☞ ✠ ✠ ☛ ☛ ☞✠ u ■✳ ❶☛☛ ♠✚ ■☛✳ ❶☛ ❦ (h. asiba yah. sibu ) ✠ ☛ ☞ ☛❳ Form−IH ❳ ☛❆ ❑✡ ➱➆☛ ➤❑ ➱➆➥ (fad ula yafd ☛ ✡ . ➽ ❆➾ (kada yakadu ) . ilu )

Form−IG

u

i

All these verb stems, derived from a single root verb, have different verbal nouns. Table 2.3 shows the list of active participle and passive participle for all verb stems ☛☛ ✏ ■ ❏ ➺ (kataba ). All type of verbal noun may not exist for including the root verb a ☛ ✳ particular form. In Table 2.3 passive participle does not exist for Form−IX.

2.2.3

Classification of Arabic Verbal Nouns

In this part, we discuss the eight types of nouns derived from verbs [22]:


CHAPTER 2. BACKGROUND AND RELATED WORKS

Table 2.3: Verbal Nouns Derived from Different Forms

Form Form−I Form−II Form−III Form−IV Form−V Form−VI Form−VII Form−VIII Form−IX Form−X

■✳ ❏➺ (kataba )

Verb Stem

■✳ ❏➺ ✏☛ ☛ ☛ (kattaba )

■✳ ❑☛ ❆➾ (katibun ) Active Participle

☛ ■ ✏ ✳ ❏➸Ó ✔ ✔

(mukattibun )

☛ ■ ✳ ❑❆➾ ☛ ✎ ✏☛ (kataba ☛) ☛

■✳ ❑☛ ❆➽Ó (muk¯atibun ✏ ☛ ☞) ✔

■✳ ❏➺ ❅ ✏☛) ☛ ☛ (aktaba

■✳ ❏☛➸Ó ✏ ☞ ) ✔(muktibun

☛✏

✏☛ ☛ ✔ ■✎ ❏➸❏Ó ☛■✳✎❏➸❑ ✳ ) ☛ ✏(takattaba (mutakattibun ) ☛ ✏☛ ☛ ✔ ✏ ☛ ✏☛ ■✳☛ ❑❆➽❑ ■✳ ❑☛ ❆➽❏Ó ☞ ✏ ☛ ☛ (takataba ) ✠ ✔(mutakatibun ✏☛ ✠ ☞ ☛✏ ) ■✳ ❏➸❑ ☛ ☛❅ ☛ ✏☛ ✏ (inkataba )

☛ ☞ ■✳✏❏ ✏ ✔ ☛➸❏Ó

✳ ☛ ✏☛ ) ☛❅ ✏(iktataba ☛

■✳

☛ ✏☛ ✎ ■ ✜❏➺

■✳ ❏➺ ☛❅ (iktabba ) ■✳ ❏➸❏❷ ☛❅ (istaktaba )

(munkatibun )

✔✎ ✏☛ ☞ ✏❏➸Ó(muktati ✏☛ ☞ ✔ ✜☛ bun ) ■✳ ❏➸Ó (muktabbun ) ■✳ ❏☛➸❏❶Ó (mustaktibun )

❍✳ ñ❏➸Ó (maktuwbun ) Passive Participle ✔ ☞ ❏➸Ó ☛ ✔■✏✳ (mukattabun )

✎✏☛ ☛ ☞

■✳ ❑❆➽Ó (muk¯atabun ) ✏☛ ☛ ☞ ✔ ■

❏➸Ó

✳☛ ☞ ✔✏ (muktabun ) ✔ ✎✏☛ ☛ ■✳ ❏➸❏Ó

☛☞ ✏(mutakattabun ) ✔ ✏☛ ☛ ✏☛ ☞■✳ ❑❆➽❏Ó ✏☛ ☛ ✠) ☞ ✔(mutakatabun ☛ ✏☛ ☞ ✔■✏ ✳ ❏➸❏Ó (munkatabun ) N/A

■✳

✏☛ ✏☛ ☞ ✔ ✜❏➸Ó(muktatabu n)

■✳ ❏➸❏❶Ó (mustaktabun )


CHAPTER 2. BACKGROUND AND RELATED WORKS 1. Gerund ( P❨ ☛ ➆Ó

Õæ❹ ☛❅ - ism ma.sdar )- names the action denoted by its corresponding

verb. ☛ 2. Active participle ( ➱➠✠ ☛ ❆➤❐❅ Õæ❹ ☛❅ - ism alf¯a,il )- entity that enacts the base meaning i.e. the general actor. ☛ ✏ 3. Hyperbolic participle ( é➟❐❆❏✳ ÜÏ❅ Õæ❹ ☛❅ - ism almubalag˙ ah )- entity that ☛ enacts the base meaning exaggeratedly. So it modifies the actor with the meaning that actor does it excessively. 4. Passive participle ( ➮ñ➟

✠ ☛ Ü Ï❅ Õæ❹ ☛❅ - ism almaf,uwl )- entity upon which

the base meaning is enacted. Corresponds to the object of the verb. ☛ ✏☞ ☛✠ ☛ ✎☛ ✑ 5. Resembling✏ participle ( é☛ î❉☞ ✳ ❶ ÜÏ❅ é ➤ ➆☛ ❐❅ - al.sifatu’lmuˇsabbahah )entity enacting (or upon which is enacted) the base meaning intrinsically or inherently. Modifies the actor with the meaning that the actor does the action inherently. ✏☛ ☛ Õæ❹ ☛❅ - ism alalah )- entity used to enact the base 6. Utilitarian noun ( é❐❇❅ meaning i.e. instrument used to conduct the action. ✠ ☛✠ Õæ❹ ❅ - ism al.zarf )- time or place at which the base 7. Locative noun ( ➡◗➣❐❅ ☛ meaning is enacted. 8. Comparative and superlative ( ➱ ➆✠ ➤✠ - ism altafdil ❏ )- entity . that enacts (or ☛ ✏☛ ❐❅ Õæ❹ ☛❅ upon whom is enacted) the base meaning the most. In Arabic, this type of word is categorized as a noun, but it is similar to an English adjective. Examples of these eight types of verbal nouns are presented in Table 2.4. Each of these types can be subcategorized on the basis of types of verbs. To understand complete variation of verb and its morphology we should have some preliminary knowledge of the Arabic verb [20].


CHAPTER 2. BACKGROUND AND RELATED WORKS

Table 2.4: Different Types of Verbal Nouns Root verb

Verbal noun Gerund

Example

Õ❰➟

Meaning “Knowing”

☛ ☛ ❐

Active participle

,alima (alima)

Hyperbolic participle

means “he knew”

Passive participle

✔Õ ❅

☛ ✔ ❐ ✏ ☛ ✎ ✔ éÓ❈ ☞

“One who knows”

“One who knows a lot” “That which is known”

(ma,luwmun ) Resembling participle

✔Õ

Ðñ✃

æ✡☛

intrinsically”

✔ ✃

“Through which

we know”

✔Õ

“Where/when we know”

➟Ó Utilitarian noun

Locative noun

“One who knows

❰ Comparative and Superlative

Õ❰ ➟Ó ☞ ☛

☛ ☛ ✌

Õ❰ ➠ ❅

“One who knows the most”


CHAPTER 2. BACKGROUND AND RELATED WORKS 2.3

An HPSG Primer

HPSG is highly lexicalized, non-derivational constraint-based, surface oriented grammatical architecture developed by Carl Pollard and Ivan Sag [32, 33]. It combines the best idea from its predecessors - Generalized phrase structure grammar (GPSG) [15], Lexical functional grammar (LFG) [6], Government and binding theory (GB) [8]. It combines linguistic layers (Phonology, Morphology, Syntax, Semantics, Context etc.) and for this reason, it is very attractive in Natural Language Processing. Its highly lexicalized property gives the flexibility to modify the lexicon depending on language to capture different features. A lexical entry, represented in AVM (Attribute Value Matrix), may describe the sign partially. Each lexical entry must have a type, and its subtypes are part of a big structure that forms the type hierarchy. Thus, HPSG is seen consisting of inheritance hierarchy of sorts with constraints of various kinds on the sort of linguistic object in the hierarchy [16]. There is no distinction between terminal and non-terminal nodes in HPSG. This is related to the fact that HPSG is a “fractal� [A fractal is a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole], every sign down to the word level has syntactic, semantic and phonological features encoded in a similar manner [31]. Thus we can work on a specific level or surface of this hierarchy and use unification to reuse and extend the structure. HPSG includes grammar rules and lexical entities. Normally, the latter are not considered to belong to a grammar. The formalism is centered around lexicons. This means that the lexicon is more than just a list of entries; it is in itself richly structured. In HPSG terminology, the basic grammatical type is the sign, which is a formal representation of words, phrases and sentences. All human utterances are captured by signs. A rule that licenses a sign, is captured by another object called construct. Signs and constructs are formalized as typed feature structure which is a set of attribute-value pairs. Attributes are called linguistic objects. The value of an attribute may be either atomic or


CHAPTER 2. BACKGROUND AND RELATED WORKS complex i.e. function. Functions are those feature structures which are described using an attribute value matrix (AVM). The generic construct of a sign is presented in Figure 2.3. The AVM basically maps features to feature structure. A feature in an AVM can be of two types: (a) category name, i.e., sort description and (b) agreement (or constraints), which is a list of attributes and their values. Feature Value ⎡ PHON phonobj ⎤ Phonology MORPH ⎢ ⎢ morphobj⎢ Morphology ⎥ ⎢ SYN synob ⎥ Syntax ⎢ ⎥ j ⎢ SEM ⎥ Semantics semob ⎥ ⎣⎢ M ⎦ j M An HPSG Sign

Figure 2.3: An HPSG Sign. A construct is represented using a feature structure with MOTHER (MTR) feature and DAUGHTERS (DTRS) feature. The value of MTR feature is a sign and the value of DTRS is a nonempty list of signs. A typical description of a construct is shown in Figure 2.4. The licensing of signs follows the Sign Principle which states that “Every sign must be lexically or constructionally licensed. A sign is lexically licensed only if it satisfies some lexical entry, and constructionally licensed only if it is the mother of some construct ” [39]. HPSG modeling of any language starts from building a very detailed type hierarchy which is both linguistically motivated as well as captures the language independent constraints. From this type hierarchy, the attribute value matrix for linguistic signs can be constructed. In this thesis, we use the Sign-Based Construction Grammar (SBCG) [38] version of HPSG. Unlike standard presentations of HPSG, where the type constraints form part of the signature of a grammar, the type constraints of SBCG are an essential part of


CHAPTER 2. BACKGROUND AND RELATED WORKS

Feature Value Mother sign ⎤ list (sign)⎥ ⎦ List of Daughters An HPSG Construction

⎡ MTR ⎢ ⎣ DTRS

Figure 2.4: An HPSG Construction. the body of the grammar. A standard SBCG type hierarchy is shown in Figure 2.5. From the type hierarchy, we know that every linguistic object can be modeled using feature-structure. There are two types of feature structures. Atoms are simple feature structures, which indicate the terminal value of various linguistic attributes. Functions are complex feature structure, which are expressed using attribute value matrix and can contain other feature structures as their feature values. Sign and cxt(construct) both are feature-structure. The attribute of signs are also feature-structure; phon-obj, syn-obj, sem-obj, etc.

Frames are semantic representation of events. There are two types

of constructions; phr-cxt (phrasal) and lex-cxt (lexical). There are also two types of signs; lex-sign and expression. For the detail of this type hierarchy, see [38]. In HPSG, the semantic information is expressed in Minimal Recursion Semantics (MRS), as developed in CSLI’s Linguistic Grammars Online (LinGO) project [10, 11]. Most semantic information in MRS is contained under the feature FRAMES. In this list, for verb there is a frame event-fr which contains a Davidsonian event variable and index-valued features such as act(or) and und(ergoer) [12, 13]. These variables are used for contain information which is used for agreement purpose also. In Section 2.1.3, we discuss about these semantic agreements.


CHAPTER 2. BACKGROUND AND RELATED WORKS

featurestructure function cat

atom pos

phon-obj

sign synobj

cx t

lex-sign

semobj

phr-cxt lexcxt

fram e

nou n

verb

event-fr

expression word phrase … lexeme si-lxm

… sclxm

inflcxt

deriv-cxt … fr

to-besplit-fr

actfr

act-undfr

und-only-

trans-lxm srlxm

undfr

soa-fr act-soa-fr

act-undsoa-fr writefr

cause-fr

try-fr


Figure 2.5: A Standard SBCG type hierarchy 2.4

Related Works

This section is dedicated for discussion of linguistic modeling of morphology related works. At the beginning of this section, we give an overview of overall works related to computational modeling of Morphology. Then we put emphasis on HPSG modeling of morphology. As Semitic languages like Arabic, Amharic and Hebrew are rich in morphology, we give a glimpse on HPSG modeling of Hebrew as there are mentionable amount of works done in this area. At the end of this section, we discuss HPSG modeling of Arabic language and its morphology.

2.4.1

HPSG Modeling of Morphology

HPSG is one of the most successful grammars to process natural languages specially to process syntactic and semantic aspects but it has inadequate coverage on morphological CHAPTER 2. BACKGROUND AND RELATED WORKS construction specially for nonconcatenative morphology. Nonconcatenative morphology is not so plentiful in the mostly used languages. But this phenomenon is abundant in Semitic languages such as Arabic, Amharic, Hebrew, etc. Among these Semitic languages, Arabic is the mostly used and very rich in nonconcatenative morphology. Its precious morphology attracts several series of research projects [1, 7, 40]. These research projects are mainly based on development of toolkit for Arabic morphological analysis. These projects are not based on compiler development rather these are dedicated for morphological analyzer which designs and implements finite state morphological models. From linguistic perspective, these models describe rules of lexicon development and derive lexicons. Morphology of Sierra Miwok and French were modeled in HPSG by phonological realization [5]. The author also showed how nonconcatenative morphology can be captured by his framework. He further mentioned the idea how consonant and vowel melody forms the word in Arabic. But he did not show any construction rule for any language. Susanne modeled concatenative morphology in German and English by HPSG formal-


ism in 1998 [35, 36]. In that paper, she captured the morphological derivation by a special feature called MORPH-B which means morphological base. This MORPH-B feature serves the purpose of derivation. This MORPH-B feature can be used to capture nonconcatenative morphology also. The alternative of this mechanism is lexical construction rule [38]. This is also widely used in HPSG modeling. An HPSG formalism of morphological complex predicate is outlined [9]. Here the author mostly focused on syntax and semantics of causative construction. He used lexical rule with semantic frames to capture morphological effect. As Japanese is an Agglutinative language, the morphology used here is concatenative morphology. Thus HPSG modeling of nonconcatenative morphology is still untouched. As mentioned earlier, HPSG modeling of nonconcatenative morphology is relatively new area of research. There are few mentionable works in nonconcatenative morphology of Semitic languages. We discuss about this in detail in the Sections 2.4.2 and Section


CHAPTER 2. BACKGROUND AND RELATED WORKS

2.4.2

HPSG Modeling of Hebrew

Semitic languages exhibit rich morphological operations. Both concatenative and nonconcatenative morphology are abundant in these languages. Among these languages, HPSG modeling of Hebrew is not new but it lacks its coverage on morphology. In 2000, Nathan Vaillette presented a paper on Hebrew relative clauses [41]. In this paper, he nicely modeled the phrasal construction rules to capture Hebrew relative clauses. He did not put emphasis on morphological operation. Susanne extended her work on German and English concatenative morphology in 2001 and along with German and English, she added the nonconcatenative morphology of Hebrew verbal nouns [36]. She proposed an AVM for Hebrew verbal noun. This AVM has similarity with the AVM we proposed for verbal noun regarding the morphological feature. But she did not show any syntactic effect of this morphology. She articulated the AVM by placeholders for consonants. By placing the list root consonants, from this AVM, verbal noun AVM will be generated. She did not ensure that only valid verbal nouns will be generated from this AVM. Her solution can be used to automate lexical entry in dictionary or corpus but will not reduce the number of entry. Actually, she just gave a glimpse on morphology of Hebrew verbal noun in her massive work. A detail work on verb initial construction (which is also called verbal sentence as opposed to nominal sentence and in this type of sentence verb precedes the subject) was shown [26]. In that work, the authoress put emphasis on Modern Hebrew verb related phrasal construction. She discussed the agreement of verb with its subject and complement. She also showed concatenative and nonconcatenative morphology of Hebrew verb in that paper but did not give any formalism of this morphology like what were modeled in German or Japanese [9, 36]. She mainly discussed the syntactic effect of these inflected verb forms. She also presented an implementation framework of HPSG grammar.


CHAPTER 2. BACKGROUND AND RELATED WORKS In 2007, Nurit presented a comparision of the implementation platform of HPSG [27]. She discussed the advantages and disadvantages of TRALE (An extension of the Attribute Logic Engine) and Linguistic Knowledge Building (LKB). This paper is very useful to choose the implementation platform of HPSG.

2.4.3

HPSG Modeling of Arabic

In 2006, an HPSG analysis of broken plural and gerund has been presented [24]. Main assumption in that work revolves around the Concrete Lexical Representations (CLRs) located between an HPSG type lexicon and phonological realization. Here, HPSG sign was represented using CLR function not by AVM and this function put more emphasis on phonology instead of morpho-syntactic operation. But main drawback of this work is it does not deal with other type of verbal noun and it does not dictate any implementation of CLR. HPSG modeling of Arabic triliteral strong verb was proposed in 2008 [2–4].

The

authors in these papers, show regular morphology of Arabic verb. They designed the SBCG AVM of Arabic verb. They also designed several verb lexeme construction and morphologically complex predicates (MCP). But they did not touch the morphological derivation of verbal noun. Also, they did not give any distinct way to implement the construct proposed in their works. During our work on verbal noun construct, we have to work with SBCG verb lexeme too. We adopt the verb lexeme proposed in these papers and modify it to cope with all the cases that we have found. The authors did not propose any idea about SIT-INDEX and INDEX and they actually duplicated the INDEX feature with ref-fr semantic frame which is never used in any HPSG or SBCG literatures. The atomic features (person, number and gender), that are used under INDEX function feature by Pollard and Sag [33], are used under ref-fr in these papers where at the same time they still keep INDEX feature and does not show its components. We correct this INDEX and SIT-INDEX related problem. This will be discussed in Section 3.2.


CHAPTER 2. BACKGROUND AND RELATED WORKS A nice HPSG formalism of Arabic nominal sentence is presented [29]. The paper introduces a grammar for Arabic nominal sentence. They have implemented their formalization using LKB system. The main limitation of this work is it deals with only agreement of nominal sentences and it does not discuss on morphology at all. Another big limitation in this work is the assumption - agreement information in Arabic arises from syntactic rules and that it obeys grammar rules. But in Section 2.1.3 and 3.1, we have established that agreement in Arabic is not always syntactic and the agreement feature needs another feature humanness (HUM) which is not mentioned in the discussed work. A parser on Arabic relative clause is designed in [17]. It is not a deep research and a study about different forms of relative clauses to process relative sentences.

Thus,

we can conclude that the rich nonconcatenative morphology of Arabic verbal noun is not yet explored and we have the opportunity to do it. In 2010, part of this work was published [19]. In that paper, we proposed the construction rules but did not articulate any implementation. Chapter 3 HPSG Formalism for Verbal Noun

In this chapter, we model the HPSG categories of verbal nouns and their derivation from different types of verbs through HPSG formalism. In Section we adopt the SBCG

2.3, we mention that

[38] for this analysis. Here, we give an AVM for nouns and

extend it for verbal nouns. We extend the verb AVM proposed by Bhuyan et al. [2–4]. We propose a multiple inheritance hierarchical model for Arabic verbal nouns and how to get a sort description from the type hierarchy. Finally, we propose construction rules of verbal nouns derived from strong triliteral i.e. Form I root verbs.

3.1

AVM of Arabic Nouns

We modify the SBCG feature geometry for English and adopt it for Arabic. The SBCG AVMs for nouns in English and in Arabic are shown in Figure 3.1 and Figure 3.2, respec-


tively. The PHON feature is out of the scope of this paper. Three main function features MORPH, SYN and SEM are discussed in the following subsections. CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN  nounlex    phon    form []     arg-st    

  []         list(sign)           noun                             case . .     .    cat     select .    ..        syn     xarg..  .             lid . . .           val list(sign)                  mrkg mrk   sem   index i      frames  list(f rame)


Figure 3.1: AVM for English noun 3.1.1

MORPH

The MORPH feature captures the morphological information of signs and replaces the FORM feature of English AVMs. This feature is similar to MORPH feature used for Hebrew verbal noun [36]. The value of the feature FORM is a sequence of morphological objects (formatives); these are the elements that will be phonologically realized within the sign’s PHON value [38]. On the other hand, MORPH is a function feature. It not only contains these phonologically realized elements but also contains their origins. MORPH contains three features - ROOT, STEM and DEC. ROOT feature contains root letters for the following cases: 1. The root is characterized as a part of a lexeme, and is common to a set of derived or inflected forms 2. The root cannot be further analyzed into meaningful units when all affixes are removed

3. The root carries the principal portion of meaning of the lexeme


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

 nounlex    phon []   

     

   root  list(letter)            morph   stem list(letter)          dec . . .         arg-st list(sign)               nou            n                 case . . .             cat def . . .                 select  . .. syn         . . .  xarg                  lid  ...            val list(sign)          mrkg mrk            person        . .     .    number .     ..        index         sem gender . . .   

   

 

 hum  

...

    frames  list(f rame)

Figure 3.2: AVM for Arabic noun


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN In rest of the cases, the content of this feature is empty. The STEM feature contains a list of letters, which comprises the word or phrase or lexeme. We can identify the pattern in a lexeme by substituting the root letters by the placeholders if any root exists in STEM. As an example, the ROOT of the lexeme ‘kataba’ contains ‘k’, ‘t’ and ‘b’ and the pattern of the STEM is ( a a a). Without the existence of this pattern, the ROOT is irrelevant. Thus a pattern bears the syntactic information and a ROOT bears the semantic information. Lexemes which share a common pattern must also share some common syntactic information. Similarly, lexemes which share a common root must also share some common semantic information. STEM is derived from the root letters by morphology if root exists. The DEC (declension type) feature under the MORPH feature maps to the declension type of noun. It determines how the end vowel of noun lexemes changes to reflect its case. The change of end vowel changes the form of a lexicon. There exists nine possible ways in which grammatical cases can be represented on an Arabic noun. So, for declinable noun, value of DEC feature can be T 1, T 2, T 3, . . . , T 9, corresponding to the nine declension types. The value of this DEC feature can be determined from type hierarchy of noun lexeme. It needs further research and it is beyond the scope of this thesis. In our current research, we will not mention this feature in the following AVM’s but we keep it in our basic design to make our design robust for inflection also.

3.1.2

SYNTAX

The SYN feature contains CAT, VAL and MRKG features. We modify the CAT feature of SBCG to adopt it for Arabic language. Note that, for all kinds of verbal nouns the sort description of the CAT feature is noun. In Arabic there are only three parts of speech (POS) for lexemes or words: noun (in Arabic pronoun is also considered as noun), verb and particle. Any verbal noun serving as a modifier is also treated as noun. In the case of the Arabic noun, the CAT feature consists of CASE, DEF, SELECT, XARG and LID


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN features. Among these features, we introduce DEF feature which is used for syntactic agreement in phrasal construction. This feature also strengthen our design. As Arabic has three cases for noun, the value of CASE will be nominative, accusative and genitive. The DEF feature denotes the value of definiteness of an Arabic noun. There are eight ways by which a noun word or lexeme becomes definite [21]. Personal pronouns such as ✗ ✌☛ ✎ ☞ “he”, “I” and “you” are inherently definite. Proper nouns are also definite. é❁❐❅ (-allahu ) is another instance of definite lexeme. These examples confirm that definiteness has to be specified at the lexeme level. The article ‘al’ also expresses the definite state of a noun of any gender and number. Thus if the state of a noun is definite, the noun lexeme contains yes as the value of DEF, otherwise its value will be no. In Arabic, there is a significant role of this definiteness (DEF) feature for syntactic agreement. A nouns and its modifier must agree on the DEF feature value. For example, ❍☞ ❍❆✏☛➸❐☛ ◗☞Ô ☛ ❣ ❅ ❆✏☛ ➸❐☛ ❅ (alkitabu ) means ❇ ✳ ❅ (alkitabu ’l--ah. maru ) means “the red book”. “the ❏☛ ☞ ✳ ❏☛ ☞❣☛ book” and ◗Ô ❅ (-ah. maru ) means “red”. As “red” is used as a modifier for “the book”, ☞☛ ☛ ◗☛ ☞ Ô❣ ❇❅ (al-❣ ❅ (-ah. maru ) yielding the definiteness prefix ‘al’ has been added to ◗Ô ah. maru ).

3.1.3

SEMANTICS

Like SBCG in English, SEM feature in Arabic contains two function features - INDEX and FRAMES. The INDEX is used for index based semantic agreement which is mentioned in Section 2.1.3 and FRAMES contains the list of frames which contain semantic information in Minimal Recursion Semantics (MRS).


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN As mentioned earlier in Section 2.1.3, person, number, gender and human/nonhuman - these information must be kept for semantic agreement. So, INDEX feature is composed of PERSON, NUMBER, GENDER and HUM and it is contained under SEM. We use this index based agreement [33] as opposed to putting the agreements under AGR feature [23]. This is because index based agreement is more customary in HPSG and most of the scholars use index based agreement. HUM feature is introduced by us for Arabic. The other three features are also used for semantic agreement in English [33]. This HUM feature denotes humanness. Depending on languages, agreement may have gender, human/non-human, animate/inanimate or shape features [33]. In Arabic, Humanness is a crucial grammatical factor for predicting certain kinds of plural formation and for the purpose of agreement with other components of a phrase or clause within a sentence. The grammatical criterion of humanness only applies ☞ ☛ ☛ ✕☛ ☛❩ to nouns in the plural form. As an example, “these boys are intelligent” ( ❳❇ð❇❅ ❇ñ ë ❩ ✕ ☛ ✔✏✎☛ ☞ ☛☛✠ ✠ ( ❆❏✡➺☛ ❳ ❅ - ha-ul¯a- alawl¯adu -adkiy¯a- ) and “these birds are intelligent” ☛ ✠ é❏✡➺☛ ❳ P☞ ñ❏✡➣❐❅ è☛ ❨ ë - hadihi ¯ ¯ ’ltuywru dakiyyatun ). Both of these sentences are plural. But the former refers to human . ¯ beings whereas latter refers to non-humans. So the same word “intelligent” (dakiyyun ) ¯✕☛ ✔✏✎☛ ☛✠ has taken two different plural forms in two sentences: ❩ - ) and é❏✡➺☛ ❳ ✠ (dakiyyatun ❆❏✡➺☛ ❳ ❅ (-adkiy ¯ ¯ ✕☛ ☛✌ ). In the case of boys, it is in the third person masculine plural form ( ❩ ❆❏✡ ➺☛ ❳ ✠ ❅ - -adkiy - ) ¯ ✔✏✎☛ ☛✠ whereas in case of birds, it is in the third person feminine singular form ( é❏✡➺☛ ❳ dakiyyatun ¯ ✔✏ ✎☛ ☛✠ ). Also, from the third person feminine singular form ( é ❏✡ ➺☛ ❳ - dakiyyatun ), we cannot ¯ readily say that it refers to feminine. In fact, it may refer not plural of nonhuman beings too. This is why, along with PERSON, NUMBER and GENDER, we keep HUM as a semantic agreement feature.


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN If the noun refers to a human being then the value of HUM is yes, otherwise it is no. The value of PERSON for Arabic noun can be 1st, 2nd or 3rd. There are three number values in Arabic. So, the value of NUMBER can be sg, dual or pl denoting singular, dual or plural, respectively. The GENDER feature contains either masc or f em denoting masculine and feminine respectively.

3.2

AVM of Arabic Verbs

As we will formulate construction rules which capture the linguistic derivation of noun from verb, we need to model the AVM of verb. We modify the verb AVM proposed by Bhuyan et al. [2]. We correct the index related problem found in that work. We disscuss the problem in detail in Section 2.4.3. We try to align the design of verb AVM with that of noun AVM. Figure 3.3 shows the SBCG AVM of Arabic verb.   verblex       phon []           root list(letter)          morph   stem    list(letter)           vdec list(letter)         arg-st list(sign)                              verb                     vform . .   .            voice     .. .      cat           mood  .. .      syn       . . .     select                   xarg. .  .              lid ...                  val list(sign)             mrkg mrk          sit   index situation  sem     ...   frames list(f rame) Figure 3.3: AVM for Arabic verb




CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN MORPH feature in the verb AVM is similar to MORPH in noun AVM except the VDEC feature. It captures the declension type of verb and it replaces the DEC feature which captures the declension type of noun. Like DEC, it determines how the end vowel of noun lexemes changes to reflect the mood of vowel. The change of end vowel changes the form of a verb lexicon.

There

exists five possible ways in

which

grammatical cases can be represented on an Arabic verb. So, for declinable verb, value of VDEC feature can be V T 1, V T 2, V T 3, . . . , V T 5, corresponding to the five declension types. The value of this VDEC feature

can be determined

from type

hierarchy of verb lexeme. It needs further research and it is beyond the scope of this thesis. In our current research, we will not mention this feature in the following AVM’s of verb. We keep it in our basic design to make our design robust for inflection also. SYN in this AVM is same as in standard SBCG verb AVM. VFORM contains the type of verb form. In Arabic, there are three types of verb form. The feature value of VFORM can be perf ect, imperf ect or imperative. Perfect indicates that the event has been completed, imperfect indicates that the event has not yet been completed, and imperative indicates that the event is a command. There are two types of voice in Arabic; active and passive. So, the value of VOICE feature can be either active or passive. The value of MOOD can be indicative, subjunctive or jussive. Like SYN, SEM feature in this AVM are same as in SBCG English verb AVM. SITINDEX i.e. situation index is used for index based semantic agreement. SBCG does not show any distinction between INDEX and SIT-INDEX. Also, it does not show the feature description of SIT-INDEX. We put it as a function feature but currently it has only one atomic attribute. This attribute is SITUATION. It contains the name of the verb. This SIT-INDEX is used in event-frames of verb and verbal noun lexeme. Thus ultimately it is very similar to Davidsonian event variable [12]. Like AVM for noun, FRAMES contains the list of frames which contain semantic


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN information in Minimal Recursion Semantics (MRS). These frames contain indices of both kinds INDEX and SIT-INDEX.

3.3

Type Hierarchy of Verbal Noun

As mentioned in Section 2.2, the derivation of verbal nouns from verbs depends on the number of root letters, the verb form and the root type. In Figure 3.4, we give a type hierarchy of Arabic verbal nouns.

nounlex …

DERIVATION

non-derived …

derived verb-derived

gerund activ hyperbolic- passiveresembling- locativeeparticiple partici partici partici ple ple ple

nou n

utilitar comparative iannoun

Figure 3.4: Lexical type hierarchy of Arabic noun lexeme. We analyze the Arabic noun from verbal noun perspective.

So, we classify noun

lexeme only on derivation dimension. Some other dimension can be the end ending type or declension type of noun lexeme. As shown in this Figure 3.4, eight types of verbal nouns are immediate daughters of verb-derived-noun. Each of these eight different verbal nouns can be subcategorized on the basis of the properties of the root verb, which are mentioned in Section 2.2. Each verb carries distinct information on these properties, which form the


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN dimensions of classification for verbs. So, the three dimensions for root verbs are: number of root letter, type of the root and verb form. For lack of space we discuss in detail only the subtypes of active participles. activeparticiple

NUMBER OF ROOT LETTER

triliteralrootderived

formItriliteralsoundactiveparticiple

TYPE OF ROO T

soundrootderived

VERB FORM

form Ideriv ed

formI Iderive d

formIItriliteralsoundactiveparticiple

Figure 3.5: Lexical type hierarchy of Active participle. In Figure 3.5, active-participle is at the root. Categorizing it along the number of letters in root verb, we get two types of active participles, derived from triliteral and quadriliteral root verb. Again classifying the active participle along the root type, we find several types of roots and thus verbal nouns.

Categorizing along the verb form

dimension, we get Form I, . . ., Form X active participles.

Categories in one dimen-

sion cross-classifies with categories in other dimensions and form different subtypes like form-I-triliteral-sound-active-participle, form-I-triliteral-sound-passive-participle, form-Itriliteral-sound-gerund, etc. Not all these forms generate all types of verbal nouns i.e. some of these forms do not have verbal nouns of all corresponding types. For example, locative nouns are generated from triliteral Form I root verbs only. So for this type of verbal noun, classifying along other forms does not generate any new type.


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN 3.4

Construction Rules for Verbal Nouns

We have mentioned in Section 2.2.3, there are eight types of verbal nouns. These are gerund, active participle, hyperbolic participle, passive participle, resembling participle, utilitarian noun, locative noun and comparative participle. We have developed construction rules for active participle, passive participle, locative noun and comparative participle derived from Form I strong root verb i.e. strong triliteral root verb which has no extra character.

We have found that for other categories of verbal nouns, we have to give

exhaustive lexical entry. All eight types of verbal noun are derived from strong Form I root verb. On the other hand, only gerund, active participle and passive participle are derived from quadriliteral root verbs or weak verbs. Also, the derivation pattern is not so regular. So, it requires further research. 3.4.1

Active Participle

A sample AVM for an active participle is shown in Figure 3.6. All features of this AVM have been discussed before. In this example, the event frame is the write-fr which denotes write frame. Throughout this whole formalism, we use the event frame for verb and verbal nouns to capture their semantic content efficiently. This event frame takes a event or situational index variable (SIT) and index-valued features such as actor, undergoer, instrument, location. In case of write-fr, this event frame contains three indices: one for action or event (SIT), another for the actor (ACTOR) and the last one is for undergoer of the action(UNDGR) i.e. the object of the verb. We do not store this AVM as a lexical entry. Rather, this AVM is recognized from the AVM in Figure 3.7 by our lexical construction rules. The construction rule in Figure 3.8 does this job. As we use the SBCG version of


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

  kaatibun-form-I-triliteral-sound-active-participle-lex           ro k, t, b    morph  ot             k, a, a, t, i, b,u,n    stem      arg-st hi                no            un                    case nominative                c   def no          at     syn   select none                    xargnone                lid none                    val hi          mrkg none             3rd   person                   number sg         i index         gender masc               sem                hum yes         write + *       -fr         sit situation writing           frames  actor    i Figure 3.6: AVM for active participle


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

 kataba-form-IA-triliteral-sound-activeperfect-3rd-sg-masc-verb-lex      phon []            ro  k, t, b    ot    morph        k, a, t, a, b, a     stem               nou        n                         c   at syn       * +    case accusative            1    arg-st   opt −                     sem index j                               verb                                    vform perf  ect                voice   active           cat       moodindicative              syn        select none                 xargnone                   lid none              val 1               mrkg   none         situation writing     sit-index s                    write-fr         s       

 

sit         

   

 

sem 

    

 person

           

3rd     number    sg 


                

*  +    actor  i   frames   gender masc                    hum                 undgr j         yes         location    k     

instrument

l

Figure 3.7: AVM for a sample root verb


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN HPSG, the construction rule contains two parts: MTR which contains the AVM of the verbal noun and DTRS which contains the AVM of the base verb. This rule demonstrates how a Form I triliteral sound active participle is recognized from the lexeme of Form I triliteral sound root verb.   form-I-triliteral-sound-active-participle-lex-cxt          form-I-triliteral-sound-active-participle-lex     # "            ste 1 , a, a, 2 ,       m   3 i, ,u,n morph             arg-st hi                     noun                      c               at     syn     mtr   case nominative                       val hi                   index i                         event-fr *    +                fra mes  sem    sit s                       actor i              form-I-triliteral-sound-active-perfect-3rd-sg-masc-verb-lex                   " #           3 ,a ste 1 , 2 ,   m  a, VOWE morph   L,               verb              *            +        syn  cat        dtrs      vform perf ect               voice active                         sit-index       s               event-fr   * +         sem           sit s     frames                 actor i Figure 3.8: Lexical rule for active participle construction


The construction rule contains three placeholders for the three root letters. Thus from this construction rule, an active participle generated from letters ‘k’, ‘t’ and ‘b’ or ‘n’, ‘s’ and ‘r’ can be recognized. In other words, ‘kaatibun’, ‘naasirun’, ‘saajidun’ or ‘saamiun’


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN all theres active participles can be recognized. There is no difference between constructing an active participle from a sound triliteral Form IB−IF verb and a sound triliteral Form IA verb. This is denoted in the Figure 3.8 by the VOWEL variable positioned just after the middle place holder in the daughter AVM. The distinction between all the subtypes of Form I verb is reflected by this vowel in perfect form. As variation of vowel in this position has no impact on active participle formation, Form I verb may contain any of the three vowel letter in this position. Rule shown in Figure 3.8, also shows how semantic information are propagated from root verb to active participle. The content of event frame is same in mother and daughter. The only change in SEM feature is the semantic index. The actor index, (i) of the event frame becomes the semantic index in active participle and event index, (s) of the event frame is the semantic index in the verb lexeme. Thus, semantic information of active participle are successfully derived from the root verb. The syntactic information is fixed from the vowel pattern. This process of derivation is same for other verbal nouns too. Note that, we have derived active participle stem. To use it in a sentence, we need other construction rule which caputes the inflection of noun. The construction of the active participle from Form I verb is most regular. Constructions from other verb forms are complex and the derviation pattern is not regular. Thus, it requires further analysis.

3.4.2

Passive Participle

Like that of the active participle, the construction of the passive participle from Form I triliteral sound root verb is simple. There is just one pattern for its construction from Form I triliteral sound root verb. So for all Form I subtypes, the construction rule of Figure 3.9 will be applicable. Derivation from other forms of verbs is complex and not regular. For some forms this type of participle does not exist either, which requires further analysis.


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

  form-I-triliteral-sound-passive-participle-lex-cxt         form-I-triliteral-sound-passive-participle-lex     # "            ste m, a, 1 , 2 , u,       m   3 u, ,u,n morph             arg-st hi                        noun                  c            at      syn mtr   case nominative                           val hi               index  j                         event-fr   *  +              fra   mes   sem   sit s                    undgr j             form-I-triliteral-sound-passive-perfect-3rd-sg-masc-verb-lex                    " #            3 ,a ste 1 , 2 ,     m a, VOWE morph     L,                    nou         n                             c    at  syn      *    case accusative+               4   arg-st       opt −                      se     index j    m      *      dtrs           verb                   


 

  

 +

  

              

 

   

   syn               

sem     

         cat      vform perf ect           voice       val 4   sit-index    s     event-fr  +     *  frames   sit s          undgr j

                          active                       

          

Figure 3.9: Lexical rule for passive participle construction


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN The verbs from which passive participles are derived should be transitive.

For this

reason, in the AVM of the DTR, the ARG-ST feature is not empty and its semantic index, (j) is co-indexed with the undergoer index in the event-fr. Note that the ARG-ST of the DTR contains one sign for object only, and it is in accusative case. It does not contain any sign for the actor. This is because, in Arabic, the actor is implicitly mentioned in the verb and the verb does not syntactically require the actor. If a subject is explicitly mentioned in the sentence, it can be parsed by phrasal construction rule. Like active participle, here semantic information is derived from root verb. The undergoer index, (j) in event frame of root verb becomes the semantic index of passive participle. The VOWEL variable in STEM of passive participle works same as it works in active participle construction. As an example, the verb (‘kataba’) shown in Figure 3.7 is a transitive verb. So, from this verb lexeme, we can recognize the passive participle (‘maktuubun’) shown in Figure 3.10. 3.4.3

Locative Noun

A locative noun can be generated from triliteral Form I root verbs only.

There are

two patterns of derivation, and which pattern will be used for derivation is predictable. Locative noun generated from Form IA, IC, ID, IE and IG root verbs use same pattern where Form IB and IF use another pattern. For this reason, locative noun is of two types - Form IA locative noun and Form IB locative noun. Figure 3.7 shows AVM of Form IA root verb (‘kataba’). The locative noun (‘maktabun’) derived from this verb is shown in Figure 3.11. Being of same pattern, one construction rule shown in Figure 3.12 captures the derivation of locative noun derived from verb form IA, IC, ID and IE. Like construction rules of active and passive participle, the syntactic information is derived from vowel pattern of lexeme and semantic information is derived from root verb. Thus, the location index


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN 

maktuubun-form-I-triliteral-sound-passive-participle-lex            ro k, t, b    morph ot            m, a, k, t, u, u, b,u,n    stem       arg-st  hi                 no            un                         case nominative                c def no          at       select none     syn            xargnone                    lid none                    val hi           mrkg none           3rd   person            number sg            j     index           gender masc                     sem       hum no     

   

    

 frames          

 write -fr  *  sit s     acto r i   undgr j

      + situation   writing           

Figure 3.10: AVM for passive participle


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN  maktabun-form-IA-triliteral-sound-locative-noun-lex          ro k, t, b    morph  ot           m, a, k, t, a, b, u, n     stem        arg-st  hi           noun                                    case nominative                c def no          at      select none     syn            xargnone                    lid none                   val hi           mrkg none           3rd   person                 number  sg          index           gender masc    k                 sem          hum no     

   

    

 frames          

 write -fr  *  s sit     actor i   locatio n

      + situation  writing           

k Figure 3.11: AVM for Form IA locative noun


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN (j) in the event frame of root verb, becomes the semantic index in the locative noun.   form-IA-triliteral-sound-locative-noun-lex-cxt           form-IA-triliteral-sound-locative-noun-lex     # "            ste m, a, 1 , 2 ,       m 3 a, ,u,n morph              arg-st  hi                          no                c un           at       syn    mtr  case nominative                              val hi                 index j                         event  +         -fr *   fra    mes   sem    sit   s                       location j            form-IA-triliteral-sound-locative-perfect-3rd-sg-masc-verb lex              "  #           3  ,a ste 1 , 2 ,     m a, a, morph                   verb                              c      at  + *     syn   vform        perf ect     dtrs              voice active                    sit-index    s                  event-fr          

 

+     * sem      sit s           frames         actor i     

           


  

   

 

   location j

   

Figure 3.12: Lexical rule for the locative noun construction from Form IA sound root verb Similarly, Figure 3.13 shows AVM of Form IB root verb ‘sajada’. The locative noun (‘masjidun’) derived from this verb is shown in Figure 3.14. As mentioned above, locative noun generated from Form IB and IF verb has same pattern. Thus, one construction rule captures the derivation from both of these two types of verb. The construction rule is shown in Figure 3.15. This is same as construction rule


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN

sajada-form-IB-triliteral-sound-active perfect-3rd-sg-masc-verb-lex     phon      ro s, j, d    ot  morph        s, a, j, a, d, a stem          

  

 syn *        arg-st

     c at    

            

 no un      

  []           

        

 case accusative+             opt −      

1             sem                       cat    syn         

   

index j    

     

    

           

 verb      

               

        

vform perf ect    voice active    

     

 

   indicative 

 mood  

       select   none           xargnone     lid none         val 1       mrkg none

             sit-index s     situation prostration       

                   


   

write-fr    s sit

  

     

       sem 

 

 

 person

   *

 frames                          location

                                 +   

3rd    number     actor sg     i     gender masc                  hum                undgr j        yes  k

Figure 3.13: AVM for a Form IB root verb


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN   masjidun-form-IB-triliteral-sound-locative-noun-lex          ro s, j, d    morph  ot           m, a, s, j, i, d, u, n   stem       arg-st  hi            noun                                    case nominative                c def no          at       select none     syn            xargnone                    lid none                    val hi           mrkg none          3rd    person                number  sg      k           index    gender masc                     sem          hum no     

  

  

 

frames          

 prostrat e-fr  *  s sit     actor i   locatio n

situation

     +  prostration             

k Figure 3.14: AVM for Form IB locative noun


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN form-IB-triliteral-sound-locative-noun-lex-cxt except the vowel pattern in the mother i.e. the vowel patter in locative noun and the sort description of the daughter.   form-IB-triliteral-sound-locative-noun-lex-cxt           form-IB-triliteral-sound-locative-noun-lex     # "            ste m, a, 1 , 2 ,       m 3 i, ,u,n morph              arg-st  hi                          no                   un  c         at       syn     mtr case nominative                              val hi                 index j                      event  +         -fr  fra *   mes     sem   sit    s                     location j             form-IB-triliteral-sound-locative-perfect-3rd-sg-masc-verb lex              "  #           3  ,a ste 1 , 2 ,     m a, a, morph                   verb                            c        at *    syn + vform        perf ect   dtrs               voice active                    sit-index   s                 event-fr           

 

    * sem   

 sit

+ s 

   


      

       

  

      frames   actor i       location j

           

Figure 3.15: Lexical rule for locative noun construction from Form IB sound root verb 3.4.4

Comparative Participle

Figure 3.17 shows AVM for comparative participle ‘aktabu’. It is derived from root verb ‘kataba’ shown in Figure 3.7. We have introduced a new semantic frame compare-fr


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN inspired by the analysis of Farkas, et.al. [14]. This frame has three features. The first feature is COMPARED which contains the index for the object that we want to compare. The second feature is COMPAREWITH. This feature contains the index for the object with which we want to compare. The last feature, DIMENSION, is the dimension of comparison. This dimension is actually a SIT-INDEX. This situational index must be co-indexed with the situational index of the verb lexeme from which this participle is derived. Figure 3.17 shows the construction rule for comparative participles.

This participle

has an optional syntactic requirement, which is contained in the ARG-ST feature. The case of the required sign must be genitive. Its semantic index is co-indexed with the index of “COMPAREWITH” in compare-fr.

At the same time, the situational index

of “DIMENSION” in compare-fr, must be co-indexed with the SIT-INDEX of the verb lexeme. From, this rule what we can say is that - comparative participle expresses the comparision of two things from the verb dimension.

3.4.5

Other Types of Verbal Noun

The constructions of the remaining four types of verbal nouns are complex and we cannot resolve these by construction rules. We have to give the lexical entries for these verbal nouns individually. Each verb form has a gerund that uses the most unpredictable pattern. Modeling its construction rule is a vast area of research. For now we can only list lexical entries for all gerunds individually. Figure 3.18 shows a lexical entry for gerund ‘kitaabatun’ which means writing. Hyperbolic participles are generated only from triliteral sound Form I root verbs. But not all verbs possess a corresponding hyperbolic participle.

There are eleven patterns

for deriving hyperbolic participles from verbs. However, we cannot predict from the root letters which of these eleven patterns will be used; neither can we infer the existence


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN   aktabu-form-I-triliteral-sound-comparative-participle-lex         ro k, t, b    morph ot            a, k, t, a, b, u    stem                  no          un               +  *         c     at    syn       case genitive    agr-st  1                 opt          +        index j          sem noun                                          case nominative                   c     def no     at             select none syn                     xarg none                   lid none                val 1               mrkg none             3rd   person                   number     isg     index      gender masc                        compare-fr     hum yes        sem               write     *  +  -fr  


 

  

   frames        

   sit s  actori

situation writing ,

 compared i          

 comparewith j            s dimension

Figure 3.16: AVM for Form I comparative participle


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN   form-I-triliteral-sound-comparative-participle-lex-cxt            form-I-triliteral-sound-comparative-participle-lex   "     #        morph     ste a, 2   , 3 ,             m 1 , a,     u         no            un                         + *          c    at   syn         case genetive  arg-st 4                           opt +                         se index j        m               noun                       c               at     s     mtr   case nominative     yn                     4     va              l                         person PERS        number sg

 

  

  index  i             sem  

    

  

   

    

      frames       

 

 

   gender  masc     hum HUM



               

         

            event  compar -fr e-fr * +   i     compared      , , sit s           comparewith j    actor


 i 

           dimension s     form-I-triliteral-sound-comparative-perfect-3rd-sg-masc-verb  lex                 " #           ste 1 , a, 2 , VOWEL, 3 ,a     m morph                           * +        verb      dtrs  s c  yn  at                       

      

    

  sem 

  vform     voice

     

 

perf ect  



 sit-index s   + *  event-fr

 frames 

 

    sit s

     

  active                

    

Figure 3.17: Lexical rule for comparative participle construction




CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN  kitaabatun-gerund-lex     ro k, t, b  ot   morph      k, i, t, a, a, b,

     a, t,u,n

            

stem      hi  arg-st            noun                      cat          case nominative           syn     def no              val hi            mrkg none             person 3rd                        number sg      i   index           gender  f em                hum no  sem                +  *  write     -fr  sit s situation writing         frames                action i Figure 3.18: Sample lexical entry for ‘kitaabatun’ gerund


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN of a hyperbolic participle for the given root letter. So we have to list a lexical entry for each of these hyperbolic participles.

Figure 3.19 shows a sample lexical entry for

hyperbolic participle ‘kattaabun’ which means the person who writes a lot. We have used the modifier-fr frame to capture the modification information. The LEVEL feature contains the value excessive which means that the actor does this action excessively.   kattaabun-hyperbolic-participle-lex          ro k, t, b   ot    morph             k, a, t, t, a, a, b,u,n stem      arg-st hi      noun                 cat    case   syn           

          

       nominative     

 

def no

  val hi    mrkg none     person        number  sg  i index        gender  masc           hum yes  sem           modifie write  *  r-fr -fr        s situation writing ,sit arg i     frames      level    excessive   

              

 i

                   3rd                         +     

     

actor

Figure 3.19: Sample lexical entry for ‘kattaabun’ hyperbolic participle Resembling Participles are similar to hyperbolic participles. These are generated only from triliteral sound FORM-I root verbs.

There exist a large number of derivational


patterns in this case. So, it is not feasible to formulate a lexical construction rule for these nouns. Thus in this case we also need to give the lexical entries. Figure 3.20 shows the lexical entry for ‘katiibun’ which means a person who always writes. Like hyperbolic participle, here we have used the modifier-fr frame to capture its information as a modifier. Unlike hyperbolic participle, the value of LEVEL feature is intrinsic which capurtes its difference with hyperbolic participle.


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN  katiibun-resembling-participle-lex      ro   ot   morph        k, a, t, i, i, b,u,n

    k, t, b         

stem      arg-st hi       noun                        cat    case nominative   syn         def no          val hi         mrkg none        person               number sg   i  index         gender    masc        hum yes  sem          write  * -fr       s situation ,sit arg   frames  

 

 

 i

         

 modifie  r-fr  writingi       level intrinsic

                  3rd                        +     

    

actor

Figure 3.20: Sample lexical entry for ‘katiibun’ resemble participle


CHAPTER 3. HPSG FORMALISM FOR VERBAL NOUN Utilitarian Nouns are also generated from triliteral sound Form I root verbs only. There are four patterns of derivation. For a given set of root letters it is unpredictable which pattern will be used. For this reason, despite the limited number of patterns, we have to list the lexical entries exhaustively. Figure 3.21 shows a lexical entry for utilitarian noun ‘miktabun’ which means instrument for writing.   miktabun-utilitarian-noun-lex      roo k, t, b     t  morph       ste m m, i, k, t, a, b,u,n      arg-st hi       noun                     cat   syn   

                         sem   

   

                      

      case nominative         def no          val hi    mrkg none    person         sg  number     index k    gender  masc        hum no            

 frames              

write-fr    s    actor    undgr  

* sit

situation i j

instrumen t k

                 3rd                               + writing                         

Figure 3.21: Sample lexical entry for ‘miktabun’ utilitarian noun




Chapter 4 TRALE Implementation We have implemented the HPSG formalism described in previous chapter in TRALE (An extension of the Attribute Logic Engine) compiler. Here, Section 4.1 gives an introduction about the TRALE system. Then Section 4.2 discusses the necessary components of TRALE compiler. Finally, Section 4.3 discusses the methodology that we follow to implement the HPSG formalism in TRALE.

4.1

Introduction to TRALE

TRALE is a lexical rule compiler. It integrates phrase structure parsing, semantic-headdriven generation and constraint logic programming with typed feature structures as terms. It is responsible for compiling feature structure descriptions into Prolog code. It is descendent of two other compiler Attribute Logic Engine (ALE) and Contoll [30, 34]. Both, of these compilers are designed based on the formalism of HPSG 87 [32]. With Gerald Penn being the chief developer of TRALE, TRALE inherited the core of the ALE system, but with the underlying logic specialized to the case where it becomes a logic in the tradition of HPSG 94 [33]. We have decided to use TRALE for our implementation. We have the other alternative

CHAPTER 4. TRALE IMPLEMENTATION i.e. LKB. Both, HPSG and LFG grammars can be implemented in LKB. But TRALE was solely developed to capute the HPSG grammar and it was developed aftr LKB. For this reason, we have decided to use TRALE. Before choosing the implementation platform, we have also read the comparison between TRALE and LKB presented by Nurit [27]. We use TRALE on Grammix operation system version of June 01, 2007 [28]. This is the older version of TRALE. There is another new version of TRALE which is not complete but can be run stand alone on Linux platform. This new version was published on 2008 [34]. Grammix is developed for grammar development. It contains two complete grammar development systems - TRALE and LKB. Its TRALE system was last updated on May 31, 2007.


4.2

Necessary Components for Implementation

TRALE has two major component files- signature file and theory file, an I/O console and a Graphical Interface (GRISU) which shows output of AVM and type hierarchy graphically.

4.2.1

Signature File

This file contains the type hierarchy of features in HPSG. This file also contains the description of function features i.e. the features that constitute a function feature. This file does not have any extension. This file is called from theory file. The hierarchy in this file is maintained by specific spacing. A feature in following line with three blank spaces indicates an immediate child. Figure 4.1 shows some sample lines of the signature file. The bot feature is always at the top of this hierarchy. TRALE requires this feature at top of all features. The constituents of function features are specified at the same line. Here sign and all of its children are function features. The constituents of sign feature are listed in the same line. In signature file, multiple inheritance is marked by &. That is, mentioning & before


CHAPTER 4. TRALE IMPLEMENTATION type_hier archy bot sign phon:ne_list morph:morph arg_st:list syn:syn sem:sem lexeme noun_lex ... active_participle_le x trilateral_root_deri ved_ap_lex formI_sound_trilat ... .

eral_ap_lex sound_root_derived _ap_lex &formI_sound_trila teral_ap_lex formI_derived_ap_le x &formI_sound_trilateral_ap_lex

Figure 4.1: Signature file any type means that this type is also mentioned in another place of this file and in that place, it is a child of another type.

4.2.2

Theory File

This file is composed of SWI Prolog code and this file must have pl extension. This is the starting point of TRALE compiler. It loads the signature files and additional prolog files. This file along with other prolog files contains the lexical entries and construction rules. Figure 4.2 shows the lexical entry for the root verb ‘kataba’. Detail lexical property of a lexicon is entered after ˜˜>. This entry in TRALE file is very much similar to AVM of root verb ‘kataba’ shown in Figure 3.7. Figure 4.3 shows part of the lexical construction rule.


Detail of construction rule starts after ##. In lexical construction rule, the daughter comes first then the mother. Mother and daughter are separated by ∗∗ >. At the end


CHAPTER 4. TRALE IMPLEMENTATION kataba ~~> (formIA_triliteral_root_v erb_lex, (morph: (root:[k,t,b]), arg_st:[(OBJ_SIGN,(syn:(cat:(case:acc,def:no)), sem: (index:OBJ_INDEX)))], syn:(cat: (verb, vform:per f, voice:acti ), val: ve, [OBJ_SI mood:sub GN]), junctive sem:(sit_index: (SIT_IND EX, (situation: writing)), frames:[ ( sit:SIT_I NDEX, actor: (SUB_IN DEX, ( pers:t hird, num:s g, gen: male, hum: ) y ) )), ). undgr:OBJ _INDEX, location:LO C_INDEX) ]

Figure 4.2: Sample lexical entry in theory file trilateral-active-lex-cxt## (formI_triliteral_ root_verb_ lex, (morph: (root:R OOTS ) , syn:( ... ), sem: **> (sit_index:SIT _INDEX, frames: [ (sit:SIT_INDEX ,actor:SUB_INDE X) ] ) )) (formI_sound_tri lateral_ap_lex, ( (morph: morphs


( r o o t : R O O T

S), syn: ( ... ), sem: (index:SUB_ INDEX, frames: [ (sit:SIT_INDEX ,actor:SUB_INDE X) ] ) ) )) (X,a,Y,a,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,u,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,i,Z,a) becomes (X,a,a,Y,i,Z,u,n) . Figure 4.3: Sample lexical construction rule in theory file


CHAPTER 4. TRALE IMPLEMENTATION of lexical construction rule, morphs keyword is used to incorporate the morphological operation done by this construction rule. The change of STEM feature from daughter to mother is captured by the command - morphs DTR becomes MTR.

4.2.3

Input and Output System

TRALE has an I/O console in EMACS editor. The output for AVM and type hierarchy are graphically shown in a Graphical Interface which is called GRISU. The I/O console takes input to compile signature and theory file and show compiled output. It also takes phrases and sentences which are to be recognized by the grammar mentioned in signature and theory file. Commands can be issued in this console to show GRISU output. GRISU gives very nice and attractive pictorial definition of AVM, construction rule and type hierarchy. The order of features in GRISU AVM can be configured from theory.pl file in the following manner >>>phon. phon <<< morph. Here phon <<< morph denotes MORPH feature preceeds ARG−ST feature in GRISU AVM and >>>phon denotes PHON is at the top position in the feature listing. The GRISU output for AVM of root verb ‘kataba’ in Figure 3.7 is shown in Figure 4.4. Similarly, GRISU output for type hierarchy of active participle in Figure 3.5 is shown in Figure 4.5.

4.3

Implementation Methodology

Our Implementation can be divided in the following steps 1. We have translated the SBCG type hierarchy in Figure 3.4 and Figure 3.5 into signature file.


CHAPTER 4. TRALE IMPLEMENTATION

Figure 4.4: GRISU output for AVM of ‘kataba’ root verb.

Figure 4.5: GRISU output for type hierarchy of active participle.


CHAPTER 4. TRALE IMPLEMENTATION 2. We have given the description of function feature in signature file. 3. We have loaded the signature file from theory file and tested whether the hierarchy and function feature description are correct or incorrect by view the GRISU outputs like Figure 4.5. 4. We have mentioned the orders of features at the beginning of theory file by using >>> and <<< operator. 5. We have given the lexical entry for all types (Form IA, IB, IC, ..., IG) of Form I triliteral strong root verb. Then we have checked whether our entries are correct or not by viewing the GRISU outputs like Figure 4.4. 6. We have given the lexical construction rules which is at the core of our implementation. Then in Input console, we have entered the verbal nouns which are derived from the root verbs given as lexical entry. We have found these verbal nouns are recognized by TRALE compiler from their root verbs and construction rules. Thus, when we have entered ‘kaatibun’ to be recognized by the compiler, it gives the GRISU output in Figure 4.6. 7. In the above manner, we have tested all the five construction rule proposed in Section 3.4 for all types of Form I triliteral strong root verb. Following these steps, we have successfully implemented the HPSG formalism. The detail content of signature and theory file are provided in Appendix.


CHAPTER 4. TRALE IMPLEMENTATION

Figure 4.6: GRISU output of AVM for ‘kaatibun’ active participle.


Chapter 5 Conclusion In this last chapter, we draw the conclusion of our thesis by describing the major contributions made through this research followed by some directions for future research.

5.1

Summary of Contributions

The contributions that have been made in this thesis can be enumerated as follows: • We have formulated a concrete AVM for Arabic noun and verb. We have made the design robust so that it can not only handle lexical construction but also phrasal construction. We have implemented it in TRALE. We have extended it to capture the root pattern morphology. • When a verbal noun stem is derived from a root verb, some syntactic semantic information is encoded in the derived stem. syntactic

and semantic information.

and

We have captured this

We have modified the INDEX feature for

Arabic to reference and incorporate semantic meaning. • We have given concrete description of SIT-INDEX and show its differences with INDEX. Before this no literature, show its use and distinction from INDEX. CHAPTER 5. CONCLUSION • We have articulated the type hierarchy of Arabic noun and placed the verb-derived nouns and its subtypes in lexical type hierarchy. We have also provided the justification of this placement and implemented it in TRALE platform. • We have utilized the root pattern morphology. Thus we need minimal lexical entry to resolve a lexicon. We have developed lexical construction rules for four types of verbal noun (active participle, passive participle, locative participle and compar- ative participle) derived from triliteral sound root verb. The other four types of verbal noun need exhaustive lexical entry for each verb stem. We have given the sample lexical entry for each of these stem. Some lexical construction rules can be constructed which inflection of these verb stem for different gender and number.


• We have implemented all the lexical type hierarchy, lexical entry and lexical con- struction

rules

proposed in this

thesis.

Then

we have verified these

construction rules by recognizing the verbal noun stems from their root verbs.

5.2

Future Directions for Further Research

Modeling a natural language is a big part of research. Ours is the starting of this massive work. The following directions should be considered for best utilization of our research. • Verbal noun derived from strong quadriliteral verb and weak verb should be mod- eled. To accomplished this, we have to model the quadriliteral and weak verbs as well. • We have not developed any construction rules for four type of verbal nouns (Gerund, Hyperbolic participle, Resembling participle and Utilitarian noun). But our investigation says, Hyperbolic participle and Utilitarian noun can be derived by construction rules based on some specific root classes. For this, Arabic roots should be classified into more granular level so that from those root classes different construction rules can be generated.


CHAPTER 5. CONCLUSION • As per mentioned in Section 5.1, we have developed construction rules for verbal noun stems. Other construction rules should be generated to capture the inflection of these verbal noun stems. Some examples of inflecions are number, gender and declension. Declension is a unique linguistic feature for Arabic. Noun and verb lexemes are inflected and got different forms based on the case and mood in phrases. • So far, we have only discussed with lexical construction rules. It is obvious that to parse Arabic noun lexicon in a sentence, phrasal construction rules must be constructed.


Bibliography

[1] Kenneth R. Beesley. Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001. In Proceedings of the Workshop on Arabic Language Processing: Status and Prospects, Association for Computational Linguistics, 2001. [2] Md. Shariful Islam Bhuyan and Reaz Ahmed. An HPSG Analysis of Arabic Passive. In Proceedings of the 11th International Conference on Computer and Information Technology, 2008. [3] Md. Shariful Islam Bhuyan and Reaz Ahmed. An HPSG Analysis of Arabic Verb. In Proceedings of the 8th International Arab Conference on Information Technology, 2008. [4] Md. Shariful Islam Bhuyan and Reaz Ahmed. Nonconcatenative Morphology: An HPSG Analysis. In Proceedings of the 5th International Conference on Electrical and Computer Engineering, 2008. [5] Steven Bird and Ewan Klein.

Phonological Analysis in Typed Feature Systems.

Computational Linguistics, 20:455–491, 1994. [6] Joan Bresnan. The Mental Representation of Grammatical Relations.

Cambridge,

MA, USA: MIT Press, 1982. [7] Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0. In Linguistic Data Consortium, Philadelphia, PA, USA, 2004. BIBLIOGRAPHY

74

[8] Noam Chomsky. Lectures on Government and Binding, 1981. [9] Domenic Cipollone. Morphologically complex predicates in Japanese and what they tell us about grammar architecture. In OSU Working Papers in Lingusitics 56, pages 1–52. Ohio State University, 2001. [10] Ann Copestake, Dan Flickinger, Rob Malouf, Susanne Riehemann, and Ivan Sag. Translation using minimal recursion semantics. In Proceedings of the 6th International

Conference on

Theoretical

and

Methodological Issues

in

Machine


Translation, Leuven, 1995. [11] Ann Copestake, Dan Flickinger, Susanne Riehemann, and Ivan Sag. Minimal recursion semantics:

An introduction.

Research on Language and Computation,

3(4):281– 332, 2006. [12] Anthony R. Davis. Linking and the Hierarchical Lexicon. PhD thesis, Stanford University, 1996. [13] Anthony R. Davis. Linking by Types in the Hierarchical Lexicon. Chicago: University of Chicago Press, 2001. [14] Donka F. Farkas and Katalin E´ . Kiss. On the comparative and absolute readings of superlatives. Natural Language and Linguistic Theory, 18(3):417–455, 2000. [15] Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan A. Sag. Generalized Phrase Structure Grammar. Chicago: University of Chicago Press, 1985. [16] Georgia M. Green. Elementary principles of HPSG. In FIPS PUB, pages 140–1, 1999. [17] Kais Haddar, Ines Zalila, and Sirine Boukedi.

An HPSG parser generation with

the LKB for Arabic relatives. International Journal of Computing and Information Sciences, 7(5):51–60, 2009.


BIBLIOGRAPHY [18] Md. Sadiqul Islam, Mahmudul Hasan Masum, Md. Shariful Islam Bhuyan, and Reaz Ahmed. An HPSG Analysis of Declension in Arabic Grammar. In Proceedings of the 9th International Arab Conference on Information Technology, 2009. [19] Md. Sadiqul Islam, Mahmudul Hasan Masum, Md. Shariful Islam Bhuyan, and Reaz Ahmed. Arabic Nominals in HPSG: A Verbal Noun Perspective. In Proceedings of the 17th International HPSG Conference, Universit Paris Diderot, pages 158–178. On-line: CSLI Publications, 2010. [20] Mohtanick

Jamil.

Araboc

verb

paradigms.

Website,

2003-2011.

http://www.learnarabiconline.com/verbal-paradigms.shtml. [21] Mohtanick

Jamil.

Definiteness.

Website,

2003-2011.

http://www.learnarabiconline.com/definiteness.shtml/. [22] Mohtanick

Jamil.

Derived

nouns.

Website,

2003-2011.

http://www.learnarabiconline.com/derived-nouns.shtml. [23] Andreas Kathol.

Agreement and the syntax-morphology interface in HPSG.

In

Studies in contemporary phrase structure grammar, pages 223–274. UC Berkeley, 1999. [24] Alain Kihm. Nonsegmental Concatenation: A Study of Classical Arabic Broken Plurals and Verbal Nouns . Morphology, 16:69–105, 2006. [25] Eugene E. Loos, Susan Anderson, and J. Douglas Wingate.

Jr. Dwight H., Day, Paul

Glossary of linguistic terms.

C.

Jordan,

Website, 2011.

http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/. [26] Nurit Melnik. Verb-Initial Construction in Modern Hebrew. PhD thesis, university of california, berkeley, 2002.


BIBLIOGRAPHY [27] Nurit Melnik. From “hand-written” to computationally implemented HPSG theories. In Proceedings of the 12th International HPSG Conference, University of Lisbon, pages 311–321. On-line: CSLI Publications, 2005. [28] Stefan

Muller.

Grammix.

Website,

2007.

http://hpsg.fu-berlin.de/Software/Grammix/. [29] A.M. Mutawa, Salah Alnajem, and Fadi Alzhouri. An HPSG Approach to Arabic Nominal Sentences. Journal of the American Society for Information Science and Technology, 59(3):422–434, 2008. [30] Gerald Penn and Mohammad Haji-Abdolhosseini. ALE Documentation. Website, 2003. http://www.ale.cs.toronto.edu/docs/. [31] Carl J. Pollard. Lectures on the foundations of HPSG, 1997. [32] Carl J. Pollard and Ivan A. Sag. Information-based syntax and semantics. Stanford: Center for the Study of Language and Information (CSLI), 1:262–267, 1987. [33] Carl J. Pollard and Ivan A. Sag. Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press, 1994. [34] Frank

Richter.

Priliminary

TRALE

Page.

Website,

2008.

http://milca.sfs.uni-tuebingen.de/A4/Course/trale/. [35] Susanne Z. Riehemann. Type-Based Derivational Morphology. Journal of Comparative Germanic Linguistics, 2:49–77, 1998. [36] Susanne Z. Riehemann. A Constructional Approach to Idioms and Word Formation. PhD thesis, Stanford University, 2001. [37] Karin C. Ryding. Modern Standard Arabic. Cambridge University Press, UK, 2005. [38] Ivan A. Sag. Sign-Based Construction Grammar, chapter 2. Stanford University, August 2010.


BIBLIOGRAPHY [39] Ivan A. Sag and Thomas Wasow. Syntactic Theory: A Formal Introduction. Stanford University Center for the Study, 1999. [40] Otakar Smrzˇ. Functional Arabic Morphology. Formal System and Implementation. PhD thesis, Charles University in Prague, 2007. [41] Nathan Vaillette. Hebrew relative clauses in HPSG. In Proceedings of the 7th International HPSG Conference, UC Berkeley (2223), pages 305–324. On-line: CSLI Publications, 2000. Glossary of Terms Affix: is a bound morpheme that is joined before, after, or within a root or stem. Agreement: refers to a formal relationship between elements whereby a form of one word requires a corresponding form of another. It is also known as concord. Arabic quadriliteral verb: is the Arabic verb which contains four consonants. Arabic strong verb: is the Arabic verb which does not contain any vowel in its long form. Arabic triliteral verb: is the Arabic verb which contains three consonants. Arabic weak verb: is the Arabic verb which contains any vowel in its long form. Arabic verb form: In Arabic, from any particular sequence of root letters, up to fifteen different verb stems may be derived, each with its own template or vowel pattern and semantic information. These stems are called verb forms. AVM: Attribute Value Matrix. Bound morpheme: is a morpheme that never occurs by itself, but is always attached to some other morpheme. Concatenative morphology: is the process where bound morphemes are linearly concatenated. Construct: is a formal linguistic representation of construction rule. Construction: is an ordered arrangement of grammatical units forming a larger unit. Construction rule: is the rules for constructing phrases or sentences. Declension: is the process of disambiguating the grammatical roles of words by slightly changing their end vowels. In Arabic, end vowel implies grammatical case for nominal


BIBLIOGRAPHY and mood for verb. Derivation: is the formation of a new word or inflectable stem from another word or stem. It typically occurs by the addition of an affix. The derived word is often of a different syntactic category from the original. Free morpheme: is a morpheme that can occur by itself. However, other morphemes such as affixes can be attached to it. Gemination: is the consecutive double occurrence of an alphabet. HPSG: Head-driven Phrase Structure Grammar developed by Ivan Sag and Pollard Sag in 1994. Inflection: is variation in the form of a word, that expresses a grammatical contrast which is obligatory for the stems. It does not change the syntactic category of the word. Lexeme: is the minimal unit of language which has a semantic interpretation and embodies a distinct cultural concept. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as run. Lexical construction rule: deals with the forming of lexicon i.e. forming of words and stems. Lexical entry: is the entry of lexeme in the dictionary. Lexicon: is its vocabulary, including its words and expressions.

A lexicon is also a

synonym of the word thesaurus. It includes the lexemes used to actualize words. Grammatical rules are not considered part of the lexicon. LFG: Lexical functional grammar developed by Joan Bresnan in 1982. Morpheme: is the smallest meaningful unit in the grammar of a language. Morphological process: is a means of changing a stem to adjust its meaning to fit its syntactic and communicational context. There are two types of processes - concatenative and nonconcatenative. Morphology: is the study of word formation i.e. the internal structure of words. Morphosyntactic operation: is an ordered, dynamic relation between one linguistic form and another. Derivation and Inflection are morphosyntactic operations.


BIBLIOGRAPHY

Nonconcatenative morphology: is the process where bound morphemes are nonlin- early concatenated. Phonology: is the systematic use of sound to encode meaning. Phrasal construction rule: is a construction rule that deals with the forming of phrase and sentence. Root: is the portion of a word that carries the principle portion of meaning of the words in which it functions. It is common to a set of derived or inflected forms, if any, when all affixes are removed. A root is a stem also. Root class:

is a set of roots, which share a common derivational and

inflectional paradigm. SBCG: Sign-Based Construction Grammar. It is a variation of HPSG and proposed by Ivan Sag in 2007. Semantics: is the study of meaning. It typically focuses on the relation between signi- fiers, such as words, phrases, signs and symbols. Sign: is a formal linguistic representations of words, phrases as well as sentences. All human utterances are captured by signs. Stem: is the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added. Syntactic category: is a set of words and/or phrases in a language which share a sig- nificant number of common characteristics. It is also known syntactic class. Syntax: is the study of the principles and rules for constructing phrases or sentences. TRALE: is a lexical rule compiler specially developed for HPSG. It is an extension of the Attribute Logic Engine. Appendix A In the Table 5.1, we give the romanized transliteration of Arabic alphabet. Table 5.1: Transliteration Table of Arabic Alphabet


Arabic Letter

❅ ❍ ✳ ❍ ✏ ❍ ✑ ❤

b t t ¯ ˇ g h .

✳ ❤ ♣ ❳ ❳ ✠

h ˘ d

P P

ˇ s

Appendix B Content of signature file type_hierar chy bot list

Transliteration Arabic Letter

d ¯ r z s

Transliteration

.t

➔ ✠ ➝ ➝ ✠

.z

➡ ✠

q

, g ˙ f

k l

➻ ➮ Ð à ✠

m

ð

y

n w h


ne_list hd:bot tl:list e_list char k t b a i u n r s j d m h sign phon:ne_list morph:morph arg_st:list syn:syn sem:sem word dtrs:list dtr:sign lexeme noun_lex derived_noun_lex verbderived_noun_lex gerund_lex active_participle_lex trilateral_root_derived_ap_lex formI_sound_trilateral_ap_lex sound_root_derived_ap_lex &formI_sound_trilateral_ap_lex formI_derived_ap_lex &formI_sound_trilateral_ap_lex hyperbolic_participle_lex passive_participle_lex trilateral_root_derived_pp_lex formI_sound_trilateral_pp_lex sound_root_derived_pp_lex &formI_sound_trilateral_pp_lex formI_derived_pp_lex &formI_sound_trilateral_pp_lex resembling_participle_lex locative_noun_lex trilateral_root_derived_ln_lex formI_sound_trilateral_ln_lex sound_root_derived_ln_lex &formI_sound_trilateral_ln_lex formI_derived_ap_lex


BIBLIOGRAPHY &formI_sound_trilateral_ln _lex formIA_sound_trilateral_ ln_lex formIB_sound_trilateral_ ln_lex utilitarian_noun_ lex comparative_lex trilateral_root_derived_com _lex formI_sound_trilateral_c om_lex sound_root_derived_c om_lex &formI_sound_trilateral_com_lex formI_derived_com_lex &formI_sound_trilateral_com_lex nounderived_noun_lex dual_noun_lex sg_noun_lex pl_noun_lex verb_lex triliteral_root_verb_lex formI_triliteral_root_verb_lex formIA_triliteral_root_verb_lex formIB_triliteral_root_verb_lex formIC_triliteral_root_verb_lex formID_triliteral_root_verb_lex formIE_triliteral_root_verb_lex formIF_triliteral_root_verb_lex formIG_triliteral_root_verb_lex formIH_triliteral_root_verb_lex formII_root_verb _lex quadriliteral_root_verb_lex morph root:list stem:list syn cat:cat val:list mrkg:mrkg cat case:case def:def mood:mood vform:vform voice:voice noun verb case nom acc gen person first second third number sg dual plural situatio n writing prostration helping honour suffice hearing drinking gender male female de f ye s no hu my n vform perf impe rf voice


active passiv e


BIBLIOGRAPHY

.

mood subjuncti ve indicativ e jussive mrkg non e that lid selec t sem index:index sit_index:sit_index frames:list index pers:person num:number gen:gender hum:hum sit_index situation:situation fra me event_fr sit:sit_index actor:index undgr:index location:index compare_fr compared:index comparedwith:index dimension:sit_index ref_fr ref_index:cat

Content of theory.pl is %theory.pl % Multifile declarations. %:- multifile ’##’/2. %:- multifile ’~~>’/2. % load phonology and tree output :- [trale_home(tree_extensions)]. % maximum 4 rules will be used for licensing. :-lex_rule_depth(4). % specify signature file signature(signature). >>>phon. phon <<< morph. morph <<< arg_st. arg_st <<< syn. syn <<< sem. index <<< frames. sit_index <<< frames. sit <<< actor. actor <<< undgr. undgr <<< location. %lexical entry kataba ~~> (formIA_triliteral_root_verb_lex, (morph: ( root:[k,t,b] ), arg_st: [(OBJ_SIGN, (syn:(cat:(case:acc,def:no)), sem:(index:OBJ_INDEX)))], syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ) , val:[OBJ_SIGN]


BIBLIOGRAPHY

sem:(

). nasara ~~> (

)

), sit_index: (SIT_INDEX, ( situation:writing )), frames :[ ( sit:SIT_IND EX, actor: (SUB_INDEX, ( pers:third, num:sg, gen:male, hum:y )), undgr:OBJ_I NDEX, location:LOC_I NDEX ) ] )

formIA_triliteral_root_ver b_lex, (morph: ( root:[n,s,r] ), arg_st: [(OBJ_SIGN, (syn:(cat:(case:acc,def:no)), sem:(index:OBJ_INDEX)))], syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ) , val:[OBJ_SIGN] ), sem:( sit_index: (SIT_INDEX, ( situation:helping )), frames :[ ( sit:SIT_IND EX, actor: (SUB_INDEX, ( pers:third, num:sg, gen:male, hum:y )), undgr:OBJ_I NDEX, location:LOC_I NDEX ) ] ) ) ).


BIBLIOGRAPHY sajada ~~> (

86

formIB_triliteral_root_ver b_lex, (morph: ( root:[s,j,d] ), arg_st: [(OBJ_SIGN,

syn:

sem:(

).

)

(syn:(cat: (case:acc,def:no)), sem: (index:OBJ_INDEX) ))],

( cat:

(verb, vform:perf, voice:active, mood:subjunc tive ) , val:[OBJ_SIGN] ), sit_index: (SIT_INDEX, ( situation:prostration )), frames :[ ( sit:SIT_IND EX, actor: (SUB_INDEX, ( pers:third, num:sg, gen:male, hum:y )), undgr:OBJ_I NDEX, location:LOC_I NDEX ) ] )

%lexical construction rules trilateral-active-lexcxt## ( formI_triliteral_root_verb _lex, (morph: ( root:ROOTS ), syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ), ) sem: ( sit_index:SIT_INDEX, frames:[ (sit:SIT_INDEX,actor:SUB_INDEX) ]


BIBLIOGRAPHY

) **> (

)

)

formI_sound_trilateral_ap_lex, ( (morph: ( root:ROOTS ), arg_st: [], syn: ( cat: (noun, case:no m, def:no , val: ) [], mrkg:none ), sem: index:SUB_I ( NDEX, frames:[ (sit:SIT_INDEX,actor:SUB_INDEX) ] ) ) )

) morphs (X,a,Y,a,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,u,Z,a) becomes (X,a,a,Y,i,Z,u,n), (X,a,Y,i,Z,a) becomes (X,a,a,Y,i,Z,u,n) .

trilateral-passive-lexcxt## ( formI_triliteral_root_verb _lex, ( morp h: ( root:ROOTS ), arg_st:[(OBJ_SIGN, (syn:(cat:(case:acc,def:no)), sem:(index:OBJ_INDEX)))], syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ) , val:[OBJ_SIGN] ), sem sit_index:SIT_I :( NDEX, frames:[ (sit:SIT_INDEX, undgr:OBJ_INDEX) ] ) )


BIBLIOGRAPHY ) ** > (

formI_sound_trilateral_p p_lex, ( morp h: ( root:ROOTS ), arg_st:[], syn: ( cat: (noun, case:no m, def:no , val: ) [], mrkg:none ), sem index:OBJ_I :( NDEX, frames:[ (sit:SIT_INDEX, undgr:OBJ_INDEX) ] ) )

) morphs (X,a,Y,a,Z,a) becomes (m,a,X,Y,u,u,Z,u,n), (X,a,Y,u,Z,a) becomes (m,a,X,Y,u,u,Z,u,n), (X,a,Y,i,Z,a) becomes (m,a,X,Y,u,u,Z,u,n) .

trilateral-locative-formIA-lexcxt## ( formIA_triliteral_root_ver b_lex, ( morph: ( root:ROOTS ), syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ), ) sem: ( sit_index:SIT_INDEX, frames:[ (sit:SIT_INDEX,location:LOC_INDEX) ] ) ) ) **> ( formIA_sound_trilateral_ln_lex, ( morph: ( root:ROOTS


BIBLIOGRAPHY ), arg_st: [], syn: ( cat:

sem :(

(noun, case:no m, def:no , val: ) [], mrkg:none ), index:LOC_I NDEX, frames:[ (sit:SIT_INDEX, location:LOC_INDEX) ] )

) ) morphs (X,a,Y,a,Z,a) becomes (m,a,X,Y,a,Z,u,n) . trilateral-locative-formIB-lexcxt## ( formIB_triliteral_root_ver b_lex, ( morph: ( root:ROOTS ), syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ), ) sem: ( sit_index:SIT_INDEX, frames:[ (sit:SIT_INDEX,location:LOC_INDEX) ] ) ) ) **> ( formIB_sound_trilateral_ln_lex, ( morph: ( root:ROOTS ), arg_st: [], syn: ( cat: (noun, case:no m, def:no ) ,


BIBLIOGRAPHY

sem :(

val:[], mrkg:n one ), index:LOC_I NDEX, frames:[ (sit:SIT_INDEX, location:LOC_INDEX) ] )

) ) morphs (X,a,Y,a,Z,a) becomes (m,a,X,Y,i,Z,u,n) .

trilateral-comparative-formIlex-cxt## ( formI_triliteral_root_verb _lex, ( morph: ( root:ROOTS ), syn: ( cat: (verb, vform:perf, voice:active, mood:subjunc tive ), ) sem: ( sit_index:SIT_INDEX, frames:[ (sit:SIT_INDEX) ] ) ) ) **> ( formI_sound_trilateral_com_lex, ( morph: ( root:ROOTS ), arg_st:[(OBJ_SIGN, (syn:(cat:(case:gen,def:no)), sem:(index:OBJ_INDEX)))], syn: ( cat: (noun, case:no m, def:no , val: ) [], mrkg:none ), sem index:SUB_I :( NDEX, frames:[ (sit:SIT_INDEX,


BIBLIOGRAPHY actor:

(SUB_I NDEX2, ( pers:P ERS, num:sg, gen:male , hum:HU M))

), (dimension:SIT_IN DEX, compared:SUB_IN DEX2, comparedwith:OBJ _INDEX) ]

) ) ) morphs (X,a,Y,a,Z,a) becomes (a,X,Y,a,Z,u), (X,a,Y,u,Z,a) becomes (a,X,Y,a,Z,u), (X,a,Y,i,Z,a) becomes (a,X,Y,a,Z,u) .


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.