Perceptual category mining in human second language speech perception

Perceptual Category Mining in Human Second Language Speech Perception Yizhou Lan Department of Chinese, Translation and Linguistics City University of Hong Kong Kowloon, Hong Kong ylylan2-c@my.cityu.edu.hk

Will X. Y. Li Department of Electronic Engineering City University of Hong Kong Kowloon, Hong Kong xiangyuli4-c@my.cityu.edu.hk

Abstract—this paper intends to clarify the process of the Perceptual Assimilation Model (PAM) predicting the patterns of human categorical speech perception in audio perception of second language (L2) speech signals. The original stage of categorizing acoustic stimuli in the L2 signal often involves assimilation from L1 categories. Whether L1 sounds will assimilate to L2 ones can be assimilated to the L2 category is decided by difference of category distance. This study, with evidence from Cantonese learners’ perception of the English trsignal, proposes that a careful mining process to find out the intended assimilating L1 category is necessary. The chosen L1 category should be tested on perceptual similarity to the L2 category in fine-grained phonetic environments, rather than just possible L1 phonological mappings from observed production errors. Two experiments with different L1 assimilating candidates are done using the AX identification and ABX discrimination paradigms. Target trVC syllables are both aligned with twVC, a representative of phonological closeness through contrastive analysis, and with chVC, a representative of perceptually similar candidate which is less mentioned in literature. Results from both identification and discrimination show that accuracy rate for twVC is almost ceiling whereas that of trVC is significantly lower. Results suggest that perceptual distance is better represented by perceptual similarity and such screening process should be applied as a pre-examination procedure instead of choosing L1 assimilator based on phonological similarity.

not be directly functioning because L1 and L2 categories may differ. Despite the unavailability of readily usable categories, L2 learners will “borrow” L1 categories, a process called “equivalence classification”, to increase perceptual efficiency [3] and such “laziness” of equaling L1 and L2 categories is referred to as perceptual assimilation [5]. Even in cases where L1 and L2 categories are labeled as the same phoneme (the smallest meaningful unit in speech) in their languages, L1 and L2 speakers’ perception may still surface subtle mismatches. The Perceptual Assimilation Model (PAM, [5] [6]) is an approach describing how L2 speech sounds, perceived as categories of discrete constellations of articulatory gestures, can be assimilated to L1 ones. Assimilation process could either facilitate or hinder communication. According to different perceptual distances between L1 and L2 categories, a candidate L1 sound may be perceptually identical, similar, or distinct to the target L2 sound to be perceived or learned. If the L2 sound is perceived identical phonologically to the candidate L1 sound, it will map onto the L1 category and make the L2 sound indistinguishable. For two given L2 sounds and their different perceptual distance to a given L1 sound, the PAM model proposes six possible assimilation types and predicts L2 perceiver’s ability to discriminate the two L2 sounds in these situations. The types and predictions are as follows:

Index Terms—human speech perception, category assimilation, perceptual distance, L2 speech processing.

TABLE I. ASSIMILATION TYPES AND THEIR PREDICTIONS IN PAM

I. INTRODUCTION The context of second language (L2) speech processing is a precious condition to study the capability of human speech perception per se. Speech, part of the human cognitive mechanism, is usually perceived by recognizing higher-level knowledge of categories like in the process of recognizing shapes or colors [1]. One does not have to focus on all the acoustic details to comprehend a speech sound [2]. Instead, previous studies has shown that economical ways of extracting specific acoustic cues [2] [3] or articulatory cues [4] [5] in topdown processing helps humans to recognize fast and variant speech sound tokens efficiently and accurately. However, in L2 speech, though the incoming speech signals are linguistic sounds, the higher-level knowledge of categories, which helps us processing our native language (L1) automatically, might

Assimilation types Two-category Uncategorizable–categorizable Both uncategorizable Category-goodness Single-category Non-assimilatable

Predicted discrimination rate Excellent Very good Fair to good Moderate to very good Very poor Very good to excellent

Since a major purpose of researching L2 speech perception is to find out the difficult phonemes in learning a specific L2, the “perceptual distance” became especially important for researchers to effectively testify whether a non-discriminable assimilation has taken place [7]. The SC type (bolded in Table 1) has been extensively discussed to this very end. When two L2 sounds were assimilated to the same L1 category, L2 learners, who fails to discriminate subtle acoustic differences and faces two possible categories to access to, will randomly pick one candidate category in a force-choice discrimination task. Taking

Turn static files into dynamic content formats.

Create a flipbook