The Speech Chain

Page 1

science and technical writing

I Hear What You Say It is a marvelous evolution. We draw in air and as it leaves our bodies, muscles coordinate with a multitude of structures to phonate and articulate words, musical notes, grunts, whistles, clicks, snaps, hisses and all the other utterances of the over 2700 human languages. By Rob Crimmins

Speech is a common skill and an evolutionary development of astounding complexity. We think about it when speaking to a crowd or if a sore throat makes talking painful but most of the time speech just happens. You need to tell someone something and the words come out. They don’t just come out though. The mind and body are engaged in a fantastic process every time you speak, sing, whistle or shout. For a different perspective on spoken communication, imagine an intelligence from another planet is studying us. Let’s say they use telepathy to communicate

with each other. One of them has to explain to another how speech works. “I tell you it’s amazing. By manipulating the fluid in which they live they’re able to convey their most complex and abstract thoughts.” The second alien cocks his antenna and rolls his third and fifth eyes and says telepathically, “Oh really, how do they do that?” “They do it by drawing the fluid into their bodies, then expelling it. As they expel it they cause it to vibrate. The exhausted fluid causes a wave propagation that impinges upon receptors in the bod-


ies of those within range of the wave. The mechanical action of the receptors is converted into electrical energy that is instantly converted to meaningful symbols. It permits millions of combinations of signals. For most of them the skill is perfect and automatic, seemingly thoughtless. It is a marvelous evolution.” Still skeptical, the second alien thinks to his friend, “They must have very long life spans to master such a skill.” “Not at all. Their children learn it by themselves before they have reached a fifth of their maturity.” It is a marvelous evolution. We draw in air and as it leaves our bodies, muscles coordinate with a multitude of structures to phonate and articulate words, musical notes, grunts, whistles, clicks, snaps, hisses and all the other utterances of the over 2700 human languages. The process of producing and perceiving speech is referred to as the “speech chain”. It begins in the mind of the speaker who chooses the words, organization and grammar that will convey his thoughts. The linguistic information is transmitted along three neural pathways which simultaneously control the tongue, lips, jaw, velum (or soft palate), vocal cords, larynx and lungs. The changes in the surrounding air create sound waves which the speaker and his listeners perceive. For the speaker his words are important feedback. The listener and his perceptual and interpretive equipment and faculties form the other half of the chain. For him, the moving air is converted to the mechanical motion of the eardrum and bones of the middle ear. The inner ear fluid then excites the auditory nerve. As with everything else we need and study we categorize the sounds that we make. Speech sounds are described in reference to the place along the vocal tract where the sound is articulated. The point of articulation is where maximum constriction occurs. Most sounds and the mechanisms that form them can be at least partially understood by making the sound and feeling the structures at work to produce it. For the major languages the primary types of speech sounds are vowels, nasals, plosives, and fricatives. We form vowel sounds at the glottis. The glottis is the opening manipulated by the vocal cords at the top of the larynx. The vocal cords are two strong bands; connective tissues (inferior thyro-arytenoid ligaments, and muscular fibers) covered on their surface with a thin layer of mucous membrane. They are attached to the Adams apple (thyroid cartilage) at the front of the larynx and to the arytenoid cartilage at the back of the larynx. The location and action of the vocal cords and glottis can be felt when vowel sounds are sung.

The laryngeal musculature tenses the vocal cords to narrow the opening between them (the glottis). As the air passes through, its velocity increases creating a negative pressure above the glottis. Air pressure from the lungs builds up below the glottis, forcing it to open. These forces cause the elastic vocal cords to vibrate producing a tone. They continue to vibrate at a constant frequency as long as the aerodynamic forces and the muscular tension are maintained. (This explanation is part of the myoelastic-aerodynamic theory of phonation.) Sounds involving vibrating vocal cords are said to be “voiced sounds”. The speech sounds that are produced when the vocal cords are apart and not vibrating are unvoiced. All the glottal sounds produced while whispering are unvoiced. By lowering the velum, or soft palate, airflow is channeled through the nasal cavity producing nasal tones. The soft palate can be felt when you hold your nose and speak. Nasalized vowels are produced with the addition of nasal resonances. Nasal consonants are formed when air flow through the mouth is completely cut off. “M”, “N” and “ing” are nasal consonant sounds where the air flow constriction is respectively at the lips, hard palate (roof of the mouth), and velum. Plosives are formed by stopping air flow and releasing it suddenly. The location along the vocal tract where constriction occurs and the manner in which the air is released determine the specific sound. The words “Cat”, “bake”, and “put” begin and end with plosive sounds. These types of plosives, where the interruption of air flow is brief and not necessarily complete are called flaps. Trills are plosives formed by tensing the articulatory device to periodically interrupt the air flow. A raspberry is a trill. So is Roy Orbison’s growl. The beginning and ending sounds of the words “these”, “froth”, “vase” and “hash” are fricatives. These are formed by the partial constriction of air flow and the resulting turbulence. To make the correct sounds and to join them to form intelligible speech requires rapid decision making and considerable neural activity. In addition to grammar and syntax, speech formation also depends heavily on context, conversational rules, memory, meaning, and the speakers knowledge of the listener. The intended meaning has as much to do with speech production and perception as the choice of words. People usually remember what was meant more than what was said. The millions of signals, effects and responses that occur in the mind of the speaker are part of an extremely complex electrochemical process. Roger

Penrose, the English physicist, has suggested that the incredibly rapid switching associated with brain activity may actually be governed by the rules of quantum mechanics. In such a system neurons could make quantum jumps, or move from one space to another in no time and without traversing the space between them. The listener, and with the exception of a deaf person talking to himself, there is always at least one listener for every speaker, forms the other half of the speech chain. Speech perception is influenced by many of the same factors affecting the speaker and a few more. The theories of speech perception and production are closely interrelated. Aspects of each fall within the fields of psycholinguistics and psychoacoustics. We perceive speech sounds differently than other types of sounds. Speech is processed in a very complicated fashion and more rapidly than other types of auditory signals. Several parts of the brain are engaged in the reception and interpretation of speech. Sound spectrograms show slight differences in syllabic units but compared to the analysis that the human listener applies these differences are small. Some consonant sounds that are actually quite different will appear nearly identical on the spectrogram. The machine may fail to discern even these primary differences much less the difference in meaning of homonyms, the affect of accent, tone, context, and the many other factors that the human processes immediately. It’s little wonder that reliable and consistent speech processing equipment has yet to achieve wide spread application. Given the complexity of the process it’s not surprising that a lot can go wrong. Speech disorders are classified according to their causes or symptoms. The major causes are physical, imitative/environmental, and psychogenic. Each of these are further broken down into disorders of articulation, rhythm, voice and symbolization. Lisping is a disorder of articulation. So is lalling, in which r, l, t and d sounds are mis-articulated. Others are delayed speech, involving the absence of consonants and lack of intelligibility, and Dysarthria, characterized by distortions and sound substitutions. Dysarthria is caused by lesions in the central or peripheral nervous systems. Stuttering and stammering are common disorders of rhythm. Others are cluttering, characterized by omitted and slurred syllables and words, plastic speech in which transition between sounds isn’t smooth and breath control is poor, and athetotic speech in which the normal rate of speech is disturbed by a general

jerkiness. A number of diseases such as multiple sclerosis, and Parkinson’s disease can lead to disorders of rhythm. Voice disorders affect loudness, pitch, and voice quality. Laryngitis leading to hoarseness is one example of a disorder of voice. A high pitched voice caused by a small larynx or psychological tension and a monotonous voice perhaps due to a lack of pitch perception are other examples. A striking example of nasality, another general classification of voice disorder, is cleft palate speech. This is the result of an open palate due to the failure of the bilateral structures of the soft palate to join during fetal development. Disorders involving expression and language formulation are disorders of symbolization. Aphasia, which is a lack of understanding of the symbolic aspects of language resulting in the inability to effectively communicate in speech or writing is an example of a disorder of symbolization. Aphasia can result from injury or birth defect but in either case it involves an abnormality to specific parts of the brain. Much can go wrong with the process but most of us are effective communicators. We may not all be orators but the average speaker’s abilities are impressive, particularly from the view of our intelligent and objective aliens. The beauty of the evolution is particularly striking considering the fact that the generation of speech sounds is accomplished by physiological devices that are primarily for very different purposes. Language developed long after the mechanisms that produce speech sounds were formed. The vocal cords were (and still are) primarily for closing the glottis to stop air flow out of the lungs. This is so we can hold our breath during periods of exertion when the rib cage must be rigid. The degree to which we have learned to manipulate that device is amazing. The next time you hear an opera singer or four part harmony remember that those beautiful tones are being produced by ligaments and muscle whose primary function is as a simple closure. We take our speech for granted because it is done so naturally but our thoughtless treatment of such a wonderful skill only adds to it’s grace. Like the athlete, the machine operator, the swordsman, typist, tradesman, and pilot we are each equipped with a tool that we adroitly manipulate without difficulty. It is another example of man’s ability take what’s available and put it to good use. Considering the secondary nature of the tooling and what is done with it I would say it is the finest example.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.