Off-campus UNL users: To download campus access dissertations, please use the following link to log into our proxy server with your NU ID and password. When you are done browsing please remember to return to this page and log out.
Non-UNL users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Silent speech recognition from articulatory motion
Abstract
Silent speech recognition is the process of converting motion data of articulators (e.g., tongue, lips, and jaw) into speech in the form of text. The primary objective of this dissertation was to develop new approaches for silent speech recognition from segmented and continuous input of tongue and lip movement data at three levels of speech units with increasing conceptual complexity—phonemes, words, and sentences. At each level, unique theoretical issues were addressed and plans for use in specific applications were described. This dissertation is motivated by the need for (1) speech movement-based treatment options for people with speech and voice impairments and (2) computational approaches for recognizing speech when acoustic data are not available or extremely noisy. Machine learning and statistical shape analysis were used to classify and quantify the articulatory distinctiveness of phonemes, words, and sentences. The approach is unique in that it maps the motion data directly (instead of articulatory features) to speech units. Procrustes analysis, a statistical shape matching approach, provided an index of articulatory distinctiveness of vowels and consonants, which was used to derive quantitative articulatory vowel and consonant spaces. The derived vowel space resembles long-standing descriptions of articulatory vowel space. The theoretical properties and practical applications in speech pathology (e.g., motor speech decline in amyotrophic lateral sclerosis) of these spaces were also discussed. In addition, support vector machine, Procrustes analysis, and Eigenspace approaches were used to classify a set of phonetically balanced words and functional sentences from articulatory motion. The direct mapping approaches resulted high classification accuracy levels, which were adequate for practical applications. A near-time algorithm (Holistic Articulatory Recognition, HAR) to recognize the whole words and sentences from continuous (unsegmented) articulatory motion was proposed and evaluated. The accuracy and speed of HAR demonstrated its potential for practical applications. HAR is based on classification probabilities and hence any classifier that could estimate them can be incorporated seamlessly. HAR can serve as the recognition component of an articulation-based silent speech interface that may provide an alternate oral communication modality for persons with speech impairments.
Subject Area
Speech therapy|Computer science
Recommended Citation
Wang, Jun, "Silent speech recognition from articulatory motion" (2011). ETD collection for University of Nebraska-Lincoln. AAI3487114.
https://digitalcommons.unl.edu/dissertations/AAI3487114