Authors
Elizabeth Ellen Shriberg
Publication date
1994
Institution
University of California, Berkeley
Description
This thesis examines disfluencies (eg,“um”, repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably regular trends in a number of dimensions. These regularities have consequences for models of human language production; they can also be exploited to improve performance in speech applications.
The method includes analysis of over 5000 hand-annotated disfluencies from a database (250,000 words) containing three different styles of spontaneous speech: task-oriented humancomputer dialog, task-oriented human-human dialog, and human-human conversation on a prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations correspond to observable characteristics (“features”) in the data, including: 1) the speech domain; 2) the speaker; 3) the sentence in which a disfluency occurs; 4) word-related characteristics of the disfluency; and 5) simple acoustic characteristics of the disfluency. A methodology is developed for representing these features in a database format, and an algorithm is provided for automatic disfluency type classification based on this representation.
Total citations
1995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320243131010111423923182533232824382429415649354738454643496218