In this study, we examine factors that lead to the misperception of speech in conversational speech transcription, an area which has had little previous research. We investigate the role of surprisal, a measure of expectedness and language processing effort, on misperceptions. A dataset of word errors was generated using two sets of transcriptions that have undergone two iterations, where the second transcription was hand-corrected to reduce the error rate. Errors fell into three basic categories: insertions (a word occurred in the first but not second iteration), deletions (a word occurred in the second but not first iteration), and substitutions (a mismatch between iterations). The dataset includes surprisal measures and syntactic information about word errors and their respective baselines. Surprisal was calculated using a trigram model and an LSTM neural network model. Using linear mixed effects models, we found that category of word error and syntactic class have a complex relationship to surprisal. A qualitative analysis shows that transcribers in the first iteration often generate errors which lead to a more formal grammatical structure. Future work will consider the expectedness of both lexical and acoustic information and its relationship to misperceptions.
We're Not Surprised You Didn't Notice That: Linguistic Surprisal and Misperception in Conversation
Room
409