Title: Tones, Tunes and the Dynamics of Intonational Categories
Speaker: Jennifer S Cole, Northwestern University, USA
In contemporary linguistic models of intonation, surface pitch contours derive from a sparse specification of tonal primitives, and a fundamental question for this theoretical approach concerns the nature and categorical status of the phonological elements that make up these representations. This remains an open question, even for an intensively studied language like English, due to several factors: analytic challenges in determining the appropriate segmentation and parameterization of the F0 signal, ubiquitous and extensive intra- and inter-speaker variability in the scaling of F0 movements, many-to-many mapping between proposed tonal features (e.g., pitch accents or edge tones) and pragmatic context, and challenges in F0 estimation due to voiceless phones or creaky voice. This talk presents highlights from a series of experiments on American English intonation that seek evidence for categorical variation among phrase-final (‘nuclear’) tunes and their component features–pitch accents, phrase accents, and boundary tones—with data elicited through an intonation imitation paradigm. Quantitative modeling of dynamic F0 trajectories from imitated productions reveals a much smaller number of robust, categorical distinctions in nuclear tunes than is predicted from the prevailing Autosegmental-Metrical (AM) model (Pierrehumbert 1980; see also Ladd 2008). These primary categories distinguish phrase-final high-rising F0 trajectories from other (low-rising, rising-falling, falling) trajectories that end in a lower F0. Additional distinctions emerge, for some speakers, in fine-grained, continuous patterns of F0 variation within each of these two primary categories. Data from perceptual discrimination of model tunes points to the same partitioning of F0 trajectories into primary and secondary tune classes. While these findings have immediate implications for the AM model of American English, they also provide cues to an underlying dynamical system that governs the temporal and scaling patterns of F0 trajectories in all of our data. This talk presents a simple, unified dynamical systems model of the F0 trajectories of pitch accents as a first step in an ongoing project. The proposed dynamical system is based on the empirical F0 trajectories from our production experiments, and captures the small number of categorical distinctions and the finer-grained within-category variation observed in our data. The talk concludes with a discussion of dynamical systems as an analytic framework that introduces a new perspective on the question of categorical vs. gradient phonetic variation, suggesting a shift in focus for future empirical work on intonation in English and other languages, and possible extensions to F0 patterning in tone languages.
Jennifer Cole is a Professor in the Department of Linguistics. Her research investigates the sound patterns of human languages and how speech sounds are used to signal meaning about words, sentences and utterances in everyday communication. Her current work focuses on prosody—the intonation and rhythmic patterns of language—and its role in conveying information about linguistic structure, pragmatic meaning, speaker emotion, and the dynamics of social interaction. Dr. Cole has pioneered methods of prosodic annotation for large speech databases using crowd-sourcing. Her work combines experimental methods with large-scale observational analyses of natural interactions, in English, Hindi-Urdu, Spanish, and many other languages, using computational and statistical modeling with acoustic and behavioral data.
Title: Perceptual similarity and acoustic variability as filters on tonal variation and change
Speaker: James Kirby, Ludwig-Maximilians-Universität München, Germany
In order to reason about the nature and directionality of constraints on tonal variation and change, researchers have typically focused on the comparison of acoustic f0 trajectories in real (or apparent) time, as well as the analysis of patterns of contextual tonal variation. However, like segmental sound change, tone change must ultimately involved perceptual realignment as well as changes in production norms. If there exist regions of greater or lesser perceptual equivalence within the tonal perceptual space, we could then ask whether these correlate with higher or lower regions of acoustic variability in tone pitch trajectories. Establishing this relation between the production and perception is crucial to formulating testable hypotheses about more and less likely patterns of tone change.
In this talk, we present our ongoing work into probing the perceptual space of lexical tonal representations, focusing on two case studies of Thai and Vietnamese. First, using data from a similarity judgment task, we show that the two languages partition the perceptual tonal contour space in different ways. In Thai, the primary dimension of perceptual similarity is rises vs. falls, while in Vietnamese, contours are clustered according to the register of the midpoint (high, mid, or low). Importantly, listeners of both languages seem to prioritize information in the second half of tone contours, consistent with previous work establishing the relatively greater perceptual salience of f0 offsets.
We then proceed to assess the variability of tonal production patterns on the basis of data drawn from spoken language corpora of the two languages. We find that the portions of the f0 trajectories with the highest stability across talkers do not correlate straightforwardly with the regions of maximal perceptual salience, as regions of lowest variability are almost invariably located from the onset to midpoints of tonal contours. We conclude by proposing a possible account of this production-perception mismatch and outline its implications for theories of tonal variation and change.
James Kirby is a linguist and speech scientist. His research focuses on tone and register, sound change, computational and statistical methods for speech processing, and language and music, with an areal focus on languages of East and Southeast Asia.
He received an MA from the University of California-San Diego in 2005 and completed his PhD at the University of Chicago in 2010 under the supervision of Alan Yu and John Goldsmith. He spent the next 10 years in the Department of Linguistics and English Language at the University of Edinburgh, where he remains an Honorary Fellow. In 2021, he moved to the LMU Munich to take up the Bavarian AI Chair in Spoken Language Processing at the Institute of Phonetics and Speech Processing.
He has previously been the recipient of an AHRC Early Career fellowship (2015-2017) to study tonal text-setting, and together with Marc Brunelle, an AHRC Research Grant to investigate the evolution of register in Southeast Asia (2017-2021). He is currently the Principal Investigator of the ERC-funded EVOTONE project, studying the emergence and evolution of linguistic tone.
Title: Exploring the Relationship Between Gesture and Pitch ‘Prominence’ From the Perspective of African Tonal Languages
Speaker: Katie Franich, Harvard University, USA
The relationship between speech and co-speech gesture has been explored in a number of non-tonal languages, many of which have shown important relationships between intonational prominence and gesture. For example, gestures have been found to occur more frequently with higher and more dynamic pitch accent types in multiple languages (Loehr 2012, Im & Baumann 2020, Baills et al. 2023), and the timing of gesture peaks (or apexes) has been shown to closely track with pitch peaks (Jannedy & Mendoza-Denton 2005, Leonard and Cummins 2010, Esteve-Gibert & Prieto 2013, Pouw et al. 2019). This has led some researchers to go so far as to posit a biomechanical basis for the link between gesture and pitch (Pouw & Fuchs 2022). Little work has explored the relationship between gesture and pitch in lexical tonal languages, where a primary function of pitch is to distinguish between words, and where pitch-based intonational events (if present) tend to take the form of boundary tones or global changes in pitch height or range, rather than pitch accents (Chao 1933; Trần 1967; Luksaneeyanawin 1998; Brunelle et al. 2012; Downing & Rialland 2017; Xu 2017; DiCanio & Hatcher 2018). Here, I explore the gesture-tone/pitch relationship in three Niger-Congo languages, Medʉmba and Kejom/Bantu (both Grassfields Bantu) and Igbo in order to examine whether gesture presence and timing are modulated by tone and pitch, in addition to other prosodic factors. While lexical tone in-and-of-itself does not predict gesture presence, I show that the timing of co-speech gestures is nonetheless sensitive to pitch information, such that coupling relationships between tonal gestures and manual gestures appear likely. Data from a pointing task show that tone melody also appears to play a modest role in gesture alignment in Grassfields Bantu languages (but not in Igbo), suggesting language-specific links between tone, morphological structure, and prominence.
Kathryn Franich is an Assistant Professor in the Department of Linguistics at Harvard University, where she is also a member of the Mind, Brain, Behavior faculty initiative and a faculty affiliate of the Center for African Studies. She directs the Harvard PhonLab, where her team examines patterns of speech acoustics, articulation, and perception in order to understand cross-linguistic patterns of prosody and rhythm and how they link up with other aspects of behavior such as co-speech gesture, music, and dance. Much of her work draws on data from Niger-Congo languages. This includes Medʉmba, a Grassfields Bantu language spoken in Cameroon, on which she has been conducting fieldwork since 2010.
Title: When East speaks West: Assumptions about Prominence and Intonation in Singapore English
Speaker: TAN Ying Ying, Nanyang Technological University, Singapore
Many speakers of postcolonial Englishes exhibit anxieties about their own varieties of English, and are often concerned about the perceived “correctness” of their English language usage. Typically, speakers and language pedagogists make observations about how these Englishes exhibit different intonation and/or stress patterns as compared to British or American English. Much work has also been done to suggest that such “deviations” create problems for intelligibility. Yet intelligibility is a perceptual issue, and one knows from a large body of phonetic research that prosody perception is not a straightforward affair. Using Singapore English as a point of reference, this paper looks at prominence and intonation production and perception gathered from a few different studies, and highlights the assumptions and challenges of doing prosody research in sites where multilingualism is the norm. More critically, this paper suggests that prosodic differences in World Englishes need to be embraced without the preconceived, traditional notions of Western “nativeness”.
Ying-Ying Tan is an Associate Professor of Linguistics and Multilingual Studies at the Nanyang Technological University, Singapore. She is the first Singaporean to have received the prestigious Fung Global Fellowship from Princeton University. She is a sociophonetician who has published on accents, prosody, and intelligibility, focusing primarily on languages in Singapore.