Keynote Speakers

We are honored to welcome the following keynote speakers to present at the conference:

Speaker 1: Prof. Xipeng Qiu, Fudan University

Biography

Prof. Xipeng Qiu is a professor at the School of Computer Science, Fudan University. His research interests include natural language processing and deep learning. He has published more than 100 top journal/conference papers. He spearheaded the development of the large language model MOSS, open-sourced natural language processing tools FudanNLP, and FastNLP, garnering widespread adoption in both academic and industrial circles.


Speaker 2: Prof. Eng Siong Chng, Nanyang Technological University (NTU), Singapore

Biography

Image Description

Prof. Eng Siong Chng is currently an Associate Professor in the College of Computing and Data Science (CCDS) at Nanyang Technological University (NTU) in Singapore. Prior to joining NTU in 2003, he worked at Knowles Electronics (USA), Lernout and Hauspie (Belgium), the Institute of Infocomm Research (I2R) in Singapore, and RIKEN in Japan. He received both a PhD and a BEng (Hons) from the University of Edinburgh, U.K., in 1996 and 1991, respectively, specializing in digital signal processing. His areas of expertise include speech research, Large Language Models, machine learning, and speech enhancement.

He currently serves as the Principal Investigator (PI) of the AI-Singapore Speech Lab from 2023 to 2025. Throughout his career, he has secured research grants from various institutions, including Alibaba ANGEL Lab, NTU-Rolls Royce, Mindef, MOE, and AStar. These grants, totaling over S$18 million, were awarded under the “Speech and Language Technology Program (SLTP)” in the School of Computer Science and Engineering (SCSE) at NTU. In recognition of his expertise, he was awarded the Tan Chin Tuan fellowship in 2007 to conduct research at Tsinghua University in Fang Zheng’s lab. Additionally, he received the JSPS travel grant award in 2008 to visit Tokyo Institute of Technology in Furui’s Lab.

He has supervised the graduation of over 19 PhD students and 13 Masters students. His publication record includes 2 edited books and over 200 journal and conference papers. Additionally, he has contributed to the academic community by serving as the publication chair for 5 international conferences, including Human Agent Interaction 2016, INTERSPEECH 2014, APSIPA-2010, APSIPA-2011, and ISCSLP-2006. Furthermore, he is in the organizing committee for ASRU 2019 (Singapore), ICAICTA 2024 (General Co-chair) and SLT 2024 (General Co-chair).

Title: Enabling LLM for ASR

Abstract:

The decoder-only LLM, such as ChatGPT, was originally developed to accept only text input. Recent advances have enabled it to handle other modalities, such as audio, video, and images. This talk focuses on integrating speech modality into LLMs. The research community has proposed various innovative approaches for this task, including applying discrete representations, integrating pre-trained encoders with existing LLM decoder architectures (e.g., Qwen), multitask learning, and multimodal pretraining. In the talk, I will review recent approaches to the ASR task using LLMs and introduce two works from NTU’s Speech Lab: (i) “Hyporadise,” which applies LLMs to N-best hypotheses generated by traditional ASR models to improve the top-1 transcription result, demonstrating that LLMs not only exceed the performance of traditional language model rescoring but also recover and generate correct words not found in the N-best hypothesis—an ability we call Generative Error Correction (GER); and (ii) leveraging LLMs for ASR and noise-robust ASR by extending the Hyporadise approach to include noisy language embeddings, capturing the diversity of N-best hypotheses under low SNR conditions, and showing improved GER performance with fine-tuning.