Keynote Speakers

We are honored to welcome the following keynote speakers to present at the conference:

Speaker 1: Prof. Eng Siong Chng, Nanyang Technological University (NTU), Singapore

Biography

Image Description

Prof. Eng Siong Chng is currently an Associate Professor in the College of Computing and Data Science (CCDS) at Nanyang Technological University (NTU) in Singapore. Prior to joining NTU in 2003, he worked at Knowles Electronics (USA), Lernout and Hauspie (Belgium), the Institute of Infocomm Research (I2R) in Singapore, and RIKEN in Japan. He received both a PhD and a BEng (Hons) from the University of Edinburgh, U.K., in 1996 and 1991, respectively, specializing in digital signal processing. His areas of expertise include speech research, Large Language Models, machine learning, and speech enhancement.

He currently serves as the Principal Investigator (PI) of the AI-Singapore Speech Lab from 2023 to 2025. Throughout his career, he has secured research grants from various institutions, including Alibaba ANGEL Lab, NTU-Rolls Royce, Mindef, MOE, and AStar. These grants, totaling over S$18 million, were awarded under the “Speech and Language Technology Program (SLTP)” in the School of Computer Science and Engineering (SCSE) at NTU. In recognition of his expertise, he was awarded the Tan Chin Tuan fellowship in 2007 to conduct research at Tsinghua University in Fang Zheng’s lab. Additionally, he received the JSPS travel grant award in 2008 to visit Tokyo Institute of Technology in Furui’s Lab.

He has supervised the graduation of over 19 PhD students and 13 Masters students. His publication record includes 2 edited books and over 200 journal and conference papers. Additionally, he has contributed to the academic community by serving as the publication chair for 5 international conferences, including Human Agent Interaction 2016, INTERSPEECH 2014, APSIPA-2010, APSIPA-2011, and ISCSLP-2006. Furthermore, he is in the organizing committee for ASRU 2019 (Singapore), ICAICTA 2024 (General Co-chair) and SLT 2024 (General Co-chair).

Title: Enabling LLM for ASR

Abstract:

The decoder-only LLM, such as ChatGPT, was originally developed to accept only text input. Recent advances have enabled it to handle other modalities, such as audio, video, and images. This talk focuses on integrating speech modality into LLMs. The research community has proposed various innovative approaches for this task, including applying discrete representations, integrating pre-trained encoders with existing LLM decoder architectures (e.g., Qwen), multitask learning, and multimodal pretraining. In the talk, I will review recent approaches to the ASR task using LLMs and introduce two works from NTU’s Speech Lab: (i) “Hyporadise,” which applies LLMs to N-best hypotheses generated by traditional ASR models to improve the top-1 transcription result, demonstrating that LLMs not only exceed the performance of traditional language model rescoring but also recover and generate correct words not found in the N-best hypothesis—an ability we call Generative Error Correction (GER); and (ii) leveraging LLMs for ASR and noise-robust ASR by extending the Hyporadise approach to include noisy language embeddings, capturing the diversity of N-best hypotheses under low SNR conditions, and showing improved GER performance with fine-tuning.


Speaker 2: Prof. Xipeng Qiu, Fudan University

Biography

Image Description

Qiu Xipeng is a professor at the School of Computer Science, Fudan University. He received B.S. and Ph.D. degrees from Fudan University. His major research areas include NLP and LLM. He has published more than 100 papers in the top journals and conferences (ACL/EMNLP/IJCAI/AAAI, etc.) with 20,000+ citations. He has written a textbook “Neural Network and Deep Learning”, which is adopted as a textbook by hundreds of organizations and universities. also available on Github. He won the Outstanding Paper Award of ACL 2017 and the Best Paper Award of CCL 2019/2023. He is also the project leader of LLM MOSS.

Title: From Large Language Model to World Model

Abstract:

Large language models (LLMs) still have certain shortcomings in handling complex, multimodal and long-term memory tasks. In order to solve the above limitations, we can let LLM interact with the actual environment for continuous learning, so that LLM becomes a world model, overcoming some of the current limitations of LLM in tasks that require understanding of the physical and social world. However, compared with LLMs, the technical route of world models is not clear, and the future development path is still controversial. This talk mainly discusses how to improve the capabilities of LLMs from the perspectives of multilingual and multimodal expansion, embodied learning, etc., so as to achieve a world model. This report will also report some of the latest research progress of the LLM MOSS2.