Special Sessions

The Organizing Committee of SLT2021 is proud to announce the following Special Sessions.

Integration of Speech Separation, Recognition, and Diarization towards Real Conversation Processing

Organizers: Zhuo Chen (Microsoft), Paola Garcia (Johns Hopkins University), and Shinji Watanabe (Johns Hopkins University)

Owing to the success of deep learning, the application of speech processing has been dynamically shifted from simple automatic speech recognition to more real-world conversation transcriptions including meeting and dinner party scenarios. Such real-world conversation transcription systems require proper integration of speech and audio processing tasks, such as speech separation, speech enhancement, speaker diarization, speech recognition, among others, to tackle long unsegmented recordings with overlapped multiple speakers under the far-field acoustic conditions. The above overall task is one of the most challenging problems for the speech community. In this special session, we provide an opportunity for world-leading researchers in related areas, from academia and industry, to get together, present their advanced studies, and interact with each other to discuss the progress and the future of real-world conversation transcription systems. This special session primarily focuses on the integration of speech separation, speech enhancement, speaker diarization, and/or speech recognition components, as well as fundamental technologies of individual components for the real-world conversation transcription.

Anti-spoofing in Speaker Recognition

Organizers: Kong Aik Lee (A*STAR, Singapore), Man Wai Mak (The Hong Kong Polytechnic University)

With the advances in deep learning and availability of big data, speaker recognition, as an important spoken language technique, has reached a milestone on its accuracy, which makes it one step further to real-world applications. However, beyond accuracy, spoofing is still a great challenge hindering the usage of speaker recognition in scenarios with high-security demands. Recently, voice spoofing has also been improved greatly, such as adversarial samples and voice cloning for targeted attacks of a specific person, or voice morphing and speaker anonymization for non-targeted attacks to hide a speaker’s original identity. By following these trends, this special session focuses on anti-spoofing of speaker recognition to elaborate a more secure biometric technique.

The topics of interest include but are not limited to the following:

Spoofing detection with both logical and physical accesses
Adversarial attack and detection on speaker recognition
DeepFake attack and detection on speaker recognition
Speaker anonymization and de-identification
Feature and representation learning for spoofing detection
Joint optimization of speaker recognition and spoofing countermeasure

Papers for approved Special Sessions have to be submitted and reviewed, according to the same schedule and procedure as regular papers. A sufficient number of papers should be accepted into each Special Session for its final successful setup.