You are in:Home/Publications/An Efficient Speaker Diarization Pipeline for Conversational Speech

Ass. Lect. Wael Ali Basiony Sultan :: Publications:

Title:
An Efficient Speaker Diarization Pipeline for Conversational Speech
Authors: Wael Ali Sultan, Mourad Samir Semary, Sherif Mahdy Abdou
Year: 2024
Keywords: speaker diarization, speaker separation, voice activity detection, optimization, pipeline
Journal: Benha Journal of Applied Sciences
Volume: 9
Issue: 5
Pages: 141-146
Publisher: Benha University
Local/International: Local
Paper Link:
Full paper Wael Ali Basiony Sultan_BJAS_Volume 9_Issue 5_Pages 141-146.pdf
Supplementary materials Not Available
Abstract:

In the domain of audio signal processing, the accurate and efficient diarization of conversational speech is still a challenging task, particularly in environments with significant speaker overlap and diverse acoustic scenarios. This paper introduces a comprehensive speaker diarization pipeline that improves performance and efficiency in processing conversational speech. Our pipeline comprises several key components: Voice Activity Detection (VAD), Speaker Overlap Detection (SOD), Speaker Separation models, robust speaker embedding, clustering algorithms, and sophisticated post-processing techniques. Beginning with Voice Activity Detection (VAD), the pipeline efficiently discriminates between speech and non-speech segments, effectively reducing processing overhead. Following VAD, the Speaker Overlap Detection (SOD) component identifies segments featuring speaker overlap. Following this, a speaker separation model separates the overlapping speech into distinct streams. A pivotal enhancement in our pipeline is the integration of robust speaker embedding and clustering techniques, which capture and utilize speaker-specific characteristics to improve the grouping of speech segments. Finally, the post-processing stage refines these segments to ensure temporal consistency and improve the overall diarization accuracy. We evaluated our pipeline across multiple benchmark datasets, proving significant reductions up to 10% in Diarization Error Rate (DER) compared to existing methods.

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus