Wael Ali Basiony Sultan|Publications:Optimized WFST-Based ASR for Arabic: Balancing Memory, Speed, and Accuracy

You are in:Home/Publications/Optimized WFST-Based ASR for Arabic: Balancing Memory, Speed, and Accuracy
Ass. Lect. Wael Ali Basiony Sultan :: Publications:

Title:	Optimized WFST-Based ASR for Arabic: Balancing Memory, Speed, and Accuracy
Authors:	Sherif M Abdou Wael A. Sultan, Mourad S. Semary
Year:	2025
Keywords:	FST-decoding, ASR, Arabic language, Model footprint, WFST
Journal:	Journal of Computer Science
Volume:	Not Available
Issue:	Not Available
Pages:	Not Available
Publisher:	Science Publications
Local/International:	International
Paper Link:	Not Available
Full paper	Wael Ali Basiony Sultan_asr_paper_JCS-3.pdf
Supplementary materials	Not Available

Abstract:

Weighted finite-state transducers (WFSTs) have revolutionized automatic speech recognition (ASR) by enabling significantly faster decoding speeds compared to traditional systems that build the search space progressively. However, applying WFSTs to morphology-rich languages such as Arabic presents challenges due to the large vocabulary, resulting in extensive networks that exceed the memory capacity of standard CPUs. This study introduces various strategies to reduce the size of large vocabulary Arabic WFSTs with minimal impact on accuracy. We employed a star architecture for the network topology, which effectively reduced the network size and improved the decoding speed. Additionally, a two-pass decoding approach was adopted: the first pass used a smaller network with a short history language model, and the second pass rescored the produced lattice with a longer history language model. We explored several tuning parameters to find the optimal balance between network size and accuracy. Our results show that by using an optimized search graph built with a 2-gram language model instead of a 3-gram model, we achieve a 45% reduction in the graph’s memory footprint with a negligible accuracy loss of less than 0.2% MR-WER. On the MGB3 benchmark, our method achieved 40x real-time Arabic ASR data processing with an accuracy of 83.67%, compared to the 85.82% accuracy of state-of-the-art systems, which only achieve 8x real-time performance on standard CPUs.

Ass. Lect. Wael Ali Basiony Sultan :: Publications: