You are in:Home/Publications/BFCAI at CoLI-Tunglish@ FIRE 2023: Machine Learning Based Model for Word-level Language Identification in Code-mixed Tulu Texts

Assist. Ahmed Megahed :: Publications:

Title:
BFCAI at CoLI-Tunglish@ FIRE 2023: Machine Learning Based Model for Word-level Language Identification in Code-mixed Tulu Texts
Authors: Ahmed M. Fetouh; Hamada Nayel
Year: 2023
Keywords: Natural Language Processing; Code-mixed Text; Language Identification; Machine Learning; TF-IDF; SVM
Journal: FIRE 2023 – Working Notes (CoLI-Tunglish Shared Task)
Volume: Not Available
Issue: Not Available
Pages: 205–212
Publisher: CEUR Workshop Proceedings
Local/International: International
Paper Link:
Full paper Ahmed Megahed_T4-4.pdf
Supplementary materials Not Available
Abstract:

This paper describes the model submitted by the BFCAI team for the CoLI-Tunglish shared task held at FIRE 2023. The proposed approach employs character n-gram TF-IDF vectorization enhanced with word-length features. Several machine learning classifiers were evaluated, including Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), K-Nearest Neighbors (KNN), and Multi-layer Perceptron (MLP). Experimental results demonstrate that SVM outperformed all other models. System submissions were evaluated using the macro-average F1 score, where the proposed SVM-based model achieved an F1 score of 81.2% on the test set, ranking second among all participating teams

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus