| You are in:Home/Publications/BFCAI at CoLI-Tunglish@ FIRE 2023: Machine Learning Based Model for Word-level Language Identification in Code-mixed Tulu Texts | |
Assist. Ahmed Megahed :: Publications: |
|
| Title: | BFCAI at CoLI-Tunglish@ FIRE 2023: Machine Learning Based Model for Word-level Language Identification in Code-mixed Tulu Texts |
| Authors: | Ahmed M. Fetouh; Hamada Nayel |
| Year: | 2023 |
| Keywords: | Natural Language Processing; Code-mixed Text; Language Identification; Machine Learning; TF-IDF; SVM |
| Journal: | FIRE 2023 – Working Notes (CoLI-Tunglish Shared Task) |
| Volume: | Not Available |
| Issue: | Not Available |
| Pages: | 205–212 |
| Publisher: | CEUR Workshop Proceedings |
| Local/International: | International |
| Paper Link: | |
| Full paper | Ahmed Megahed_T4-4.pdf |
| Supplementary materials | Not Available |
| Abstract: |
This paper describes the model submitted by the BFCAI team for the CoLI-Tunglish shared task held at FIRE 2023. The proposed approach employs character n-gram TF-IDF vectorization enhanced with word-length features. Several machine learning classifiers were evaluated, including Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), K-Nearest Neighbors (KNN), and Multi-layer Perceptron (MLP). Experimental results demonstrate that SVM outperformed all other models. System submissions were evaluated using the macro-average F1 score, where the proposed SVM-based model achieved an F1 score of 81.2% on the test set, ranking second among all participating teams |














