Hamada Ali Mohamed Ali Nayel|Publications:Character N-gram model for toxicity prediction

You are in:Home/Publications/Character N-gram model for toxicity prediction
Dr. Hamada Ali Mohamed Ali Nayel :: Publications:

Title:	Character N-gram model for toxicity prediction
Authors:	Eman Shehab; Hamada Nayel; Mohamed Taha
Year:	2024
Keywords:	Feature extraction; Machine learning; Molecular toxicity prediction; N-gram; Simplified molecular-input line-entry system
Journal:	IAES International Journal of Artificial Intelligence (IJ-AI)
Volume:	13
Issue:	4
Pages:	4380-4387
Publisher:	IAES
Local/International:	International
Paper Link:	http://doi.org/10.11591/ijai.v13.i4.pp4380-4387
Full paper	Hamada Ali Mohamed Ali Nayel_document-1.pdf
Supplementary materials	Not Available

Abstract:

Molecular toxicity prediction is a crucial step in the drug discovery process. It has a direct relationship with human health and medical destiny. Accurately assessing a molecule’s toxicity can aid in the weeding out of low-quality compounds early in the drug discovery phase, avoiding depletion later in the drug development process. Computational models have been used automatically for molecular toxicity prediction. In this paper, a machine learning-based model has been proposed. TF/IDF representation scheme has been used for N-gram and integrated with simplified molecular-input line-entry system (SMILES). Multiple machine learning classifiers such as logistic regression (LR), support vector machine (SVM), random forest (RF), decision tree (DT), k-nearest neighbors (KNN), AdaBoost, multi-layer perceptron (MLP), and stochastic gradient descent (SGD) classifiers have been implemented. A wide range of N-gram models have been implemented and trigram reported the best results. RF and SVM achieved 85% and 84% accuracy respectively. Comparable to state-of-the-art models, our results are acceptable as we used minimum available resources.

Dr. Hamada Ali Mohamed Ali Nayel :: Publications: