You are in:Home/Theses

Dr. Hamada Ali Mohamed Ali Nayel :: Theses :

Title Biomedical Named Entity Recognition
Type PhD
Supervisors H. L. Shashirekha
Year 2018
Abstract Named Entity Recognition (NER) is a crucial Natural Language Processing (NLP) task which extracts Named Entities (NE) from the text. Names of persons, places, date and time are examples of NEs in general domain texts, while names of genes, proteins and diseases are examples of NEs in biomedical domain termed as BioNE. NER in Biomedical domain (BioNER) is an important preprocessing task for many further tasks such as relation extraction between entities, knowledge discovery and hypothesis generation. The tremendous growth of publications in biomedical research area makes it vital to apply BioNER as it is tough to extract NEs manually. Furthermore, BioNEs pose several challenges related to ambiguous names, synonyms, variations, multi-word NEs and nested NEs. Different approaches have been used for BioNER, such as dictionary approaches, rule- based approaches, Machine Learning (ML) approaches and hybrid approaches. Of late ML approaches specially Artificial Neural Network based models are popularly being used for BioNER. Annotating the dataset for training the models to recognize and classify NEs is a crucial task in BioNER. There are many methods used for annotating the datasets such as XML format, BioNEs offset and Segment Representation (SR). SR is an efficient way of annotating BioNEs within a sentence in order to differentiate them from non-BioNEs. Different SR schemes such as IO, IOE2, IOB2, IOBE and IOBES are used to annotate the dataset to develop efficient BioNER systems. Support Vector Machines (SVM) and Conditional Random Fields (CRF) have been used to train different BioNER models with different SR schemes. The Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 2004) shared task dataset, National Center for Biotechnology Information (NCBI) dataset, BioCreative II Gene Mention (BC2GM) recognition shared task dataset, BioCreative V Chemical-Disease Relation (BC5CDR) task dataset and i2b2/VA 2010 shared task1 dataset are used to assess the performance of BioNER systems. Ensemble approach which combines the output of base classifiers to get better performance from a pool of classifiers than an individual classifier has achieved meaningful and encouraging results. The generalization capability of ensemble classifier which is usually much better than that of base classifiers is the strength of ensemble classifier. It has been applied for different Natural Language Processing (NLP) tasks such as BioNER, word segmentation, word sense disambiguation and Part-of-Speech (PoS) tagging. An ensemble based BioNER system using two different classification algorithms CRF and SVM and with IO, IOB2, IOE2, IOBE and IOBES SR schemes is proposed. To study the impact of ensemble approach on other NLP task an ensemble-based system is developed for Native Language Identification (NLI) - an important NLP task that has got the attention recently by research community. In recent years, Deep Learning (DL) models are becoming important due to their demonstrated success at overcoming complex learning problems. DL models have been applied effectively for different NLP tasks such as PoS tagging and Machine Translation. A DL model for Disease Named Entity Recognition (Disease-NER) using dictionary information is proposed and evaluated on NCBI disease corpus and BC5CDR dataset. Pre-trained word embeddings trained over general domain texts as well as biomedical texts have been used to represent input to the proposed model. This study also compares two different SR schemes, namely IOB2 and IOBES for Disease-NER. The results illustrate that using dictionary in- formation, pre-trained word embeddings, character embeddings and CRF with global score improves the performance of BioNER system. An extension of IOBES SR scheme is proposed to improve the representation of multi- word entities and hence the performance of BioNER. A Bidirectional Long Short-Term Memory (BiLSTM) network is used to design a baseline system for BioNER and the new SR model is evaluated on i2b2/VA 2010 challenge dataset and JNLPBA 2004 shared task dataset. Results obtained illustrate that the proposed SR model outperforms IOB2 and IOBES schemes for multi-word entities with length greater than two. Further, the outputs of different SR models combined using majority voting ensemble method illustrate that ensemble method outperforms the performance of baseline models.
Keywords NLP; Named Entity Recognition; Biomedical NLP
University Mangalore University
Country India
Full Paper download paper

Title On Semantics of Programming Languages: MSc. Report
Type PhD
Supervisors Maher Zayed; Mohamed A. El-Zawawy
Year 2012
Abstract Multi-threaded programs have many applications which are widely used such as operating systems. Analyzing multi-threaded programs differs from sequential ones; the main feature is that many threads execute at the same time. The effect of all other running threads must be taken in account. This these focuses on the analysis of multi-threaded programs. The first aim of our work is to implement partial redundancy elimination for multi-threaded programs via type systems. Partial redundancy elimination is among the most powerful compiler optimization: it performs loop invariant code motion and common subexpression elimination. In chapter 3, we present a type system with optimization component which performs partial redundancy elimination for a multi-threaded programs. In chapter 4, we designed a type systems based data race detector. Data race occurs when two threads try to access a shared variable at the same time without a proper synchronization. A detector is a software that determines if the program contains a data-race problem or not. In this these we develop a detector that has the form of a type system. We present a type system which discovers the data-race problems. We also prove the soundness of our type system.
Keywords
University Benha University
Country Egypt
Full Paper download paper

Title On Semantics of Programming Languages: MSc. Report
Type MSc
Supervisors Maher Zayed; Mohamed Elzawawy
Year 2012
Abstract Multi-threaded programs have many applications which are widely used such as operating systems. Analyzing multi-threaded programs differs from sequential ones; the main feature is that many threads execute at the same time. The effect of all other running threads must be taken in account. This these focuses on the analysis of multi-threaded programs. The first aim of our work is to implement Partial Redundancy Elimination (PRE) for multi-threaded programs via type systems. PRE is among the most powerful compiler optimization: it performs loop invariant code motion and common subexpression elimination. We present a type system with optimization component which performs PRE for a multi- threaded programs. In addition, we designed a type systems based data race detector. Data race occurs when two threads try to access a shared variable at the same time without a proper synchronization. A detector is a software that determines if the program contains a data-race problem or not. In this work, we developed a detector that has the form of a type system. We present a type system which discovers the data-race problems. The soundness of our type system has been proved.
Keywords
University Benha University
Country Egypt
Full Paper download paper

Title Rumors Detection on Social Platforms Using NLP Methods
Type PhD
Supervisors Mohamed Taha; Hamada Nayel
Year 2023
Abstract Social media platforms have grown rapidly in recent years, with billions of people world- wide using them for communication, entertainment, and information. Social media development has dramatically impacted society, affecting how people interact, communicate, and consume information. While social media has numerous advantages, it has also prompted worries about privacy, misinformation, and the influence on mental health, especially among young people. The dissemination of rumors has been significantly impacted by social media platforms. The major platform that has been used for spreading news regarding the Covid-19 pandemic is Twitter. The Covid-19 pandemic has spread a considerable deal of false material on social media. Artificial intelligence proposed several methods to relieve the spread of fake news. In this study, we proposed a model that can discriminate between “fake” and “true” news tweets capable of working with any up-to-date problem. To address this issue, this research explored various learning approaches to detect fake news. We compare different deep learning and machine learning methods for fake news detection, such as CNN, LSTM, Naive Bayes, and Support Vector Machine. The efficiency of these models was evaluated on benchmark datasets and self-collected dataset. This research aims to improve the model used in classifying rumors by utilizing various techniques for text representation such as Word Embedding and TF-IDF. It involves extracting the underlying meanings in texts by searching for semantic relationships between words, phrases, and texts. These processes help in analyzing and understanding texts. The efficiency of these models was tested by training data on a set of tweets. New tweets were collected using Snscrape to track different writing methods and build a model capable of detecting errors with all the changes that occur in a word and returning to the origin of the word. The results of the first model using TF-IDF algorithms and machine learning algorithms showed the superiority of Multi-Layer Perceptron algorithm, achieving an accuracy of 93.8% and an F-score of 93.6% when applied to the English language. The results of the Arabic language models showed the superiority of the Support Vector Machine algorithm, achieving an accuracy of 82.90%, while the K-Nearest Neighbor achieved better results with an F-score of 57.5%. The results showed the superiority of Uni-gram text vectorization over Bi-gram. GloVe word embedding was used with deep learning algorithms to improve text understanding and discover relationships between words. Recurrent neural networks achieved the best results for the English language with an accuracy of 99%, but the ensemble learning model achieved better results in terms of F-score achieved 97%. The Convolutional Neural Network achieved the best results with the Arabic language achieved an accuracy of 83% using the Accuracy measure, while the Ensemble learning model achieved better results using the F-score at a rate of 81.7%. The second step was to test the model on a new test set that had not been tested before. A significant decline of about 25% was found in the English language model, achieving an accuracy of 74%. The experiments showed that adding some modifications to the evidence processing stage to develop the model made it capable of dealing with all the changes that occur in a word and showed an improvement of about 8% achieving an accuracy of 83%. As for the proposed model for the Arabic language, there was a decline of about 5%, achieving an accuracy of 70%. The results vary between deep learning models, but the BI-LSTM showed the difference between the differences in the data. With some modifications to the word processing stage to develop the model and make it capable of dealing with all the changes that occur in a word, there was an improvement of about 8% achieving an accuracy of 78%.
Keywords
University Benha University
Country Egypt
Full Paper download paper

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus