DSpace logo

Please use this identifier to cite or link to this item: http://142.54.178.187:9060/xmlui/handle/123456789/4137
Title: Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach
Authors: Chiragh, Neelam.
Keywords: Enhancing accuracy of Urdu sentiments Analysis,Using Lexicon-Based Approach
Issue Date: 2018
Publisher: University of Peshawar, Peshawar
Series/Report no.: 17015;
Abstract: In this research the accuracy of Urdu Sentiment Analysis in multiple domains is enhanced by using the Lexicon-based approach. In the lexicon, apart from the traditional approach that considers adjectives only, nouns and verbs are also included. An efficient Urdu Sentiment Analyzer is developed that applies rules and makes use of this new lexicon to perform Urdu Sentiment Analysis by classifying sentences as positive, negative or neutral. Negations, intensifiers and context-depentent words are effectively handled for enhancing accuracy of Urdu Sentiment Analyzer. Specific rules for handling negations, intensifiers and context-dependent words are incorporated in Urdu Sentiment Analyzer. For testing the Lexicon-based approach, a corpus of 6025 sentences from 151 blogs belonging to 14 different genres is collected and the sentences are annotated by three human annotators to classify each sentence as positive, negative and neutral. Evaluating this Urdu Sentiment Analyzer, by using sentences from the corpus, yields the most promising results so far in Urdu language (up to the knowledge of the author) with 89.03% accuracy, 0.86 precision, 0.90 recall and 0.88 f-measure. The comparison with the previous works in Urdu Sentiment Analysis shows that the combination of this Urdu Sentiment Lexicon and Urdu Sentiment Analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu Sentiment Analyzer. Although high accuracy is achieved by Lexicon-based approach in multiple domains for Urdu Sentiment Analysis, which is the main objective of this research, but for comparison, Supervised Machine Learning approach is also used. Three well known classifiers that are Support Vector Machine, Decision Tree and K Nearest Neighbor are tested; their outputs are compared and their results are ultimately improved in several iterations. It is further concluded that K Nearest Neighbor is performing better than Support Vector Machine and Decision Tree. For verification of this result, three evaluation measures i.e. McNemar’s Test, Kappa Statistic and Root Mean Squared Error are used. The result from all these three evaluation measures confirmed that K Nearest Neighbor is performing much better than the other two classifiers and achieved 67.02% accuracy, 0.68, 0.67 and 0.67 precision, recall and f-measure respectively. The results from both the approaches are compared. On the basis of experiments performed in this research, it is concluded that the Lexicon-based approach outperforms Supervised Machine Learning approach, when Urdu Sentiment Analysis is performed in multiple domains in terms of accuracy, precision, recall and f-measure, economy of time and effort.
Gov't Doc #: 17015
URI: http://142.54.178.187:9060/xmlui/handle/123456789/4137
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
9257.htm120 BHTMLView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.