A Framework to Improve Classification of Positive and Negative Opinions in Roman Urdu-English Code Switching Environment

Hassan, Muhammad Awais

Please use this identifier to cite or link to this item: http://localhost:80/xmlui/handle/123456789/5289

Title:	A Framework to Improve Classification of Positive and Negative Opinions in Roman Urdu-English Code Switching Environment
Authors:	Hassan, Muhammad Awais
Keywords:	Computer Science
Issue Date:	2016
Publisher:	University of Engineering & Technology, Lahore.
Abstract:	In computational linguistics, sentiment analysis facilitates classification of opinion as a positive or a negative class. In last decade, the area of sentiment analysis of English language is explored largely with different techniques those have improved the overall performance.Urdu is language of sixty-six million people and largely spoken in south-asian subcontinent. Also, it is national language of Pakistan which is world sixth most populous country according to United Nations Population Division. Sentiment analysis of Urdu language is important tool to understand the behavioural aspects, cultural values and social habits of the people living in this part of world. Opinion mining is also crucial for governments, policy makers, business owners and brand ambassadors to make their decisions in accordance to sentiment of the public. However, sentiment analysis of Urdu language is not well explored as that of English language. The Urdu sentiment analysis is performed with simple Bag-of-Word (BoW) method and machine learning (ML) techniques with limited set of features. The BoW method is not sufficient to handle complex opinions. Also, the accuracy of ML techniques, with legacy features, is not comparable to the sentiment classification task of other languages. For English language, the discourse information (sub-sentence level information) boosted the performance of both BoW method and ML techniques. A theory for Urdu sentiment analysis that extract and use the discourse information at sub sentence level and also suggest a computational model to achieve more accurate and better results than the simple bag of word approach. The proposed solution segmented the sentiment into two sub-opinions, extracted discourse information (discourse relation and polarity relation), proposed an extended BoW method (rule based method) and suggested a new small subset of features for ML techniques. The results significantly enhance (p < 0.001) the performance of recall, precision and accuracy by 37.25%, 8.46%, and 24.75% respectively. The current research targeted sentiment with two sub-opinions that remain excellent until the opinions are short messages like those on Twitter, in forum comments or as Facebook status posts. The proposed technique can be extended for sentiments with more than two sub-opinions such as blogs, reviews, and TV talk shows.
Gov't Doc #:	18589
URI:	http://142.54.178.187:9060/xmlui/handle/123456789/5289
Appears in Collections:	Thesis

Files in This Item:

File	Description	Size	Format
11361.htm		121 B	HTML	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets