DSpace logo

Please use this identifier to cite or link to this item: http://142.54.178.187:9060/xmlui/handle/123456789/5335
Title: REDEFINING URDU MORPHOLOGY AND GRAMMAR FOR THE DEVELOPMENT OF AN INTEGRATED SENTIMENT ANALYSIS FRAMEWORK
Authors: SYED, AFRAZ ZAHRA
Keywords: Computer science, information & general works
Issue Date: 2013
Publisher: UNIVERSITY OF ENGINEERING AND TECHNOLOGY LAHORE – PAKISTAN
Abstract: The rise of social networking sites and blogs has simulated a bull market in personal opinion; consumer recommendations, product reviews, ratings, and other types of online expressions. For computational linguistic researchers, this fast-growing heap of information has opened an exciting research frontier, referred as, the Sentiment Analysis (SA). For English, this area is under consideration from last decade. But, other major languages, like Urdu, are totally overlooked by the research community. Urdu is a morphologically rich and recourse poor language. The distinctive features, like, complex morphology, flexible grammar rules, context sensitive orthography and free word order, make the Urdu language processing a challenging problem domain. For the same reasons, sentiment analysis approaches and techniques developed for other well-explored languages are not workable for Urdu text. This dissertation presents a grammatically motivated, sentiment classification framework to handle these distinctive features of the Urdu language. The main research contributions are; to highlight the linguistic (orthography, grammar and morphology, etc.) as well as technical (parsing algorithm, lexicon, corpus, etc.) aspects of this multidimensional research problem, to explore Urdu morphological operations, grammar and orthographic rules, to redefine these operations and rules with respect to the requirements of sentiment analysis framework. The orthographical, morphological, grammatical and finally the conceptual details of the language are our target concerns. Additionally, our approach can help in the sentiment analysis of other languages, like Arabic, Persian, Hindi, Punjabi etc. The proposed framework emphasizes on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. The framework uses the sentiment-annotated lexicon based approach. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. The experimentation based evaluation of the system with a sentiment-annotated lexicon of Urdu words and two corpuses of reviews as test-beds, shows encouraging achievement in terms of accuracy, precision, recall and f-measure.
URI: http://142.54.178.187:9060/xmlui/handle/123456789/5335
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
2223.htm128 BHTMLView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.