Multi-Label Classification of Computer Science Research Papers Using Paper' Metadata

Sajid, Naseer Ahmed

Please use this identifier to cite or link to this item: http://localhost:80/xmlui/handle/123456789/5248

Title:	Multi-Label Classification of Computer Science Research Papers Using Paper' Metadata
Authors:	Sajid, Naseer Ahmed
Keywords:	Database, Data Mining
Issue Date:	2018
Publisher:	Capital University of Science and Technology, Islamabad.
Abstract:	In scientiﬁc literature, a publication is deemed to be a way of expression regarding scientiﬁc contribution in a speciﬁc context of a discipline. It can be further substantiated through a well-known quote that “Communication in science is realized through research publications”. Over the decades, the tremendous increase has been witnessed in the production of documents available in the digital form. The increased production of documents has gained so much momentum that their rate of production jumps two-fold every ﬁve years. The large chunk of these documents comprises of research publications due to the subsequent discoveries and inventions in science. This incessant process of research publications has never been interrupted on the contrary, it has gained signiﬁcant momentum. Almost 28,100 active scholarly journals are publishing almost 2.5 million articles per year. These articles are searched over the Internet via search engines, digital libraries, and citation indexes. However, retrieval of relevant research papers for user queries is still a pipedream. This is due to the fact that scientiﬁc documents are not indexed based on some subject classiﬁcation hierarchies such as ACM classiﬁcation system for Computer Science. This has motivated researchers to propose innovative approaches for research papers classiﬁcation. This is not only beneﬁcial for relevant retrieval of research papers but also is helpful in many other application scenarios such as when: (1) journal/conference editors want to identify reviewers; (2) research scholar wishes to identify the suitable supervisor; (3) authors intend to submit their research papers; and (4) one seeks to analyze trends, ﬁnd experts and to recommend relevant papers etc. In this dissertation, author has critically reviewed the literature on research papers classiﬁcation and identiﬁed the following research deﬁciencies which have been focused in this dissertation: (1) The existing research papers’ classiﬁcation schemes utilize content of papers and most of the time, non-availability of content make those schemes non-applicable. There is a need to explore some alternative features to classify research articles that could produce results closer to content based approaches. (2) Majority of state-of-the-art approaches focus on single-label classiﬁcation, while experiments on comprehensive dataset revealed that a research article may belong to multiple categories. There is a need of such multi-label classiﬁcation system that utilizes best possible alternate of the content based approaches with closer or improved accuracy. (3) The existing multi-label classiﬁcation schemes classify citations into limited number of categories, In Computer Science domain; ACM classiﬁcationsystem contains 11 classes at its root level. An approach that could classify research articles at least to the root level of ACM classiﬁcation system is a need of the hour. The objective of this dissertation is to use freely available metadata in the best possible way to perform multi-label classiﬁcation and to evaluate that; to what extent metadata based features can perform similar to content-based approaches? We have proposed, developed and evaluated techniques on metadata such as Title , Keywords, Title & Keywords, References of the research papers and have reported the achieved results. For classiﬁcation of research articles based on metadata and into multi-labels, we have harnessed metadata in diverse ways for example: (1) Multi-label Document Classiﬁcation using Papers’ Metadata (Title & Keywords); and (2) Multi-label Document Classiﬁcation based on Research Articles’ References. These techniques have been evaluated for two diﬀerent and diversiﬁed datasets. One dataset is from online journal known as Journal of Universal Computer Science (J.UCS) and other is benchmark dataset comprises of research papers published by the ACM. These techniques yield encouraging results (i.e. 88% of accuracy) by using only freely available metadata as compared to the state-of-the-art techniques on both datasets.
Gov't Doc #:	17634
URI:	http://142.54.178.187:9060/xmlui/handle/123456789/5248
Appears in Collections:	Thesis

Files in This Item:

File	Description	Size	Format
10894.htm		121 B	HTML	View/Open

Show full item record