Research Paper Recommendation by Exploiting Co-citation Occurrences in Sections of Scientific Papers

Ahmed, Riaz

Please use this identifier to cite or link to this item: http://localhost:80/xmlui/handle/123456789/5165

Title:	Research Paper Recommendation by Exploiting Co-citation Occurrences in Sections of Scientific Papers
Authors:	Ahmed, Riaz
Keywords:	Information Retrieval, Digital Library
Issue Date:	2018
Publisher:	Capital University of Science and Technology, Islamabad
Abstract:	Citation indexes and digital libraries index millions of research papers and make them available to the scienti c community; however, searching the intended information from these huge repositories remain a challenge. Everyday, the research papers in online digital libraries are increasing due to di erent number of conferences, workshop, and journals which are being arranged throughout the world. According to the statistic in 2017, one of the digital libraries in medical domain, such as PubMed consisted of 28 millions of research documents. The manual searching of relevant research papers from such a huge amount of documents is a very di cult task. Therefore, this area has attracted the attention of researcher's worldwide to propose and implement innovative techniques that could recommend relevant papers to researchers. The identi cation of relevant research papers has become an important research area. For this, research community has proposed more than 90 di erent approaches in the past 15 years. These approaches have utilized di erent data sources, such as metadata, content, pro le based data and citations of research papers. These techniques have certain strengths and limitations which have been critically reviewed and presented in this document. One of the important approaches in this area is co-citation analysis which considers two documents as relevant if they are co-cited in other scienti c documents. The original approach used references from the reference list of scienti c documents to make such observations. However, in the recent years, the content of documents have also been exploited along with the reference list to enhance the accuracy. These approaches include Citation Proximity Analysis (CPA), Citation Order Analysis (COA), and exploit bytes of the content of scienti c papers. These approaches conceptualize the occurrence of co-citations in di erent level of proximity and give more weights to the co-cited documents which are co-cited closely. However, the closely co-cited documents in the \Methodology/Results" section may be considered more relevant as compared to the closely co-cited papers in the \Introduction/Discussion" sections. This thesis explores structural organization of scienti c documents by giving weights according to the importance of di erent generic sections, and investigates that whether such approach may increase the accuracy of identifying relevant papers. This work addresses the following important research challenges and can be considered as the contributions of the thesis: (1) generic section identi cation in citing document (2) in-text citation patterns and frequencies identi cation in citing document and (3) design of an algorithm that utilizes evidences from above mentioned sources (sections name, their weight, and the frequency of co-citations) to identify and recommend relevant papers. For each contribution, the detailed architecture, dataset and evaluation have been discussed in this thesis. First the generic section identi cation component was designed, implemented and then evaluated with state-of-the-art approaches. The proposed approach was evaluated on two datasets consisted of 150 and 300 citing documents respectively. The aggregated F-score of proposed approach was 92% over the both datasets while the F-score of the state-of-the-art technique was 81%. Second, the component of in-text citation patterns and frequencies identi cation was implemented with detailed architecture, dataset, and evaluation. For the evaluation, two datasets were prepared from openly available digital libraries, Journal of Universal Computer Science (J.UCS)1 and CiteSeerX2. The proposed model was outperformed the state-of-the-art approach by increasing the F-score from 0.58 to 0.97. The third contribution of this thesis is section wise co-citation analysis which depends on earlier two components. The proposed approach was designed to rank the co-cited documents. For the evaluation purpose, two benchmarks such as JSD and cosine similarity based rankings were selected for the comparison of proposed and state-of-the-art approaches. The score has been compared between the proposed and state-of-the-art approaches using Spearman's and Kendall's tau measures. The results show that the proposed approach has outperformed comparatively the state-of-the-art techniques such as: standard co-citation and CPA based on bytes o set.
Gov't Doc #:	17622
URI:	http://142.54.178.187:9060/xmlui/handle/123456789/5165
Appears in Collections:	Thesis

Files in This Item:

File	Description	Size	Format
10525.htm		121 B	HTML	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets