Performance Improvement of Parallel Sparse Matrix-Vector Product on PC Cluster

Shahnaz, Rukhsana

Please use this identifier to cite or link to this item: http://localhost:80/xmlui/handle/123456789/2437

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shahnaz, Rukhsana	-
dc.date.accessioned	2017-12-04T06:50:06Z	-
dc.date.accessioned	2020-04-09T16:31:09Z	-
dc.date.available	2020-04-09T16:31:09Z	-
dc.date.issued	2010	-
dc.identifier.uri	http://142.54.178.187:9060/xmlui/handle/123456789/2437	-
dc.description.abstract	The efficient parallelization of sparse matrix-vector product (SMVP) is of prime importance in scientific computing. To achieve this on a distributed memory computers, we concentrate on minimizing the inter-processor communication, achieving a good balance of workload, overlapping communication with computation along with optimizing single processor performance. The thesis consists of two parts presenting the optimization and improvement of sparse matrix-vector multiplication performance on single as well as multi processors. For the performance improvement of SMVP on a single scalar processor, we propose two sparse storage formats, namely the grouped compressed row storage with permutation (GCRSP) and the blocked compressed row storage with permutation (BCRSP). The proposed formats are designed to efficiently exploit the benefits of blocking such as reduced indirect addressing, increased spatial and temporal locality along with eliminating the corresponding overheads. For the good load balancing and low communication cost, reordering of sparse matrices according to their sparsity structure is highly important. For this purpose we proposed reordering based partitioning strategies that tend to exploit sparsity of input matrix presenting the balanced load distribution along with the reduced communication cost. It has been observed that GCRSP improves the performance over simple compressed row storage (CRS) and compressed row storage with permutation (CRSP) with an average of 16% and 25%, respectively. Moreover, due to blocking in BCRSP, the performance improvements of an average of 32%, 41% and 20% are observed over CRS, CRSP and GCRSP respectively. Likewise, the proposed partitioning models permuted row column matrix produce an average of 49% better load balancing and 14% better communication than the corresponding naïve row/column and checker board models. Moreover, they produce same level of balanced load and an average of 78% better communication than the corresponding balanced naïve partitioning i.e. row/column block and balanced checker board (BCH) models. On the whole an average of 30% performance gain for parallel SMVP is achieved by using BCRSP format along with permuted row partitioning over the implementation using CRS format with naïve row partitioning using cluster of eight processors.	en_US
dc.description.sponsorship	Higher Education Commission, Pakistan	en_US
dc.language.iso	en	en_US
dc.publisher	Pakistan Institute of Engineering and Applied Sciences Islamabad, Pakistan	en_US
dc.subject	Applied Sciences	en_US
dc.title	Performance Improvement of Parallel Sparse Matrix-Vector Product on PC Cluster	en_US
dc.type	Thesis	en_US
Appears in Collections:	Thesis

Files in This Item:

File	Description	Size	Format
1192.htm		128 B	HTML	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets