Performance Improvement of Parallel Sparse Matrix-Vector Product on PC Cluster

Shahnaz, Rukhsana

Please use this identifier to cite or link to this item: http://localhost:80/xmlui/handle/123456789/2437

Title:	Performance Improvement of Parallel Sparse Matrix-Vector Product on PC Cluster
Authors:	Shahnaz, Rukhsana
Keywords:	Applied Sciences
Issue Date:	2010
Publisher:	Pakistan Institute of Engineering and Applied Sciences Islamabad, Pakistan
Abstract:	The efficient parallelization of sparse matrix-vector product (SMVP) is of prime importance in scientific computing. To achieve this on a distributed memory computers, we concentrate on minimizing the inter-processor communication, achieving a good balance of workload, overlapping communication with computation along with optimizing single processor performance. The thesis consists of two parts presenting the optimization and improvement of sparse matrix-vector multiplication performance on single as well as multi processors. For the performance improvement of SMVP on a single scalar processor, we propose two sparse storage formats, namely the grouped compressed row storage with permutation (GCRSP) and the blocked compressed row storage with permutation (BCRSP). The proposed formats are designed to efficiently exploit the benefits of blocking such as reduced indirect addressing, increased spatial and temporal locality along with eliminating the corresponding overheads. For the good load balancing and low communication cost, reordering of sparse matrices according to their sparsity structure is highly important. For this purpose we proposed reordering based partitioning strategies that tend to exploit sparsity of input matrix presenting the balanced load distribution along with the reduced communication cost. It has been observed that GCRSP improves the performance over simple compressed row storage (CRS) and compressed row storage with permutation (CRSP) with an average of 16% and 25%, respectively. Moreover, due to blocking in BCRSP, the performance improvements of an average of 32%, 41% and 20% are observed over CRS, CRSP and GCRSP respectively. Likewise, the proposed partitioning models permuted row column matrix produce an average of 49% better load balancing and 14% better communication than the corresponding naïve row/column and checker board models. Moreover, they produce same level of balanced load and an average of 78% better communication than the corresponding balanced naïve partitioning i.e. row/column block and balanced checker board (BCH) models. On the whole an average of 30% performance gain for parallel SMVP is achieved by using BCRSP format along with permuted row partitioning over the implementation using CRS format with naïve row partitioning using cluster of eight processors.
URI:	http://142.54.178.187:9060/xmlui/handle/123456789/2437
Appears in Collections:	Thesis

Files in This Item:

File	Description	Size	Format
1192.htm		128 B	HTML	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets