Please use this identifier to cite or link to this item:
http://localhost:80/xmlui/handle/123456789/2900
Title: | Fast Speech Processing Algorithms for Real-Time Identification of Wanted Persons using Digital Communication Networks |
Authors: | Afzal, Muhammad |
Keywords: | Applied Sciences |
Issue Date: | 2012 |
Publisher: | University of Engineering and Technology, Lahore, Pakistan |
Abstract: | Telephony networks are frequently connected to computers for speech processing to extract useful information such as automatic speaker identification (ASI). Matching of feature vectors extracted from speech sample of an unknown speaker, with models of registered speakers is the most time consuming component in real-time speaker identification systems. Time controlling parameters are size d and count T of extracted test feature vectors as well as size M , complexity and count N of models of registered speakers. Reported speedup techniques for Vector quantization (VQ) and Gaussian mixture model (GMM) based ASI systems reduce test feature vector count T by pre-quantization and reduce candidate registered speakers N by pruning unlikely models which introduces accuracy degradation. Vantage point tree (VPT) indexing of code vectors has also been used to decrease the effect of parameter M on ASI speed for VQ based systems. Somehow parameter d has remained unexplored in ASI speedup studies. Speedup techniques for VQ based and GMM based real-time ASI without loss of accuracy are presented in this thesis. Speeding up closest code vector search (CCS) is focused for VQ based systems. Capability of partial distortion elimination (PDE), through reducing d parameter of codebook, was found more promising than VPT to speedup CCS. Advancing in this direction, speech signal stationarity has been capitalized to a greater extent than previously proposed technique of cluster size based sorting of code vectors to speedup PDE. Proximity relationship among code vectors established through Linde Buzo Gray (LBG) process of codebook generation has been substantiated. Based upon the high correlation of proximate code vectors, circular partial distortion elimination (CPDE) and toggling-CPDE algorithms have been proposed to speedup CCS. Further speedup for ASI is proposed through test feature vector visequence pruning (VSP) when a codebook proves unlikely during search of best match speaker. Empirical results presented in this thesis show that an average speedup factor up to 5.8 for 630 registered speakers of TIMIT 8kHz corpus and 6.6 for 230 speakers of NIST-1999 database have been achieved through integrating VSP and TCPDE. Speeding up potential of hierarchical speaker pruning (HSP) for faster ASI has also been demonstrated in this thesis. HSP prunes unlikely candidate speakers based on ranking results of coarse speaker models. Best match is then found from the detailed models of remaining speakers. VQ based and GMM based ASI systems are explored in depth for parameters governing the speedup performance of HSP. Using the smallest possible coarse model and pruning the largest number of detailed candidate models is the key objective for speedup through HSP. City block distance (CBD) is proposed instead of Euclidean distance (EUD) for ranking speakers in VQ based systems. This allows use of smaller codebook for ranking and pruning greater number of speakers. HSP has been ignored by previous authors for GMM based ASI systems due to discouraging speedup results in their studies of VQ-based systems. However, we achieved speedup factors up to 6.61 and 10.40 for GMM based ASI systems using HSP for 230 speaker from NIST-1999 and 630 speakers from TIMIT data, respectively. While speedup factors of up to 22.46 and 34.78 are achieved on TIMIT and NIST-1999 data for VQ based systems, respectively. All the speedup factors reported are with out any accuracy loss. |
URI: | http://142.54.178.187:9060/xmlui/handle/123456789/2900 |
Appears in Collections: | Thesis |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.