logo logo


Accelerating dynamic time warping for speech recognition with SSE

НазваAccelerating dynamic time warping for speech recognition with SSE
Назва англійськоюAccelerating dynamic time warping for speech recognition with SSE
АвториYurii Vash, Mariana Rol, Mykola Chyzhmar
ПринадлежністьUzhhorod National University, Uzhhorod, Ukraine
Бібліографічний описAccelerating dynamic time warping for speech recognition with SSE / Yurii Vash, Mariana Rol, Mykola Chyzhmar // Scientific Journal of TNTU. — Tern.: TNTU, 2024. — Vol 114. — No 2. — P. 30–38.
Bibliographic description:Vash Y., Rol M., Chyzhmar M. (2024) Accelerating dynamic time warping for speech recognition with SSE. Scientific Journal of TNTU (Tern.), vol 114, no 2, pp. 30–38.
DOI: https://doi.org/10.33108/visnyk_tntu2024.02.030
УДК

004.421:004.934.1’1

Ключові слова

Dynamic Time Warping, Speech Recognition, Euclidean Distance, Manhattan Distance.

This study presents a significant enhancement to the Dynamic Time Warping (DTW) algorithm for real-time applications like speech recognition. Through integration of SIMD (Single Instruction Multiple Data) instructions to distance function, the research demonstrates how SSE accelerates DTW, markedly reducing computation time. The paper not only explores the theoretical aspects of DTW and this optimization but also provides empirical evidence of its effectiveness. Diverse dataset of 18 voice command classes was assembled, recorded in controlled settings to ensure audio quality. The audio signal of each speech sample was segmented into frames for detailed analysis of temporal dynamics. DTW search was performed on features set based on Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC), combined with delta features. A comprehensive set of 27 features was extracted from each frame to capture critical speech characteristics. The core of the study involved applying traditional DTW as a baseline for performance comparison with the SSE-optimized DTW. The evaluation, focusing on computational time, included measurements like minimum, maximum, average, and total computation times for both standard and SSE-optimized implementations. Experimental results, conducted on datasets ranging from 5 to 60 WAV files per class, revealed that the SSE-optimized DTW significantly outperformed the standard implementation across all dataset sizes. Particularly noteworthy was the consistent speed of the SSE-optimized Manhattan and Euclidean distance functions, which is crucial for real-time applications. The SSE-optimized DTW maintained a low average time, demonstrating remarkable stability and efficiency, especially with larger datasets. The study illustrates the potential of SSE optimizations in speech recognition, emphasizing the SSE-optimized DTW's capability to efficiently process large datasets.

ISSN:2522-4433
Перелік літератури
1. Jiang, S. and Chen, Z., (2023). Application of dynamic time warping optimization algorithm in speech recognition of machine translation. Heliyon, 9 (11), p. e21625.
2. D’Urso, Pierpaolo & De Giovanni, Livia & Massari, Riccardo. (2021). Trimmed fuzzy clustering of financial time series based on dynamic time warping. Annals of Operations Research. 299.
3. Puri, Chetanya & Kooijman, Gerben & Vanrumste, Bart & Luca, Stijn. (2022). Forecasting Time Series in Healthcare With Gaussian Processes and Dynamic Time Warping Based Subset Selection. IEEE journal of biomedical and health informatics. PP.
4. Baturinets A. (2022). Distance measures-based information technology for identifying similar data series. Scientific Journal of TNTU (Tern.), vol. 105, no. 1, pp. 128–140.
5. Cassisi, Carmelo & Montalto, Placido & Aliotta, Marco & Cannata, Andrea & Pulvirenti, Alfredo. (2012). Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining.
6. Garg, Nidhi & Bisht, Amandeep. (2016). Comparative Analysis of DTW based Outlier Segregation Algorithms for Wrist Pulse Analysis. Indian Journal of Science and Technology. 9.
7. Xie, Chunhu & Wu, Huachun & Zhou, Jian. (2023). Vectorization Programming Based on HR DSP Using SIMD. Electronics. 12. 2922.
8. Intel Corporation, “Intel Intrinsics Guide”. Available at: https://www.intel.com/content/www/us/en/docs/ intrinsics-guide/index.html.
References:
1. Jiang, S. and Chen, Z., (2023). Application of dynamic time warping optimization algorithm in speech recognition of machine translation. Heliyon, 9 (11), p. e21625.
2. D’Urso, Pierpaolo & De Giovanni, Livia & Massari, Riccardo. (2021). Trimmed fuzzy clustering of financial time series based on dynamic time warping. Annals of Operations Research. 299.
3. Puri, Chetanya & Kooijman, Gerben & Vanrumste, Bart & Luca, Stijn. (2022). Forecasting Time Series in Healthcare With Gaussian Processes and Dynamic Time Warping Based Subset Selection. IEEE journal of biomedical and health informatics. PP.
4. Baturinets A. (2022). Distance measures-based information technology for identifying similar data series. Scientific Journal of TNTU (Tern.), vol. 105, no. 1, pp. 128–140.
5. Cassisi, Carmelo & Montalto, Placido & Aliotta, Marco & Cannata, Andrea & Pulvirenti, Alfredo. (2012). Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining.
6. Garg, Nidhi & Bisht, Amandeep. (2016). Comparative Analysis of DTW based Outlier Segregation Algorithms for Wrist Pulse Analysis. Indian Journal of Science and Technology. 9.
7. Xie, Chunhu & Wu, Huachun & Zhou, Jian. (2023). Vectorization Programming Based on HR DSP Using SIMD. Electronics. 12. 2922.
8. Intel Corporation, “Intel Intrinsics Guide”. Available at: https://www.intel.com/content/www/us/en/docs/ intrinsics-guide/index.html.
Завантажити

Всі права захищено © 2019. Тернопільський національний технічний університет імені Івана Пулюя.