Speaker Recognition Based on Deep Bidirectional GRU and SE Block with Small Training Set

XU Xin, YANG Cheng

Computer & Telecommunication ›› 2025, Vol. 1 ›› Issue (5) : 22-27.

Computer & Telecommunication ›› 2025, Vol. 1 ›› Issue (5) : 22-27.

Speaker Recognition Based on Deep Bidirectional GRU and SE Block with Small Training Set

  • XU Xin, YANG Cheng
Author information +
History +

Abstract

With the development of deep learning and large models, the training of the model needs enough samples in speaker recognition, and when the training set is limited, it often fails to achieve good convergence. To solve the problem of a small training set, a speaker recognition method based on the combination of deep bidirectional gated recurrent unit neural network and SE block (Squeeze and Excitation block) is proposed. In this method, the deep bidirectional gated recurrent unit neural network mainly realizes the extraction of multiple information in different directions and depths of input speech, and then assigns different weights to the obtained information through SE-block, and finally uses the information to perform classification and recognition tasks. The experimental results show that the recognition accuracy reaches 90.34% when each speaker has only 6 trained speech, which shows that the model can achieve good results under a small number of training samples.

Key words

gated recurrent unit / attention mechanism / speaker recognition / recurrent neural networks / deep learning

Cite this article

Download Citations
XU Xin, YANG Cheng. Speaker Recognition Based on Deep Bidirectional GRU and SE Block with Small Training Set[J]. Computer & Telecommunication. 2025, 1(5): 22-27

References

[1] 郑方,李蓝天,张慧,等.声纹识别技术及其应用现状[J].信息安全研究,2016,2(1):44-57.
[2] Atal,B S.Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J].Journal of the Acoustical Society of America,1974,55(6):1304-1322.
[3] Vergin R,O'Shaughnessy D,Farhat A.Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition[J].IEEE Transactions on Speech and Audio Processing,1999,7(5):525-532.
[4] 王玥,钱志鸿,王雪,等.基于伽马通滤波器组的听觉特征提取算法研究[J].电子学报,2010,38(3):525-528.
[5] Reynolds D A,Rose R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Trans Speach & Audio Processing,1995,3(1):72-83.
[6] Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[J].Digital Signal Processing,2000,10(1):19-41.
[7] Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE signal processing letters,2006,13(5):308-311.
[8] Novotn O,Plchot O,Pavel Matějka,et al.On the use of X-vectors for Robust Speaker Recognition[C]//Odyssey 2018 The Speaker and Language Recognition Workshop.2018:168-175.
[9] Desplanques B,Thienpondt J,Demuynck K.ECAPA-TDNN:Emphasized channel attention,propagation and aggregation in TDNN based speaker verification[J].(2020-08-10).https://arxiv.org/abs/2005.07143.
[10] 余玲飞,刘强.基于深度循环网络的声纹识别方法研究及应用[J].计算机应用研究,2019,36(1):153-158.
[11] 刘晓璇,季怡,刘纯平.基于LSTM神经网络的声纹识别[J].计算机科学,2021,48(S2):270-274.
[12] 刘勇,梁宏涛,刘国柱,等.基于ResNet-LSTM的声纹识别方法[J].计算机系统应用,2021,30(6):215-219.
[13] 王华朋. 基于深度双向LSTM网络的说话人识别[J]. 计算机工程与设计,2020,41(6):1768-1772.
[14] Chung J,Gulcehre C,Cho K H,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[J].(2014-12-11).https://arxiv.org/abs/1412.3555.
[15] Hu J,Shen L,Sun G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition,2018:7132-7141.

Accesses

Citation

Detail

Sections
Recommended

/