针对通用领域命名实体识别方法难以识别网络安全领域中专业名词等安全实体,且提取特征不充分,导致网络安全实体识别准确率低等问题,提出一种融合残差感知网络的Bi-LSTM-CRF(Res-Inception Bi-LSTM-CRF, RIBIC)模型,通过残差感知网络模型提取多粒度特征,以捕获更丰富的特征信息;并自行构建网络安全领域词典,结合词典匹配校正算法进一步提高实体识别准确率。实验结果表明,在两个威胁情报公开数据集上,F1值分别达到94.09%和83.91%,比基线模型分别高出15.02%和15.72%,充分证明本文方法在威胁情报领域命名实体识别上的有效性。
Abstract
Addressing the challenges faced by general domain named entity recognition methods, which struggle to identify specialized terms and security entities within the cybersecurity domain, and suffer from insufficient feature extraction leading to low accuracy in cybersecurity entity recognition, this paper introduces a new model named Res-Inception Bi-LSTM-CRF (RIBIC). The RIBIC model leverages a Res-Inception Network to extract multi-granularity features, thereby capturing a richer set of feature information. Furthermore, an in-house cybersecurity domain-specific dictionary is developed, and a dictionary-based matching correction algorithm is incorporated to enhance the precision of entity recognition. The experimental results indicate that on two publicly available threat intelligence datasets, the F1 scores achieved are 94.09% and 83.91%, representing improvements of 15.02% and 15.72% over the baseline models, respectively. These findings robustly validate the effectiveness of the proposed method for named entity recognition in the threat intelligence domain.
关键词
威胁情报 /
命名实体识别 /
残差感知网络
Key words
Cyber Threat Intelligence /
Named Entity Recognition /
Res-Inception Network
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] LAMPLE. G,BALLESTEROS. M,SUBRAMANIAN.S,et al.Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016.
[2] COLLINS M,SINGER.Y.Unsupervised models for named entity classification[C]/Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora,1999:100-110.
[3] 灌正高. 基于规则和统计相结合的中文命名实体识别研究[J].情报科学,2012,30(5):708-712.
[4] 曹春萍,关鹏举.基于E-CNN和BLSTM-CRF的临床文本命名实体识别[J].计算机应用研究,2019,36(12):3748-3751.
[5] MORWAL S,JAHAN N,Chopra D,et al.Named Entity Recognition using Hidden Markov Model (HMM)[J].International Journal on Natural Language Computing (IJNLC),2012,1(4):15-23.
[6] QIN Y,and YINGFEI Z.Research of clinical named entity recognition based on Bi-LSTM-CRF[J].Journal of Shanghai Jiaotong University (Science),2018 (23):392-397.
[7] HAN X,RUONAN.The method of medical named entity recognition based on semantic model and improved SVM-KNN algorithm[C]//2011 seventh international conference on semantics,knowledge and grids.IEEE,2011:21-27.
[8] SHERSTINSKY.A.Fundamentals of recurrent neural network and long short-term memory network[J].Physica D:Nonlinear Phenomena,2020,404:132306.
[9] 罗凌,杨志豪,宋雅文,等.基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J].计算机学报,2020,43(10): 1943-1957.
[10] 董瑞,杨雅婷,蒋同海.融合多种语言学特征的维吾尔语神经网络命名实体识别[J].计算机应用与软件,2020,37(5):183-188.
[11] JI B,LIU R,LI S,et al.A bilstm-crf method to Chinese electronic medical record named entity recognition[C]//Proceedings of the 2018 International Conference on Algorithms,Computing and Artificial Intelligence,2018:1-6.
[12] LU Y,YANG R,ZHOU D,et al.A military named entity recognition method combined with dictionary[C]//Proceedings of the 2019 2nd international conference on algorithms,computing and artificial intelligence,2019:591-596.
[13] WU F,LIU J,WU C,et al.Neural Chinese named entity recognition via cnn-lstm-crf and joint training with word segmentation[C]//The World Wide Web Conference,2019:3342-3348.
[14] 王笑月,李茹,段菲.一种基于门控空洞卷积的高效中文命名实体识别方法[J].中文信息学报,2021,35(1):72-80.
[15] 秦娅,申国伟,赵文波,等.基于深度神经网络的网络安全实体识别方法[J].南京大学学报(自然科学版),2019,55(1):29-40.
[16] 王瀛,王泽浩,李红,等.基于深度学习的威胁情报领域命名实体识别[J].东北大学学报(自然科学版),2023 (1):33-39.
[17] 周景贤,王曾琪.基于ALBERT的网络威胁情报命名实体识别[J].陕西科技大学学报,2023,41(1):9.
[18] LI J,SUN A,HAN J,et al.A survey on deep learning for named entity recognition[J].IEEE transactions on knowledge and data engineering,2020,34(1):50-70.
[19] 顾佼佼,翟一琛,姬嗣愚,等.基于BERT 和知识蒸馏的航空维修领域命名实体识别[J].电子测量技术,2024,46(3):19-24.
[20] HE K,ZHANG X,REN S.Deep residual learning for image recognition[C]//In Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.
[21] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2015:1-9.
[22] LAL R.Annotations of cybersecurity blogs and articles[J/OL].(2013-5-30).https://ebiquity.umbc.edu.
[23] XUREN W,XINPEI L,SHENQING A,et al.Dnrti:A large-scale dataset for named entity recognition in threat intelligence[C]//2020 IEEE 19th International Conference on Trust,Security and Privacy in Computing and Communications (TrustCom).IEEE,2020:1842-1848.
基金
企业信息化与物联网测控技术四川省高校重点实验室,项目编号:2022WYJ03; 四川轻化工大学2023年校级教学改革研究项目“产教融合背景下网络安全综合实验教学改革探索与实践”资助,项目编号:JG-2307