中文文本敏感信息自动校对方法研究

龚永罡 汪昕宇 李玉莹 王蕴琪

电脑与电信 ›› 2018, Vol. 1 ›› Issue (12) : 66-69.

电脑与电信 ›› 2018, Vol. 1 ›› Issue (12) : 66-69.
应用技术与研究

中文文本敏感信息自动校对方法研究

  • 龚永罡 汪昕宇 李玉莹 王蕴琪
作者信息 +

Research on Automatic Proofreading Method of Chinese Text Sensitive Information

Author information +
文章历史 +

摘要

针对海量文本内容中的敏感信息自动校对问题,提出了一种基于规则和SVM(支持向量机)相结合的敏感信息自动校对方法。以《新华社新闻信息报道中的禁用词和慎用词》和相关的中央文件与网络文本提供的重要敏感信息为依据,对敏感信息进行分类,针对不同的类别,构建分类处理规则库,设计相应的规则自动处理算法,实现敏感信息的自动校对,同时应用SVM模型对规则处理结果进行情感分析,大大减少了误报。测试结果显示,该方法的召回率为89.98%,准确率为98.31%,每秒处理10万字以上的文本内容,解决了实际工程应用中的关键难点问题。

Abstract

Aiming at the problem of automatic proofreading of sensitive information in a large number of text content, this paper presents a method of automatic correction of sensitive information based on rule and SVM. Based on prohibited words and cautious words in News and Information Reporting of Xinhua News Agency and relevant central documents and online texts, sensitive information is classified into different categories. According to each category, classifying rule library is built, corresponding automatic rule processing algorithms are designed, and automatic correction to sensitive information is also available. Meanwhile, by applying SVM model to emotional analysis of rule processing results, false positives are greatly reduced. The test results show that the recall rate and accuracy rate of this method are 89.98% and 98.31%, and it can process more than 100,000 words per second of text content and solve the key issues and difficult problems in practical engineering applications.

关键词

内容安全 / 敏感信息 / 自动校对 / 规则处理 / SVM / 情感分析

Key words

content security / sensitive information / automatic proofreading / rule processing / SVM / sentiment analysis

引用本文

导出引用
龚永罡 汪昕宇 李玉莹 王蕴琪. 中文文本敏感信息自动校对方法研究[J]. 电脑与电信. 2018, 1(12): 66-69
GONG Yong-gangWANG Xin-yuLI Yu-yingWANG Yun-qi. Research on Automatic Proofreading Method of Chinese Text Sensitive Information[J]. Computer & Telecommunication. 2018, 1(12): 66-69
中图分类号: TP309    TP391.1   

Accesses

Citation

Detail

段落导航
相关文章

/