中文文本敏感信息自动校对方法研究

全文: PDF(0 KB)
输出: BibTeX | EndNote (RIS)

摘要针对海量文本内容中的敏感信息自动校对问题，提出了一种基于规则和SVM(支持向量机)相结合的敏感信息自动校对方法。以《新华社新闻信息报道中的禁用词和慎用词》和相关的中央文件与网络文本提供的重要敏感信息为依据，对敏感信息进行分类，针对不同的类别，构建分类处理规则库，设计相应的规则自动处理算法，实现敏感信息的自动校对，同时应用SVM模型对规则处理结果进行情感分析，大大减少了误报。测试结果显示，该方法的召回率为89.98%，准确率为98.31%，每秒处理10万字以上的文本内容，解决了实际工程应用中的关键难点问题。

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	（）
	作者相关文章
	龚永罡汪昕宇李玉莹王蕴琪

关键词 ：内容安全, 敏感信息, 自动校对, 规则处理, SVM, 情感分析

Abstract：Aiming at the problem of automatic proofreading of sensitive information in a large number of text content, this paper presents a method of automatic correction of sensitive information based on rule and SVM. Based on prohibited words and cautious words in News and Information Reporting of Xinhua News Agency and relevant central documents and online texts, sensitive information is classified into different categories. According to each category, classifying rule library is built, corresponding automatic rule processing algorithms are designed, and automatic correction to sensitive information is also available. Meanwhile, by applying SVM model to emotional analysis of rule processing results, false positives are greatly reduced. The test results show that the recall rate and accuracy rate of this method are 89.98% and 98.31%, and it can process more than 100,000 words per second of text content and solve the key issues and difficult problems in practical engineering applications.

Key words： content security sensitive information automatic proofreading rule processing SVM sentiment analysis

年卷期日期: 2018-12-10 出版日期: 2019-07-30

ZTFLH:	TP309
	TP391.1

作者简介: 龚永罡(1973-)，男，河南洛阳人，副教授，研究方向为自然语言处理。

引用本文:

龚永罡汪昕宇李玉莹王蕴琪. 中文文本敏感信息自动校对方法研究[J]. 电脑与电信, 2018, 1(12): 66-69.
GONG Yong-gangWANG Xin-yuLI Yu-yingWANG Yun-qi. Research on Automatic Proofreading Method of Chinese Text Sensitive Information. Computer & Telecommunication, 2018, 1(12): 66-69.

链接本文:

http://www.computertelecom.com.cn/CN/ 或 http://www.computertelecom.com.cn/CN/Y2018/V1/I12/66