Abstract:Aiming at the problem of automatic proofreading of sensitive information in a large number of text content, this paper presents a method of automatic correction of sensitive information based on rule and SVM. Based on prohibited words and cautious words in News and Information Reporting of Xinhua News Agency and relevant central documents and online texts, sensitive information is classified into different categories. According to each category, classifying rule library is built, corresponding automatic rule processing algorithms are designed, and automatic correction to sensitive information is also available. Meanwhile, by applying SVM model to emotional analysis of rule processing results, false positives are greatly reduced. The test results show that the recall rate and accuracy rate of this method are 89.98% and 98.31%, and it can process more than 100,000 words per second of text content and solve the key issues and difficult problems in practical engineering applications.
龚永罡 汪昕宇 李玉莹 王蕴琪. 中文文本敏感信息自动校对方法研究[J]. 电脑与电信, 2018, 1(12): 66-69.
GONG Yong-gangWANG Xin-yuLI Yu-yingWANG Yun-qi. Research on Automatic Proofreading Method of Chinese Text Sensitive Information. Computer & Telecommunication, 2018, 1(12): 66-69.