基于遗传算法及概率论的文本分类算法

宋倩王; 东明

电脑与电信 ›› 2015, Vol. 1 ›› Issue (3) : 49-52.

基金项目

基于遗传算法及概率论的文本分类算法

宋倩¹，王东明²

作者信息 +

Text Classification Algorithm Based on Genetic Algorithm and Probability Theory

Song Qian¹，Wang Dongming²

Author information +

文章历史 +

摘要

本文意在提高文本分类的准确度和速度。利用tf 算法对特征项进行初步赋予权值，再使用屏蔽词对特殊非实意词进行屏蔽。本文独创概率论分布法，使用L-E 算子进行加权，使得特殊位置与分布广泛的特征项，呈指数形式加权，较优结果能更快收敛。本文利用遗传算法，采用交叉算子和变异算子，采用适宜的目标函数，加快了检索速度，并有更大概率得到最优结果。采用混合算法，可以排除同义词和非特征项的干扰。

Abstract

This article aims to improve the accuracy and speed of text classification. T * f algorithm is used to initially weigh the feature item, then stop words is used to shield specially meaningless words. Original probability distribution method and weighted L - E operator enable the features in the special positions or widely distributed to weight in exponential form, so that the better results converge faster. In this paper, by using the genetic algorithm, crossover operator and mutation operator, and adopting appropriate objective function, the retrieval process speeds up, and has a greater probability to get the optimal result. Hybrid algorithm is proposed, which can eliminate the synonyms and the characteristics of interference.

导出引用

宋倩王, 东明. 基于遗传算法及概率论的文本分类算法[J]. 电脑与电信. 2015, 1(3): 49-52

Song Qian, Wang Dongming. Text Classification Algorithm Based on Genetic Algorithm and Probability Theory[J]. Computer & Telecommunication. 2015, 1(3): 49-52

中图分类号： TP301.6