摘要
本文意在提高文本分类的准确度和速度。利用tf 算法对特征项进行初步赋予权值,再使用屏蔽词对特殊非实
意词进行屏蔽。本文独创概率论分布法,使用L-E 算子进行加权,使得特殊位置与分布广泛的特征项,呈指数形式加权,较优
结果能更快收敛。本文利用遗传算法,采用交叉算子和变异算子,采用适宜的目标函数,加快了检索速度,并有更大概率得到
最优结果。采用混合算法,可以排除同义词和非特征项的干扰。
Abstract
This article aims to improve the accuracy and speed of text classification. T * f algorithm is used to initially weigh the
feature item, then stop words is used to shield specially meaningless words. Original probability distribution method and weighted
L - E operator enable the features in the special positions or widely distributed to weight in exponential form, so that the better results
converge faster. In this paper, by using the genetic algorithm, crossover operator and mutation operator, and adopting appropriate
objective function, the retrieval process speeds up, and has a greater probability to get the optimal result. Hybrid algorithm is proposed,
which can eliminate the synonyms and the characteristics of interference.
关键词
遗传算法 /
文本分类 /
特征项
Key words
genetic algorithm /
text classification /
term
宋倩王, 东明.
基于遗传算法及概率论的文本分类算法[J]. 电脑与电信. 2015, 1(3): 49-52
Song Qian, Wang Dongming.
Text Classification Algorithm Based on Genetic Algorithm and Probability Theory[J]. Computer & Telecommunication. 2015, 1(3): 49-52
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
基金
大夏基金项目,项目编号:2013DX-241。