基金项目

基于远程学习的关键词提取技术研究

展开
  • 江汉大学 人工智能学院;人工智能研究院 东风汽车财务有限公司
曹聪慧(1992-),女,山西大同人,博士,讲师,研究方向为信号处理、数据挖掘

网络出版日期: 2021-09-02

基金资助

湖北省教育厅科学研究计划指导性项目,项目编号:B2020224;江汉大学湖北省重点学科管理科学与工程2019年度开放性课
题,项目编号:ZD X K2019Y B05;江汉大学高层次人才科研启动经费,项目编号:2019032。

Research on Extraction Technology Based on Remote Learning

Expand
  • School ofArtificial Intelligence, Jianghan University Dongfeng Motor Finance Co., Ltd

Online published: 2021-09-02

摘要

伴随着互联网技术的发展,文本数量的爆发式增长带来了处理文本数据的一些困扰,传统的文本聚类以及关键
词提取的技术不能很好解决对大数据进行精准筛选的需求。对此,提出利用基于LD A 算法的潜在语义模型来对文本进行文
本聚类,得到了对文本进行聚类的结果和LD A 提取出来的主题词语;然后利用FP-grow th算法对LD A 算法的结果进行分析,
对文本进行挖掘,得到中文关键词集;借助网络知识库的思想,利用百度百科提出了汉语比对算法对中文关键词集进行筛选,
过滤掉了很多噪声词。实验表明,本文的方法可以很好地对给定的中文语料文本进行文本聚类和关键词提取,特别是在增加
了基于百度百科远程学习的筛选之后,系统的准确率有大幅度的提高。

本文引用格式

曹聪慧 兰 强 侯 群 漆为民 . 基于远程学习的关键词提取技术研究[J]. 电脑与电信, 2021 , 1(8) : 1 -5 . DOI: 1008-6609(2021)08-0001-05

Abstract

 With the development of Internet technology, the explosive growth in the number of text has brought some troubles in pro-
cessing text data. The traditional text clustering and keyword extraction technology cannot solve the need for precise screening of
large data very well. This paper combines text clustering and keyword extraction. The text clustering based on LDA algorithm is pro-
posed. The results of clustering and the subject terms extracted from LDA are obtained. Then the FP-growth algorithm is used to ana-
lyze the results of the LDA algorithm, and the text is mined. In this paper, according to the idea of using the network knowledge
base, the Baidu encyclopedia is used to put forward the Chinese comparison algorithm to select the Chinese keyword set and filter
out a lot of noise words. Experimental results show that the method can cluster text and extract keyword perfectly for a given Chi-
nese corpus by comparing with the existing method. On the basis of increasing the word selection of Baidu encyclopedia, the accura-
cy of the system is greatly improved.

Options
文章导航

/