With the development of Internet technology, the explosive growth in the number of text has brought some troubles in pro-
cessing text data. The traditional text clustering and keyword extraction technology cannot solve the need for precise screening of
large data very well. This paper combines text clustering and keyword extraction. The text clustering based on LDA algorithm is pro-
posed. The results of clustering and the subject terms extracted from LDA are obtained. Then the FP-growth algorithm is used to ana-
lyze the results of the LDA algorithm, and the text is mined. In this paper, according to the idea of using the network knowledge
base, the Baidu encyclopedia is used to put forward the Chinese comparison algorithm to select the Chinese keyword set and filter
out a lot of noise words. Experimental results show that the method can cluster text and extract keyword perfectly for a given Chi-
nese corpus by comparing with the existing method. On the basis of increasing the word selection of Baidu encyclopedia, the accura-
cy of the system is greatly improved.