Research on Personal Microblog Clustering Based on CBOW Model

SONG Tian-shu, LI Jiang-yu, ZHANG Qin-zhe

Computer & Telecommunication ›› 2018, Vol. 1 ›› Issue (4) : 69-72.

Computer & Telecommunication ›› 2018, Vol. 1 ›› Issue (4) : 69-72.

Research on Personal Microblog Clustering Based on CBOW Model

  • SONG Tian-shu, LI Jiang-yu, ZHANG Qin-zhe
Author information +
History +

Abstract

Personal microblog is a popular social tool. The number of users is troublesome because it is confusing to users. This article clusters microblogs with high semantic similarity to facilitate user browsing. The main research work of this dissertation is as follows: 1. Use jieba segmentation in python to preprocess word segmentation and remove stopwords of personal microblog; 2. Use segmentation dataset to train word vectors using CBOW model; 3. Express personal microblog sentence vectors using word vector; 4. Personal microblog sentence vectors are represented as distribution points in space, using the modified Manhattan sentence algorithm to calculate distances, ie similarities between individual microblogs. 5. Use a modified clarans algorithm for clustering. Experiments show that the method of this paper is obviously improved compared with the traditional clustering algorithms, such as the method of dividing, the method of layering and the method of density.

Key words

individual microblog / semantic / clustering / machine learning

Cite this article

Download Citations
SONG Tian-shu, LI Jiang-yu, ZHANG Qin-zhe. Research on Personal Microblog Clustering Based on CBOW Model[J]. Computer & Telecommunication. 2018, 1(4): 69-72

Accesses

Citation

Detail

Sections
Recommended

/