Abstract:Personal microblog is a popular social tool. The number of users is troublesome because it is confusing to users. This
article clusters microblogs with high semantic similarity to facilitate user browsing. The main research work of this dissertation is as
follows: 1. Use jieba segmentation in python to preprocess word segmentation and remove stopwords of personal microblog; 2. Use
segmentation dataset to train word vectors using CBOW model; 3. Express personal microblog sentence vectors using word vector;
4. Personal microblog sentence vectors are represented as distribution points in space, using the modified Manhattan sentence algorithm
to calculate distances, ie similarities between individual microblogs. 5. Use a modified clarans algorithm for clustering. Experiments
show that the method of this paper is obviously improved compared with the traditional clustering algorithms, such as the
method of dividing, the method of layering and the method of density.
宋添树, 李江宇, 张沁哲. 基于CBOW模型的个人微博聚类研究[J]. 电脑与电信, .
SONG Tian-shu, LI Jiang-yu, ZHANG Qin-zhe. Research on Personal Microblog Clustering Based on CBOW Model. Computer & Telecommunication, 2018, 1(4): 69-72.