|
|
Research on Personal Microblog Clustering Based on CBOW Model |
SONG Tian-shu, LI Jiang-yu, ZHANG Qin-zhe |
Inner Mongolia University of Science and Technology |
|
|
Abstract Personal microblog is a popular social tool. The number of users is troublesome because it is confusing to users. This
article clusters microblogs with high semantic similarity to facilitate user browsing. The main research work of this dissertation is as
follows: 1. Use jieba segmentation in python to preprocess word segmentation and remove stopwords of personal microblog; 2. Use
segmentation dataset to train word vectors using CBOW model; 3. Express personal microblog sentence vectors using word vector;
4. Personal microblog sentence vectors are represented as distribution points in space, using the modified Manhattan sentence algorithm
to calculate distances, ie similarities between individual microblogs. 5. Use a modified clarans algorithm for clustering. Experiments
show that the method of this paper is obviously improved compared with the traditional clustering algorithms, such as the
method of dividing, the method of layering and the method of density.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|