基于大数据的结构化SVM 的黏着语词性标注的研究

刘婉婉

电脑与电信 ›› 2021, Vol. 1 ›› Issue (1) : 23-26.

基金项目

基于大数据的结构化SVM 的黏着语词性标注的研究

作者信息 +

Research onAgglutinating Language Part of Speech Tagging Based on Structured SVM

Author information +

文章历史 +

摘要

传统的条件随机场(Conditional Random Fields,CRF)方法虽然可以容纳任意长度的上下文信息且特征设计灵活，但训练代价大、模型复杂度高，尤其在序列标注任务中由于需要计算整个标注序列的联合概率分布使其缺点更加突出。为此，结合一种结构化方式的支持向量机(Structured Support Vector Machine，SSVM)方法，根据黏着语的构词特征和语料的上下文信息进行词性标注研究，本模型相比传统SVM，通过附加额外的约束条件使特征函数能够拟合分布，进而用于处理不同领域内词性标注。通过相关黏着语词性标注实验结果显示，SSVM的词性标注方法相比传统的词性标注算法，准确率有了一定的提高。

Abstract

Although the traditional conditional random fields (CRF) method can accommodate any length of context information and the feature design is flexible, but the training cost is high and the model complexity is high, especially in the sequence tagging task,because the joint probability distribution of the whole tagging sequence needs to be calculated, its shortcomings are more exposed.For this reason, this paper combines a structured support vector machine (SSVM) method to do part of speech tagging research according to Agglutinating Language word formation features and context information of corpus. Compared with traditional SVM, this model can fit the distribution of feature functions by adding additional constraints, and then be used to deal with tagging in different fields. In this paper, the Agglutinating Language part of speech tagging experiment results show that the accuracy of SSVM is higher compared with the traditional part of speech tagging algorithm.