为减少T5 PEGASUS模型生成的摘要中的虚构信息、重复等问题,提出了一种基于T5 PEGASUS和Deep KE的文本摘要生成模型——T5 PEGASUS-DK。该模型将T5 PEGASUS模型和Deep KE框架相融合,先使用Pkuseg分词方法改进分词效果,再使用Deep KE框架抽取文本中的三元组,最后将三元组的词向量集合与文本的表示向量进行拼接。通过建立文本与三元组之间的映射关系,使得模型可以提取出事实性知识,从而提取出与原文内容更相符的信息作为摘要。T5 PEGASUS-DK模型的ROUGE值均达到最高,所生成的摘要更真实、连贯,与原文内容更相符。
In order to solve the problem of false information and duplication in the summarizations generated by the T5
PEGASUS model, a text summarization model based on T5 PEGASUS and DeepKE - T5 PEGASUS-DK is proposed.
This model combines the T5 PEGASUS model with DeepKE framework. Firstly, the Pkuseg segmentation method is used
to improve the segmentation performance. Then, the DeepKE framework is used to extract triads from text. Finally, the
word vector set of triads is concatenated with the representation vector of text. By establishing a mapping relationship
between text and triads, the model can extract factual knowledge and extract information that is more consistent with the
original content as a summary. The experimental results show that the T5 PEGASUS-DK model has the highest ROUGE
value, and the generated abstracts are more authentic, coherent, and consistent with the original content.