Abstract:In the era of big data, it has become a consensus that various business departments can stimulate data value
based on the accumulation of existing business data. However, due to the lack of uni?ed data standards across di?erent
business systems, disorganized metadata, data silos, and low-quality data problems constantly emerge, hindering the
e?ective utilization of data and necessitating necessary governance. Among them, data lineage analysis is one of the key
tasks of metadata management, which is of great signi?cance for data traceability and data governance. However,
traditional methods for constructing data lineage often face high computational complexity, poor accuracy, and high
execution costs. To overcome these issues, a data lineage construction method based on the similarity calculation of data
tables is proposed: by text feature representation of the three elements of data table naming, table structure, and data
?elds, using TFIDF to calculate the similarity of data tables, and further constructing the data table lineage relationship
through the improved Jaro-Winkler Distances algorithm to verify the ?eld overlap and table name similarity. The results
show that the algorithm has a signi?cant e?ect on the construction of data table lineage, facilitating the smooth progress of
data governance work.
潘奇蔡斯博, 魏芳芳. 基于数据表相似度计算的数据血缘构建方法[J]. 电脑与电信, 2024, 1(6): 11-.
PAN Qi CAI Si-bo WEI Fang-fang. Building Method for Data Lineage Based on Data Table Similarity Calculation . Computer & Telecommunication, 2024, 1(6): 11-.