|
|
Building Method for Data Lineage Based on Data Table Similarity Calculation |
1.The Open University of China
2. Ministry of Education Engineering Research Center for Integration and Application of Digital Learning Technology
|
|
|
Abstract In the era of big data, it has become a consensus that various business departments can stimulate data value
based on the accumulation of existing business data. However, due to the lack of uni?ed data standards across di?erent
business systems, disorganized metadata, data silos, and low-quality data problems constantly emerge, hindering the
e?ective utilization of data and necessitating necessary governance. Among them, data lineage analysis is one of the key
tasks of metadata management, which is of great signi?cance for data traceability and data governance. However,
traditional methods for constructing data lineage often face high computational complexity, poor accuracy, and high
execution costs. To overcome these issues, a data lineage construction method based on the similarity calculation of data
tables is proposed: by text feature representation of the three elements of data table naming, table structure, and data
?elds, using TFIDF to calculate the similarity of data tables, and further constructing the data table lineage relationship
through the improved Jaro-Winkler Distances algorithm to verify the ?eld overlap and table name similarity. The results
show that the algorithm has a signi?cant e?ect on the construction of data table lineage, facilitating the smooth progress of
data governance work.
|
Published: 01 November 2024
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|