Building Method for Data Lineage Based on Data Table Similarity Calculation 

PAN Qi CAI Si-bo WEI Fang-fang

Computer & Telecommunication ›› 2024, Vol. 1 ›› Issue (6) : 11.

Computer & Telecommunication ›› 2024, Vol. 1 ›› Issue (6) : 11. DOI: 10.15966/j.cnki.dnydx.2024.06.015

Building Method for Data Lineage Based on Data Table Similarity Calculation 

Author information +
History +

Abstract

In the era of big data, it has become a consensus that various business departments can stimulate data value based on the accumulation of existing business data. However, due to the lack of uni?ed data standards across di?erent business systems, disorganized metadata, data silos, and low-quality data problems constantly emerge, hindering the e?ective utilization of data and necessitating necessary governance. Among them, data lineage analysis is one of the key tasks of metadata management, which is of great signi?cance for data traceability and data governance. However, traditional methods for constructing data lineage often face high computational complexity, poor accuracy, and high execution costs. To overcome these issues, a data lineage construction method based on the similarity calculation of data tables is proposed: by text feature representation of the three elements of data table naming, table structure, and data ?elds, using TFIDF to calculate the similarity of data tables, and further constructing the data table lineage relationship through the improved Jaro-Winkler Distances algorithm to verify the ?eld overlap and table name similarity. The results show that the algorithm has a signi?cant e?ect on the construction of data table lineage, facilitating the smooth progress of data governance work.

Key words

data lineage / data governance / metadata / table similarity 

Cite this article

Download Citations
PAN Qi CAI Si-bo WEI Fang-fang. Building Method for Data Lineage Based on Data Table Similarity Calculation [J]. Computer & Telecommunication. 2024, 1(6): 11 https://doi.org/10.15966/j.cnki.dnydx.2024.06.015

Accesses

Citation

Detail

Sections
Recommended

/