Building Method for Data Lineage Based on Data Table Similarity Calculation 

Expand
  • 1.The Open University of China 2. Ministry of Education Engineering Research Center for Integration and Application of Digital Learning Technology

Online published: 2024-11-01

Abstract

In the era of big data, it has become a consensus that various business departments can stimulate data value based on the accumulation of existing business data. However, due to the lack of uni?ed data standards across di?erent business systems, disorganized metadata, data silos, and low-quality data problems constantly emerge, hindering the e?ective utilization of data and necessitating necessary governance. Among them, data lineage analysis is one of the key tasks of metadata management, which is of great signi?cance for data traceability and data governance. However, traditional methods for constructing data lineage often face high computational complexity, poor accuracy, and high execution costs. To overcome these issues, a data lineage construction method based on the similarity calculation of data tables is proposed: by text feature representation of the three elements of data table naming, table structure, and data ?elds, using TFIDF to calculate the similarity of data tables, and further constructing the data table lineage relationship through the improved Jaro-Winkler Distances algorithm to verify the ?eld overlap and table name similarity. The results show that the algorithm has a signi?cant e?ect on the construction of data table lineage, facilitating the smooth progress of data governance work.

Cite this article

PAN Qi CAI Si-bo WEI Fang-fang . Building Method for Data Lineage Based on Data Table Similarity Calculation [J]. Computer & Telecommunication, 2024 , 1(6) : 11 . DOI: 10.15966/j.cnki.dnydx.2024.06.015

Options
Outlines

/