Suche
Close this search box.

The Critical Role of Data Lineage: Improve Data Quality and Boost Efficiency in Modern Ecosystems

Discover the strategic importance of data lineage in modern business ecosystems. Learn how data lineage enhances data quality, ensures compliance with GDPR & CCPA, and boosts operational efficiency for Data & Analytics teams. Read our latest blog on implementing data lineage for improved governance and streamlined data management.

Erhalten Sie Zugriff auf diesen Inhalt, indem Sie BARC+ in unserem Shop erwerben.

When we think of Data & Analytics, we think of dashboards, maybe ad hoc analyses in Excel spreadsheets, we think of machine learning, and of course Generative AI as the new kid on the block. To make all these applications reliable and compliant, D&A professionals need tools to maintain transparency of data flows. We call this transparency „data lineage,“ and there are many reasons why it drives reliability and compliance of analytical applications: By tracing data from origin through various transformations, data lineage supports operational efficiency, allowing Data & Analytics (D&A) teams to prevent and resolve quality issues. Lineage is essential for compliance with regulations such as GDPR and CCPA, as it tracks data transformations, the flow of sensitive data such as PII (personally identifiable information) and its use. Data lineage also helps foster collaboration by providing visibility into data processes, which builds trust in data products. In essence, CDOs and other D&A leaders should be aware of data lineage because it is a strategic tool that enhances governance by making data flows visible across the organization, improves decision making and ensures business agility.

Data Lineage: Supply-Chain Management in the Data World

Today, organizations use multiple systems for data and analytics. Not only do they use different software offerings for different purposes (e.g., a BI system for data analysis, a data warehouse offering for data management): But we increasingly see multiple pieces of software being used to solve similar challenges, especially in larger organizations. This may be the result of mergers and acquisitions, or simply the need to cover different use cases. In addition, the number of source systems is increasing rather than decreasing. All these developments lead to decentralized and complex data landscapes that need to be integrated.

The reason why integration is necessary lies in another observation: Organizations developed highly relevant analytical use cases on top of the mentioned data that differentiate them from their competitors. However, a minor oversight in upstream data processing can lead to huge problems in the application of the data. This is no different than complicated supply chains, e.g. in the car industry: Imagine a steel mill that has a flaw in its process and produces poor quality steel in one batch – it might lead to a complete standstill of a car production line further downstream. A clear understanding of the supply chain helps here to resolve the issue as fast as possible and mitigate the risk if needed.

To manage and prevent such issues in data processing, organizations employ data lineage – the practice of tracing and documenting data’s journey from its origin to its endpoint. This post explores data lineage, its generation and the technologies that streamline this process.

Data Lineage – System Level Lineage
Figure 1: Simplified lineage diagram on the system level: Data flows from left to right through the data landscape and transformations are applied

Understanding Data Lineage

Data lineage refers to the process of tracking and documenting the flow of data from its origin through various transformations and processes until it reaches its destinations. It provides a comprehensive view of the data’s journey, including where it originated, how it has been transformed and where it is ultimately stored or consumed. Often, lineage is visualized in flow charts. These visualize the flow of data from source (left) to destination (right) in some solutions even with different layers: Starting on an abstract level (system level lineage), one can dive deeper to table-level and column-level.

Weiterlesen mit BARC+.

Exklusiver Inhalt für Abonnenten von BARC+

Weiterlesen mit dem Digital-Abo BARC+. Für alle, die wissen wollen, was die Data & Analytics-Welt wirklich bewegt.

BARC+

Für nur 79€ im Monat (948€ im Jahr) erhalten Sie Zugang zu allen kostenpflichtigen Inhalten auf www.barc.com.  

Ihre Vorteile:

  • Erhalten Sie unabhängige Informationen zu Trends, Marktentwicklungen und Softwarelösungen aus den Bereichen Data, Analytics, Business Intelligence, Data Science und Corporate Performance Management
  • Treffen Sie Ihre Entscheidungen rund um Data & Analytics auf Basis von Zahlen, Daten, Fakten und Experten-Know-how
  • Zugriff auf alle Premium-Artikel und unseren gesamten Research, unter anderem alle Softwarevergleichsstudien, Scores, Surveys und die Premiumversion der BARC Data & Analytics Gehaltsstudie
  • Unbeschränkter Zugang zur BARC-Mediathek
  • Konsumieren Sie die Inhalte unbeschränkt und überall

Haben Sie bereits ein Abo von BARC+? Hier anmelden.

Gefällt Ihnen dieser Beitrag?
Wir haben noch viel mehr davon! Schließen Sie sich über 25.775 Data & Analytics Professionals an, um der Konkurrenz einen Schritt voraus zu bleiben.

Weitere Inhalte entdecken

Über den Autor

Sie wollen mit Ihrem Datenprojekt durchstarten, aber wissen nicht so recht, wie? Hier geht's zu unseren SmartStart Workshops zu verschiedenen Themen.