Overview
An overview of data catalogs by BARC Analyst Timm Grosser, including tips on how to select the right data cataloging solution for your organization.
Data is essential for companies to keep up with the digital age. Everyone knows that by now. But it’s not so easy to extract the desired value from data and shine with innovative, data-driven business applications. Instead, we often see data chaos that has been growing for years in the form of fragmented data landscapes and distributed expert knowledge.
A hotly discussed technological approach to make knowledge of distributed data available is the data catalog, the “Yellow Pages” for business-relevant data. It stores information about data in the form of metadata and structures, and makes it searchable.
A data catalog tool achieves its usefulness primarily through three essential points:
- covering information needs quickly and easily
- capturing and curating metadata (knowledge) as efficiently (automated) as possible
- providing a platform for the exchange of knowledge for “all”
In addition, functions for data governance and/or data access are valuable.
Finding the right tool can be more complicated than you might expect. The market for data catalogs is anything but transparent. As with other trending areas, the range of products is exploding and we are now aware of more than 90 solutions with data cataloging functions operating worldwide. But not all data cataloging is the same. These offerings vary in focus, content, features and supported use cases. The following table provides an overview of the basic tool types for data cataloging. Basically, there are options for specific use cases (as part of a BI or analytics user tool, as part of an environment) and offerings that provide a comprehensive, independent solution (specialists, as part of a data governance (DG)/data management (DM) platform).
Data Cataloging tool types:
Catalog scenario | Characteristics | Tool examples |
---|---|---|
…homemade | Rudimentary catalog functions | Excel, Confluence, Wiki, … |
…as part of a BI/analytics tool | Catalog functions related to the data/artifacts in the environment | Alteryx, Qlik, Tableau, … |
…as part of an environment | Catalog functions related to technical data/artifacts in the environment | Amazon, Cloudera, Google, … |
…as specialist | Comprehensive catalog functions related to data and partly artifacts from different tools/environments, added functionality such as data governance | Alation, Waterline, Zeenea, … |
…as part of a data governance/DM platform | Comprehensive catalog functions related to data and partly artifacts from different tools/environments. Additional functionality from the portfolio (e.g., workflows, data quality, etc.) | ASG, Collibra, Infogix, Informatica, SAP, … |
Pay particular attention to interfaces and transparent, open metadata models for metadata exchange with other catalogs and systems when selecting a data catalog. This offers you a number of advantages:
- You avoid vendor lock-in and can use the tool’s capabilities in a targeted manner
- You can more easily transfer catalogs from different areas or environments to a parent catalog
- It allows easier migration or integration with more powerful tools or tools with a different focus
When selecting a data catalog, its functions should be carefully checked. A checklist should normally include:
- Adapters and functions for metadata integration and exchange
- Supported content (e.g., supported metadata types, openness and extensibility of the metadata model)
- Functions and machine support for the maintenance (curation) of metadata
- Functions and machine support for catalog use and search/navigation/analysis of metadata
- Ease of use
- Support for collaboration
- Further data management functions (e.g., for data governance, data preparation, data quality and data protection)
We are also happy to support you directly – with our best practice experience, established process models and numerous templates – through the entire selection process from requirements gathering to the creation of a shortlist, proof of concept support and deciding which tool to use. This gives you greater decision security, saves you time and resources and provides you with a partner who can help to create a data cataloging roadmap which is both transparent and acceptable to management and relevant stakeholders.