May 28, 2020

Briefing Insights: Dremio – The “Data Lake Engine”

BARC analyst Timm Grosser introduces the Data Lake Engine from Dremio.

Reflections on BARC’s recent briefing with Dremio by Timm Grosser, BARC’s Senior Analyst for Data Management

What is a data lake engine?

In short, it should help to find data in its (cloud) data lake quickly and easily, and to evaluate it with a high level of query performance. Technically speaking, it is an SQL-based query engine with a semantic layer that enables queries on different data storage systems (on-premises or cloud-based). It acts as a central access point for JDBC/ODBC-compatible user tools.

About Dremio

Dremio was established in 2015 with headquarters in Santa Clara, USA. Currently, around 120 employees work for the technology supplier. Customers include companies such as Diageo, Microsoft, NCR, PayPal, Standard Chartered and Transunion. In the DACH region, DATEV, DB Cargo, Henkel and Software AG (Cumulocity IoT) already use Dremio. Datev, DBCargo and Henkel are among the showcase customers in the DACH region. Dremio is suitable for use by companies from all industry sectors. A dedicated team was established in early 2020 to focus on the German-speaking market. The company plans to expand this team in the future.

In 2018, Dremio Enterprise Edition was launched as a supplement to the open source Dremio Community Edition product. The Enterprise Edition primarily includes additional enterprise functions related to data protection and security as well as services. Dremio can be used on-premises and/or in your own cloud account (AWS, Azure).

Dremio is available in the AWS and Azure marketplaces and is a co-sell partner of both these providers.

Another strong global partner is Tableau. Tableau uses Dremio primarily for SQL data access to distributed file systems and has already convinced several customers to work with Dremio.

Dremio is leveraged and has recently received a US$70 million cash injection.

The mission

With its technology, Dremio aims to simplify and accelerate access to data for analytical workflows and to make this more cost-effective than other players in the market. Its cost-effectiveness extends beyond technology license fees: its approach of not moving and duplicating data in the overall architecture also saves costs. Dremio follows the approach of providing fast, flexible access to (distributed) data via a user-friendly interface. This is to avoid additional persistent layers such as aggregations and to give users a platform to perform ad hoc analyses.

Its main users are business analysts, data scientists and data engineers. Dremio considers it important to be seen as an agnostic tool. It enables the querying of different data storage technologies at different locations (cross-cloud, on-premises/cloud, etc.).

The technology

Dremio goes back to Apache Drill, an SQL engine for Hadoop. This has never really caught on for analytical workloads, mainly for reasons of performance and complex handling. Apache Arrow – a technology that Dremio makes use of – was co-developed by Dremio co-founder and CTO Jaques Nadeau and is still being developed. Apache Arrow provides a cross-language development platform for in-memory data and specifies a standardized language-independent column storage format for flat and hierarchical data. This becomes interesting when it comes to linking different data stores with different formats for analytical queries and still delivers good performance.

Dremio provides connectors to various relational and non-relational storage technologies and distributed file systems. The next step is to execute queries on the connected systems. These queries are defined as virtual datasets and are executed (live) at the time of execution. Each query uses acceleration mechanisms such as massive parallel processing (MPP), query optimization or push-down options. The push-down allows the delegation of workload to the source systems. One of Dremio’s main performance features is called “Reflections”. These resemble “materialized views” and persist physically optimized data representation in column-based Parquet files if desired. With each query, Dremio checks whether a persisted (precalculated) intermediate result is available, thus saving computing time. An internal catalog is available for data searches in the technical metadata. The technical metadata can be tagged with wikis and tags, making it findable for the more professional user. The solution does not replace an enterprise data catalog, but can be integrated with one.

Further expansion of Reflections is a particularly exciting aspect of Dremio’s roadmap. Today, these still have to be set up manually. In the future, the system is set to provide “intelligent” support with the help of data from the query behavior.

Don‘t miss out!

Join over 25,775 data & analytics professionals and get the latest product insights, research, surveys and more!

Analyst opinion

Dremio is a query engine for analytical workloads, preferably on (cloud) data lakes. The technology offers an approach to virtually merge today’s complex, heterogeneous system landscapes. The idea of an access layer across different cloud offerings seems especially attractive and opens up the possibility to operate on more than one cloud platform, giving the analyst a lot of flexibility in data delivery. A technology/vendor lock-in is avoided.

Dremio calls itself a query engine with a semantic layer and leaves the data processing to the specialists. This is how it successfully differentiates itself from providers such as Databricks and Denodo. The promise: high performance at low cost.

It remains to be seen to what extent its performance convinces customers. We are looking forward to finding out more in our upcoming reference customer meetings and technological deep dives.

The Data Management Survey

Nutzen Sie Software für Data Intelligence, Data Catalogs & Marketplaces, Cloud Data Platforms, Data Observability & Quality, Data Warehouse Automation oder Data Product Engineering? Dann würden wir gerne Ihre Erfahrungen dazu hören!

Nehmen Sie jetzt an der weltweit größten Umfrage zum Thema Data Management teil:

Discover more content

Artikel

June 19, 2026

The AI-Ready Data Formula: Why Context Decides Whether AI Succeeds

Score

May 22, 2026

BARC Score Data Intelligence Platforms 2026

Artikel

May 13, 2026

What the Web Is Saying About SAP’s Three Acquisitions (Reltio, Dremio, Prior Labs)

Artikel

May 4, 2026

BARC Perspective on SAP’s Dual Acquisition of Prior Labs and Dremio

Artikel

March 31, 2026

BARC Perspective – SAP Acquires Reltio

Artikel

March 3, 2026

Lessons from Grandma: What Really Makes AI Agents Reliable

Artikel

February 16, 2026

Data Sovereignty and Performance: A Test for European Cloud Providers

Artikel

February 12, 2026

Data Culture Podcast 2025: Top Episodes

Artikel

February 9, 2026

SAP data and analytics 2026: From roadmap to reality

Artikel

February 2, 2026

Data, BI and Analytics Trend Monitor 2026 – Recommendations & Methodology

Artikel

January 26, 2026

Data, BI and Analytics Trend Monitor 2026 – Top 20 Trends in Detail

Artikel

January 22, 2026

Running Two Speeds. How Data Leaders Balance Governance And Innovation

Author(s)

Timm Grosser

Senior Analyst Data & Analytics

Timm Grosser is a Senior Analyst Data & Analytics at BARC with a focus on data strategy, data governance and data management. His core expertise is the definition and implementation of data & analytics strategy, organization, architecture and software selection.

He is a popular speaker at conferences and seminars and has authored numerous BARC studies and articles.

Briefing Insights: Dremio – The “Data Lake Engine”

What is a data lake engine?

About Dremio

The mission

The technology

Analyst opinion

Discover more content

The AI-Ready Data Formula: Why Context Decides Whether AI Succeeds

BARC Score Data Intelligence Platforms 2026

What the Web Is Saying About SAP’s Three Acquisitions (Reltio, Dremio, Prior Labs)

BARC Perspective on SAP’s Dual Acquisition of Prior Labs and Dremio

BARC Perspective – SAP Acquires Reltio

Lessons from Grandma: What Really Makes AI Agents Reliable

Data Sovereignty and Performance: A Test for European Cloud Providers

Data Culture Podcast 2025: Top Episodes

SAP data and analytics 2026: From roadmap to reality

Data, BI and Analytics Trend Monitor 2026 – Recommendations & Methodology

Data, BI and Analytics Trend Monitor 2026 – Top 20 Trends in Detail

Running Two Speeds. How Data Leaders Balance Governance And Innovation

Author(s)

Ready for the data-driven business of tomorrow?

Then let's bring it to life together