STARBURST ENTERPRISE PRESTO: NATIVE DELTA LAKE

STARBURST ENTERPRISE PRESTO:

NATIVE DELTA LAKE READER

The secure, enterprise-grade distribution of the open

source Presto SQL query engine now includes a Native

Delta Lake Reader.

Open-sourced by Databricks in 2019, Delta Lake enables data modication

and optimizations in data lakes. The Native Delta Lake Reader helps Databricks

customers take advantage of Presto’s speed, concurrency, and scalability to query

their Delta Lake. Starburst and Databricks share many of the same enterprise

customers, and this new tool provides enterprises with greater cost control,

exibility, and speed of access to the data in their data lakes.

Databricks Delta Lake

As object storage became increasingly popular over the last decade, a frustrating

aw became apparent. Updating data such as customer or product information

was a very difcult, time-consuming process. Database engineers were constantly

forced to modify, join, and overwrite tables. Databricks changed this with its Delta

Lake, a storage platform that lets users easily update and modify data stored

in a cloud data lake. Additionally, Delta Lake provides performance and le

management optimizations which didn’t exist in cloud data lakes.

Starburst Native Delta Lake Reader

On a mission to power analytics anywhere, Starburst recognized the need to

support the leading technology that enables ACID transactions and performance

optimizations on top of object storage. Starburst’s Native Delta Lake Reader was

written from scratch, specically for Delta Lake, to make it as efcient as possible.

Features include:

•

Fast, efcient reads of Delta Lake transaction logs

•

Support for data skipping to enhance performance

•

Optimization of queries using Delta Lake le statistics

Starburst always supports the best platforms rst — and Delta Lake is the industry

leader and a favorite tool of some of our top enterprise customers.

DATASHEET

Starburst Enterprise Presto: Native Delta Lake Reader

Starburst & Databricks: Complementary Platforms

Databricks and Starburst share many of the same customers, and these large global enterprises have been asking for

a Native Delta Lake Reader. Our shared customers overturn a common misconception in the industry — the idea that

Databricks and Starburst are competitors. Instead, the two are complementary. Enterprises use Spark and Databricks

for Machine Learning, AI, ETL, and streaming ingestion, while Starburst Enterprise Presto is their high-concurrency SQL

query engine, providing a single point of access to all of their data.

How the Native Delta Lake Reader Works

If your enterprise has distributed storage, whether

in S3, ADLS, or another system, you probably have

tables logically dened on top of these storage

platforms. Files reside inside the tables, and in

Delta Lake, these les are stored in an open-

source format called Parquet. In Delta Lake, each

time there is a change to a le — when customer

information is modied, for example — this is added

to the transaction log, along with an associated

timestamp. Starburst’s new tool reads the Delta

Log and les (when needed) and extends multiple

benets to end users.

Data Sources

Other Data Sources

Data Lake Machine Learning / AI

High Concurrency SQL Engine SQL Analysis & BI Reporting

BRONZE

SILVER

GOLD

CLOUD DATA APPS

CLOUD DATA SOURCES AWS S3 / AZURE BLOB / ADLS

ON-PREMISES

Streaming

Batch

Customer Table

Table Files

End Users

Delta Log

ACID Transactions

Previously you’d have to rewrite an entire table to add a

new customer address or some other additional piece

of information — now you can simply run an update or

merge statement.

Governance

GDPR sometimes requires companies to remove specic

customer data. This was not possible with Hadoop and

previous storage platforms, but Delta Lake makes it as

simple as a delete or update statement. As part of our

commitment to ne-grained global security, Starburst’s

Native Delta Lake Reader supports this functionality.

Data Skipping

Min, max, nulls, counts and other high-level statistics

are stored on each le. Starburst took advantage of this

when designing the Native Delta Lake Reader. Users can

quickly narrow which les they actually need to query.

The Starburst connector feeds this information into the

Presto Cost Based Optimizer, which greatly improves

performance by reducing the amount of les that

actually need to be read for a query.

Vacuum

The process of updating data does leave you with many

small les, and although object storage can be pretty

affordable, it’s not that cheap. Delta Lake allows you

to run a vacuum command that clears up the table and

gets rid of older les.

Optimize

The well-known small le problem can be a drag

on performance, so Delta Lake added an optimize

command that combines small les into larger ones,

which greatly increases performance. Starburst’s Native

Delta Lake Reader allows users to take advantage of

this as well.

Z-Ordering

After you optimize les, and are left with, say, 10,000

les instead of 100,000, you can order them by the

columns of your choosing. Included in the optimize

command, you can optionally choose to z-order

these les by one or more columns. This increases

performance for queries that include the z-ordered

columns in their predicate.

To operate the Native Delta Lake Reader in Starburst Enterprise Presto, you can either keep separate

metastores or utilize a shared metastore.

STARBURSTDATA.COM

The Reader allows you to:

•

Create tables in Starburst Presto that point to

Delta Lake tables

•

Execute queries on Delta Lake data as you would with

normal tables

•

Optimize and z-order your tables to improve performance

All of this is seamless to the end user. Overall, the Native

Delta Lake Reader adds to the 27+ connectors included with

the Starburst Enterprise platform, and aligns with Starburst’s

mission to give its customers a single point of access to all of

their data — no matter where it resides.