Data Lakes without Hadoop

It seems like migrating to the cloud has dominated the news and a lot of companies are shuttering their data centers and letting cloud providers handle it for them. Reasons such as elasticity, simplicity, and infrastructure agility are all great reasons but there are many companies that continue to host their own infrastructure. The reasons could be security or they believe the cloud doesn’t provide the cost benefits in their scenario.

For these companies, building a data lake usually means…

True Separation of Storage and Compute

For the last few years, the hot topic in any organization is the separation of storage and compute. With data volumes increasing on a daily basis as well as the types of data being stored, placing this data on a flexible storage medium such as HDFS and cloud object storage such as Amazon's S3 and Azure's Blob storage provides a company with great flexibility on when and where they consume this data.

Presto Join Enumeration

Welcome back to the series of blog posts (checkout our previous post!) about Presto's first Cost-Based Optimizer! Today let's focus on the challenge of choosing the optimal join order. The order by which relations are joined affects performance of a query substantially. Poor join order might introduce unnecessary CPU and network overhead. To overcome that, the Starburst Presto release includes a state-of-the art join enumeration algorithm that will greatly benefit its users. Let’s first do a quick introduction how Presto join enumerator will speed up your common queries and then we will discuss the algorithm in more details.

Introduction to Presto Cost-Based Optimizer

The Cost-Based Optimizer (CBO)  we have released just recently achieves stunning results in industry standard benchmarks (and not only in benchmarks)! The CBO makes decisions based on several factors, including shape of the query, filters and table statistics. I would like to tell you more about what the table statistics are in Presto and what information can be derived from them.

Presto Cost-Based Optimizer rocks the TPC benchmarks!

As mentioned in our previous blog about the Starburst Presto release and its hottest addition - the Cost Based Optimizer for Presto we’re happy to share the results of benchmarks we did for this release (195e) comparing it to the ‘vanilla’ Presto release 195. Now we will continue on the process of getting all those CBO-related changes merged into the ‘vanilla’ Presto repository.

The benchmarks were performed using a standard set of TPC-H and TPC-DS queries. As a side-note, I would like to highlight that, thanks to our team’s contributions throughout the last couple years, Presto supports 100% TPC benchmark queries and executes them unmodified! That is with no prohibited query modifications. You can find the queries in our repository.

Starburst Enterprise Distribution of Presto 195e Now Available!

Today, I am pleased to announce the availability of Presto 195e including Presto’s first Cost Based Optimizer! With the new optimizer you should expect to see significant improvements in Presto’s query performance.  Our team, in collaboration with Facebook, spent the last year heads down working on it, so you can understand why we are pretty excited that this day has finally come!  You can read more about Starburst’s state of the art optimizer here.

Presto gets EVEN FASTER, with a 10-15x performance boost in upcoming release!

Next week, we will be releasing the Starburst Distribution of Presto 195e. Based on prestodb/presto 0.195, Starburst’s 195e will ship with Presto’s first cost-based optimizer! In our performance testing and in collaboration with customers in our beta program, we are measuring greater than an order of magnitude performance improvement for many analytical queries such as TPC-H and TPC-DS queries.