Amazon Athena Execution Plans - Types and Formats

Amazon Athena Execution Plans - Types and Formats

Neil Addison
Neil Addison

Estimated Reading Time: 10 minutes
Last Updated: 2025-01-02

Disclaimer: The details described in this post are the results of my own research and investigations. While some effort has been expended in ensuring their accuracy - with ubiquitous references to source material - I cannot guarantee that accuracy. The views and opinions expressed on this blog are my own and do not necessarily reflect the views of any organization I am associated with, past or present.

Introduction

You click Run Query. 5 minutes go by, 10 minutes, 20. Hundreds of GBs have been scanned. And yet, it’s still running. Finally, at 29 minutes and 45 seconds: Query Completed.

Why though? Why did it take so long? What exactly was Athena doing for all that time?

Fortunately, Amazon Athena gives us a way to answer that very question: Execution Plans.

What is an Execution Plan?

An Execution Plan is a construct that details the intended execution steps of the Athena query engine to logically and physically calculate the results for a SQL query.

These execution steps are formulated as a hierarchy of ‘plan’ nodes, with each node representing an operation or transformation of the dataset being considered. More information about the various nodes can be found here and here. We will explore the different nodes/steps, which we will generally refer to as ‘operations’, in the next post.

Athena is based on - though note, not necessarily identical to - the open-source SQL engine Trino, which in turn is a fork of the Presto SQL engine. All three share the syntax for deriving execution plans through the EXPLAIN syntax[1][2]:

EXPLAIN [ (option [, ...]) ] statement;

where option is one of:

FORMAT { TEXT | GRAPHVIZ | JSON }
TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO }

Here are some example EXPLAIN statements:

A) EXPLAIN (FORMAT TEXT, TYPE LOGICAL) SELECT * FROM table;
B) EXPLAIN (TYPE DISTRIBUTED) SELECT * FROM table;
C) EXPLAIN (FORMAT JSON, TYPE DISTRIBUTED) SELECT * FROM table;
D) EXPLAIN SELECT * FROM table;

In this post, we will describe the types and formats of Execution Plans that can be generated for a query.

First, however, we need to set up some test tables.

Setting Up Test Data

To illustrate the different execution plans, I used ChatGPT to generate 5 tables of fake data as CSV files, representing sales within a supermarket. Here is a simplified entity relationship diagram for this data model, with the number of rows in each table:

ERD

Figure 1: Entity-Relationship Diagram (ERD) for a supermarket data model

These files were uploaded to an S3 bucket and crawled using an AWS Glue Crawler to add them to the AWS Glue Data Catalog. These tables:

  • are Apache Hive tables (i.e. not Apache Iceberg)
  • are not partitioned
  • are not indexed
  • do not have column statistics

The query that we will review through execution plans is:

EXPLAIN (TYPE <type>, FORMAT <format>)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

That is, for each day and product category, we want to calculate the total amount of money transacted by customers, and for the results to be sorted by transaction date in descending order and by category name.

With the data structures set up, we can review the available Execution Plan options.

Execution Plan Types

First, we will explore the different types of execution plans that Athena/Trino can create, including potential uses for them.

TYPE LOGICAL

A Logical execution plan[3] focuses on what logical operations need to be executed on the query engine to derive the result, rather than specifically how the operations will be executed as units of work on the physical architecture.

Query:

EXPLAIN (TYPE LOGICAL)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

Logical Execution Plan:

Query Plan Output[columnNames = [transaction_date, category_name, sum_total_amount]] │ Layout: [transaction_date:date, category:varchar, round:decimal(38,2)] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} │ category_name := category │ sum_total_amount := round └─ RemoteMerge[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} └─ LocalMerge[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} └─ PartialSort[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} └─ RemoteExchange[type = REPARTITION] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?} └─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ round := round("sum", 2) └─ Aggregate[type = FINAL, keys = [transaction_date, category], hash = [$hashvalue]] │ Layout: [transaction_date:date, category:varchar, $hashvalue:bigint, sum:decimal(38,2)] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ sum := sum("sum_11") └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue], arguments = ["transaction_date", "category"]] │ Layout: [transaction_date:date, category:varchar, sum_11:varbinary, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} └─ RemoteExchange[type = REPARTITION, hashColumn = [$hashvalue_12]] │ Layout: [transaction_date:date, category:varchar, sum_11:varbinary, $hashvalue_12:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?} └─ Aggregate[type = PARTIAL, keys = [transaction_date, category], hash = [$hashvalue_25]] │ Layout: [transaction_date:date, category:varchar, $hashvalue_25:bigint, sum_11:varbinary] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ sum_11 := sum("total_amount") └─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar, $hashvalue_25:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ $hashvalue_25 := combine_hash(combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_date"), 0)), COALESCE("$operator$hash_code"("category"), 0)) └─ InnerJoin[criteria = ("category" = "category_name"), hash = [$hashvalue_21, $hashvalue_22], distribution = REPLICATED] │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ Distribution: REPLICATED │ dynamicFilterAssignments = {category_name -> #df_816} ├─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar, $hashvalue_21:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("category"), 0)) │ └─ InnerJoin[criteria = ("product_id" = "product_id_4"), hash = [$hashvalue_17, $hashvalue_18], distribution = REPLICATED] │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar] │ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ │ Distribution: REPLICATED │ │ dynamicFilterAssignments = {product_id_4 -> #df_817} │ ├─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), product_id:varchar, $hashvalue_17:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ │ $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("product_id"), 0)) │ │ └─ InnerJoin[criteria = ("transaction_id" = "transaction_id_0"), hash = [$hashvalue_13, $hashvalue_14], distribution = REPLICATED] │ │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), product_id:varchar] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ │ │ Distribution: REPLICATED │ │ │ dynamicFilterAssignments = {transaction_id_0 -> #df_818} │ │ ├─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_transactions, dynamicFilters = {"transaction_id" = #df_818}, projectLocality = LOCAL, protectedBarrier = NONE] │ │ │ Layout: [transaction_id:varchar, transaction_date:date, total_amount:decimal(10,2), $hashvalue_13:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ │ $hashvalue_13 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_id"), 0)) │ │ │ transaction_id := transaction_id:string:REGULAR │ │ │ transaction_date := transaction_date:date:REGULAR │ │ │ total_amount := total_amount:decimal(10,2):REGULAR │ │ └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_14], arguments = ["transaction_id_0"]] │ │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_14:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ └─ RemoteExchange[type = REPLICATE] │ │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_15:bigint] │ │ │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} │ │ └─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_transaction_details, dynamicFilters = {"product_id" = #df_817}, projectLocality = LOCAL, protectedBarrier = NONE] │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_16:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ $hashvalue_16 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_id_0"), 0)) │ │ product_id := product_id:string:REGULAR │ │ transaction_id_0 := transaction_id:string:REGULAR │ └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_18], arguments = ["product_id_4"]] │ │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_18:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ └─ RemoteExchange[type = REPLICATE] │ │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_19:bigint] │ │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} │ └─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_products, dynamicFilters = {"category" = #df_816}, projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_20:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ $hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("product_id_4"), 0)) │ product_id_4 := product_id:string:REGULAR │ category := category:string:REGULAR └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_22], arguments = ["category_name"]] │ Layout: [category_name:varchar, $hashvalue_22:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} └─ RemoteExchange[type = REPLICATE] │ Layout: [category_name:varchar, $hashvalue_23:bigint] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} └─ ScanProject[table = awsdatacatalog:aept_db:supermarket_categories, projectLocality = LOCAL, protectedBarrier = NONE] Layout: [category_name:varchar, $hashvalue_24:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} $hashvalue_24 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("category_name"), 0)) category_name := category_name:string:REGULAR

It is a translation of the query statement into a sequence of operations - such as Scan, Filter, Project, Join, Aggregate, Partition Pruning - that are understood by the query engine, in the order they need to be executed. Despite abstraction, the logical plan appears to still be cognizant of the underlying architecture, as it will include operations such as LocalExchange and RemoteExchange to transfer data within and between worker nodes.

Amazon Athena also uses both a Cost-Based Optimizer (CBO) and a Rule-Based Optimizer (RBO) to determine the optimal plan/sequence of operations. For example, a WHERE predicate could be applied early when scanning a table (predicate pushdown) to reduce the amount of data scanned. However, for the CBO to be used, column statistics need to be generated for the underlying tables[4].

Logical Plan Example

Figure 2: Operations within a Logical execution plan

In the above image, we can see ScanFilterProject, InnerJoin and Project operations executed in order within the logical plan. We will wade through what all of the operations mean in a future post.

TYPE DISTRIBUTED

A Distributed execution plan focuses on how the operations outlined in the logical plan will be executed on the physical Athena worker nodes allotted to your query. As such, it is sometimes referred to as the ‘physical’ plan.

Query:

EXPLAIN (TYPE DISTIBUTED)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

Distributed Execution Plan:

Query Plan Fragment 0 [SINGLE] Output layout: [transaction_date, category, round] Output partitioning: SINGLE [] Output[columnNames = [transaction_date, category_name, sum_total_amount]] │ Layout: [transaction_date:date, category:varchar, round:decimal(38,2)] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} │ category_name := category │ sum_total_amount := round └─ RemoteMerge[sourceFragmentIds = [1]] Layout: [round:decimal(38,2), category:varchar, transaction_date:date] Fragment 1 [ROUND_ROBIN] Output layout: [round, category, transaction_date] Output partitioning: SINGLE [] LocalMerge[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} └─ PartialSort[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} └─ RemoteSource[sourceFragmentIds = [2]] Layout: [round:decimal(38,2), category:varchar, transaction_date:date] Fragment 2 [HASH] Output layout: [round, category, transaction_date] Output partitioning: ROUND_ROBIN [] Project[projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ round := round("sum", 2) └─ Aggregate[type = FINAL, keys = [transaction_date, category], hash = [$hashvalue]] │ Layout: [transaction_date:date, category:varchar, $hashvalue:bigint, sum:decimal(38,2)] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ sum := sum("sum_11") └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue], arguments = ["transaction_date", "category"]] │ Layout: [transaction_date:date, category:varchar, sum_11:varbinary, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} └─ RemoteSource[sourceFragmentIds = [3]] Layout: [transaction_date:date, category:varchar, sum_11:varbinary, $hashvalue_12:bigint] Fragment 3 [SOURCE] Output layout: [transaction_date, category, sum_11, $hashvalue_25] Output partitioning: HASH [transaction_date, category][$hashvalue_25] Aggregate[type = PARTIAL, keys = [transaction_date, category], hash = [$hashvalue_25]] │ Layout: [transaction_date:date, category:varchar, $hashvalue_25:bigint, sum_11:varbinary] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ sum_11 := sum("total_amount") └─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar, $hashvalue_25:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ $hashvalue_25 := combine_hash(combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_date"), 0)), COALESCE("$operator$hash_code"("category"), 0)) └─ InnerJoin[criteria = ("category" = "category_name"), hash = [$hashvalue_21, $hashvalue_22], distribution = REPLICATED] │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ Distribution: REPLICATED │ dynamicFilterAssignments = {category_name -> #df_816} ├─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar, $hashvalue_21:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("category"), 0)) │ └─ InnerJoin[criteria = ("product_id" = "product_id_4"), hash = [$hashvalue_17, $hashvalue_18], distribution = REPLICATED] │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar] │ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ │ Distribution: REPLICATED │ │ dynamicFilterAssignments = {product_id_4 -> #df_817} │ ├─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), product_id:varchar, $hashvalue_17:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ │ $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("product_id"), 0)) │ │ └─ InnerJoin[criteria = ("transaction_id" = "transaction_id_0"), hash = [$hashvalue_13, $hashvalue_14], distribution = REPLICATED] │ │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), product_id:varchar] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ │ │ Distribution: REPLICATED │ │ │ dynamicFilterAssignments = {transaction_id_0 -> #df_818} │ │ ├─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_transactions, dynamicFilters = {"transaction_id" = #df_818}, projectLocality = LOCAL, protectedBarrier = NONE] │ │ │ Layout: [transaction_id:varchar, transaction_date:date, total_amount:decimal(10,2), $hashvalue_13:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ │ $hashvalue_13 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_id"), 0)) │ │ │ transaction_id := transaction_id:string:REGULAR │ │ │ transaction_date := transaction_date:date:REGULAR │ │ │ total_amount := total_amount:decimal(10,2):REGULAR │ │ └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_14], arguments = ["transaction_id_0"]] │ │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_14:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ └─ RemoteSource[sourceFragmentIds = [4]] │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_15:bigint] │ └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_18], arguments = ["product_id_4"]] │ │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_18:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ └─ RemoteSource[sourceFragmentIds = [5]] │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_19:bigint] └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_22], arguments = ["category_name"]] │ Layout: [category_name:varchar, $hashvalue_22:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} └─ RemoteSource[sourceFragmentIds = [6]] Layout: [category_name:varchar, $hashvalue_23:bigint] Fragment 4 [SOURCE] Output layout: [transaction_id_0, product_id, $hashvalue_16] Output partitioning: BROADCAST [] ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_transaction_details, dynamicFilters = {"product_id" = #df_817}, projectLocality = LOCAL, protectedBarrier = NONE] Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_16:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} $hashvalue_16 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_id_0"), 0)) product_id := product_id:string:REGULAR transaction_id_0 := transaction_id:string:REGULAR Fragment 5 [SOURCE] Output layout: [product_id_4, category, $hashvalue_20] Output partitioning: BROADCAST [] ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_products, dynamicFilters = {"category" = #df_816}, projectLocality = LOCAL, protectedBarrier = NONE] Layout: [product_id_4:varchar, category:varchar, $hashvalue_20:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} $hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("product_id_4"), 0)) product_id_4 := product_id:string:REGULAR category := category:string:REGULAR Fragment 6 [SOURCE] Output layout: [category_name, $hashvalue_24] Output partitioning: BROADCAST [] ScanProject[table = awsdatacatalog:aept_db:supermarket_categories, projectLocality = LOCAL, protectedBarrier = NONE] Layout: [category_name:varchar, $hashvalue_24:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} $hashvalue_24 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("category_name"), 0)) category_name := category_name:string:REGULAR

This plan is also the default plan generated for you in Amazon Athena if you simply use the clause EXPLAIN without format or type specifiers:

EXPLAIN <query>

A distributed plan splits the logical plan into sections of work called fragments, which are then represented as stages. A Stage is the “runtime incarnation of a plan fragment”[5, pg.54], essentially a collection of multiple worker tasks that need to be completed to fulfil the stage. This allows tasks from each stage to be submitted in parallel to the worker nodes, improving performance. The O’Reilly Trino Guide[5] gives a detailed explanation of how the Trino/Athena coordinator distributes work.

TYPE VALIDATE

The Validate execution plan checks that the issued query has the correct syntax and that the underlying data sources exist. This may be useful if you are using Prepared Statements/Parameterized Queries with dynamic values[6] and want to capture invalid queries before they are submitted to the query engine.

Query:

EXPLAIN (TYPE VALIDATE)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

Validation Execution Plan:

Valid true

TYPE IO

The I/O[7] execution plan focuses on the input and output operations involved in executing the query. That is, it estimates, for each data object (table/partition/view etc):

  • the number of rows that will be read
  • the number of bytes that will be read
  • the maximum memory usage
  • the CPU component of the query cost
  • the network component of the query cost

Query:

EXPLAIN (TYPE IO)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

I/O Execution Plan:

Query Plan { "inputTableColumnInfos" : [ { "table" : { "catalog" : "awsdatacatalog", "schemaTable" : { "schema" : "aept_db", "table" : "supermarket_transactions" } }, "constraint" : { "none" : false, "columnConstraints" : [ ] }, "estimate" : { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "maxMemory" : 0.0, "networkCost" : 0.0 } }, { "table" : { "catalog" : "awsdatacatalog", "schemaTable" : { "schema" : "aept_db", "table" : "supermarket_categories" } }, "constraint" : { "none" : false, "columnConstraints" : [ ] }, "estimate" : { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "maxMemory" : 0.0, "networkCost" : 0.0 } }, { "table" : { "catalog" : "awsdatacatalog", "schemaTable" : { "schema" : "aept_db", "table" : "supermarket_products" } }, "constraint" : { "none" : false, "columnConstraints" : [ ] }, "estimate" : { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "maxMemory" : 0.0, "networkCost" : 0.0 } }, { "table" : { "catalog" : "awsdatacatalog", "schemaTable" : { "schema" : "aept_db", "table" : "supermarket_transaction_details" } }, "constraint" : { "none" : false, "columnConstraints" : [ ] }, "estimate" : { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "maxMemory" : 0.0, "networkCost" : 0.0 } } ], "estimate" : { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "maxMemory" : "NaN", "networkCost" : "NaN" } }

This plan can be useful for manual query optimization. By identifying the volume of data and computational cost involved in reading a data object you may, for example, determine that the object needs to be repartitioned, bucketed or requires additional filtering criteria.

If you wanted to, you could also log the total query costs generated by the IO plan so that you can identify costly queries in need of refactoring.

Default Athena Execution Plans

Athena provides an Explain button in the Query Editor that can provide execution plans for a single SQL statement.

Explain Plan Button

Figure 3: Explain Button - Generates graphical execution plans

This opens up a new tab with a Distributed plan and a Logical plan in node graphs, with an Operators tab detailing the required Stages for the Distributed plan. These graphed plans are useful because they visually represent the required execution steps, and are potentially more practical than the GRAPHVIZ format we will see later.

Default Athena Plan - Logical Plan

Figure 4: Graph of the Logical Execution Plan, generated by Athena

Default Athena Plan - Distributed Plan

Figure 5: Stages within a Distributed execution plan

On a day-to-day basis, you will likely use these default plans more than any of the others.

Execution Plan Formats

Now that we have briefly talked through the different types of execution plans, let’s review the different formats in which these plans can be output.

FORMAT TEXT

The first format is TEXT, in which the execution plan is returned as a long string, with hierarchies represented using nesting and ASCII characters. Examining a TEXT plan for the first time can be a little overwhelming, but it is densely packed with information that can be read through quickly with practice.

In the Execution Plan Types section, all of the plans except for those generated in Default Athena Execution Plans were presented in the TEXT format.

If you are so inclined, TEXT plans can be saved and compared to a plan for another query to troubleshoot or optimize. For example, the structures or statistics of your tables may change over time, and saved TEXT plans could be compared to the plan of the same query later to understand why performance may have degraded.

Query:

EXPLAIN (TYPE LOGICAL, FORMAT TEXT)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

Text (Logical) Execution Plan:

Query Plan Output[columnNames = [transaction_date, category_name, sum_total_amount]] │ Layout: [transaction_date:date, category:varchar, round:decimal(38,2)] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} │ category_name := category │ sum_total_amount := round └─ RemoteMerge[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} └─ LocalMerge[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} └─ PartialSort[orderBy = [transaction_date DESC NULLS LAST, category ASC NULLS LAST]] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: 0B} └─ RemoteExchange[type = REPARTITION] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?} └─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [round:decimal(38,2), category:varchar, transaction_date:date] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ round := round("sum", 2) └─ Aggregate[type = FINAL, keys = [transaction_date, category], hash = [$hashvalue]] │ Layout: [transaction_date:date, category:varchar, $hashvalue:bigint, sum:decimal(38,2)] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ sum := sum("sum_11") └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue], arguments = ["transaction_date", "category"]] │ Layout: [transaction_date:date, category:varchar, sum_11:varbinary, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} └─ RemoteExchange[type = REPARTITION, hashColumn = [$hashvalue_12]] │ Layout: [transaction_date:date, category:varchar, sum_11:varbinary, $hashvalue_12:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?} └─ Aggregate[type = PARTIAL, keys = [transaction_date, category], hash = [$hashvalue_25]] │ Layout: [transaction_date:date, category:varchar, $hashvalue_25:bigint, sum_11:varbinary] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ sum_11 := sum("total_amount") └─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar, $hashvalue_25:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ $hashvalue_25 := combine_hash(combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_date"), 0)), COALESCE("$operator$hash_code"("category"), 0)) └─ InnerJoin[criteria = ("category" = "category_name"), hash = [$hashvalue_21, $hashvalue_22], distribution = REPLICATED] │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ Distribution: REPLICATED │ dynamicFilterAssignments = {category_name -> #df_816} ├─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar, $hashvalue_21:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("category"), 0)) │ └─ InnerJoin[criteria = ("product_id" = "product_id_4"), hash = [$hashvalue_17, $hashvalue_18], distribution = REPLICATED] │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), category:varchar] │ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ │ Distribution: REPLICATED │ │ dynamicFilterAssignments = {product_id_4 -> #df_817} │ ├─ Project[projectLocality = LOCAL, protectedBarrier = NONE] │ │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), product_id:varchar, $hashvalue_17:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ │ $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("product_id"), 0)) │ │ └─ InnerJoin[criteria = ("transaction_id" = "transaction_id_0"), hash = [$hashvalue_13, $hashvalue_14], distribution = REPLICATED] │ │ │ Layout: [transaction_date:date, total_amount:decimal(10,2), product_id:varchar] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: 0B} │ │ │ Distribution: REPLICATED │ │ │ dynamicFilterAssignments = {transaction_id_0 -> #df_818} │ │ ├─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_transactions, dynamicFilters = {"transaction_id" = #df_818}, projectLocality = LOCAL, protectedBarrier = NONE] │ │ │ Layout: [transaction_id:varchar, transaction_date:date, total_amount:decimal(10,2), $hashvalue_13:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ │ $hashvalue_13 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_id"), 0)) │ │ │ transaction_id := transaction_id:string:REGULAR │ │ │ transaction_date := transaction_date:date:REGULAR │ │ │ total_amount := total_amount:decimal(10,2):REGULAR │ │ └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_14], arguments = ["transaction_id_0"]] │ │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_14:bigint] │ │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ └─ RemoteExchange[type = REPLICATE] │ │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_15:bigint] │ │ │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} │ │ └─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_transaction_details, dynamicFilters = {"product_id" = #df_817}, projectLocality = LOCAL, protectedBarrier = NONE] │ │ Layout: [transaction_id_0:varchar, product_id:varchar, $hashvalue_16:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ │ $hashvalue_16 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("transaction_id_0"), 0)) │ │ product_id := product_id:string:REGULAR │ │ transaction_id_0 := transaction_id:string:REGULAR │ └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_18], arguments = ["product_id_4"]] │ │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_18:bigint] │ │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ └─ RemoteExchange[type = REPLICATE] │ │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_19:bigint] │ │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} │ └─ ScanFilterProject[table = awsdatacatalog:aept_db:supermarket_products, dynamicFilters = {"category" = #df_816}, projectLocality = LOCAL, protectedBarrier = NONE] │ Layout: [product_id_4:varchar, category:varchar, $hashvalue_20:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} │ $hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("product_id_4"), 0)) │ product_id_4 := product_id:string:REGULAR │ category := category:string:REGULAR └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_22], arguments = ["category_name"]] │ Layout: [category_name:varchar, $hashvalue_22:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B} └─ RemoteExchange[type = REPLICATE] │ Layout: [category_name:varchar, $hashvalue_23:bigint] │ Estimates: {rows: ? (?), cpu: 0, memory: 0B, network: ?} └─ ScanProject[table = awsdatacatalog:aept_db:supermarket_categories, projectLocality = LOCAL, protectedBarrier = NONE] Layout: [category_name:varchar, $hashvalue_24:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B} $hashvalue_24 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("category_name"), 0)) category_name := category_name:string:REGULAR

The amount of information here is intriguing, and we will return to this plan in a future post.

FORMAT GRAPHVIZ

A GraphViz plan returns a string in DOT language, used for defining node graphs that can be visualized through the open source GraphViz software[8]. On its own the string is difficult to read, but by passing the plan to the software, a directed acyclic graph of the execution steps can be rendered.

Query:

EXPLAIN (TYPE LOGICAL, FORMAT GRAPHVIZ)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

GraphViz (Logical) Execution Plan:

Query Plan digraph logical_plan { subgraph cluster_graphviz_plan { label = "SINGLE" plannode_1[label="{Output[transaction_date, category_name, sum_total_amount]}", style="rounded, filled", shape=record, fillcolor=white]; plannode_2[label="{ExchangeNode[GATHER]|\"round\", \"category\", \"transaction_date\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_3[label="{ExchangeNode[GATHER]|\"round\", \"category\", \"transaction_date\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_4[label="{Sort[transaction_date, category]}", style="rounded, filled", shape=record, fillcolor=aliceblue]; plannode_5[label="{ExchangeNode[REPARTITION]}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_6[label="{Project|round := \"@round@52QIVVB4FS06A1O00872O8A0DMDG0M4RPK1AQ4P54MAQ6AG6BOL9R471S3HECO2OHMUJ430300VOU1N3S507F649K9968401J5J1JQ2MH90CFLBCLQ24GQPT0PU6RKCJBIBPLB71KBT7JGR2C5OU1859K15MT8HFSBJGQ887E63L2610044O014JQ1I8GQ8559VOSLJ2DL8S8S2TSE78SIR99989D59M4KVOVHGR5AS9K33KMU5EOA9DVKE12E040E59AM16L2CECGERBDM3ULVQIPR78P29TT0CFTHML69ET3GM00UH4I0HF3UHH908T0SOSPO7VFDOECS0VS1HCLNG81EIE2L67BMSQ89OOIM4J90JCCO82HTKC878BA69132U7E2E69PP1E8A\"(\"sum\", 2)\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_7[label="{Aggregate[FINAL]|sum := sum(\"sum_11\")\n}", style="rounded, filled", shape=record, fillcolor=chartreuse3]; plannode_8[label="{ExchangeNode[REPARTITION]|\"transaction_date\", \"category\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_9[label="{ExchangeNode[REPARTITION]|\"transaction_date\", \"category\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_10[label="{Aggregate[PARTIAL]|sum_11 := sum(\"total_amount\")\n}", style="rounded, filled", shape=record, fillcolor=chartreuse3]; plannode_11[label="{Project|$hashvalue_25 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(\"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BG00Q1O0I972Q830H59UHS2EOHM4ICABDH695CAG81QARTH7SAE4HOA8D4AG100J7S8SC8G2N6RKP2P6KFSI88KVD0BTI51UTQKQ4J9A7582JQ1046RFE9BD9ED1P5KRPI9IEA06P90Q3AA0AVD4B8E70KMC0G1A0SP7P75LRG529CSMOP1C61967NSIIT6OQ9PN4MUKFP7SAA16JAIN0SBP29GJO5SP5VN6QD8TL1GVHP5BL54OTNVJH4A98GCRTJ4GI408GT3458941HM2RCJSIRGVET2G1USIF0OMML4UR40F00UH460HJ3T1HT0KOUPO7T7DOCCH1V81DO9JOK6N05GVQTOKKC7EECF6D1IO6685J91VTN8=\"(\"transaction_date\"), 0)), COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"category\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_12[label="{InnerJoin|(\"category\" = \"category_name\")}", style="rounded, filled", shape=record, fillcolor=orange]; plannode_13[label="{Project|$hashvalue_21 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"category\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_14[label="{InnerJoin|(\"product_id\" = \"product_id_4\")}", style="rounded, filled", shape=record, fillcolor=orange]; plannode_15[label="{Project|$hashvalue_17 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"product_id\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_16[label="{InnerJoin|(\"transaction_id\" = \"transaction_id_0\")}", style="rounded, filled", shape=record, fillcolor=orange]; plannode_17[label="{Project|$hashvalue_13 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"transaction_id\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_18[label="{Filter|\"@$internal$dynamic_filter_function@52QIVVB4N803Q1O0M96II7J09FDMQOP8UFD34ETL5L3P5VPPC1KLGPCO3C0MLRKUG8GEL94DO9K1H12C6UQQKLI51DA5JNIGOPSI0L2UFPSHN4LP4LKF6HSOURCVCBMNAU2GE58I104O51EU0CU2HO1SO300NA65GE476UEID3KR719189ITAQC04IE9U2AFKCSEHQCG7GFT0G68NPQJBH1JIM9Q8GAP33DDNAV3O57FCKTNCFGVLF674BMSKNGKGAPLRFONG447AP93GD8BQV9MM869I6007K9AG4AOG2P90CL14SV9S78UT0V9O08F6H08E1GFQHQLPM316M1FE818GSEAM0G0IQAIAGTKCBHEF2G0B9FF2D5EQDLA15HI0FOSU4QV\"(\"transaction_id\", 'EQUAL', 'df_818', false)}", style="rounded, filled", shape=record, fillcolor=yellow]; plannode_19[label="{TableScan[awsdatacatalog:aept_db:supermarket_transactions]}", style="rounded, filled", shape=record, fillcolor=deepskyblue]; plannode_20[label="{ExchangeNode[REPARTITION]|\"transaction_id_0\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_21[label="{ExchangeNode[REPLICATE]|\"transaction_id_0\", \"product_id\", \"$hashvalue_15\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_22[label="{Project|$hashvalue_16 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"transaction_id_0\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_23[label="{Filter|\"@$internal$dynamic_filter_function@52QIVVB4N803Q1O0M96II7J09FDMQOP8UFD34ETL5L3P5VPPC1KLGPCO3C0MLRKUG8GEL94DO9K1H12C6UQQKLI51DA5JNIGOPSI0L2UFPSHN4LP4LKF6HSOURCVCBMNAU2GE58I104O51EU0CU2HO1SO300NA65GE476UEID3KR719189ITAQC04IE9U2AFKCSEHQCG7GFT0G68NPQJBH1JIM9Q8GAP33DDNAV3O57FCKTNCFGVLF674BMSKNGKGAPLRFONG447AP93GD8BQV9MM869I6007K9AG4AOG2P90CL14SV9S78UT0V9O08F6H08E1GFQHQLPM316M1FE818GSEAM0G0IQAIAGTKCBHEF2G0B9FF2D5EQDLA15HI0FOSU4QV\"(\"product_id\", 'EQUAL', 'df_817', false)}", style="rounded, filled", shape=record, fillcolor=yellow]; plannode_24[label="{TableScan[awsdatacatalog:aept_db:supermarket_transaction_details]}", style="rounded, filled", shape=record, fillcolor=deepskyblue]; plannode_25[label="{ExchangeNode[REPARTITION]|\"product_id_4\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_26[label="{ExchangeNode[REPLICATE]|\"product_id_4\", \"category\", \"$hashvalue_19\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_27[label="{Project|$hashvalue_20 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"product_id_4\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_28[label="{Filter|\"@$internal$dynamic_filter_function@52QIVVB4N803Q1O0M96II7J09FDMQOP8UFD34ETL5L3P5VPPC1KLGPCO3C0MLRKUG8GEL94DO9K1H12C6UQQKLI51DA5JNIGOPSI0L2UFPSHN4LP4LKF6HSOURCVCBMNAU2GE58I104O51EU0CU2HO1SO300NA65GE476UEID3KR719189ITAQC04IE9U2AFKCSEHQCG7GFT0G68NPQJBH1JIM9Q8GAP33DDNAV3O57FCKTNCFGVLF674BMSKNGKGAPLRFONG447AP93GD8BQV9MM869I6007K9AG4AOG2P90CL14SV9S78UT0V9O08F6H08E1GFQHQLPM316M1FE818GSEAM0G0IQAIAGTKCBHEF2G0B9FF2D5EQDLA15HI0FOSU4QV\"(\"category\", 'EQUAL', 'df_816', false)}", style="rounded, filled", shape=record, fillcolor=yellow]; plannode_29[label="{TableScan[awsdatacatalog:aept_db:supermarket_products]}", style="rounded, filled", shape=record, fillcolor=deepskyblue]; plannode_30[label="{ExchangeNode[REPARTITION]|\"category_name\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_31[label="{ExchangeNode[REPLICATE]|\"category_name\", \"$hashvalue_23\"}", style="rounded, filled", shape=record, fillcolor=gold]; plannode_32[label="{Project|$hashvalue_24 := \"@combine_hash@52QIVVB4BK08A1G0GB62C7RG96DG116UCQKPD4HHQ89CM9K1TID9G8454JD9F7HN31A3BG0VHE1R2Q7E1G9L2J0K2KS802J61LI0I8FEJ2D3NCSPU902JD83DHLUUI1E9D6RAS1HV4SO5GE0428A0MD09CNA43BH043I1ICVBT16KBBQ1ABJ3R3N9I4I40J595K13SLJ70C7FDKAL8KGNVBS6MAJOVI70MHQ1APDLQFIE62A5IGED88MSHRL8I6VRHORAHDA4HTH64O07K94G49OV8CA82787676E1VARE3J7079PC5AHLITMKV0U1ALGCEJ321SEIMPHOSDPAGEFKLGQOQ072SQBSGG====\"(bigint '0', COALESCE(\"@$operator$HASH_CODE@52QIVVB4BS02Q1O0O972S8AGH6DG0I597F4JBIFU286P6SHC31L40A7E6P8KG7FL289CNJ6C15VP61J3490NVACISQHDIP12JDJHJU8M8B876FDBIPTE0MME84T6TKCJRAJL8CCNJF5PA9OS1G78H1DA09MML4PQ20AB0003N4FEHC315MGHBI554L3U8OB4A5BOS5HIQSAB3N7NNIHUEMGK2L65NE4OLP61KTOO9471FJI0FULSI766S6AQUAI67M79TJD3K37DIT2N9Q840G1O2I5O7GN0RAHCN9GPHQF0PT8IUHQCT8HIT9EH43O07K91G4COV8CF8567MD1VBRE3350FQ0BH2CU51LO1C7UNK5031VJJRPJ8CM1HI1EJAOUIO===\"(\"category_name\"), 0))\n}", style="rounded, filled", shape=record, fillcolor=bisque]; plannode_33[label="{TableScan[awsdatacatalog:aept_db:supermarket_categories]}", style="rounded, filled", shape=record, fillcolor=deepskyblue]; } plannode_1 -> plannode_2; plannode_2 -> plannode_3; plannode_3 -> plannode_4; plannode_4 -> plannode_5; plannode_5 -> plannode_6; plannode_6 -> plannode_7; plannode_7 -> plannode_8; plannode_8 -> plannode_9; plannode_9 -> plannode_10; plannode_10 -> plannode_11; plannode_11 -> plannode_12; plannode_12 -> plannode_13; plannode_13 -> plannode_14; plannode_14 -> plannode_15; plannode_15 -> plannode_16; plannode_16 -> plannode_17; plannode_17 -> plannode_18; plannode_18 -> plannode_19; plannode_16 -> plannode_20; plannode_20 -> plannode_21; plannode_21 -> plannode_22; plannode_22 -> plannode_23; plannode_23 -> plannode_24; plannode_14 -> plannode_25; plannode_25 -> plannode_26; plannode_26 -> plannode_27; plannode_27 -> plannode_28; plannode_28 -> plannode_29; plannode_12 -> plannode_30; plannode_30 -> plannode_31; plannode_31 -> plannode_32; plannode_32 -> plannode_33; }

By copying the plan into a text file called ExecutionPlanGraphViz.txt, installing GraphViz locally, removing the “Query Plan” string at the top, and executing the following command:

dot -Tsvg "<absolute_file_path>\ExecutionPlanGraphViz.txt" > output.svg

we get our execution graph: GraphViz Output

Figure 6: Sub-section of a GraphViz-rendered execution plan

Of course, this is just part of the generated image. The hash codes that were provided in the plan produce extremely wide Project nodes, which makes this graph difficult to navigate.

Why might this be of use, then? As we saw earlier, Athena provides its own default execution graphs that are easier to read, but those graphs condense the operations down. For example, operations of ScanProject and ScanFilterProject in the default graphs are separated into TableScan, Filter and Project in GraphViz. GraphViz plans are also color-coded based on their type, perhaps making it easier to identify the different operations quickly. Finally, you may have a need to store generated plans in S3 or elsewhere. By using GraphViz in a Lambda function (e.g. pygraphviz for Python), you could use the Boto3 Athena API to generate a GraphViz plan and render the plan to a file for review later.

FORMAT JSON

A JSON plan, as you would expect, delivers the hierarchy of execution steps in a nested JSON format. The information conveyed seems to be the same as the TEXT plan, but it may be easier to systematically review complex execution plans if copied into an IDE that can hide/collapse nested JSON objects. JSON plans are also more easily parsed than TEXT plans, which may be helpful if you intend to process the metadata of the plan in some way or to render the plan within your own graphing software.

Query:

EXPLAIN (TYPE LOGICAL, FORMAT JSON)
SELECT t.transaction_date, 
       c.category_name,
       ROUND(SUM(t.total_amount), 2) AS sum_total_amount 
FROM supermarket_transactions t
INNER JOIN supermarket_transaction_details td ON t.transaction_id = td.transaction_id
INNER JOIN supermarket_products p ON p.product_id = td.product_id
INNER JOIN supermarket_categories c ON p.category = c.category_name
GROUP BY t.transaction_date,
         c.category_name
ORDER BY t.transaction_date DESC,
         c.category_name;

JSON (Logical) Execution Plan:

Query Plan { "id" : "30", "name" : "Output", "descriptor" : { "columnNames" : "[transaction_date, category_name, sum_total_amount]" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "round", "type" : "decimal(38,2)" } ], "details" : [ "category_name := category", "sum_total_amount := round" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "750", "name" : "RemoteMerge", "descriptor" : { "orderBy" : "[transaction_date DESC NULLS LAST, category ASC NULLS LAST]" }, "outputs" : [ { "symbol" : "round", "type" : "decimal(38,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "transaction_date", "type" : "date" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : "NaN" } ], "children" : [ { "id" : "943", "name" : "LocalMerge", "descriptor" : { "orderBy" : "[transaction_date DESC NULLS LAST, category ASC NULLS LAST]" }, "outputs" : [ { "symbol" : "round", "type" : "decimal(38,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "transaction_date", "type" : "date" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "751", "name" : "PartialSort", "descriptor" : { "orderBy" : "[transaction_date DESC NULLS LAST, category ASC NULLS LAST]" }, "outputs" : [ { "symbol" : "round", "type" : "decimal(38,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "transaction_date", "type" : "date" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "749", "name" : "RemoteExchange", "descriptor" : { "partitionCount" : "", "type" : "REPARTITION", "isReplicateNullsAndAny" : "", "hashColumn" : "[]" }, "outputs" : [ { "symbol" : "round", "type" : "decimal(38,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "transaction_date", "type" : "date" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : "NaN" } ], "children" : [ { "id" : "133", "name" : "Project", "descriptor" : { "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "round", "type" : "decimal(38,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "transaction_date", "type" : "date" } ], "details" : [ "round := round(\"sum\", 2)" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "22", "name" : "Aggregate", "descriptor" : { "type" : "FINAL", "keys" : "[transaction_date, category]", "hash" : "[$hashvalue]" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue", "type" : "bigint" }, { "symbol" : "sum", "type" : "decimal(38,2)" } ], "details" : [ "sum := sum(\"sum_11\")" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : "NaN", "networkCost" : 0.0 } ], "children" : [ { "id" : "997", "name" : "LocalExchange", "descriptor" : { "partitioning" : "HASH", "isReplicateNullsAndAny" : "", "hashColumn" : "[$hashvalue]", "arguments" : "[\"transaction_date\", \"category\"]" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "sum_11", "type" : "varbinary" }, { "symbol" : "$hashvalue", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "1003", "name" : "RemoteExchange", "descriptor" : { "partitionCount" : "", "type" : "REPARTITION", "isReplicateNullsAndAny" : "", "hashColumn" : "[$hashvalue_12]" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "sum_11", "type" : "varbinary" }, { "symbol" : "$hashvalue_12", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : "NaN" } ], "children" : [ { "id" : "1001", "name" : "Aggregate", "descriptor" : { "type" : "PARTIAL", "keys" : "[transaction_date, category]", "hash" : "[$hashvalue_25]" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue_25", "type" : "bigint" }, { "symbol" : "sum_11", "type" : "varbinary" } ], "details" : [ "sum_11 := sum(\"total_amount\")" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : "NaN", "networkCost" : 0.0 } ], "children" : [ { "id" : "1120", "name" : "Project", "descriptor" : { "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue_25", "type" : "bigint" } ], "details" : [ "$hashvalue_25 := combine_hash(combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"transaction_date\"), 0)), COALESCE(\"$operator$hash_code\"(\"category\"), 0))" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "18", "name" : "InnerJoin", "descriptor" : { "criteria" : "(\"category\" = \"category_name\")", "hash" : "[$hashvalue_21, $hashvalue_22]", "distribution" : "REPLICATED" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "category", "type" : "varchar" } ], "details" : [ "Distribution: REPLICATED", "dynamicFilterAssignments = {category_name -> #df_816}" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : "NaN", "networkCost" : 0.0 } ], "children" : [ { "id" : "1118", "name" : "Project", "descriptor" : { "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue_21", "type" : "bigint" } ], "details" : [ "$hashvalue_21 := combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"category\"), 0))" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "12", "name" : "InnerJoin", "descriptor" : { "criteria" : "(\"product_id\" = \"product_id_4\")", "hash" : "[$hashvalue_17, $hashvalue_18]", "distribution" : "REPLICATED" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "category", "type" : "varchar" } ], "details" : [ "Distribution: REPLICATED", "dynamicFilterAssignments = {product_id_4 -> #df_817}" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : "NaN", "networkCost" : 0.0 } ], "children" : [ { "id" : "1116", "name" : "Project", "descriptor" : { "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "product_id", "type" : "varchar" }, { "symbol" : "$hashvalue_17", "type" : "bigint" } ], "details" : [ "$hashvalue_17 := combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"product_id\"), 0))" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "6", "name" : "InnerJoin", "descriptor" : { "criteria" : "(\"transaction_id\" = \"transaction_id_0\")", "hash" : "[$hashvalue_13, $hashvalue_14]", "distribution" : "REPLICATED" }, "outputs" : [ { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "product_id", "type" : "varchar" } ], "details" : [ "Distribution: REPLICATED", "dynamicFilterAssignments = {transaction_id_0 -> #df_818}" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : "NaN", "networkCost" : 0.0 } ], "children" : [ { "id" : "1114", "name" : "ScanFilterProject", "descriptor" : { "table" : "awsdatacatalog:aept_db:supermarket_transactions", "filterPredicate" : "", "dynamicFilters" : "{\"transaction_id\" = #df_818}", "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "transaction_id", "type" : "varchar" }, { "symbol" : "transaction_date", "type" : "date" }, { "symbol" : "total_amount", "type" : "decimal(10,2)" }, { "symbol" : "$hashvalue_13", "type" : "bigint" } ], "details" : [ "$hashvalue_13 := combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"transaction_id\"), 0))", "transaction_id := transaction_id:string:REGULAR", "transaction_date := transaction_date:date:REGULAR", "total_amount := total_amount:decimal(10,2):REGULAR" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ ] }, { "id" : "939", "name" : "LocalExchange", "descriptor" : { "partitioning" : "HASH", "isReplicateNullsAndAny" : "", "hashColumn" : "[$hashvalue_14]", "arguments" : "[\"transaction_id_0\"]" }, "outputs" : [ { "symbol" : "transaction_id_0", "type" : "varchar" }, { "symbol" : "product_id", "type" : "varchar" }, { "symbol" : "$hashvalue_14", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "745", "name" : "RemoteExchange", "descriptor" : { "partitionCount" : "", "type" : "REPLICATE", "isReplicateNullsAndAny" : "", "hashColumn" : "[]" }, "outputs" : [ { "symbol" : "transaction_id_0", "type" : "varchar" }, { "symbol" : "product_id", "type" : "varchar" }, { "symbol" : "$hashvalue_15", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : "NaN" } ], "children" : [ { "id" : "1115", "name" : "ScanFilterProject", "descriptor" : { "table" : "awsdatacatalog:aept_db:supermarket_transaction_details", "filterPredicate" : "", "dynamicFilters" : "{\"product_id\" = #df_817}", "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "transaction_id_0", "type" : "varchar" }, { "symbol" : "product_id", "type" : "varchar" }, { "symbol" : "$hashvalue_16", "type" : "bigint" } ], "details" : [ "$hashvalue_16 := combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"transaction_id_0\"), 0))", "product_id := product_id:string:REGULAR", "transaction_id_0 := transaction_id:string:REGULAR" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ ] } ] } ] } ] } ] }, { "id" : "940", "name" : "LocalExchange", "descriptor" : { "partitioning" : "HASH", "isReplicateNullsAndAny" : "", "hashColumn" : "[$hashvalue_18]", "arguments" : "[\"product_id_4\"]" }, "outputs" : [ { "symbol" : "product_id_4", "type" : "varchar" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue_18", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "746", "name" : "RemoteExchange", "descriptor" : { "partitionCount" : "", "type" : "REPLICATE", "isReplicateNullsAndAny" : "", "hashColumn" : "[]" }, "outputs" : [ { "symbol" : "product_id_4", "type" : "varchar" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue_19", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : "NaN" } ], "children" : [ { "id" : "1117", "name" : "ScanFilterProject", "descriptor" : { "table" : "awsdatacatalog:aept_db:supermarket_products", "filterPredicate" : "", "dynamicFilters" : "{\"category\" = #df_816}", "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "product_id_4", "type" : "varchar" }, { "symbol" : "category", "type" : "varchar" }, { "symbol" : "$hashvalue_20", "type" : "bigint" } ], "details" : [ "$hashvalue_20 := combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"product_id_4\"), 0))", "product_id_4 := product_id:string:REGULAR", "category := category:string:REGULAR" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ ] } ] } ] } ] } ] }, { "id" : "941", "name" : "LocalExchange", "descriptor" : { "partitioning" : "HASH", "isReplicateNullsAndAny" : "", "hashColumn" : "[$hashvalue_22]", "arguments" : "[\"category_name\"]" }, "outputs" : [ { "symbol" : "category_name", "type" : "varchar" }, { "symbol" : "$hashvalue_22", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ { "id" : "747", "name" : "RemoteExchange", "descriptor" : { "partitionCount" : "", "type" : "REPLICATE", "isReplicateNullsAndAny" : "", "hashColumn" : "[]" }, "outputs" : [ { "symbol" : "category_name", "type" : "varchar" }, { "symbol" : "$hashvalue_23", "type" : "bigint" } ], "details" : [ ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : 0.0, "memoryCost" : 0.0, "networkCost" : "NaN" } ], "children" : [ { "id" : "1119", "name" : "ScanProject", "descriptor" : { "table" : "awsdatacatalog:aept_db:supermarket_categories", "projectLocality" : "LOCAL", "protectedBarrier" : "NONE" }, "outputs" : [ { "symbol" : "category_name", "type" : "varchar" }, { "symbol" : "$hashvalue_24", "type" : "bigint" } ], "details" : [ "$hashvalue_24 := combine_hash(bigint '0', COALESCE(\"$operator$hash_code\"(\"category_name\"), 0))", "category_name := category_name:string:REGULAR" ], "estimates" : [ { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 }, { "outputRowCount" : "NaN", "outputSizeInBytes" : "NaN", "cpuCost" : "NaN", "memoryCost" : 0.0, "networkCost" : 0.0 } ], "children" : [ ] } ] } ] } ] } ] } ] } ] } ] } ] } ] } ] } ] } ] } ] } ] }

EXPLAIN ANALYZE

Trino/Amazon Athena[2] also provides an EXPLAIN ANALYZE clause, given by the syntax:

EXPLAIN ANALYZE [ (option [, ...]) ] statement

where option is one of:

FORMAT { TEXT | JSON }

This executes the query and then produces an enhanced distributed plan with runtime statistics of each stage. The advantage of executing EXPLAIN ANALYZE is that you can more easily identify performance bottlenecks and tune more complex queries using accurate metrics captured during an execution. The disadvantage, of course, is that your query may take a long time to run, and thus may be a time-consuming endeavor.

Explain Analyze Difference

Figure 7: Differences between an execution plan generated by EXPLAIN and EXPLAIN ANALYZE

Note the difference between the plan generated by a normal EXPLAIN and an EXPLAIN ANALYZE. The latter contains information about the time taken in each part of the query during execution. We will go through all of the statistics generated in a future section.

Conclusion

In this post we set up tables in Amazon Athena, explored the different types of execution plan for the Athena query engine and the formats in which these plans can be displayed. In the next post, we will dive into the details of the Logical and Distributed plans, analysing each part step by step.

References

[1] - Trino Software Foundation. Trino Documentation - Explain. https://trino.io/docs/current/sql/explain.html
[2] - AWS documentation. Using EXPLAIN and EXPLAIN ANALYZE in Athena. https://docs.aws.amazon.com/athena/latest/ug/athena-explain-statement.html
[3] - Apache DataFusion - Building Logical Plans. https://datafusion.apache.org/library-user-guide/building-logical-plans.html
[4] - AWS documentation. Use the cost-based optimizer https://docs.aws.amazon.com/athena/latest/ug/cost-based-optimizer.html
[5] - Traverso, M., Fuller, M., Moser, M (2023) Trino: The Definitive Guide. 2nd Edition. Publisher: O’Reilly Media, Inc. https://www.starburst.io/info/oreilly-trino-guide/
[6] - AWS documentation. Use parameterized queries https://docs.aws.amazon.com/athena/latest/ug/querying-with-prepared-statements.html
[7] - TrinoDB/Trino Github https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/cost/PlanCostEstimate.java
[8] - GraphViz - Open-source Graph Visualization software. Graphviz. https://graphviz.org/