300x Performance Improvement! PieCloudDB Database Optimizer "Dutch" New Feature "Agg Pushdown"

As the pace of digitization accelerates, global data volumes are experiencing exponential growth, posing significant challenges to database systems' performance. The optimizer, as a critical technology within database management systems, has a profound impact on database performance and efficiency. It achieves this by parsing and optimizing user query requests to generate efficient execution plans, thus rapidly returning query results. However, the same SQL statement, under different optimization decisions, can result in orders of magnitude difference in performance. With increasing data scale and query complexity, the ongoing development and innovation of the optimizer will be a vital direction for continually improving database system performance and a key competitive advantage.


For cloud-native and distributed scenarios, PieCloudDB Database, a cloud-native virtual data warehouse, has designed and built a new-generation optimizer named "Dutch." This name draws inspiration from an NPC character in the video game "Red Dead Redemption" whose catchphrase "I have a plan" aligns well with the optimizer's functionality.


The optimizer "Dutch" in PieCloudDB has implemented numerous optimization features that act as the "think tank" of the database system, ensuring user query performance and efficiency. "Dutch" achieves faster and more efficient query processing by intelligently analyzing query statements, optimizing query plans, and leveraging distributed data storage and computing capabilities.


The mascot of OpenPie: "Pai Pai"


Aggregation Operations and Aggregation Pushdown


Today, we will delve into the new feature of "Dutch" in detail: Aggregation Pushdown. In modern database systems, aggregation operations are very common. Through aggregation operations, multiple rows of data can be combined into one and various calculations like statistics, sum, and average can be performed.


In traditional query execution, aggregation operations typically occur after the query results are returned, requiring all data to be transferred to the query engine for aggregation calculations. However, for large-scale datasets and complex queries, this approach can lead to issues like poor query performance, high computational load, and expensive processing costs.


To address these challenges, PieCloudDB's optimizer "Dutch" has introduced the feature of Aggregation Pushdown, which, in many scenarios, has been tested to achieve performance improvements of hundreds or even thousands of times.


Principles of Aggregation Pushdown


Aggregation Pushdown is a database query optimization technique that reduces data transfer and processing overhead by pushing aggregation operations down to the data source, thereby enhancing query performance and efficiency.


The implementation principle behind Aggregation Pushdown is to perform aggregation operations as early as possible during query execution and push them down to the data source for processing. Specifically, when the query plan is generated, the optimizer pushes aggregation operations down to the data source to perform aggregation calculations there. This reduces data transfer, alleviates the burden on the query engine, and leverages the data source's computing capabilities for local aggregation.


Next, we will use a simple diagram to illustrate the main implementation principles of Aggregation Pushdown:


In the diagram above, the left side represents the execution flow without Aggregation Pushdown, while the right side depicts the execution flow with Aggregation Pushdown. In the scenario without Aggregation Pushdown (left side), tables T1 and T2 are first joined, and the aggregation operation is performed on the resulting dataset.


In contrast, in the scenario with Aggregation Pushdown (right side), an aggregation operation is performed on table T1 before joining it with table T2. This can significantly reduce the volume of data involved in table T1 during the join operation in certain scenarios, resulting in a significant improvement in query performance. To ensure accurate query optimization, both of these execution plans are generated in "Dutch" and the one with the lower estimated cost is selected based on a cost model.


When more tables are involved in the join, "Dutch" attempts to push the aggregation operation to different join levels and selects the optimal pushdown location based on cost comparisons.


Aggregation Pushdown Demonstration Example


Now let's look at the effect of Aggregation Pushdown through a query example. First, create a test table using the following statement:


CREATE TABLE t (x int, y int);
INSERT INTO t SELECT i % 30, i % 30 FROM generate_series(1, 10240) i;

The query used for testing is as follows:


SELECT t1.x, sum(t2.y + t3.y), count(*) FROM t t1
JOIN t t2 ON t1.x = t2.x JOIN t t3 ON t2.x = t3.x
GROUP BY t1.x;

When there is no aggregation pushdown, the query plan is as follows:

SELECT t1.x, sum(t2.y + t3.y), count(*) FROM t t1 JOIN t t2 ON t1.x = t2.x JOIN t t3 ON t2.x = t3.x GROUP BY t1.x; 

                                   QUERY PLAN 


Gather Motion 3:1  (slice1; segments: 3) (actual time=153884.859..274102.066 rows=30 loops=1) 
   ->  HashAggregate (actual time=274100.004..274100.011 rows=12 loops=1) 
         Group Key: t1.x 
         Peak Memory Usage: 0 kB 
         ->  Hash Join (actual time=38.717..100579.782 rows=477571187 loops=1) 
               Hash Cond: (t1.x = t3.x) 
               Extra Text: (seg0)   Hash chain length 341.4 avg, 342 max, using 12 of 131072 buckets. 
               ->  Hash Join (actual time=2.088..429.203 rows=1398787 loops=1) 
                     Hash Cond: (t1.x = t2.x) 
                     Extra Text: (seg0)   Hash chain length 341.4 avg, 342 max, using 12 of 131072 buckets. 
                     ->  Redistribute Motion 3:3  (slice2; segments: 3) (actual time=0.044..4.590 rows=4097 loops=1) 
                           Hash Key: t1.x 
                           ->  Seq Scan on t t1 (actual time=1.382..32.683 rows=3496 loops=1) 
                     ->  Hash (actual time=1.760..1.761 rows=4097 loops=1) 
                           Buckets: 131072  Batches: 1  Memory Usage: 1185kB 
                           ->  Redistribute Motion 3:3  (slice3; segments: 3) (actual time=0.049..0.922 rows=4097 loops=1) 
                                 Hash Key: t2.x 
                                 ->  Seq Scan on t t2 (actual time=1.628..32.837 rows=3496 loops=1) 
               ->  Hash (actual time=36.153..36.153 rows=4097 loops=1) 
                     Buckets: 131072  Batches: 1  Memory Usage: 1185kB 
                     ->  Redistribute Motion 3:3  (slice4; segments: 3) (actual time=3.918..35.169 rows=4097 loops=1) 
                           Hash Key: t3.x 
                           ->  Seq Scan on t t3 (actual time=1.380..30.316 rows=3496 loops=1) 

Planning Time: 8.810 ms 
   (slice0) Executor memory: 257K bytes. 
   (slice1) Executor memory: 2484K bytes avg x 3 workers, 2570K bytes max (seg0).  Work_mem: 1185K bytes max. 
   (slice2) Executor memory: 32840K bytes avg x 3 workers, 32841K bytes max (seg0). 
   (slice3) Executor memory: 32860K bytes avg x 3 workers, 32860K bytes max (seg0). 
   (slice4) Executor memory: 32860K bytes avg x 3 workers, 32860K bytes max (seg0). 

Memory used:  128000kB 
Optimizer: Postgres query optimizer 
Execution Time: 274130.589 ms
(32 rows) 

And when there is aggregation pushdown, the query plan is as follows:

SELECT t1.x, sum(t2.y + t3.y), count(*) FROM t t1 JOIN t t2 ON t1.x = t2.x JOIN t t3 ON t2.x = t3.x GROUP BY t1.x; 

                                   QUERY PLAN 


Gather Motion 3:1  (slice1; segments: 3) (actual time=835.755..836.406 rows=30 loops=1) 
   ->  Finalize GroupAggregate (actual time=834.227..835.432 rows=12 loops=1) 
         Group Key: t1.x 
         ->  Sort (actual time=834.031..834.441 rows=4097 loops=1) 
               Sort Key: t1.x 
               Sort Method:  quicksort  Memory: 1266kB 
               ->  Redistribute Motion 3:3  (slice2; segments: 3) (actual time=812.139..830.706 rows=4097 loops=1) 
                     Hash Key: t1.x 
                     ->  Hash Join (actual time=810.536..828.097 rows=3496 loops=1) 
                           Hash Cond: (t1.x = t2.x) 
                           Extra Text: (seg0)   Hash chain length 1.0 avg, 1 max, using 30 of 131072 buckets. 
                           ->  Seq Scan on t t1 (actual time=1.689..16.674 rows=3496 loops=1) 
                           ->  Hash (actual time=808.497..808.498 rows=30 loops=1) 
                                 Buckets: 131072  Batches: 1  Memory Usage: 1026kB 
                                 ->  Broadcast Motion 3:3  (slice3; segments: 3) (actual time=461.065..808.466 rows=30 loops=1) 
                                       ->  Partial HashAggregate (actual time=810.026..810.033 rows=12 loops=1) 
                                             Group Key: t2.x 
                                             Peak Memory Usage: 0 kB 
                                             ->  Hash Join (actual time=28.070..331.181 rows=1398787 loops=1) 
                                                   Hash Cond: (t2.x = t3.x) 
                                                   Extra Text: (seg0)   Hash chain length 341.4 avg, 342 max, using 12 of 262144 buckets. 
                                                   ->  Redistribute Motion 3:3  (slice4; segments: 3) (actual time=0.040..1.270 rows=4097 loops=1) 
                                                         Hash Key: t2.x 
                                                         ->  Seq Scan on t t2 (actual time=1.449..19.963 rows=3496 loops=1) 
                                                   ->  Hash (actual time=27.834..27.835 rows=4097 loops=1) 
                                                         Buckets: 262144  Batches: 1  Memory Usage: 2209kB 
                                                         ->  Redistribute Motion 3:3  (slice5; segments: 3) (actual time=3.836..27.025 rows=4097 loops=1) 
                                                               Hash Key: t3.x 
                                                               ->  Seq Scan on t t3 (actual time=1.537..23.654 rows=3496 loops=1) 

Planning Time: 14.425 ms 
   (slice0)    Executor memory: 328K bytes. 
   (slice1)    Executor memory: 408K bytes avg x 3 workers, 514K bytes max (seg0).  Work_mem: 450K bytes max. 
   (slice2)    Executor memory: 33951K bytes avg x 3 workers, 33952K bytes max (seg0).  Work_mem: 1026K bytes max. 
   (slice3)    Executor memory: 2298K bytes avg x 3 workers, 2341K bytes max (seg0).  Work_mem: 2209K bytes max. 
   (slice4)    Executor memory: 32860K bytes avg x 3 workers, 32860K bytes max (seg0). 
   (slice5)    Executor memory: 32860K bytes avg x 3 workers, 32860K bytes max (seg0). 

Memory used:  128000kB 
Optimizer: Postgres query optimizer 
Execution Time: 865.305 ms 
(39 rows) 

By comparing the two query plans, it can be seen that without aggregation pushdown (Query Plan 1), all join operations need to be completed first, followed by the aggregation operation on the joined dataset. In contrast, with aggregation pushdown (Query Plan 2), the aggregation operation is pushed down to the optimal join layer, reducing the amount of data for subsequent join operations, which significantly improves performance.


In this example, "Dutch" pushed the aggregation operation down after the t2 and t3 joins but before the t1 join. By comparing the execution times, we can see that the execution time has decreased from 274,130.589 ms to 865.305 ms, achieving a performance improvement of over three hundred times. As the data volume increases or more tables are involved in the query, the performance improvement from aggregation pushdown becomes even more pronounced.


PieCloudDB's new optimizer "Dutch" provides users with a comprehensive set of logical optimization features, including predicate pushdown, subquery and subjoin promotion, and outer join elimination, among others. As a database designed for online analytical processing (OLAP) scenarios, the optimizer "Dutch" not only supports aggregation pushdown but also comes with advanced features like Data Skipping and pre-computation to meet the demands of various complex queries.

Related Blogs:
no related blog