The cloud-native virtual data warehouse PieCloudDB Database, with its storage-compute separation architecture, has developed a new eMPP (elastic MPP) architecture, offering high elasticity, high security, and high availability, empowering the digital transformation of enterprises. PieCloudDB has developed components such as a vectorized execution engine, a new storage engine JANM, and a metadata management system MUNDO, further enhancing query efficiency through features like agg pushdown and data skipping. This article will introduce how PieCloudDB has created parallel computing acceleration technology to improve query performance.
During the deployment of a cluster, PieCloudDB configures a certain number of executors on each physical node. This configuration strategy ensures that the cluster remains highly available even when facing high concurrency demands. However, the execution of a single query may be limited by the number of executors, which can impact the full utilization of CPU, the performance of disk I/O, and the efficiency of network bandwidth utilization.
To address this performance bottleneck and further improve the utilization of CPU, disk, and network under a single query, PieCloudDB has developed an executor-based parallel computing acceleration feature. This technology not only complements the parallel worker mechanism of PostgreSQL but also extends its capabilities, supporting parallel scanning, parallel joining, and parallel aggregation. This article will focus on how to optimize the performance of parallel scanning through the cache prefetching technology of the PieCloudDB metadata management system MUNDO, optimizing the unified storage architecture of the storage engine JANM, and significantly improving the utilization of disk I/O and network bandwidth.
As a distributed database, each computing cluster of PieCloudDB consists of one coordinator and multiple executors. The overall architecture is shown in the figure below:
PieCloudDB Computing Cluster Architecture
For a normal query, each executor node will initiate one (without parallel computing) or multiple (with parallel computing) scanning processes to scan the required files. At the same time, the metadata management system MUNDO in the coordinator node will send the file information to be scanned to the MUNDO service process on each physical node. Then, the scanning process will obtain specific file information needed through the MUNDO service process and subsequently, the scanning process will scan and process the required data files through the storage engine JANM.
The following will provide a detailed introduction to the Theory of parallel computing acceleration in PieCloudDB. The parallel computing acceleration process of PieCloudDB can be divided into three steps according to the execution process:
Since PieCloudDB adopts a metadata-compute-storage separated architecture, user data is uniformly stored in external object storage (S3, HDFS, etc.) through transparent encryption. When a query is initiated, the metadata management system MUNDO scans the query plan, intelligently identifies the required file list from the storage engine JANM, analyzes the characteristics of the file list, processes the file list efficiently in batches, and pushes the first batch of files to the MUNDO service process on the physical nodes.
After receiving the first batch of file lists, the MUNDO service process immediately enters the pre-fetching mode. The number of prefetches is calculated based on the prefetch requirements of each scanning process on the physical node, as well as the size of cache and data files. Once the prefetch list is obtained, the MUNDO service initiates a download request to the storage engine JANM, and at the same time, the parallel scan process also initiates a file acquisition request to the MUNDO service. As this is the first launch of the scanning process, the MUNDO service allocates files based on the prefetch list. Subsequently, the scanning process reads the files, obtains user data, and enters the query phase.
After the current file scanning is completed, the scanning process sends a request to the MUNDO service to obtain the next file. After receiving the request, the MUNDO service sends files with higher affinity for scanning based on the characteristics of the files previously scanned by the process. Since PieCloudDB is the PULL Model based on the Volcano Model, data needs to be migrated among different operators. The purpose of this strategy is to reduce the migration of data between different operators (operator processes in other computing node processes).
As the degree of parallelism increases, for example, in parallel join and parallel aggregate operations, the data migration rate will correspondingly increase, which may limit the degree of parallelism and even lead to a decrease in performance. Therefore, by assigning data files with higher affinity to the scanning process, PieCloudDB can effectively reduce data migration and optimize performance.
Similarly, during the prefetching process, when the MUNDO service process pulls remote data files through the unified storage JANM, it will also consider sending the affinity characteristics of the files to JANM to further optimize data access.
For parallel computing, PieCloudDB uses the GUC parameter pdb_parallel_factor to control the degree of concurrency. Currently, due to technical limitations of the Dutch optimizer, this concurrency must be configured in multiples of the number of executors. In the future, PieCloudDB will support more flexible and general concurrency configuration methods that do not depend on the Dutch optimizer, to further enhance the system's performance and flexibility.
After enabling concurrency control, PieCloudDB has made significant leaps in single query performance. Within the range of 32 times concurrency, PieCloudDB has achieved nearly linear performance growth. As shown in the figure below, the performance improvement of PieCloudDB when enabling 6 times concurrency in the TPC-H benchmark test fully proves PieCloudDB's powerful processing capability in high-concurrency environments.
PieCloudDB Achieves Linear Performance Improvement
In the future, we will continue to optimize the parallel computing capabilities of PieCloudDB and introduce more parallel technologies to ensure that PieCloudDB demonstrates excellent performance in various complex scenarios, whether in data-intensive query operations or in real-time applications requiring high concurrency, providing users with a more reliable and efficient data computing experience.