SQL, Structured Query Language. Because it is easy to learn, secure, flexible, and highly scalable, SQL has now become the standard relational database management language.
When connected to the database, write a SQL query and click "Execute".
SELECT name, company FROM product WHERE id=12345;
You will get the result:
name | company
------------+---------
PieCloudDB | OpenPie
(1 row)
Are you curious about what happens in the database from when you click "Execute" to when you get the result?
First of all, the tool used to connect to the database and write SQL is the client. When we finish writing the SQL and click "Execute", the SQL query is transmitted from the client to the server. After the database server receives the SQL query, it begins its work.
The execution of SQL queries generally goes through three steps: parsing, optimization, and execution, and the functional modules in the database that perform these three operations are called: parser, optimizer, and executor. Responsible for lexical and syntactic analysis of SQL queries, query optimization and execution respectively.
First, the parser identifies the semantic meaning of the query string, such as in the following SQL:
SELECT name, company FROM product WHERE id=12345;
"SELECT", "FROM", "WHERE" will be recognized as keywords, "product" will be recognized as table name, "name" and "company" will be recognized as attribute names, and "id=12345" will be recognized as filter conditions.
Next, the parser will query the system table (catalog) of the database to determine whether the table name and attribute name exist. If the lexical syntax is correct, the parser generates an abstract syntax tree (AST) of SQL. The following is an overview of the AST tree corresponding to the above SQL example.
AST tree of SQL
The parser's task is completed, and now it's the optimizer's turn. A SQL query can be executed in many ways, and the optimizer will generate the optimal execution plan based on the SQL parse tree (AST). The optimizer will optimize from two aspects: computational complexity and IO. The optimization process is generally divided into two stages: logical optimization and physical optimization.
Logical optimization is to rewrite SQL queries into equivalent forms that are easier to execute, such as rewriting of filter conditions, subquery optimization, predicate pushdown, etc. While physical optimization is different, at this stage, the SQL execution path will be selected based on the distribution characteristics of the data in the table. The database regularly collects statistical information about the data in the table. Optimization methods include how to use indexes, how to determine the connection order when connecting multiple tables, etc.
OpenPie's cloud-native virtual data warehouse, PieCloudDB Database, has developed an optimizer called "Dutch" that focuses on optimizing complex query scenarios to meet users' performance needs for querying data in the cloud. Dutch incorporates numerous optimizations and improvements, including multi-stage aggregation, aggregate pushdown, and precomputation, delivering advanced levels of optimization.
Once the execution plan is generated, the next component in line is the executor. The executor carries out various operations based on the execution plan, such as data scanning and joining. It also collaborates with the storage layer, like PieCloudDB's storage engine 「JANM」, to handle a range of operations including transactions, locks, and caching.
Executing Operations According to The Plan
In addition, PieCloudDB's new generation vectorized execution engine will also have the ability to execute query plans in parallel and use instructions such as SIMD for acceleration. Currently, PieCloudDB vectorized executor has demonstrated impressive performance improvements on TPC-H and TPC-DS, two widely adopted decisions support benchmark in the industry.
PieCloudDB Vectorized Execution Engine
When the execution plan is completed, the server will return the SQL execution results to the client and display the execution results to us.