On February 17th, PostgreConf.CN 2022, jointly organized by the China OSS Promotion Union PostgreSQL Branch, ISCAS and CSDN, was held grandly. OpenPie, as a rising star in the field of cloud computing, was invited to attend this conference. PieCloudDB Software Engineer Haozhou Wang delivered topic during this event.
The main contents of this speech include:
In recent years, there has been an abundance of policies and hot news regarding data security. Governments, enterprises, and various vendors have placed great emphasis on data security. Past cases have shown that the leakage or loss of sensitive data can lead to significant losses for both businesses and users. For cloud users, the security of their cloud data is one of their top concerns.
As a cloud-native virtual data warehouse, PieCloudDB has separates storage and computing, and implemented security features in three major areas, ensuring end-to-end data security. These areas include:
For cloud-native security, PieCloudDB implements transport layer encryption and cache data encryption. In PieCloudDB, data is transferred between different nodes, making data encryption during the transmission process crucial. When executing user queries, PieCloudDB generates corresponding cache data to accelerate the generation of query results. The cache data contains a portion of user data, and encrypting the cache data ensures better data security for users.
Regarding storage security, metadata serves as the "heart" of the database, and any loss or damage to it would render the database unusable. Therefore, PieCloudDB implements persistent storage for metadata. User data in PieCloudDB is stored encrypted with multiple copies to ensure thorough data security.
For compute security, PieCloudDB adopts an eMPP storage and computing separation architecture and supports ACID (Atomicity, Consistency, Isolation, Durability). In the event of a single node or multiple node failures in the cluster, it does not result in data loss or damage to user data.
The following content will focus on how PieCloudDB achieves cache data encryption at the cloud-native security layer as well as user data encryption storage at the storage security layer through the TDE module.
The TDE module of PieCloudDB enables data to transition from traditional plaintext storage to encrypted storage, effectively preventing data from being directly read by system administrators, thus ensuring the security of data stored in data files. TDE refers to the automatic generation, management, and encryption/decryption processes of keys being handled by the database management system without the user's awareness. In other words, the encryption and decryption processes of TDE are transparent to the user, making data reading and writing very convenient. Data is automatically encrypted during writing and automatically decrypted during reading.
Furthermore, PieCloudDB's TDE module is not dependent on public cloud/private cloud/system-provided encryption, truly achieving self-controllability. It also meets user compliance requirements such as data security auditing and business security auditing.
During the design and implementation process of PieCloudDB's TDE module, the development team fully considered and summarized user requirements and technical challenges, resulting in the following key features being implemented:
The TDE module adopts the widely used multi-level key mechanism and an effective key management system to prevent the risk of data leakage and ensure the security of user data. Each tenant's data is completely isolated and has its own independent key system.
Implementation of Key Management
In PieCloudDB, the master key is generated, stored, and controlled by the user within their trusted domain. The TDE module does not attempt to access the master key, ensuring its security.
Furthermore, PieCloudDB operates at the granularity of data pages, where each data page is encrypted with a single key. Encryption keys support rotation, allowing for the rotation of keys based on a certain time period or specific conditions while maximizing data security. During key rotation, service interruptions are avoided, and there is no need for downtime.
PieCloudDB employs a three-level key mechanism, enabling the rotation of higher-level keys without the need to re-encrypt or decrypt the data. Only the encryption and decryption keys are involved in the rotation process. Additionally, the TDE module supports key rotation at the page level or table level, avoiding the need to bring down the entire user cluster for key rotation and minimizing the impact on user operations.
Key storage is a crucial part of the TDE module. If the keys are lost, the data cannot be decrypted and becomes inaccessible. All subkeys in the TDE module are persistently stored to ensure their security. The underlying page-level keys coexist with the data and are backed up in multiple copies to prevent irretrievable loss due to the loss of page-level keys. Moreover, as page-level keys can be large, their coexistence with the data eliminates the need for frequent access to persistent storage during key retrieval, reducing query latency.
PieCloudDB Multi-Level Key
During the process of the multi-level key mechanism, users will create the master key within their trusted domain and store it securely. The master key is always kept within the user's trusted domain. When a new tenant is created, the TDE module in PieCloudDB generates a new tenant key, encrypts it with the master key, and stores the encrypted key in the persistent storage area of PieCloudDB.
When a user creates a table, the TDE module in PieCloudDB automatically generates a table key, encrypts it with the tenant key, and stores the encrypted key in the persistent storage area of PieCloudDB. When data is written and a new data page needs to be created, the TDE module generates a new page key, encrypts it with the table key, and stores the encrypted page key together with the data in the data storage area. Storing the page key in the data storage area instead of the persistent storage area helps reduce the access density of the persistent storage area, lower its load, minimize user query latency, and improve performance. When a page is deleted, its corresponding key is also deleted.
Modular Implementation of TDE
In PieCloudDB, TDE is implemented in a modular manner. The optimizer and executor components are not aware of the existence of the TDE module. When continuously developing and optimizing data core modules such as the optimizer and executor, there is no need to consider adapting them to the TDE module. During upgrades, there is also no coupling between the two, aligning with the characteristics of cloud-native decoupling.
Furthermore, the TDE module seamlessly supports PieCloudDB's storage engine, "JANM" maximizing storage performance and minimizing performance loss caused by encryption. Based on test data, the performance impact of TDE on user queries is negligible and can be disregarded.
According to user requirements, PieCloudDB implements a pluggable encryption algorithm library that supports different encryption algorithms depending on the hardware, leveraging hardware acceleration. PieCloudDB offers various types of high-strength encryption algorithms, including DES, AES, RC4, and others. Additionally, support for domestic cryptographic standards has been added to meet auditing processes and compliance requirements. Moreover, to achieve true autonomy and controllability, the encryption algorithm is optional.
PieCloudDB's TDE module seamlessly integrates with users' existing business processes. After enabling TDE, there is no impact on users' original queries or operations, and no changes need to be made to their existing business processes. Peripheral components also do not require specific adaptation to the TDE module. For example, when using an ETL (Extract, Transform, Load) tool for data import, the data is automatically encrypted. Similarly, when exporting data to an ETL tool, the data is automatically decrypted. There will be no issues where ETL reads encrypted data, leading to task failures, and the ETL tool is also unaware of the presence of the TDE module.
Implementation of TDE
First, let's understand the architecture of the TDE component in PieCloudDB. When a user initiates a query request, the PieCloudDB optimizer generates an execution plan tree, and the executor executes the query based on this plan tree. When data needs to be accessed, the executor accesses the storage interface provided by PieCloudDB's storage engine, "JANM" to perform operations such as querying and inserting data. PieCloudDB adds a TDE component between the storage interface and data access. This component is positioned below the storage interface, ensuring that the optimizer and executor are unaware of the module.
The TDE component in PieCloudDB consists of three modules: the encryption module, key management module, and function interface. To achieve pluggable encryption algorithm libraries, the encryption module is independent of the key management module. This allows PieCloudDB to provide different interfaces for different TDE algorithm libraries while ensuring the consistency of the internal encryption processes. This design reduces the likelihood of bugs and ensures a smoother development process, leading to a more unified user experience.
TDE Component Architecture
The generation of tenant keys involves several steps. When a tenant creation request is made, the PieCloudDB TDE module uses a strong random algorithm to generate a new master key, which is then stored in the user's trust domain. The master key is used to encrypt it and generate the tenant key. The entire encryption process is performed within the user's trust domain. PieCloudDB does not attempt to access the master key. Once the tenant key is encrypted within the user's trust domain, it is returned and stored in the persistent storage area.
Generation of Tenant Keys
When the tenant key is required, the PieCloudDB TDE module reads the encrypted key from the persistent storage area and transfers it to the user's trust domain. Using the master key, the encrypted key is decrypted, and the decrypted key is stored in the key storage area. Additionally, the PieCloudDB TDE module sets a timer at this stage. After a certain period of time, the key is automatically destroyed to avoid holding the decryption key for an extended period, thereby reducing the risk of key leakage.
Retrieval of Tenant Keys
The generation process of secondary keys, including table keys and page keys, is similar to that of tenant keys. When a user initiates a query request, the TDE module of PieCloudDB uses a strong encryption algorithm to randomly generate a key. This key is then encrypted using the higher-level key and stored in the corresponding storage area.
Generation of Tenant Subkeys
When it comes to retrieving the secondary keys, such as table keys and page keys, the process is similar to that of retrieving the tenant keys. The PieCloudDB TDE module reads the encrypted keys from the corresponding storage area and decrypts them using the higher-level key.
However, the handling of page keys is slightly different. The decrypted page keys are stored in memory rather than the storage area to avoid adding burden to the storage. A timer is set to ensure that the keys are completely destroyed from memory after a certain period. During this process, the TDE module also adds random values to the keys to prevent them from being read from memory, reducing the risk of leakage. No keys are stored on disk or in the buffer area to prevent key leakage due to uncontrollable factors.
Retrieval of Tenant Subkeys
With security features such as TDE, PieCloudDB has achieved file-level security protection, effectively preventing data leakage and creating a more reliable data security fortress for users.