With the development of information technology and the accelerated proliferation of internet applications, humanity has entered the digital economy era. Since the beginning of the 21st century, with the advancements in technologies such as mobile internet, Internet of Things (IoT), and 5G, the global datasphere has been exponentially expanding. IDC predicts that global data will reach 175 zettabytes by 2025, with China expected to experience explosive growth and become the world's largest data market. Data is often referred to as the "oil" of the digital economy, as it drives the progress of the intelligent and digital era, similar to how oil propelled the industrialization age.
To unlock the value of data, businesses face the challenge of storing and analyzing massive amounts of data, as well as dealing with hotspots and sudden surges in data traffic. In the face of the increasing demand for data computing, rising operational costs of data organization, and the diverse formats of data, enterprises undergoing digital transformation are confronted with significant challenges. They urgently need a database product to help maximize the utilization of data assets, reduce costs, and enable more intelligent and efficient data computing.
As early as 2019, Gartner predicted that the future of the database market lies in the cloud. In the "Market Guide for DBMS, China" released in 2022, Gartner once again pointed out that the Chinese database industry will experience accelerated growth and gradually shift towards the cloud. Over the next four years, the pace of database migration to public clouds in China will surpass the global average. In 2020, cloud databases accounted for 40% of the overall database market share, and by 2022, cloud database revenues are expected to exceed half of the total database market.
OpenPie believes that computing technology has gone through three generations of platforms: 1) the era of mainframes, 2) the era of PCs, and 3) the era of cloud computing. Each transition in computing platforms has brought the potential for breakthrough innovations in data computing technology. As computing technology shifted from the mainframe era to the PC era, PCs gradually replaced mainframes, significantly reducing the threshold for computing and enriching computing resources, leading to breakthrough innovations in data computing. With the advent of the cloud computing era, not only has the cost of computing greatly reduced, but it has also provided abundant computing resources, unlocking more opportunities for intelligent data computation.
Since the mid-1950s, data management technology has gone through stages of manual management, file system management, and eventually entered the era of database systems with the advancement of hardware technology and declining prices. Database systems have solved the complexity of data management and provided strong support for business systems (such as CRM, ERP, BI, reporting and visualization systems) and specialized business systems (such as e-commerce recommendation systems and bank customer profiling). They have liberated developers, allowing them to focus more on business logic.
Many popular database systems are distributed databases, and most traditional distributed database systems adopt a Massively Parallel Processing (MPP) architecture. MPP databases, with PC servers as units, distribute data storage and computation across clusters as shown in the diagram below. Assuming a wide table has 300 million records, an MPP database would distribute 100 million records on the hard drive of each PC server. During data computation, all machines simultaneously perform parallel computation, theoretically reducing the computation time to 1/n (when ānā is the number of machines) of a single-machine deployment, saving processing time for massive amounts of data.
Traditional MPP Database Architecture
However, as the amount of data continues to rise, enterprises have increasingly higher demands on databases. Traditional database solutions have encountered a series of bottlenecks in their use:
With the explosive growth of data volume and computing power, the cloud-native era has emerged alongside the rapid development of cloud computing technology. In the cloud-native era, more and more enterprises are migrating their applications to the cloud, and a significant amount of data is flowing to the cloud. Public clouds bring numerous advantages:
These advantages enable cloud-native databases to reduce computing costs, provide unlimited and rich computing resources, achieve minute-level scalability and true high availability, and unleash more opportunities for data-driven insights and intelligence.
The following are typical use cases of digital enterprises in the cloud-native era:
Facing these ever-changing business demands and computing tasks, enterprises have higher requirements:
Fully integrating cloud computing and large-scale parallel processing technologies, cloud-native databases have emerged as the foundation for reliable and controllable cloud databases that can meet the requirements of the cloud-native digital era. Compared to traditional data warehouses, cloud-native databases offer clear advantages.
Traditional data warehouses tightly couple computation and storage, where computing resources and storage resources are strongly bound in a certain ratio. To ensure the correctness of query results, each computing node needs to participate in the execution of every query. This poses challenges in scaling, operations, and migration. The uncertainty of enterprise business development prevents timely analysis of business data by traditional big data systems, resulting in missed opportunities to fully tap into the value of data.
Cloud-native databases separate computation and storage, avoiding resource waste. Enterprises can flexibly and cost-effectively scale storage or computing resources according to their resource requirements, improving resource utilization and saving space costs and energy consumption.
Traditional data warehouses often require significant manpower for database installation and debugging. The "instant availability" feature of cloud-native databases saves enterprises a considerable amount of operational expenses. With compute nodes deployed in the cloud, they are not limited by physical constraints or potential latency. They can be easily managed anytime, anywhere via the internet without the need for any hardware. Data is readily available, eliminating the need to deal with backend technical issues. This opens up shortcuts for cross-departmental and cross-regional data sharing and collaboration, facilitating the globalization process for enterprises.
Traditional data warehouses store files and resources in the same host, compensating for node downtime with primary and backup node data, which severely affects data timeliness and increases operational costs and difficulties. Cloud-native databases provide truly high availability performance that users are unaware of, with automatic failover, automated disaster recovery, and high availability capabilities. This avoids the impact of single-point failures on business operations and ensures data security. When there is a need to upgrade or change services, nodes can be gradually upgraded without interrupting the service.
The black box nature of traditional databases prevents enterprises from promptly addressing scalability and node failure issues. Deployed in the cloud, cloud-native databases benefit from the agility and efficiency brought by cloud computing. They possess high elasticity and performance, effortlessly handling petabyte-scale data. Services are independent from each other, providing multiple layers of security protection and additional fault tolerance services. The highly automated operations and maintenance tools support more frequent updates to business applications, empowering enterprises with stronger agility and iterative capabilities. Thanks to the storage and compute separation of cloud-native databases, enterprises can flexibly scale up or down, giving them robust migratory capabilities.
The high costs of traditional databases, including software and hardware, result in significant upfront investments. Cloud computing provides cloud-native databases with nearly unlimited low-cost storage space, reducing costs associated with traditional database room planning and server procurement while eliminating management burdens for enterprises. With the subscription-based model and dynamic scalability of cloud-native databases, enterprises can expand resources based on their needs, avoiding resource waste and achieving higher cost-effectiveness compared to traditional databases.
OpenPie, with the mission of "Data Computing for New Discoveries" and the product concept of "Big Data Promises finally Come True," has launched the cloud-native virtual data warehouse PieCloudDB. PieCloudDB also created a new elastic Massive Parallel Processing (eMPP) distributed technology. It constructs a brand-new data computing platform with a cloud-native and analytics-oriented distributed database at its core. It aims to provide enterprises with powerful functions such as real-time processing, elastic scalability, elastic computation, and integrated data analysis in the cloud, helping enterprises maximize data value and create new advantages for high-quality development. It also serves as a reliable and controllable cloud database infrastructure in new infrastructure construction.
Based on the cloud computing architecture, PieCloudDB Database's eMPP (Elastic MPP) elastic parallel computing perfectly addresses the limitations of PC-based traditional databases and incorporates all the advantages of cloud-native databases mentioned above. PieCloudDB separates computation and storage, allowing them to independently and elastically scale in the cloud, avoiding resource waste. Enterprises can flexibly scale storage or compute resources separately based on their business requirements, resulting in improved resource utilization, space cost savings, and reduced energy consumption.
PieCloudDB's New Cloud-Native eMPP Database Architecture
What's even more exciting is that PieCloudDB allows users to simultaneously activate multiple clusters for data computation within the cloud. For example, if an airline's booking system has already activated a 3-node cluster for data analysis, their membership system can activate an additional 4-node cluster for data computation, and so on. Users can open clusters with any number of nodes anytime and anywhere for data computation in new applications. In PieCloudDB, users can continuously store all data in the cloud, enabling true data sharing for existing and future applications, thus helping users realize their big data dreams of "Big Data Promises finally Come True."