PieCloudDB Database: the Cloud Native Virtual Data Warehouse

OCTOBER 14TH, 2022

With the development of information technology and the accelerated proliferation of internet applications, humanity has entered the digital economy era. Since the beginning of the 21st century, with the advancements in technologies such as mobile internet, Internet of Things (IoT), and 5G, the global datasphere has been exponentially expanding. IDC predicts that global data will reach 175 zettabytes by 2025, with China expected to experience explosive growth and become the world's largest data market. Data is often referred to as the "oil" of the digital economy, as it drives the progress of the intelligent and digital era, similar to how oil propelled the industrialization age.

To unlock the value of data, businesses face the challenge of storing and analyzing massive amounts of data, as well as dealing with hotspots and sudden surges in data traffic. In the face of the increasing demand for data computing, rising operational costs of data organization, and the diverse formats of data, enterprises undergoing digital transformation are confronted with significant challenges. They urgently need a database product to help maximize the utilization of data assets, reduce costs, and enable more intelligent and efficient data computing.

The Future of Databases Lies in the Cloud

As early as 2019, Gartner predicted that the future of the database market lies in the cloud. In the "Market Guide for DBMS, China" released in 2022, Gartner once again pointed out that the Chinese database industry will experience accelerated growth and gradually shift towards the cloud. Over the next four years, the pace of database migration to public clouds in China will surpass the global average. In 2020, cloud databases accounted for 40% of the overall database market share, and by 2022, cloud database revenues are expected to exceed half of the total database market.

OpenPie believes that computing technology has gone through three generations of platforms: 1) the era of mainframes, 2) the era of PCs, and 3) the era of cloud computing. Each transition in computing platforms has brought the potential for breakthrough innovations in data computing technology. As computing technology shifted from the mainframe era to the PC era, PCs gradually replaced mainframes, significantly reducing the threshold for computing and enriching computing resources, leading to breakthrough innovations in data computing. With the advent of the cloud computing era, not only has the cost of computing greatly reduced, but it has also provided abundant computing resources, unlocking more opportunities for intelligent data computation.

Since the mid-1950s, data management technology has gone through stages of manual management, file system management, and eventually entered the era of database systems with the advancement of hardware technology and declining prices. Database systems have solved the complexity of data management and provided strong support for business systems (such as CRM, ERP, BI, reporting and visualization systems) and specialized business systems (such as e-commerce recommendation systems and bank customer profiling). They have liberated developers, allowing them to focus more on business logic.

Many popular database systems are distributed databases, and most traditional distributed database systems adopt a Massively Parallel Processing (MPP) architecture. MPP databases, with PC servers as units, distribute data storage and computation across clusters as shown in the diagram below. Assuming a wide table has 300 million records, an MPP database would distribute 100 million records on the hard drive of each PC server. During data computation, all machines simultaneously perform parallel computation, theoretically reducing the computation time to 1/n (when “n” is the number of machines) of a single-machine deployment, saving processing time for massive amounts of data.

Traditional MPP Database Architecture

However, as the amount of data continues to rise, enterprises have increasingly higher demands on databases. Traditional database solutions have encountered a series of bottlenecks in their use:

Complexity and inefficiency: With the growth of business, data generated by various application systems has exponentially increased. The data comes from diverse sources, and the data types have become more varied, making the application scenarios more complex. Traditional databases require centralized storage and processing of data, which leads to complex permission management and troubleshooting when issues arise.

High costs: Traditional databases involve high costs for expensive hardware and software, as well as salaries for development and operations personnel, requiring significant upfront investment from enterprises. As storage and workload requirements continue to grow, expanding and upgrading traditional databases becomes challenging. Due to the tightly coupled architecture of traditional databases for storage and computation, enterprises often incur substantial maintenance and time costs, along with cumbersome operations.

Impeding innovation: Traditional database models fail to meet the flexible scalability needs of enterprises in the face of diverse business support and innovation requirements. They struggle to embrace the development directions of the new era, such as "Internet+" and "Industry 4.0."

High learning costs: Traditional databases require a considerable workforce for maintenance, and they have high requirements for operations and development personnel who need to master complex technical stacks. Technology updates iterate rapidly, and relevant personnel must actively update their knowledge. However, the talent market for such skills is small, leading to a scarcity of skilled individuals. The high learning costs result in poor performance, high failure rates, and long troubleshooting and repair times for users.

The Cloud-Native Database is Born for the Cloud

With the explosive growth of data volume and computing power, the cloud-native era has emerged alongside the rapid development of cloud computing technology. In the cloud-native era, more and more enterprises are migrating their applications to the cloud, and a significant amount of data is flowing to the cloud. Public clouds bring numerous advantages:

On-demand provision/release of computing resources
Unlimited computing resources
Unlimited storage pools
Cost-effective object storage

These advantages enable cloud-native databases to reduce computing costs, provide unlimited and rich computing resources, achieve minute-level scalability and true high availability, and unleash more opportunities for data-driven insights and intelligence.

The following are typical use cases of digital enterprises in the cloud-native era:

Several small-scale computing tasks per day requiring a few nodes
One medium-scale computing task per week requiring dozens of nodes
One large-scale computing task per month requiring thousands of nodes

Facing these ever-changing business demands and computing tasks, enterprises have higher requirements:

Unlimited Space: Ability to provide unlimited storage space
Flexible Scalability: Elastic scaling of clusters and worker nodes according to business needs
Resource Recycling: Resource reclamation when a cluster completes a computing task, saving costs

Fully integrating cloud computing and large-scale parallel processing technologies, cloud-native databases have emerged as the foundation for reliable and controllable cloud databases that can meet the requirements of the cloud-native digital era. Compared to traditional data warehouses, cloud-native databases offer clear advantages.

Flexibility with Agility

Traditional data warehouses tightly couple computation and storage, where computing resources and storage resources are strongly bound in a certain ratio. To ensure the correctness of query results, each computing node needs to participate in the execution of every query. This poses challenges in scaling, operations, and migration. The uncertainty of enterprise business development prevents timely analysis of business data by traditional big data systems, resulting in missed opportunities to fully tap into the value of data.

Cloud-native databases separate computation and storage, avoiding resource waste. Enterprises can flexibly and cost-effectively scale storage or computing resources according to their resource requirements, improving resource utilization and saving space costs and energy consumption.

Instant Availability

Traditional data warehouses often require significant manpower for database installation and debugging. The "instant availability" feature of cloud-native databases saves enterprises a considerable amount of operational expenses. With compute nodes deployed in the cloud, they are not limited by physical constraints or potential latency. They can be easily managed anytime, anywhere via the internet without the need for any hardware. Data is readily available, eliminating the need to deal with backend technical issues. This opens up shortcuts for cross-departmental and cross-regional data sharing and collaboration, facilitating the globalization process for enterprises.

Security and Stability

Traditional data warehouses store files and resources in the same host, compensating for node downtime with primary and backup node data, which severely affects data timeliness and increases operational costs and difficulties. Cloud-native databases provide truly high availability performance that users are unaware of, with automatic failover, automated disaster recovery, and high availability capabilities. This avoids the impact of single-point failures on business operations and ensures data security. When there is a need to upgrade or change services, nodes can be gradually upgraded without interrupting the service.

Agility and Reliability

The black box nature of traditional databases prevents enterprises from promptly addressing scalability and node failure issues. Deployed in the cloud, cloud-native databases benefit from the agility and efficiency brought by cloud computing. They possess high elasticity and performance, effortlessly handling petabyte-scale data. Services are independent from each other, providing multiple layers of security protection and additional fault tolerance services. The highly automated operations and maintenance tools support more frequent updates to business applications, empowering enterprises with stronger agility and iterative capabilities. Thanks to the storage and compute separation of cloud-native databases, enterprises can flexibly scale up or down, giving them robust migratory capabilities.

Cost-saving and Profit-increasing

The high costs of traditional databases, including software and hardware, result in significant upfront investments. Cloud computing provides cloud-native databases with nearly unlimited low-cost storage space, reducing costs associated with traditional database room planning and server procurement while eliminating management burdens for enterprises. With the subscription-based model and dynamic scalability of cloud-native databases, enterprises can expand resources based on their needs, avoiding resource waste and achieving higher cost-effectiveness compared to traditional databases.

PieCloudDB Database - Cloud-native Virtual Data Warehouse

OpenPie, with the mission of "Data Computing for New Discoveries" and the product concept of "Big Data Promises finally Come True," has launched the cloud-native virtual data warehouse PieCloudDB. PieCloudDB also created a new elastic Massive Parallel Processing (eMPP) distributed technology. It constructs a brand-new data computing platform with a cloud-native and analytics-oriented distributed database at its core. It aims to provide enterprises with powerful functions such as real-time processing, elastic scalability, elastic computation, and integrated data analysis in the cloud, helping enterprises maximize data value and create new advantages for high-quality development. It also serves as a reliable and controllable cloud database infrastructure in new infrastructure construction.

Based on the cloud computing architecture, PieCloudDB Database's eMPP (Elastic MPP) elastic parallel computing perfectly addresses the limitations of PC-based traditional databases and incorporates all the advantages of cloud-native databases mentioned above. PieCloudDB separates computation and storage, allowing them to independently and elastically scale in the cloud, avoiding resource waste. Enterprises can flexibly scale storage or compute resources separately based on their business requirements, resulting in improved resource utilization, space cost savings, and reduced energy consumption.

PieCloudDB's New Cloud-Native eMPP Database Architecture

What's even more exciting is that PieCloudDB allows users to simultaneously activate multiple clusters for data computation within the cloud. For example, if an airline's booking system has already activated a 3-node cluster for data analysis, their membership system can activate an additional 4-node cluster for data computation, and so on. Users can open clusters with any number of nodes anytime and anywhere for data computation in new applications. In PieCloudDB, users can continuously store all data in the cloud, enabling true data sharing for existing and future applications, thus helping users realize their big data dreams of "Big Data Promises finally Come True."

Related Blogs:

no related blog