Data Warehouse (DW or DWH) is a central repository of organizational data, which stores integrated data from multiple sources. Developing and managing a centralized system requires a lot of effort and development time. Additionally, there will always be some latency for the availability of the latest data for reporting. The traditional integration process results in small delays in the availability of data for any type of business analysis and reporting.
Nowadays, we are seeing changes in the behavior of data, resulting in changes in the needs of businesses. The data is generated in large volumes, with high speeds and in many varieties, for example, structured, unstructured, semi-structured. It becomes difficult to support new data behavior and business growth using traditional DWH design and development methods. New modern data warehouse design models are needed to develop and operate the latest technology components.
Modern DWH design helps create a hub for all kinds of data (e.g. structured, unstructured, semi-structured or data streams) to launch integrated and transformative solutions such as Business Intelligence (BI) and reporting, real-time analytics and predictive analytics. To achieve all of these goals and support modern designs, Microsoft has introduced a set of fully managed cloud services such as Azure Data Factory, Azure SQL Data Warehouse, Azure SQL Database, and Azure Databricks, etc. All of these fully managed services not only support modern DWH design patterns, but also offer the benefits of built-in scalability, high availability, good performance, and flexibility.
Modern data warehouse model
The traditional design of DWH and BI systems used to be simple. It mainly has a standard set of design layers such as data entry, data transformation and storage, and data consumption and presentation layer. Any standard and traditional DW design is shown in the image below:
Figure 1 – Traditional design of the DWH + BI system
Modern DWH brings together all kinds of data, at any scale, without much effort and time, to gain insight through operational reports, analytical dashboards and advanced analytics for all users.
Microsoft Azure provides a set of fully managed services, which allow you to build a modern DWH in minutes. All Azure services support a fully cloud-based solution, or a mix of cloud-based and on-premises solutions, to meet business needs.
A modern data warehouse can be designed to meet business needs and adapt to changing data behavior using the latest technology components such as scalable cloud-based data storage for big data, real-time analytics, predictive analytics and machine learning, global data distribution, high availability, etc. Some of the modern data warehouse design patterns are as follows:
Modern Data Warehouse: This is the most common design pattern in the modern data warehouse world, allowing you to create a hub to store all kinds of data using fully managed Azure services at any location. what scale.
Advanced Big Data Analytics: This modern design model consists of actionable information, using machine learning tools as well as other features of the Modern Data Warehouse design model. This design pattern helps build and deploy custom machine learning models at scale.
Real-time analysis: This modern design model can gain insight from live broadcast data. This design allows you to capture streaming data from IoT devices or any web log and process it in near real time.
Design and components of a modern data warehouse
The Modern Data Warehouse combines all types of data like structured, unstructured and semi-structured data (sensor logs, IoT and media streaming) using Microsoft Azure Data Factory to Microsoft Azure Data Lake or Azure Blob Storage . Once the data is stored in Data Lake or Blob Storage, it can be cleaned and transformed and perform scalable analysis with Azure Databricks. These analyzes can help users and businesses understand the behavior, and then the cleansed and transformed data can be moved to Azure SQL Data Warehouse to merge with other existing data and create an integrated data source.
Figure 2 Modern Data Warehouse Design
Once the integrated data is available, it can be viewed and moved using Azure connectors. Additionally, operational reports and other analytical dashboards can be created on Azure Data Warehouse. These reports and dashboards pull insights from stored data and use Azure Analysis Services to understand data trends. Even ad hoc queries can be run directly against data in Azure Databricks and publish dashboards using Power BI.
Some of the key components of Azure technology that help design a modern data warehouse:
Azure Data Factory is a hybrid data integration service that can create, schedule, and orchestrate ELT workflows; the workflow is also called a pipeline. A pipeline consists of three stages: Connect & Collect, Transform & Enrich and Publish.
Azure Data Lake Store or Azure Blob Storage is the cheapest and easiest way to store any kind of unstructured data.
Azure Databricks, an analysis platform based on Apache Spark.
Azure SQL Data Warehouse is a fast and flexible cloud data warehouse. A massive parallel architecture with elastic compute and storage.
Azure Analysis Services, Azure-as-a-service-based analytics that govern, deploy, test, and deliver BI solution
Power BI, a suite of business analysis tools, that connects to hundreds of data sources, simplifies data preparation and provides ad hoc analysis.
Modern DWH is required to support growing business needs and changing data behavior. In addition, there are several other factors that make today’s DWH a âmodern DWHâ. The other factors are the use of Hadoop with Machine Learning, near real-time data processing using the Lambda architecture, a hybrid solution (cloud integration with an on-premises solution), the overall distribution of the solution and the deployment. self-service, etc.
In this article, we have discussed the design of Modern Data Warehouse. In the next article, we will discuss the advanced analytics and real-time analytical design of Modern Data Warehouse.
See all articles from Anoop Kumar