In a previous article, we discussed Modern Data Warehouse models and design components. In this article, we’ll discuss two more modern design patterns to handle your scenarios; 1) Advanced Big Data Analytics 2) Real Time Analytics.
Advanced big data analytics and real-time analytics are major business needs these days and require modern design using the latest technology components. Microsoft Azure provides a set of technology components to meet all of your needs. Advanced analysis can be performed using Azure Machine Learning tools. Modern design allows for the creation and deployment of custom machine learning models. These models can turn data into actionable information. In the case of analysis and processing of Internet of Things (IoT) data, Weblog and live streaming data, a real-time analysis design model can be used.
Designing advanced analytics on Big Data
Modern Big Data Analytics design integrates structured, semi-structured and unstructured data from various data sources using Azure Data Factory and stores it in Azure storage, Azure Data Lake or Azure Blob Storage. This is a common data ingestion process like other data warehouse design patterns. In the next part of the design, after the data is stored in Azure storage, Azure Databricks can be used to cleanse and transform unstructured data and combine the data with other structured data available from a database. operational or data warehouse. Databricks provides the flexibility to build and deploy machine learning models on-premises or in the cloud. Azure Databricks functionality can be further explored to gain deeper insight from the data using Python, R, or Scala with notebook experiences built into Azure Databricks. Now the data can be used in different ways from here:
- Data can be moved to Azure SQL Data Warehouse for access using Azure native connectors
- Power BI users can leverage Databricks to perform root cause determination and raw data analysis
- Business users can run ad hoc queries directly on data in Azure Databricks
- Data can be moved from Databricks to Azure Cosmos DB to make it accessible to web and mobile applications for better understanding
Figure 1 Advanced Analytics on Big Data Design
Real-time analysis design
The modern conception of real-time analytics begins with two parts of the data acquisition process; the first is to ingest live streaming data using Apache Kafka cluster in Azure HDInsight and the second is to get all structured / semi-structured data to Azure Data Lake or Azure Blob Storage using Azure HDInsight Azure Data Factory. In the next part of the design, data can be prepared and trained using Azure Databricks. Azure Databricks can be used to cleanse and transform live streaming data and combine data with structured data from an operational database or data warehouse. The notebook experiences built into Azure Databricks can be used to apply machine learning / deep learning techniques to derive deeper insights from this data. Thereafter, the data can be accessed in different ways using other Azure components.
Data can be moved at any scale between Azure SQL Data Warehouse and Azure Databricks using native connectors. A set of analytical dashboards can be created and reports can be integrated with Azure SQL Data Warehouse data to provide insight to business users within the organization. Additionally, Azure Analysis Services may provide this data to other users.
Power BI users can use Azure Databricks and Azure HDInsight to perform raw data analysis and determine the root cause. In this design, Azure Cosmos DB is also leveraged to transfer information from Azure Databricks to Azure Cosmos DB, in order to make data accessible through mobile and web applications in real time.
Figure 2 Real-time analysis design
Components of advanced analytics on big data and real-time analytics
The complete design consists of various technological components. We have leveraged many Azure components in both of the design patterns above. Some of the Azure components are commonly used and covered in a previous article.
We leveraged Azure Cosmos DB in Advanced Analytics on Big Data Design to gain a better understanding and make data accessible to web and mobile applications.
Azure Cosmos DB is a globally distributed, multi-model database service that provides native support for NoSQL choices. It offers turnkey global distribution to an unlimited number of Azure regions and resiliently scales throughput and storage worldwide.
We used Azure HDInsight with the Kafka cluster in designing real-time analytics to ingest live streaming data.
Azure HDInsight is a fully managed, full spectrum open source analytics service for popular open source frameworks like Hadoop, Kafka, Storm, Spark, R, Hive, etc.
Modern data warehouse designs support various types of business needs including changes in data behavior, real-time analysis on live streaming data, and lambda architecture to serve multiple purposes with the source data. In addition, the selection of technology components is important to meet the needs of your business to create a flexible, high performance and scalable solution. Microsoft Azure provides a comprehensive set of technological components to create the hub for all types of data (structured, unstructured or streaming) to develop transformative solutions such as Business Intelligence and reporting, advanced analytics on Big Data and real-time analytics.
See all articles from Anoop Kumar