Azure Synapse Analytics
Azure Synapse Analytics is the rebranded cloud-based Azure SQL Data Warehouse. Other than just a name change, it has added functionality to provide capabilities that support big data; it now combines a modern Data Warehouse with a powerful and fully-featured scalable analytics service.
It is provided to users as a fully managed PaaS solution and is built on the massively parallel processing (MPP) relational database technology; it supports Apache Spark as a fully managed service, known as Spark-as-a-Service. Data is managed and interacted with through Azure Synapse Studio, which is a browser-based user interface.
Azure Synapse Analytics is intended for large-scale modern Data Warehouse and analytic scenarios with petabytes of data, with complex queries running against it.
Azure Synapse runs on clusters that contain the following components:
- Synapse SQL Engine
- Apache Spark integration service
- Data integration layer
- Azure Synapse Studio (browser-based user interface)
The following diagram outlines how Azure Synapse Analytics may be used in a solution architecture:
Figure 5.15 – Azure Synapse Analytics solution architecture
This section looked at Azure Synapse Analytics and how it can be used in a big data and analytics solution. The next section will look at how Azure HDInsight can be used in a big data and analytics solution.
Azure HDInsight
Azure HDInsight is a cloud distribution of Hadoop in Azure that’s a fully managed analytics service for enterprise-scale organizations. It can be considered a scale-out data cluster compute engine that allows you to efficiently tackle complex unstructured data that can be ingested into a data store (typically, a data lake).
It uses the Hadoop open source framework for distributed processing and analyzing big datasets through clusters. Utilizing the advantages of cloud services makes this processing and analysis more accessible, faster, and more cost-effective. You can think of HDInsight as Hadoop-as-a-Service.
Although the clusters primarily run Hadoop as a managed service, there is also cluster support for the HBase, Storm, Spark, Interactive Query, R Server, and Kafka open source technologies.
The following are some examples of how HDInsight is being used in the real world:
- Telecoms: Churn prediction, market offers, pricing call detail records (CDRs), network monitoring, demand provisioning, and optimizations.
- Financial Services: Customer 360 and fraud detection.
- Health Care: Clinical trial selection, patient mining, vaccine effectiveness, personalized medication, and health plans.
- Industry: Predictive maintenance, supply chain, stock control, and inventory optimization.
- Utility: Demand prediction and smart metering.
The following diagram outlines how Azure HDInsight may be used in a solution architecture:
Figure 5.16 – Azure HDInsight solution architecture
This section looked at Azure HDInsight and its use in a big data and analytics solution. The next section will look at Azure Databricks and how it can be used in a big data and analytics solution.