• 9849-xxx-xxx
  • noreply@example.com
  • Tyagal, Patan, Lalitpur

Big data and analytics solutions – Core Azure Solutions

Big data and analytics solutions

This section will aim to cover the following exam objective: describe the benefits and usage of Azure Synapse Analytics, HDInsight, and Azure Databricks.

Before we look closer at the big data and analytics solutions that are part of the exam objectives, we will create a knowledge foundation and baseline to build from. This also aims to build an understanding of the bigger picture of how the different big data and analytics solutions are positioned and interrelated for technical and business personas.

We should first understand what we mean by big data and analytics. In a nutshell, it is about discovering information hidden in data, which should present actionable/ acknowledgeable information to help an organization make informed decisions and, dependent on the context, gain a competitive advantage.

The challenge is that traditional Data Warehouse (DW) solutions and technologies cannot handle the massive volumes of complex, unstructured data, which is a characteristic of big data. This makes the traditional DW approach defunct due to its cloud mindset.

Now that we understand what big data and analytics are, what are some of their use cases and scenarios?

There are three typical scenarios where we can utilize big data to arrive at better outcomes:

  • Modern Data Warehouses
  • Intelligent analytics
  • IoT

There are four types of analytics techniques for getting insights out of data; some are based on traditional business intelligence (BI) analysis techniques, while others are based on AI (machine learning) techniques. The following diagram visualizes these analytics techniques and their relationships:

Figure 5.11 – Analytics techniques

Here, we can see that we have always been able to get insights from data, but there was always an actions gap. This gap has meant that while insights are great and can be very powerful, if they don’t allow an organization to do something with that knowledge, people have just been busy fools analyzing data and presenting dashboards of charts without getting any value.

Only when we look at what modern techniques through AI can deliver, such as predictive analytics and self-actioning prescriptive analytics, can we start to see actual value creation from our data analysis work.

There is a very close relationship between BI and AI. BI uses analytics services, which can be considered a suite of business analytics tools for analyzing data and sharing insights, but what value do they provide? And what do we do with those insights?

This is where AI comes into the picture; AI refers to the ability to analyze large quantities of data, learn from the results, and then use this knowledge (insights) to optimize and change future processes, systems, and so on. It is often said that the most challenging part of AI isn’t AI… it’s data!

It is also said that you should not AI before you BI; that is, you should place visibility and transforming data ahead of data intelligence (garbage in, garbage out). BI is often the bridge between AI and the data. This approach can be visualized in the following diagram:

Figure 5.12 – Analytics as a bridge

The first stage in any big data or analytics project should be to assess the readiness of the data, ensuring that any data assets and sources are in good shape for analysis. We refer to this as preparing the data.

The following are the characteristics you should consider for big data:

  • Quantity (volume) of data arriving (ingesting): This is often on the petabytes scale.
  • Speed (velocity) of data arriving (ingesting): This could be near/real-time streamed data, a time-framed schedule, or batch data.
  • Age (validity) of data arriving (ingesting): This data could have a life cycle that means the data value is no longer valid. This is critical for decisions or actions based on a value.
  • Format (variety) of data arriving (ingesting): This could be structured, semi-structured, or unstructured data.

Traditional database systems and data stores do not have the characteristics of the preceding list. To allow us to move beyond the limitations of the traditional DW and move toward more intelligent and actionable insights, we need a paradigm shift in terms of our mindset and technologies; we need a modern set of technologies and a new architecture is required to support this modern way of ingesting, processing, analyzing, and visualizing data to take action.

As we saw earlier in this section, big data is more than just the volume of data; when we combine the velocity of the data, its complexity, and the format (being unstructured), we get the term big data.

The traditional ETL toolset of extracting (E) the data from sources before transforming (T) and then loading (L) it into destinations cannot keep pace with the velocity and variation of this big data. Big data and the modern DW bring rise to this need to shift from ETL to Extract, Load, Transform (ELT), which supports changing the volume of data and the nature of its complexity.

Much like the DevOps world brought Development and Operations teams closer together for the better, bigger goal, the big data era brings data scientists, data engineers, database administrators, IT pros, and business analysts closer together. This is the birth of unified analytics. The following diagram represents a big data and analytics architecture model:

Figure 5.13 – Big data and analytics architecture

One or more of the components in the preceding diagram can be part of the big data architecture:

  • Data Sources: Databases, log files, file stores, IoT devices, and social media
  • Data Storage: Data lakes and blob storage
  • Real-Time Message Ingestion: Azure Event Hubs, Azure IoT Hubs, and Kafka
  • Batch Processing: HDInsight and Databricks
  • Stream Processing: Azure Stream Analytics and HDInsight
  • Analytical Data Store: Azure Synapse Analytics, SQL Data Warehouse, and HDInsight
  • Analysis and Reporting: Azure Synapse Analytics, Azure Analysis Services, Power BI, and Excel
  • Orchestration: Data Factory

There are several benefits of this architecture, such as the number of technology choices it provides and the ability to use open source (Kafka, Spark, Hadoop, and so on) as well as native Microsoft products and services. It also provides an elastic scale and parallelism, interoperability, and integration with existing enterprise solutions. However, there are some challenges, such as its complexity, skillset, the technology’s maturity and its evolving nature, and security/privacy/compliance. The following diagram aims to position the Microsoft data landscape on Azure visually:

Figure 5.14 – Azure data landscape

The preceding diagram is by no means exhaustive but indicates the pillars in a data model architecture and where common services and solutions typically sit within that.

Many of the services and solutions that are shown in the preceding diagram are beyond the scope of this book; we have included some links in the Further reading section of this chapter, should you wish to continue studying this area.

For the exam objectives, we will look at the following three Azure big data and analytics services:

  • Azure Synapse Analytics
  • Azure HDInsight
  • Azure Databricks

This section looked at the data landscape. In the next section, we will look at Azure Synapse Analytics.

Leave a Reply

Your email address will not be published. Required fields are marked *