DP-900 Microsoft Azure Data Fundamentals

This certification helps you to demonstrate foundational knowledge of cloud concepts in general and Microsoft Azure. This is the first course necessary for acquiring the basics of the Cloud model offered by Azure.

Introduction

You can classify data into structured, semi-structured, or unstructured data. A common format for semi-structured data is JavaScript Object Notation (JSON).

There are two categories of commonly used data stores:

  • file storage

  • database

Definition

Avro (row-based format) is a good format for data compression, minimizing storage and network bandwidth requirements.

ORC (Optimized Row Columnar format) organizes data in columns instead of rows (like Parquet).

Azure SQL is the name for a family of relational database solutions built on the Microsoft SQL Server database engine. Specific Azure SQL services include:

  • Azure SQL Database - platform-as-a-service

  • Azure SQL Managed Instance - platform-as-a-service

  • Azure SQL VM - infrastructure-as-a-service

  • Azure SQL Edge - IoT scenarios

Azure includes managed services for popular open source relational database systems such as:

  • Azure Database pour MySQL

  • Azure Database pour MariaDB

  • Azure Database pour PostgreSQL

Azure Cosmos DB is a global-scale non-relational (NoSQL) database system that supports multiple application programming interfaces (APIs), allowing you to store and manage data as JSON documents, key-value pairs, column families, and graphs.

Azure Storage is an essential Azure service that allows you to store data in:

  • Blob Containers

  • File Shares

  • Tables

Azure Data Factory is an Azure service that allows you to define and schedule data pipelines to transfer and transform data.

Azure Synapse Analytics is an unified data analytics solution that provides a single service interface for multiple analytical capabilities.

Azure Databricks is an Azure-integrated version of the popular Databricks platform, combining the Apache Spark data processing platform with SQL database semantics and an integrated management interface for large-scale data analytics.

Azure HDInsight is an Azure service that provides Azure-hosted clusters for popular Apache open source big data processing technologies, including Apache Spark, Apache Hadoop, Apache HBase, Apache Kafka or Apache Storm.

Azure Stream Analytics is a real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.

Azure Data Explorer is a service that provides the same high-performance querying of log and telemetry data as the Azure Synapse Data Explorer runtime in Azure Synapse Analytics.

Microsoft Purview provides a solution for enterprise-wide data governance and discoverability. You can use Microsoft Purview to create a map of your data and track data lineage across multiple data sources and systems, helping you find trusted data for analysis and reporting.

Microsoft Power BI is an analytical data modeling and reporting platform that data analysts can use to create and share interactive data visualizations.

Azure Blob Storage is a service that allows you to store large volumes of unstructured data as Blobs, in the cloud.

Azure Data Lake Store (Gen1) is a separate service for hierarchical data storage for analytical data lakes, often used by big data analytics solutions that work with structured, semi-structured, and unstructured data stored in files. Azure Data Lake Storage Gen2 is a newer version of this service, which is integrated with Azure Storage (hierarchical file system capabilities).

Difference between Azure SQL, managed instance and VM

SQL Server on virtual machines allows you to use full versions of SQL Server in the cloud without having to manage hardware locally. This is an example of an IaaS approach.

Azure SQL Managed Instance runs a fully controllable instance of SQL Server in the cloud. You can install multiple databases on the same instance. You have total control over this instance, as if it were an on-premises server. SQL Managed Instance automates backups, software patching, database monitoring, and other general tasks.

Azure SQL Database is available as a single database or Elastic Pool. it gives you the best possible option for low cost and minimal administration. This service is not fully compatible with on-premises SQL Server installations. It is often used in new cloud projects where the application design can accommodate any changes needed to your applications.

To gain access to the whole document, please fill the form (home page) to contact me.