Perform data science with Azure Databricks

Por: Coursera . en: , ,

  • Introduction to Azure Databricks
    • In this module, you will discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. You will come to understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. You will also be introduced to the architecture of an Azure Databricks Spark Cluster and Spark Jobs.
  • Working with data in Azure Databricks
    • Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries. In this module, you will work with large amounts of data from multiple sources in different raw formats. You will also learn to use the DataFrame Column Class Azure Databricks to apply column-level transformations, such as sorts, filters and aggregations. You will also use advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks.
  • Processing data in Azure Databricks
    • Azure Databricks supports a range of built in SQL functions, however, sometimes you have to write custom function, known as User-Defined Function (UDF). In this module, you will learn how to register and invoke UDFs. You will also learn how to use Delta Lake to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations.
  • Get started with Databricks and machine learning
    • In this module, you will learn how to use PySpark’s machine learning package to build key components of the machine learning workflows that include exploratory data analysis, model training, and model evaluation. You will also learn how to build pipelines for common data featurization tasks.
  • Manage machine learning lifecycles and fine tune models
    • In this module, you will learn how to use MLflow to track machine learning experiments and how to use modules from the Spark’s machine learning library for hyperparameter tuning and model selection.
  • Train a distributed neural network and serve models with Azure Machine Learning
    • In this module, you will learn how to use the Uber’s Horovod framework along with the Petastorm library to run distributed, deep learning training jobs on Spark using training datasets in the Apache Parquet format. You will also learn how to use MLflow and Azure Machine Learning service register, package, and deploy a trained model to both Azure Container Instance, and Azure Kubernetes Service as a scoring web service.

Plataforma