# Mastering Data Analysis in Excel

Por: Coursera . en: ,

• Excel Essentials for Beginners
• In this module, will explore the essential Excel skills to address typical business situations you may encounter in the future. The Excel vocabulary and functions taught throughout this module make it possible for you to understand the additional explanatory Excel spreadsheets that accompany later videos in this course.
• Binary Classification
• Separating collections into two categories, such as “buy this stock, don’t but that stock” or “target this customer with a special offer, but not that one” is the ultimate goal of most business data-analysis projects. There is a specialized vocabulary of measures for comparing and optimizing the performance of the algorithms used to classify collections into two groups. You will learn how and why to apply these different metrics, including how to calculate the all-important AUC: the area under the Receiver Operating Characteristic (ROC) Curve.
• Information Measures
• In this module, you will learn how to calculate and apply the vitally useful uncertainty metric known as “entropy.” In contrast to the more familiar “probability” that represents the uncertainty that a single outcome will occur, “entropy” quantifies the aggregate uncertainty of all possible outcomes.
The entropy measure provides the framework for accountability in data-analytic work. Entropy gives you the power to quantify the uncertainty of future outcomes relevant to your business twice: using the best-available estimates before you begin a project, and then again after you have built a predictive model.
The difference between the two measures is the Information Gain contributed by your work.
• Linear Regression
• The Linear Correlation measure is a much richer metric for evaluating associations than is commonly realized. You can use it to quantify how much a linear model reduces uncertainty. When used to forecast future outcomes, it can be converted into a “point estimate” plus a “confidence interval,” or converted into an information gain measure. You will develop a fluent knowledge of these concepts and the many valuable uses to which linear regression is put in business data analysis. This module also teaches how to use the Central Limit Theorem (CLT) to solve practical problems. The two topics are closely related because regression and the CLT both make use of a special family of probability distributions called “Gaussians.” You will learn everything you need to know to work with Gaussians in these and other contexts.
• Additional Skills for Model Building
• This module gives you additional valuable concepts and skills related to building high-quality models.
As you know, a “model” is a description of a process applied to available data (inputs) that produces an estimate of a future and as yet unknown outcome as output.
Very often, models for outputs take the form of a probability distribution. This module covers how to estimate probability distributions from data (a “probability histogram”), and how to describe and generate the most useful probability distributions used by data scientists. It also covers in detail how to develop a binary classification model with parameters optimized to maximize the AUC, and how to apply linear regression models when your input consists of multiple types of data for each event.
The module concludes with an explanation of “over-fitting” which is the main reason that apparently good predictive models often fail in real life business settings. We conclude with some tips for how you can avoid over-fitting in you own predictive model for the final project – and in real life.
• Final Course Project
• The final course project is a comprehensive assessment covering all of the course material, and consists of four quizzes and a peer review assignment. For quiz one and quiz two, there are learning points that explain components of the quiz. These learning points will unlock only after you complete the quiz with a passing grade. Before you start, please read through the final project instructions. From past student experience, the final project which includes all the quizzes and peer assessment, takes anywhere from 10-12 hours.