- Introduction and the data
- The first lecture is designed to provide the broad overview of the data analysis field and the two major components it consists of: the data and the analysis. Topics within the first lecture explain how these two concepts fit together. We start with the definitions to clear some of the confusions with terminology in the field. Then, we discuss the contents of this course and map the field of data analysis. We also discuss the role that data play in our lives, what the data are, their types and classifications, and sources of data. We finally address the issue of modeling – why we model and how analytics aids decision-making in business and real life.
- Data issues that go bump in the night
- This lecture is on topic that is rarely covered in detail in most data analytics programs: the problems that we face when working with data. The segments within the lecture each cover different aspects of the data issues that can arise when working with real-life data. They include concerns with data – data management, including cleaning and recoding; sources of data errors and their fixing; working with different data file structures. We also discuss detecting fake data and state-of-the art missing data analysis.
- Descriptive Analytics
- This lecture covers the first steps to analysis that should be done with data that has been collected, cleaned, checked for issues and missing data, and otherwise prepared for the analysis. These first steps, aimed at understanding “what happened” or gathering information, are collectively called “descriptive analytics.” We start with definitions of population and sample, and move to basic graphical descriptions. We then discuss various numerical measures and selecting the best measure for a given dataset. Next, we talk about advanced graphs and charts and how to make descriptions meaningful. Finally, we examine everything we’ve learned on real cases of the Coca-Cola Company and McDonald’s corporation.
- Inferential analytics
- Linear regression, the most widely used analytical method, belongs, for the most part, to the domain of inferential analytics. Inferential analytics is concerned with explaining “why did something happen” – in other words, making inferences from the data. At the heart of this approach is hypothesis testing, which we discuss first. Then, we move to variables used to make inferences and their relationships and discuss the data requirements for inferential analytics. We discuss the basics of regression analysis and look at different examples of inferential analytics models.
- Predictive Analytics
- Predictive analytics is impossible without establishing causal relationships first. Therefore, we first discuss the issue of causality, approaches to studying this phenomenon, and causality in observational studies. Then, we move a step up: from causality, where we establish the influence of one variable on the other, to prediction, or future relationships between these variables. We talk about predictive modeling of continuous and discrete outcomes, and discuss some of the modeling issues that may arise with predictions.
- Prescriptive Analytics
- Prescriptive analytics is concerned with optimization or making the most desirable outcome happen. In this lecture, we look at theoretical considerations of prescriptive analytics, then talk about optimization as an approach, and discuss stochastic vs. mathematical optimization. Then, we discuss the specifics of one very common optimization method – linear programming, including problem setup and the simplex method approach to solving linear programming problems.
- Final assignment