# Inferential Statistics

## Overview

Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population.

We will start by considering the basic principles of significance testing: the sampling and test statistic distribution, p-value, significance level, power and type I and type II errors. Then we will consider a large number of statistical tests and techniques that help us make inferences for different types of data and different types of research designs. For each individual statistical test we will consider how it works, for what data and design it is appropriate and how results should be interpreted. You will also learn how to perform these tests using freely available software.

For those who are already familiar with statistical testing: We will look at z-tests for 1 and 2 proportions, McNemar's test for dependent proportions, t-tests for 1 mean (paired differences) and 2 means, the Chi-square test for independence, Fisher’s exact test, simple regression (linear and exponential) and multiple regression (linear and logistic), one way and factorial analysis of variance, and non-parametric tests (Wilcoxon, Kruskal-Wallis, sign test, signed-rank test, runs test).

We will start by considering the basic principles of significance testing: the sampling and test statistic distribution, p-value, significance level, power and type I and type II errors. Then we will consider a large number of statistical tests and techniques that help us make inferences for different types of data and different types of research designs. For each individual statistical test we will consider how it works, for what data and design it is appropriate and how results should be interpreted. You will also learn how to perform these tests using freely available software.

For those who are already familiar with statistical testing: We will look at z-tests for 1 and 2 proportions, McNemar's test for dependent proportions, t-tests for 1 mean (paired differences) and 2 means, the Chi-square test for independence, Fisher’s exact test, simple regression (linear and exponential) and multiple regression (linear and logistic), one way and factorial analysis of variance, and non-parametric tests (Wilcoxon, Kruskal-Wallis, sign test, signed-rank test, runs test).

## Syllabus

Before we get started...

-[formatted text here]

Comparing two groups

-In this second module of week 1 we dive right in with a quick refresher on statistical hypothesis testing. Since we're assuming you just completed the course Basic Statistics, our treatment is a little more abstract and we go really fast! We provide the relevant Basic Statistics videos in case you need a gentler introduction. After the refresher we discuss methods to compare two groups on a categorical or quantitative dependent variable. We use different test for independent and dependent groups.

Categorical association

-In this module we tackle categorical association. We'll mainly discuss the Chi-squared test that allows us to decide whether two categorical variables are related in the population. If two categorical variables are unrelated you would expect that categories of these variables don't 'go together'. You would expect the number of cases in each category of one variable to be proportionally similar at each level of the other variable. The Chi-squared test helps us to compare the actual number of cases for each combination of categories (the joint frequencies) to the expected number of cases if the variables are unrelated.

Simple regression

-In this module we’ll see how to describe the association between two quantitative variables using simple (linear) regression analysis. Regression analysis allows us to model the relation between two quantitative variables and - based on our sample -decide whether a 'real' relation exists in the population. Regression analysis is more useful than just calculating a correlation coefficient, since it allows us assess how well our regression line fits the data, it helps us to identify outliers and to predict scores on the dependent variable for new cases.

Multiple regression

-In this module we’ll see how we can use more than one predictor to describe or predict a quantitative outcome variable. In the social sciences relations between psychological and social variables are generally not very strong, since outcomes are generally influences by complex processes involving many variables. So it really helps to be able to describe an outcome variable with several predictors, not just to increase the fit of the model, but also to assess the individual contribution of each predictor, while controlling for the others.

Analysis of variance

-In this module we'll discuss analysis of variance, a very popular technique that allows us to compare more than two groups on a quantitative dependent variable. The reason we call it analysis of variance is because we compare two estimates of the variance in the population. If the group means differ in the population then these variance estimates differ. Just like in multiple regression, factorial analysis of variance allows us to investigate the influence of several independent variables.

Non-parametric tests

-In this module we'll discuss the last topic of this course: Non-parametric tests. Until now we've mostly considered tests that require assumptions about the shape of the distribution (z-tests, t-tests and F-tests). Sometimes those assumptions don't hold. Non-parametric tests require fewer of those assumptions. There are several non-parametric tests that correspond to the parametric z-, t- and F-tests. These tests also come in handy when the response variable is an ordered categorical variable as opposed to a quantitative variable. There are also non-parametric equivalents to the correlation coefficient and some tests that have no parametric-counterparts.

Exam time!

-In this final module there's no new material to study. We advise you to take some extra time to review the material from the previous modules and to practice for the final exam. We've provided a practice exam that you can take as many times as you like. The final exam is structured exactly like the practice exam, so you know what to expect. Please note that you can only take the final exam twice every seven days, so make sure you are fully prepared. Please follow the honor code and do not communicate or confer with others while taking this exam or after. In the open questions of the exam (i.e. those that are not multiple choice) you should report your answers to 3 decimal places, and use 5 decimal places in your calculations. Good luck!

-[formatted text here]

Comparing two groups

-In this second module of week 1 we dive right in with a quick refresher on statistical hypothesis testing. Since we're assuming you just completed the course Basic Statistics, our treatment is a little more abstract and we go really fast! We provide the relevant Basic Statistics videos in case you need a gentler introduction. After the refresher we discuss methods to compare two groups on a categorical or quantitative dependent variable. We use different test for independent and dependent groups.

Categorical association

-In this module we tackle categorical association. We'll mainly discuss the Chi-squared test that allows us to decide whether two categorical variables are related in the population. If two categorical variables are unrelated you would expect that categories of these variables don't 'go together'. You would expect the number of cases in each category of one variable to be proportionally similar at each level of the other variable. The Chi-squared test helps us to compare the actual number of cases for each combination of categories (the joint frequencies) to the expected number of cases if the variables are unrelated.

Simple regression

-In this module we’ll see how to describe the association between two quantitative variables using simple (linear) regression analysis. Regression analysis allows us to model the relation between two quantitative variables and - based on our sample -decide whether a 'real' relation exists in the population. Regression analysis is more useful than just calculating a correlation coefficient, since it allows us assess how well our regression line fits the data, it helps us to identify outliers and to predict scores on the dependent variable for new cases.

Multiple regression

-In this module we’ll see how we can use more than one predictor to describe or predict a quantitative outcome variable. In the social sciences relations between psychological and social variables are generally not very strong, since outcomes are generally influences by complex processes involving many variables. So it really helps to be able to describe an outcome variable with several predictors, not just to increase the fit of the model, but also to assess the individual contribution of each predictor, while controlling for the others.

Analysis of variance

-In this module we'll discuss analysis of variance, a very popular technique that allows us to compare more than two groups on a quantitative dependent variable. The reason we call it analysis of variance is because we compare two estimates of the variance in the population. If the group means differ in the population then these variance estimates differ. Just like in multiple regression, factorial analysis of variance allows us to investigate the influence of several independent variables.

Non-parametric tests

-In this module we'll discuss the last topic of this course: Non-parametric tests. Until now we've mostly considered tests that require assumptions about the shape of the distribution (z-tests, t-tests and F-tests). Sometimes those assumptions don't hold. Non-parametric tests require fewer of those assumptions. There are several non-parametric tests that correspond to the parametric z-, t- and F-tests. These tests also come in handy when the response variable is an ordered categorical variable as opposed to a quantitative variable. There are also non-parametric equivalents to the correlation coefficient and some tests that have no parametric-counterparts.

Exam time!

-In this final module there's no new material to study. We advise you to take some extra time to review the material from the previous modules and to practice for the final exam. We've provided a practice exam that you can take as many times as you like. The final exam is structured exactly like the practice exam, so you know what to expect. Please note that you can only take the final exam twice every seven days, so make sure you are fully prepared. Please follow the honor code and do not communicate or confer with others while taking this exam or after. In the open questions of the exam (i.e. those that are not multiple choice) you should report your answers to 3 decimal places, and use 5 decimal places in your calculations. Good luck!