Introduction to Data Science and scikit-learn in Python

Por: Coursera . en: ,

  • Introduction to Python Programming for Hypothesis Testing
    • In this module, we'll get ourselves started with Programming in Python. After becoming familiar with Python and the Jupyter Notebook interface, we'll dive into some basic coding paradigms such as variables, loops, and functions. We'll also cover data structures in the form of lists and dictionaries. We'll go through one of the most useful things in your Python arsenal - importing and using modules effectively. Finally, we'll introduce scikit-learn and walk through a classification problem to predict the presence/absence of cancer from health data.
  • Creating a Hypothesis: Numpy, Pandas, and Scikit-Learn
    • In this module, we'll become familiar with the two most important packages for data science: Numpy and Pandas. We'll begin by learning the differences between the two packages. Then, we'll get ourselves familiar with np arrays and their functionalities. Adding text turns our arrays into tables, and gives rise to the Pandas module. After a basic introduction, we'll end with a series of important data manipulation tools such as indexing, merging/combining datasets, and reshaping data.
  • Scikit-Learn Revisited: ML for Hypothesis Testing
    • In this module, we'll work from the ground up to build and test our hypothesis. Learning both the theory and the code, we'll learn to test our predictions with different types of machine learning algorithms. We'll start by going through some of the necessary data preprocessing steps to orient ourselves. Getting familiar with using the Scikit-Learn library starts with the documentation. From there, we'll load in a dataset and analyze some of its most basic properties. Finally, we'll import and use models to make a prediction.
  • Using Classification to Predict the Presence of Heart Disease
    • In the final project, we'll try and predict the presence of heart disease using patient data. We'll load in data, create new features, and apply a machine learning algorithm using scikit-learn.