Introduction to Text Mining with R

Por: Coursera . en: ,

  • R and RStudio Basics
    • In this module, you will learn how to work with R and RStudio, how to use RMarkdown for literate programming, and how to work with data using basic R data types and structures
  • Working with Tidyverse
    • In this module, you will learn how to work with data using the Tidyverse set of packages. You will learn how to use tibbles (a Tidyverse alternative to data.frames), the pipe operator from the magrittr package, and how to clean and transform data using the powerful dplyr package. You will also learn how to efficiently work with strings using the stringr package.
  • Supervised machine learning with the bag-of-words approach
    • In this module, you will learn how to obtain text data from Project Gutenberg, how to prepare text data for analysis. You will also learn how to use TF-IDF to find most distinctive words in a corpus of texts and how to build, interpret and evaluate supervised learning models for textual data.
  • Unsupervised machine learning
    • Is this module, you will learn how to preprocess text data using the preText package that can compare many types of preprocessing for a particular corpus. You will also learn how train, interpret and compare topic models.
  • Final Project
    • This module in its entirety is dedicated to the final project of the course, in which you will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.

Plataforma