Hadoop Platform and Application Framework
- Hadoop Basics
- Welcome to the first module of the Big Data Platform course. This first module will provide insight into Big Data Hype, its technologies opportunities and challenges. We will take a deeper look into the Hadoop stack and tool and technologies associated with Big Data solutions.
- Introduction to the Hadoop Stack
- In this module we will take a detailed look at the Hadoop stack ranging from the basic HDFS components, to application execution frameworks, and languages, services.
- Introduction to Hadoop Distributed File System (HDFS)
- In this module we will take a detailed look at the Hadoop Distributed File System (HDFS). We will cover the main design goals of HDFS, understand the read/write process to HDFS, the main configuration parameters that can be tuned to control HDFS performance and robustness, and get an overview of the different ways you can access data on HDFS.
- Introduction to Map/Reduce
- This module will introduce Map/Reduce concepts and practice. You will learn about the big idea of Map/Reduce and you will learn how to design, implement, and execute tasks in the map/reduce framework. You will also learn the trade-offs in map/reduce and how that motivates other tools.
- Welcome to module 5, Introduction to Spark, this week we will focus on the Apache Spark cluster computing framework, an important contender of Hadoop MapReduce in the Big Data Arena.
Spark provides great performance advantages over Hadoop MapReduce,especially for iterative algorithms, thanks to in-memory caching. Also, gives Data Scientists an easier way to write their analysis pipeline in Python and Scala,even providing interactive shells to play live with data.