The first module “Big Data Rankings & Products” focuses on the relation and market shares of big data hardware, software, and professional services. This information provides an insight to how future industry, products, services, schools, and government organizations will be influenced by big data technology. To have a deeper view into the world’s top big data products line and service types, the lecture provides an overview on the major big data company, which include IBM, SAP, Oracle, HPE, Splunk, Dell, Teradata, Microsoft, Cisco, and AWS. In order to understand the power of big data technology, the difference of big data analysis compared to traditional data analysis is explained. This is followed by a lecture on the 4 V big challenges of big data technology, which deal with issues in the volume, variety, velocity, and veracity of the massive data. Based on this introduction information, big data technology used in adding global insights on investments, help locate new stores and factories, and run real-time recommendation systems by Wal-Mart, Amazon, and Citibank is introduced.
Big Data & Hadoop
The second module “Big Data & Hadoop” focuses on the characteristics and operations of Hadoop, which is the original big data system that was used by Google. The lectures explain the functionality of MapReduce, HDFS (Hadoop Distributed FileSystem), and the processing of data blocks. These functions are executed on a cluster of nodes that are assigned the role of NameNode or DataNodes, where the data processing is conducted by the JobTracker and TaskTrackers, which are explained in the lectures. In addition, the characteristics of metadata types and the differences in the data analysis processes of Hadoop and SQL (Structured Query Language) are explained. Then the Hadoop Release Series is introduced which include the descriptions of Hadoop YARN (Yet Another Resource Negotiator), HDFS Federation, and HDFS HA (High Availability) big data technology.
Spark
The third module “Spark” focuses on the operations and characteristics of Spark, which is currently the most popular big data technology in the world. The lecture first covers the differences in data analysis characteristics of Spark and Hadoop, then goes into the features of Spark big data processing based on the RDD (Resilient Distributed Datasets), Spark Core, Spark SQL, Spark Streaming, MLlib (Machine Learning Library), and GraphX core units. Details of the features of Spark DAG (Directed Acyclic Graph) stages and pipeline processes that are formed based on Spark transformations and actions are explained. Especially, the definition and advantages of lazy transformations and DAG operations are described along with the characteristics of Spark variables and serialization. In addition, the process of Spark cluster operations based on Mesos, Standalone, and YARN are introduced.
Spark ML & Streaming
The fourth module “Spark ML & Streaming” focuses on how Spark ML (Machine Learning) works and how Spark streaming operations are conducted. The Spark ML algorithms include featurization, pipelines, persistence, and utilities which operate on the RDDs (Resilient Distributed Datasets) to extract information form the massive datasets. The lectures explain the characteristics of the DataFrame-based API, which is the primary ML API in the spark.ml package. Spark ML basic statistics algorithms based on correlation and hypothesis testing (P-value) are first introduced followed by the Spark ML classification and regression algorithms based on linear models, naive Bayes, and decision tree techniques. Then the characteristics of Spark streaming, streaming input and output, as well as streaming receiver types (which include basic, custom, and advanced) are explained, followed by how the Spark Streaming process and DStream (Discretized Stream) enable big data streaming operations for real-time and near-real-time applications.
Storm
The fifth module “Storm” focuses on the characteristics and operations of Storm big data systems. The lecture first covers the differences in data analysis characteristics of Storm, Spark, and Hadoop technology. Then the features of Storm big data processing based on the nimbus, spouts, and bolts are described followed by the Storm streams, supervisor, and ZooKeeper details. Further details on Storm reliable and unreliable spouts and bolts are provided followed by the advantages of Storm DAG (Directed Acyclic Graph) and data stream queue management. In addition, the advantages of using Storm based fast real-time applications, which include real-time analytics, online ML (Machine Learning), continuous computation, DRPC (Distributed Remote Procedure Call), and ETL (Extract, Transform, Load) are introduced.
IBM SPSS Statistics Project
The sixth and last module “IBM SPSS Statistics Project” focuses on providing experience on one of the most famous and widely used big data statistical analysis systems in the world. First, the lecture starts with how to setup and use IBM SPSS Statistics, and continues on to describe how IBM SPSS Statistics can be used to gain corporate data analysis experience. Then the data processing statistical results of two projects based on using the IBM SPSS Statistics big data system is conducted. The projects are conducted so the student can discover new ways to use, analyze, and draw charts of the relationship between datasets, and also compare the statistical results using IBM SPSS Statistics.
Utilizamos cookies propias y de terceros para ofrecerte el mejor servicio. Si continúas navegando, entendemos que aceptas su uso.AceptoRechazarAjustesLeer más
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Cookie
Duración
Descripción
na_tc
This cookie is set by the provider Addthis. This cookie is used for social media sharing tracking service.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Cookie
Duración
Descripción
d
3 months
This cookie tracks anonymous information on how visitors use the website.
YSC
session
This cookies is set by Youtube and is used to track the views of embedded videos.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duración
Descripción
__gads
1 year 24 days
This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga
2 years
This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid
1 day
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duración
Descripción
ab
1 year
This domain of this cookie is owned by agkn. The cookie is used for targeting and advertising purposes.
CMID
1 year
The cookie is set by CasaleMedia. The cookie is used to collect information about the usage behavior for targeted advertising.
CMPRO
3 months
This cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMPS
3 months
This cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMST
1 day
The cookie is set by CasaleMedia. The cookie is used to collect information about the usage behavior for targeted advertising.
DSID
1 hour
This cookie is setup by doubleclick.net. This cookie is used by Google to make advertising more engaging to users and are stored under doubleclick.net. It contains an encrypted unique ID.
IDE
1 year 24 days
Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
KADUSERCOOKIE
The cookie is set by pubmatic.com for identifying the visitors' website or device from which they visit PubMatic's partners' website.
KTPCACOOKIE
1 day
This cookie is set by pubmatic.com for the purpose of checking if third-party cookies are enabled on the user's website.
mc
1 year 1 month
This cookie is associated with Quantserve to track anonymously how a user interact with the website.
test_cookie
15 minutes
This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
uuid
3 months
To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE
5 months 27 days
This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.