- Data and Big Data Analysis: Approaches, Functions and Software Tools
- The 1-st module explores the concept of data analysis and introduces
basic techniques of this analysis. It discusses the concept of big data
and its possible applications. It also considers the relationship between
different approaches to process data as well as basic software for data
analysis. Some useful functions for data analysis are presented. The
principles of big data processing are discussed, in particular the
MapReduce model.
- Basic Characteristics of Data. Distributions, Statistics and Regressions
- In Module 2, descriptive statistics and exploratory data analysis are
discussed. The main characteristics of data distributions are introduced
and their calculations are presented in some examples. Frequency and
Bayesian approaches to hypothesis testing are explained. The basic concepts
of regression and correlation analysis are formulated, focusing on linear
analysis methods.
- Clustering and Dimensionality Reduction
- Module 3 discusses the clustering problem and the algorithms for solving it.
Hierarchical clustering, k-means algorithm and CURE-algorithm are explained.
Peculiarities of the algorithms operation in non-Euclidean space are specified.
The module also covers some questions of dimensionality reduction, the basic
facts of singular value decomposition, and illustrates its applications.
It also considers the principal component analysis and CUR-decomposition,
applicable for big data processing.
- Machine Learning and Artificial Neural Networks
- Module 4 discusses models and methods of machine learning. The model of the
perceptron, its functioning, advantages and disadvantages are discussed in
detail. The basic support vector machine and its generalizations are
considered. Further it discusses artificial neural networks, their
organization and training. The main features of deep neural networks,
problems that appear with such networks and modern methods to overcome
these problems are discussed. The convolutional and recurrent neural networks
are also considered.