Data sets are becoming increasingly large as the world population and things become more and more connected. Traditional data processing software and techniques cannot deal with these large-scale datasets. This course teaches the essential basics of processing large scale data sets using R and Python. In addition, the course also teaches how to perform common data science tasks such data wrangling and building machine learning models in R and Python.
In this course, students are introduced to the notion of correlations between variables, simple linear regression, multiple linear regression, generalised linear models, mixed-effects models, semi-parametric and regression survival analysis.
Several topics in machine learning and big data are also covered. These include introduction to classification trees, basic elements of statistical learning theory, introduction to support vector machine classification, introduction to k nearest neighbours learning, introduction to basic ideas and concepts, principles and theory for machine learning.
The course includes intensive labs sessions where students are taught how to implement these concepts in R and Python, and analyse various type of datasets.
January 23 to June 28, 2020