As part of the Capacity Building component of the Big Data for Development (BD4D) project, AIMS-NEI will be running a series of short courses at AIMS Rwanda. In order to kick off this program, from 11-15 March 2019, we will launch our first BD4D short course entitled “Big Data Analytics with Python”, which targets people with passion in Data Science in general and in particular in Big Data, having at least a 4 years undergraduate degree or 2 to 3 years of work experience as a professional in Statistics or any other Data Science related topic.
Data sets are becoming increasingly large as the world population and things become more and more connected. Traditional data processing software and techniques cannot deal with these large-scale datasets. Thus, it requires specialized frameworks and tools such as Apache Spark to deal with large data sets. This course teaches the essential basics of processing large scale data sets using Python. In addition, the course also teaches how to perform common data science tasks such data wrangling and building machine learning models in Python.
This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The lessons start with absolute basics of Python focusing mainly on data structures, and then quickly jump into core libraries for doing data science in Python. Next, the course moves to Big Data by first providing brief theoretical concepts about the topic and then teaches Apache Spark-a state of the art tool for processing large datasets. Thereafter, it provides introductory lectures in machine learning before moving on to show you how to build these algorithms in Python. The course emphasizes learning by doing, as such, they are a lot of exercises built into the course to give participants ample time to practice.
Summarized Learning Objectives
- Understand the basics of Python language: syntax, data structures and more.
- Perform common data science tasks using Python: data ingestion, processing, visualization, web scraping etc.
- Handle large scale data set (20 gb+) on a personal computer using Apache Spark and utilize cloud computing platforms.
- Be familiar with theoretical basis of common machine learning algorithms.
- Be able to build and evaluate machine learning models using scikit-learn library
Day 1: Introduction to Python-focuses on the Python language.
Day 2: Doing Common Data Science Tasks in Python: data ingestion, processing, analysis, visualization, web scraping and more.
Day 3: Processing large data sets with Apache Spark: introduction to Big Data, Multiprocessing in Python, Apache Spark and more
Day 4: Machine learning in Python: Lecture in machine learning, building and evaluating models with scikit-learn
Day 5: Putting it all together- Review, advanced tasks in Python and extended exercises.
This training will take place from 11-15 March 2019 at AIMS Rwanda centre and its participation is free for the first round. AIMS-NEI will not provide any financial assistance that may be required by successful applicants to attend this short course. We encourage any successful candidate to make his or her own provision to cover all logistic costs related to their participation to this course including transport.
This short course targets anyone with a genuine interest in the field of Data Science in general and Big Data in particular. This includes all students who have completed at least 4 years undergraduate degree in a discipline that is related to Data Science and professionals with 2 to 3 years work experience in any Data Science discipline.
Applicants must clearly demonstrate how their participation in this short course will benefit their personal capacity development that is required to contribute in solving African development issues using Big Data techniques. In addition, applicants must show how taking part of this training would help them achieve their career aspirations in Data Science in general and in Big Data in particular. Female applicants are strongly encouraged to apply.
Dr. Dunstan Matekenya is a consummate Data Scientist with over 10 years’ experience in both traditional statistics and modern machine learning methods. Currently, he works as a Data Scientist at the World Bank Group Headquarters in Washington DC. Prior to joining the WBG, Dunstan completed his PhD at the University of Tokyo in 2016. His Ph.D. research focused on use of machine learning methods to explore insights from mobile phone data. Before re-orienting his career into Data Science, Dunstan earlier worked as a Statistician at the National Statistical Office in Malawi from 2007 up until 2017. While there, he actively contributed to flagship projects such as the 2008 Malawi Population and Housing Census and also led the GIS unit. His passion includes contributing to modernization of official statistics in developing countries with use of alternative data sources such as mobile phone data as well improving capacity in Data Science.
All candidates interested in applying for the Big Data Analytics with Python short course must use the only provided AIMS-NEI online portal to complete and submit their application with all supporting documents by the deadline shown below. Shortlisted candidates will be contacted to provide additional information required to finalise their applications. Applications will be reviewed by a committee with members from AIMS-NEI BD4D project team. Successful applicants will be contacted within a week of the deadline. This short course will take place from 11-15 March 2019.
Applications for this short course should be submitted via the online application system, stating clearly the title of the short course you are applying for.
Deadline for applications: February 25th, 2019 – 11:59 PM (EAT).
Any inquiries about this short course should be sent to: firstname.lastname@example.org.