Short Course Description
Massive amounts of data are collected by many organizations, creating new opportunities for data scientists, but also raising several interesting challenges in extracting meaningful and actionable knowledge from data. Creating efficient and impactful data science processes is not an easy task: forming analysis questions is hard, data is messy, the volume and dimensionality of data are massive, and closing the loop in business and research operations is tough. The course aims to provide a comprehensive set of tools for extracting knowledge from data: data manipulation, extraction, and cleaning; efficient data analysis; and visualizing conclusions. We will learn technologies based on relational databases and on distributed databases (such as Spark). This course will focus on the unique challenges that arise from the practical aspects of the field, relying on business and research case studies to highlight the full process of data science. At the end of the course, successful students will be able to understand the theory and practice the creation of automated data engineering pipelines that can handle massive datasets.
Full syllabus is to be published