In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation. The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark. In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks. You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib). In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.
An excellent online course offered by edX: how it works
edX courses consist of weekly learning sequences. Each learning sequence is composed of short videos interspersed with interactive learning exercises, where students can immediately practise the concepts from the videos. The courses often include tutorial videos that are similar to small on-campus discussion groups, an online textbook, and an online discussion forum where students can post and review questions and comments to each other and teaching assistants. Where applicable, online laboratories are incorporated into the course.
edX offers certificates of successful completion and some courses are credit-eligible. Whether or not a college or university offers credit for an online course is within the sole discretion of the school. edX offers a variety of ways to take courses, including verified courses where students have the option to audit the course (no cost) or to work toward an edX Verified Certificate (fees vary by course). edX also offers XSeries Certificates for completion of a bundled set of two to seven verified courses in a single subject (cost varies depending on the courses).
An edX learning programme under Other Experiences