Course Starts on April 1, 2018
University of California, San Diego is offering free online course on Big Data Analytics Using Spark. In data science, data is qualified as “big” if it cannot fit into the memory of a single standard laptop or workstation.
In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation. The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, Map Reduce and Spark
Length: 10 weeks
Effort: 10 hours pw
Subject: Data Analysis & Statistics
Institution: University of California, San Diego and edx
Certificate Available: Yes, Add a Verified Certificate for $350
The University of California, San Diego (UC San Diego) is a student-centered, research-focused, service-oriented public institution that provides opportunity for all. This young university has made its mark regionally, nationally and internationally.
You will learn how to perform supervised unsupervised machine learning on massive data sets using the Machine Learning Library (MLlib). In this course, as in the other ones in this Micro Masters program, you will gain hands-on experience using Py Spark within the Jupyter notebooks environment.
- Programming Spark using Py spark
- Identifying the computational trade-offs in a Spark application
- Performing data loading and cleaning using Spark and Parquet
- Modeling data through statistical and machine learning methods
Dr. Freund is a Professor of Computer Science and Engineering in the University of California San Diego.