At a Glance
Ignite your interest in Spark with an introduction to the core concepts that make this general processor an essential tool set for working with Big Data.
About This Course Learn the fundamentals of
- Learn how it performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining.
- Learn how it provides in-memory cluster computing for lightning fast speed and supports Java, Python, R, and Scala APIs for ease of development.
- Learn how it can handle a wide range of data processing scenarios by combining SQL, streaming and complex analytics together seamlessly in the same application.
- Learn how it runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.
Module 1 - Introduction to Spark - Getting started
- What is Spark and what is its purpose?
- Components of the Spark unified stack
- Resilient Distributed Dataset (RDD)
- Downloading and installing Spark standalone
- Scala and Python overview
- Launching and using Spark’s Scala and Python shell ©
Module 2 - Resilient Distributed Dataset and DataFrames
- Understand how to create parallelized collections and external datasets
- Work with Resilient Distributed Dataset (RDD) operations
- Utilize shared variables and key-value pairs
Module 3 - Spark application programming
- Understand the purpose and usage of the SparkContext
- Initialize Spark with the various programming languages
- Describe and run some Spark examples
- Pass functions to Spark
- Create and run a Spark standalone application
- Submit applications to the cluster
Module 4 - Introduction to Spark libraries
- Understand and use the various Spark libraries
Module 5 - Spark configuration, monitoring and tuning
- Understand components of the Spark cluster
- Configure Spark to modify the Spark properties, environmental variables, or logging properties
- Monitor Spark using the web UIs, metrics, and external instrumentation
- Understand performance tuning considerations
- This course is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
Recommended skills prior to taking this course
- Basic understanding of Apache Hadoop and Big Data.
- Basic Linux Operating System knowledge.
- Basic understanding of the Scala, Python, R, or Java programming languages.
- Have taken the Hadoop 101 course on BDU.
Henry L. Quach
This Course is part of the following Learning Paths
Beginner Learning PathBig Data Fundamentals
Are you interested in understanding 'Big Data' beyond the terms used in headlines? Then select this learning path as an introduction to tools like Apache Hadoop and Apache Spark Frameworks, which enable data to be analyzed on mass, and start the journey towards your headline discovery.3 Courses
Beginner Learning PathSpark Fundamentals
Solid understanding and experience, with core tools, in any field promotes excellence and innovation. Apache Spark, as a general engine for large scale data processing, is such a tool within the big data realm. This learning path addresses the fundamentals of this program's design and its application in the everyday.5 Courses