- Module 1 - Introduction to Notebooks
- Understand how to use Zeppelin in your Spark projects
- Identify the various notebooks you can use with Spark
- Module 2 - Spark RDD Architecture
- Understand how Spark generates RDDs
- Manage partitions to improve RDD performance
- Module 3 - Optimizing Transformations and Actions
- Use advanced Spark RDD operations
- Identify what operations cause shuffling
- Module 4 - Caching and Serialization
- Understand how and when to cache RDDs
- Understand storage levels and their uses
- Module 5 - Develop and Testing
- Understand how to use sbt to build Spark projects
- Understand how to use Eclipse and IntelliJ for Spark development