IBM Cloud Satellite: Build faster. Securely. Anywhere.

Cognitive Class

  1. Blog
  2. Learning Paths
  3. Courses
  4. Badges
  5. Business
  • Learning Paths
  • Courses
  • Badges
  • Business

    1. Register
    2. Sign in
  1. Register
  2. Sign in
Spark Fundamentals II Image

Cognitive Class

Spark Fundamentals II

Login to enroll
  • Course Number
    BD0212EN
  • Classes Start
    Any time, Self-paced
  • Estimated Effort
    5 hours
  • Audience
  • Course Level
  • Language
  • Learning Path
  • Badge Earned
  • Tell Your Friends

ABOUT THIS COURSE

Expand your knowledge of the concepts discussed in Spark Fundamentals I with a focus on RDDs (Resilient Distributed Datasets). RDDs are the main abstraction Spark provides to enable parallel processing across the nodes of a Spark cluster.
  • Get in-deptth knowledge on Spark’s architecture and how data is distributed and tasks are parallelized.
  • Learn how to optimize your data for joins using Spark’s memory caching.
  • Learn how to use the more advanced operations available in the API.
  • The lab exercises for this course are performed exclusively on the Cloud and using a Notebook interface.
IBM Data Science Experience provides you with Jupyter notebooks that is already connected to Spark and supports Python, R, and Scala so that you start creating your Spark projects and collaborating with other data scientists. When you sign up, you get free access to Data Science Experience and all other IBM services for 30 days. Start now and take advantage of this offer.

COURSE SYLLABUS

  • Module 1 - Introduction to Notebooks
    1. Understand how to use Zeppelin in your Spark projects
    2. Identify the various notebooks you can use with Spark
  • Module 2 - Spark RDD Architecture
    1. Understand how Spark generates RDDs
    2. Manage partitions to improve RDD performance
  • Module 3 - Optimizing Transformations and Actions
    1. Use advanced Spark RDD operations
    2. Identify what operations cause shuffling
  • Module 4 - Caching and Serialization
    1. Understand how and when to cache RDDs
    2. Understand storage levels and their uses
  • Module 5 - Develop and Testing
    1. Understand how to use sbt to build Spark projects
    2. Understand how to use Eclipse and IntelliJ for Spark development

GENERAL INFORMATION

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.

RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE

  • Basic understanding of Apache Hadoop and Big Data.
  • Have taken the Spark Fundamentals I course on BDU.
  • Basic understanding of the Scala, Python, R, or Java programming languages.
  • Basic Linux Operating System knowledge.

REQUIREMENTS

  • None

COURSE STAFF

Henry Quach, Instructor of

Henry L. Quach

Henry L. Quach is the Technical Curriculum Developer Lead for Big Data. He has been with IBM for 9 years focusing on education development. Henry likes to dabble in a number of things including being part of the original team that developed and designed the concept for the IBM Open Badges program. He has a Bachelor of Science in Computer Science and a Master of Science in Software Engineering from San Jose State University.

 

Alan Barnes

Alan Barnes

Alan Barnes is a Senior IBM Information Management Course Developer / Consultant. He has worked in several companies as a Senior Technical Consultant, Database Team Manager, Application Programmer, Systems Programmer, Business Analyst, DB2 Team Lead and more. His career in IT spans more than 35 years.

  1. Privacy Notice
  2. Terms of Service
  3. About
  4. Contact
  5. Blog
  6. Resources
  7. FAQ
  8. Facebook
  9. Twitter
  10. LinkedIn
  11. YouTube

Cookie Usage Agreement

I consent to allow Cognitive Class to use cookies to capture product usage analytics. The data from these cookies will only be used for product usage on Cognitive Class domains, and this usage data will not be shared outside of Cognitive Class. The product usage will be used for business reporting and product usage understanding.

I do not agree