🚀 Master the language of AI with our brand new course: "Prompt Engineering for Everyone" Learn more

Offered By: IBM

Spark MLlIB

Spark provides a machine learning library known as MLlib. MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handing linear algebra operations, statistics and data handling.

Continue reading

Course

Big Data

1.3k+ Enrolled
0.0
(0 Reviews)

At a Glance

Spark provides a machine learning library known as MLlib. MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handing linear algebra operations, statistics and data handling.


ABOUT THIS SPARK MLLIB COURSE

Spark provides a machine learning library known as MLlib. Spark MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handling linear algebra operations, statistics and data handling. This course will start you off on your journey and walk you through some of the machine learning libraries and how to use them. 

COURSE SYLLABUS

  • Module 1 - Spark MLlib Datatypes

    1. Understand the difference between Dense and Sparse Data Types, and how they apply to LabeledPoints and matrices.


    2. Understand how to create and use the different matrices that are available in Spark MLlib.

  • Module 2 - Review of Algorithms

    1. Have a general understanding of each of the algorithm that will be discussed in the course and how they work.


    2. Learn how to instantiate simple Linear Regression and Classification models, including Linear Regression, Support Vector Machines, and Logistic Regression.

  • Module 3 - Spark MLlib Decision Trees and Random Forests

    1. Learn about the different input parameters used to create Decision Trees and Random Forests. 


    2. Understand the effects of tuning specific parameters for Decision Trees and Random Forests. 

  • Module 4 - Spark MLlib Clustering

    1. Learn about the parameters involved in creating K-Means Clustering models and Gaussian Mixture Clustering models.


    2. Describe how outputs and uses of the functions available to each clustering model.

GENERAL INFORMATION

  • This course is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.
RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE

  • None
REQUIREMENTS

  • None
COURSE STAFF


Daniel Tran

Daniel Tran is an IBM Co-op Student working as a Technical Curriculum Developer in Toronto, Ontario. He develops courses to improve the education of customers who seek knowledge in the Big Data field. He has also reworked previously developed courses, updating them to be compatible with the newest software releases, as well as work at the forefront of recreating courses on a newly developed cloud environment. He has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, HBase, MapReduce & YARN, Sqoop, Oozie, and Phoenix. He has also worked on separate courses involving Machine Learning. Daniel is from the University of Alberta, where he has completed his third year of traditional Computer Engineering Co-op.

Level

Intermediate

Language

English

Course Code

BD0221EN

Tell Your Friends!

Saved this page to your clipboard!

Sign up to our newsletter

Stay connected with the latest industry news and knowledge!