Lightbend

Data Science for Scala

Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, machine learning and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

Start the Free Course

About This Scala Course

In this course you will learn about Basic statistics and data types, Preparing data, Feature engineering, Fitting a model and Pipelines and grid search.  Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, machine learning and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

Course Syllabus

Module 1 - Basic Statistics and Data Types

        • Vectors and Labelled Points
        • Local and Distributed Matrices
        • Summary Statistics, Correlations, and Random Data
        • Sampling
        • Hypothesis Testing

Module 2 - Preparing Data

        • Statistics, Random data and Sampling on Data Frames
        • Handling Missing Data and Imputing Values
        • Transformers and Estimators
        • Data Normalization
        • Identifying Outliers

Module 3 - Feature Engineering

        • Feature Vectors
        • Categorical Features
        • Using Explode, User Defined Functions, and Pivot
        • Principal Component Analysis (PCA) in Feature Engineering
        • RFormulas

Module 4 - Fitting a Model

        • Decision Trees
        • Random Forests
        • Gradient-Boosting Trees
        • Linear Methods
        • Evaluation

Module 5 - Pipeline and Grid Search

        • Predicting Grant Applications: Introduction
        • Predicting Grant Applications: Creating Features
        • Predicting Grant Applications: Building a Pipeline
        • Prediciting Grant Applications: Cross Validation and Model Tuning
        • Predicting Grant Applications: Wrapping up

General Information

      • This course is free.
      • It is self-paced.
      • It can be taken at any time.
      • It can be audited as many times as you wish.
      • There is only ONE chance to pass the course, but multiple attempts per question

Recommended skills prior to taking this course

      • General understanding of Scala Experience with Java (preferred)
      • Python, or another object­ oriented language
      • General understanding of machine learning

Course Staff

Petro Verkhogliad

Petro Verkhogliad is Consulting Manager at Lightbend. He holds a Masters degree in Computer Science with specialization in Intelligent Systems. He is passionate about functional programming and applications of AI.

 
 

Scala Machine Learning instructor - Dr. Priya Dev

Dr Priya Dev

Dr Priya Dev is a lecturer of statistics at ANU and UNSW and also a founder of a mobile commerce startup, Qhopper. She completed a PhD in probability theory from ANU and Columbia University and has been a data analytics consultant to ASX listed companies and global banks. Qhopper is a massively scalable mobile commerce platform built on the Lightbend platform using Scala and Spark. It bridges the technology gap for hospitality businesses, helping them create better experiences and connect with new and existing customers through their own online ordering, CRM and business intelligence suite.

Python for Data Science instructor Joseph Santarcangelo, PhD

Joseph Sancartangelo

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Learn about IBM Z at this Free | Virtual | Conference.Register now