Offered By: LightBend
Data Science with Scala
Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning and graph processing. This course shows how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.
Continue readingCourse
6.47k+ EnrolledAt a Glance
Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning and graph processing. This course shows how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.
ABOUT THIS SCALA COURSE
- Vectors and Labelled Points
- Local and Distributed Matrices
- Summary Statistics, Correlations, and Random Data
- Sampling
- Hypothesis Testing
Module 2 - Preparing Data
- Statistics, Random data and Sampling on Data Frames
- Handling Missing Data and Imputing Values
- Transformers and Estimators
- Data Normalization
- Identifying Outliers
Module 3 - Feature Engineering
- Feature Vectors
- Categorical Features
- Using Explode, User Defined Functions, and Pivot
- Principal Component Analysis (PCA) in Feature Engineering
- RFormulas
Module 4 - Fitting a Model
- Decision Trees
- Random Forests
- Gradient-Boosting Trees
- Linear Methods
- Evaluation
Module 5 - Pipeline and Grid Search
- Predicting Grant Applications: Introduction
- Predicting Grant Applications: Creating Features
- Predicting Grant Applications: Building a Pipeline
- Prediciting Grant Applications: Cross Validation and Model Tuning
- Predicting Grant Applications: Wrapping up
- This course is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
- There is only ONE chance to pass the course, but multiple attempts per question
- General understanding of Scala Experience with Java (preferred)
- Python, or another object oriented language
- General understanding of machine learning
Petro Verkhogliad


Estimated Effort
6 Hours
Level
Beginner
Skills You Will Learn
Apache Spark, Data Science, Machine Learning, Scala
Course Code
SC0105EN