Offered By: IBM
Data Analysis with Python
Data Analysis has always been a very important field and a highly demanded skill. Until recently, it has been practiced using mostly closed, expensive, and limited tools like Excel or Tableau. Python, pandas, and other open-source libraries have changed Data Analysis forever and have become must-have tools for anyone looking to build a career as a Data Analyst.
Continue readingDA0101EN
Data Analysis
40k+ EnrolledAt a Glance
Data Analysis has always been a very important field and a highly demanded skill. Until recently, it has been practiced using mostly closed, expensive, and limited tools like Excel or Tableau. Python, pandas, and other open-source libraries have changed Data Analysis forever and have become must-have tools for anyone looking to build a career as a Data Analyst.
You will learn how to:
- Import data sets
- Clean and prepare data for analysis
- Manipulate pandas DataFrame
- Summarize data
- Build machine learning models using scikit-learn
- Build data pipelines
Data Analysis with Python is delivered through lectures, hands-on labs, and assignments. It includes the following parts:
- Data Analysis libraries: will learn to use Pandas DataFrames, Numpy multi-dimensional arrays, and SciPy libraries to work with various datasets. We will introduce you to pandas, an open-source library, and we will use it to load, manipulate, analyze, and visualize cool datasets. Then we will introduce you to another open-source library, scikit-learn, and we will use some of its machine learning algorithms to build smart models and make cool predictions.
- Learning Objectives
- Understanding the Domain
- Understanding the Dataset
- Python package for data science
- Importing and Exporting Data in Python
- Basic Insights from Datasets
Module 2 - Cleaning and Preparing the Data
- Identify and Handle Missing Values
- Data Formatting
- Data Normalization Sets
- Binning
- Indicator variables
Module 3 - Summarizing the Data Frame
- Descriptive Statistics
- Basic of Grouping
- ANOVA
- Correlation
- More on Correlation
Module 4 - Model Development
- Simple and Multiple Linear Regression
- Model Evaluation Using Visualization
- Polynomial Regression and Pipelines
- R-squared and MSE for In-Sample Evaluation
- Prediction and Decision Making
Module 5 - Model Evaluation
- ModelĀ Evaluation
- Over-fitting, Under-fitting, and Model Selection
- Ridge Regression
- Grid Search
- Model Refinement
- This course is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
- Python programming, Statistics
- Some Python experience is expected
- Python for Data Science
Estimated Effort
3 hours
Level
Beginner
Skills You Will Learn
Data Analysis, Python, Data Science
Language
English
Instructors
Mahdi Noorian
Ph.D., Machine Learning Engineer
Mahdi Noorian is a Postdoctoral Fellow at the Laboratory for Systems, Software, and Semantics (LS3) of the Ryerson University. He holds a Ph.D. degree in Computer Science from the University of New Brunswick. As a Data Scientist, he is interested in the application of machine learning, data mining, optimization, and semantic data analysis for big data to solve real-world problems.
Read moreJoseph Santarcangelo
Senior Data Scientist at IBM
Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Read more