About the Bootcamp
Welcome to the multi-day, intensive Data Science Bootcamp! This is a beginner-friendly, hands-on bootcamp, where you will learn the fundamentals of data science from IBM Data Scientists Saeed Aghabozorgi, PhD and Polong Lin. You will learn how to use a popular programming language for data analysis and data visualization, dive into several machine learning algorithms, and have a chance to apply your new skills to a data science capstone project.
Day 1 Morning: Introduction to Data Science
What is Data Science? Learn about the importance of data, machine learning, and big data. Find out about IBM’s online resource for data science education — Cognitive Class. And get a feel for popular open data science tools through IBM’s Data Scientist Workbench including Jupyter (IPython) Notebooks, RStudio IDE, Apache Spark, Apache Hadoop and others.
Day 1 Afternoon: Introduction to Programming for Data Science (Python/R)
Python/R are popular programming languages used in data science. Don’t know anything about either or need a refresher? No problem. Choose a language and you’ll learn it from the basics. We cover the fundamentals of Python/R programming, which will provide a strong foundation as we cover data analysis, data visualization, machine learning and big data in this bootcamp.
- Getting started with the Python or R environment and libraries
- Numbers, variables, logical statements
- Arrays, Matrices, Lists and Dataframes
- Reading data from files
- Loops and conditional statements
- Custom functions
Day 2 Morning: Data Analysis
Learn how to analyze data using R or Python. This section will take you from the basics of R to exploring many different types of data. You will learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data, and more!
- Importing Datasets
- Cleaning the Data
- Dataframe manipulation
- Summarizing the Data
Day 2 Afternoon: Data Visualization
A picture is worth a thousand words – or should we say data points? In this section, we will go through how to create beautiful visualizations. Learn how to plot bar graphs, line graphs, histograms, and more. Then jump into visualizing text data with word clouds. Create visualizations of geographic maps. Finally, learn how to create an interactive visualization of earthquakes.
- Introduction to data visualization with R/Python
- Basic Plots (Bar graphs, histograms, pie graphs)
- Scatterplots and line graphs
- Mapping geospatial data
- (R only: Shiny Dashboard Tutorial – Earthquakes)
Day 3 Morning: Machine learning
How can we get machines to learn from the data on their own? In this part you will get an overview of popular Machine Learning algorithms. To get hands-on practice with machine learning, you will work with real datasets and practice data mining techniques to predict house prices, classify food recipes, cluster weather stations, and also create a recommender system for books.
- Overview of Machine Learning
- Classification (Decision trees)
- Clustering (k-means)
- Recommender System (collaborative filtering)
Day 3 Afternoon: Analyzing Big Data
You will learn how to work with Big Data using Apache Spark. Spark is a lightweight front-end library that is used for distributed processing when dealing with big data.You will read data from a big dataset, preprocess and apply preprocessing operations.
- Introduction to Apache Spark programming with Python/R
- Reading data from a big dataset
- Selecting data, filtering, and aggregating big data
Day 4 All Day: Data Science Capstone Project
Let’s solve real-world problems with data science! In this section, using your newly-adopted skills, you will be given mentor-guided time to apply what you have learned to a real problem. Your project will include finding a problem, searching for an open dataset, preprocessing the data, summarizing and visualizing the data, and apply machine learning to find insight from your data. Publish your findings online and present your results to your classmates. When bootcamp is offered for a particular organization capstone projects can be customized to be based on usecases and problem sets relevant to the organization as well as utilize proprietary data sets in a secure and private environment.
- It is an in-person classroom bootcamp
- This bootcamp is scheduled at specific times
- You can participate as many times as you wish but space may be limited and preference will be given to first time attendees.
Certification & IBM Badge
After completing the bootcamp, you can take an optional, proctored final exam. If you achieve a passing grade, you will receive:
– A Certificate of Completion for “Data Science Bootcamp”
Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.
Polong Lin is a Data Scientist at IBM. Polong is responsible for educational content and the data science behind Cognitive Class. He is a regular speaker in conferences, and holds a M.Sc. in Experimental Psychology.