Cognitive Class

Data Science Hands-on with Open Source Tools (Archived)

Learn and try out the most popular open data science tools like Jupyter Notebooks, RStudio IDE, Apache Zeppelin, OpenRefine, and more. Tools are available to use directly via Cognitive Class Labs (Data Scientist Workbench).

Start the Free Course

Cognitive Class is committed to offer quality and current content. This course is archived as of September 18th, 2017 (end date).   A new version of this course is now available! The archived course is fully available all the time to those who enrolled before the course end date. They can continue with the course as usual, and even obtain a completion certificate; however we encourage you to take the updated course.  Certificate information will stay in your profile even after the course is archived, though the course name will be changed to include the course version number.

About This Course

Get started with some of the most popular tools for collaborative data science, including RStudio IDE, Jupyter Notebooks, Apache Zeppelin notebooks, Seahorse by and OpenRefine. Use the tools directly on Cognitive Class Labs (AKA Data Scientist Workbench or BDU Labs), a free virtual lab environment that brings powerful open data science tools together so you can analyze, visualize, explore, clean data, run models and create apps.

Course Syllabus

  • Module 1 -Introducing Cognitive Class Labs (Data Scientist Workbench)
    • What is Cognitive Class Labs (Data Scientist Workbench)?
    • Data Scientist Workbench Account features
    • Creating a Data Scientist Workbench account
    • Managing data within My Data
    • LAB: Getting Started with Cognitive Class Labs
  • Module 2 -Introducing Jupyter Notebooks
    • What are Jupyter notebooks?
    • Getting started with Jupyter
    • Data and Notebooks in Jupyter
    • Sharing your Jupyter Notebooks and data
    • Apache Spark in Jupyter Notebooks
    • LAB: Getting Started with Jupyter Notebooks
  • Module 3 - Introducing Zeppelin Notebooks
    • What are Zeppelin Notebooks?
    • Zeppelin for Scala
    • Getting started with Zeppelin
    • Managing your Interpreters in Zeppelin
    • Apache Spark in Zeppelin Notebooks
    • LAB: Getting Started with Apache Zeppelin Notebooks
  • Module 4 - Introducing RStudio IDE
    • What is RStudio IDE?
    • Uploading files, Installing Packages and loading libraries in RStudio IDE
    • Getting started with RStudio IDE
    • RStudio Environment and History
    • Apache Spark in RStudio IDE
    • LAB: Getting Started with RStudio IDE
  • Module 5 - Introducing Seahorse
    • What is Seahorse?
    • A Glimpse of Seahorse's Features
    • Getting started with Seahorse on Cognitive Class Labs
    • Creating and uploading Seahorse Workflows on Cognitive Class Labs
    • Exporting and Cloning the Seahorse Examples on Cognitive Class Labs
    • LAB: Getting Started with Seahorse
  • Module 6 - Introducing OpenRefine
    • Preparing data with OpenRefine
    • LAB: Getting Started with OpenRefine

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.

Recommended skills prior to taking this course

  • None


  • None

Course Staff

Polong Lin, Data Science Bootcamp instructor

Polong Lin

Polong Lin is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through BDU. Polong is a regular speaker in conferences and meetups, and holds a M.Sc. in Cognitive Psychology.

Dr. Saeed Aghabozorgi, Data Science Bootcamp instructor

Saeed Aghabozorgi

Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.