Cognitive Class

Moving Data into Hadoop

Open the door to move data into Hadoop to get the program working for you. This course’s emphasis on Sqoop and Flume is on purpose to get you where you need to be obstacle free.

Start the Free Course

About This Course

Get an overview of how to move data into Hadoop.

  • Learn about the different options for importing or loading data into HDFS from common data sources such as relational databases, data warehouses, web server logs, etc. (Apache Sqoop and Flume are covered in greater detail.)
  • Learn how to import/export data in and out of Hadoop from sources like databases.
  • Learn how to move data with Data Click for IBM BigInsights.

Course Syllabus

  • Module 1 - Load Scenarios
    1. Understand how to load data at rest, in motion
    2. Understand how to load data from common data sources e.g. RDBMS
  • Module 2 - Using Sqoop
    1. Import data from a relational database table into HDFS
    2. Use Sqoop import and export command
  • Module 3 - Flume Overview
    1. Describe Flume and its uses
    2. How Flume works
  • Module 4 - Using Flume
    1. List the Flume configuration components
    2. Describe how to start and configure a Flume agent
  • Module 5 - Using Data Click 
    1. Describe Data Click for BigInsights
    2. List the major components of Data Click

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.
  • There is only ONE chance to pass the course, but multiple attempts per question (see the Grading Scheme section for details).

Recommended skills prior to taking this course

  • Basic understanding of Big Data.
  • Basic understanding of the Scala / Python, or Java programming languages.


Course Staff

Glenn Mules instructor, Introduction to Big Data

Glen R.J. Mules

Glen R.J. Mules is a Senior Instructor and Principal Consultant with IBM Information Management World-Wide Education and works from New Rochelle, NY. Glen joined IBM in 2001 as a result of IBM's acquisition of Informix Software. He has worked at IBM, and previously at Informix Software, as an instructor, a course developer, and in the enablement of instructors worldwide. He teaches courses in BigData (BigInsights & Streams), Optim, Guardium, and DB2, & Informix databases. He has a BSc in Mathematics from the University of Adelaide, South Australia; an MSc in Computer Science from the University of Birmingham, England; and has just completed a PhD in Education (Educational Technology) at Walden University. His early work life was as a high school teacher in Australia. In the 1970s he designed, programmed, and managed banking systems in Manhattan and Boston. In the 1980s he was a VP in Electronic Payments for Bank of America in San Francisco and New York. In the early 1990s he was an EVP in Marketing for a software development company and chaired the ANSI X12C Standards Committee on Data Security for Electronic Data Interchange (EDI).

Warren Pettit

Warren Pettit

Warren Pettit has been with IBM for over 30 years. For the last 16 years, he has worked in Information Management education where he has been both an instructor and a course developer in the Data Warehouse and Big Data curriculums. For the nine years prior to his joining IBM, he was an application programmer and was responsible for developing a training program for newly hired programmers.


Art Walker

Art Walker has been with IBM for over 9 years. During this time he has worked in Information Management education, mostly with the IBM InfoSphere Information Server set of products. He has worked as both an instructor and as a curriculum developer. Before coming to IBM he worked as a DataStage consultant (DataStage is a component of Information Server), instructor, and curriculum developer for Ascential Software.

Kevin Wong

Kevin Wong is a Technical Curriculum Developer. He enjoys developing courses that focuses on the education in the Big Data field. Kevin updates courses to be compatible with the newest software releases, recreates courses on the new cloud environment, and develops new courses such as Introduction to Machine Learning. In addition to contributing to the transition of BDU, he has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, Phoenix, HBase, MapReduce & YARN, Sqoop and Oozie. Kevin is from the University of Alberta, where he has completed his third year of Computer Engineering Co-op and is currently a IBM Co-op Student.