At a Glance

Name: Moving Data into Hadoop
Price: Free CAD
Rating: 0.0 (0 reviews)

This course describes techniques for moving data into Hadoop. There are a variety of ways to get data into Hadoop from simple Hadoop shell commands to more sophisticated processes. Several techniques are presented but three, Sqoop, Flume, and Data Click are covered in greater detail.

About This Course

This course gives an overview of Oozie and how it is able to control Hadoop jobs. It begins with looking at the components required to code a workflow as well as optional components such as case statements, forks, and joins. That is followed by using the Oozie coordinator in order to schedule a workflow.

One of the things that the student will quickly notice is that workflows are coded using XML which tends to get verbose. The last lesson of this course shows a graphical workflow editor tool designed to simplify the work in generating a workflow.

Course Syllabus

After completing this course, you should be able to:

Describe the MapReduce model v1
List the limitations of Hadoop 1 and MapReduce 1
Review the Java code required to handle the Mapper class, the Reducer class, and the program driver needed to access MapReduce
Describe the YARN model
Compare YARN / Hadoop 2 / MR2 vs Hadoop 1 / MR1

What will I get after passing this course?

You will receive a completion certificate.
You will receive the IBM Explorer - Big Data Administration badge.

Course Syllabus

Lesson 1: Load Scenarios

List different scenarios to load data into Hadoop
Understand how to load data at rest, in motion and from sources

Lesson 2: Using Sqoop

Import data from a relational database into HDFS
Use Sqoop import and export command

Lesson 3: Flume Overview

Describe Flume and its uses

Lesson 4: Using Flume

List the Flume configuration components
Describe how to start and configure a Flume agent

Lesson 5: Using Data Click v10 for BigInsights to Offload Data to Hadoop

Describe Data Click for BigInsights
List the major components of Data Click

General Information

This course is free.
It is self-paced.
It can be taken at any time.
It can be taken as many times as you wish.

Recommended skills prior to taking this course

Basic understanding of Apache Hadoop and Big Data.
Basic Linux Operating System knowledge.
Basic understanding of the Scala, Python, or Java programming languages.

Grading scheme

The minimum passing mark for the course is 60%, where the review questions are worth 40% and the final exam is worth 60% of the course mark.
You have 1 attempt to take the exam with multiple attempts per question.

Requirements

None.

Course Staff

Glen R.J. Mules

Glen R.J. Mules is a Senior Instructor and Principal Consultant with IBM Information Management World-Wide Education and works from New Rochelle, NY. Glen joined IBM in 2001 as a result of IBM's acquisition of Informix Software. He has worked at IBM, and previously at Informix Software, as an instructor, a course developer, and in the enablement of instructors worldwide. He teaches courses in BigData (BigInsights & Streams), Optim, Guardium, and DB2, & Informix databases. He has a BSc in Mathematics from the University of Adelaide, South Australia; an MSc in Computer Science from the University of Birmingham, England; and has just completed a PhD in Education (Educational Technology) at Walden University. His early work life was as a high school teacher in Australia. In the 1970s he designed, programmed, and managed banking systems in Manhattan and Boston. In the 1980s he was a VP in Electronic Payments for Bank of America in San Francisco and New York. In the early 1990s he was an EVP in Marketing for a software development company and chaired the ANSI X12C Standards Committee on Data Security for Electronic Data Interchange (EDI).

Warren Pettit

Warren Pettit has been with IBM for over 30 years. For the last 16 years, he has worked in Information Management education where he has been both an instructor and a course developer in the Data Warehouse and Big Data curriculums. For the nine years prior to his joining IBM, he was an application programmer and was responsible for developing a training program for newly hired programmers.

Art Walker

Art Walker has been with IBM for over 9 years. During this time he has worked in Information Management education, mostly with the IBM InfoSphere Information Server set of products. He has worked as both an instructor and as a curriculum developer. Before coming to IBM he worked as a DataStage consultant (DataStage is a component of Information Server), instructor, and curriculum developer for Ascential Software.

Frequently Asked Questions

What web browser should I use?

The Open edX platform works best with current versions of Chrome, Firefox or Safari, or with Internet Explorer version 9 and above.

See our list of supported browsers for the most up-to-date information.

Offered By: Big Data University

Moving Data into Hadoop

At a Glance

About This Course

Course Syllabus

After completing this course, you should be able to:

What will I get after passing this course?

Course Syllabus

General Information

Recommended skills prior to taking this course

Grading scheme

Requirements

Course Staff

Glen R.J. Mules

Warren Pettit

Art Walker

Frequently Asked Questions

What web browser should I use?

This Course is part of the following Learning Paths