At a Glance
This beginner Apache Hadoop course introduces you to Big Data concepts, and teaches you how to perform distributed processing of large data sets with Hadoop.
About This Course Learn the basics of
- Learn about Hadoop's architecture and core components, such as MapReduce and the Hadoop Distributed File System (HDFS).
- Learn how to add and remove nodes from Hadoop clusters, how to check available disk space on each node, and how to modify configuration parameters.
- Learn about other Apache projects that are part of the Hadoop ecosystem, including Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Flume, among others. BDUprovides separate courses on these other projects, but we recommend you start here.
Module 1 - Introduction to Hadoop
- Understand what Hadoop is
- Understand what Big Data is
- Learn about other open source software related to Hadoop
- Understand how Big Data solutions can work on the Cloud
Module 2 - Hadoop Architecture
- Understand the main Hadoop components
- Learn how HDFS works
- List data access patterns for which HDFS is designed
- Describe how data is stored in an HDFS cluster
Module 3 - Hadoop Administration
- Add and remove nodes from a cluster
- Verify the health of a clusterStart and stop a clusters components
- Modify Hadoop configuration parameters
- Setup a rack topology
Module 4 - Hadoop Components
- Describe the MapReduce philosophy
- Explain how Pig and Hive can be used in a Hadoop environment
- Describe how Flume and Sqoop can be used to move data into Hadoop
- Describe how Oozie is used to schedule and control Hadoop job execution
- This course is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
Recommended skills prior to taking this course
- Knowledge about Big Data concepts
Warren Pettit Warren has been with IBM for over 30 years. For the last 16 years, he has worked in Information Management education where he has been both an instructor and a course developer in the Data Warehouse and Big Data curriculums. For the nine years prior to his joining IBM, he was an application programmer and was responsible for developing a training program for newly hired programmers.
November 19, 2020
April 27, 2022
This Course is part of the following Learning Paths
Beginner Learning PathBig Data Fundamentals
Are you interested in understanding 'Big Data' beyond the terms used in headlines? Then select this learning path as an introduction to tools like Apache Hadoop and Apache Spark Frameworks, which enable data to be analyzed on mass, and start the journey towards your headline discovery.3 Courses
Beginner Learning PathHadoop Fundamentals
Are you interested in moving beyond the elephant in the room and understanding Hadoop as a foundational tool set in your future? Then select this learning path to gain exposure to the tools used in Big Data, Hadoop's core components and supporting open source projects.4 Courses