🏆 Take the free Top-Rated Session from TechXchange in Las Vegas and Build Your First GenAI Application the Right Way! Learn more

Offered By: BigDataUniversity

Hadoop 101

This beginner Apache Hadoop course introduces you to Big Data concepts, and teaches you how to perform distributed processing of large data sets with Hadoop.

Continue reading

Course

Big Data

21.6k+ Enrolled
4.4
(2.54k+ Reviews)

At a Glance

This beginner Apache Hadoop course introduces you to Big Data concepts, and teaches you how to perform distributed processing of large data sets with Hadoop.


About This Course
 

Learn the basics of 

Apache Hadoop, a free, open source, Java-based programming framework. Why was it invented?
  • Learn about Hadoop's architecture and core components, such as MapReduce and the Hadoop Distributed File System (HDFS).
  • Learn how to add and remove nodes from Hadoop clusters, how to check available disk space on each node, and how to modify configuration parameters.
  • Learn about other Apache projects that are part of the Hadoop ecosystem, including Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Flume, among others. BDUprovides separate courses on these other projects, but we recommend you start here.

Course Syllabus

  • Module 1 - Introduction to Hadoop
    1. Understand what Hadoop is
    2. Understand what Big Data is
    3. Learn about other open source software related to Hadoop
    4. Understand how Big Data solutions can work on the Cloud
  • Module 2 - Hadoop Architecture
    1. Understand the main Hadoop components
    2. Learn how HDFS works
    3. List data access patterns for which HDFS is designed
    4. Describe how data is stored in an HDFS cluster
  • Module 3 - Hadoop Administration
    1. Add and remove nodes from a cluster
    2. Verify the health of a clusterStart and stop a clusters components
    3. Modify Hadoop configuration parameters
    4. Setup a rack topology
  • Module 4 - Hadoop Components
    1. Describe the MapReduce philosophy
    2. Explain how Pig and Hive can be used in a Hadoop environment
    3. Describe how Flume and Sqoop can be used to move data into Hadoop
    4. Describe how Oozie is used to schedule and control Hadoop job execution

General Information

  • This course is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.

Recommended skills prior to taking this course


 Requirements

  • None

Course Staff


Warren Pettit
Warren has been with IBM for over 30 years. For the last 16 years, he has worked in Information Management education where he has been both an instructor and a course developer in the Data Warehouse and Big Data curriculums. For the nine years prior to his joining IBM, he was an application programmer and was responsible for developing a training program for newly hired programmers. 




Asma Desai

Asma Desai just recently started with IBM. She has developed course content for introductory Java and graph theory. Prior to course development, she worked as a consultant using Big Data to fight fraud. 


Daniel Tran

Daniel Tran is an IBM Co-op Student working as a Technical Curriculum Developer in Toronto, Ontario. He develops courses to improve the education of customers who seek knowledge in the Big Data field. He has also reworked previously developed courses, updating them to be compatible with the newest software releases, as well as work at the forefront of recreating courses on a newly developed cloud environment. He has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, HBase, MapReduce & YARN, Sqoop, Oozie, and Phoenix. He has also worked on separate courses involving Machine Learning. Daniel is from the University of Alberta, where he has completed his third year of traditional Computer Engineering Co-op. 



Leons Petrazickis

Leons Petrazickis is the Ombud for Hadoop content on IBM Big Data U as well as the Platform Architect for Big Data U Labs. As a senior software developer at IBM, he uses Ruby, Python, and Javascript to develop microservices and web applications, as well as manage containerized infrastructure. 

Estimated Effort

20 Hours

Level

Beginner

Skills You Will Learn

Apache Spark, Big Data

Language

English

Course Code

BD0111EN

Tell Your Friends!

Saved this page to your clipboard!

Have questions or need support? Chat with me 😊