Cognitive Class

Using HBase for Real-time Access to your Big Data

A new slant on your research has landed in your lap? Are you too far down the rabbit hole to make the changes you need? HBase, an open source Hadoop database used for random, real-time read/writes to your data. Here is your carrot.. see how it works.

Start the Free Course

About This Course

Get an overview of HBase, how to use the HBase API and Clients, its integration with Hadoop.

HBase is the open source Hadoop database used for random, real-time read/writes to your Big Data.

  • Learn how to set it up as a source or sink for MapReduce jobs, and details about its architecture and administration, including labs for practice and hands-on learning.
  • Learn how HBase runs on a distributed architecture on top of commodity hardware, including practice and hands-on learning of the following features:
    • Linear and modular scalability
    • Strictly consistent read and writes
    • Automatic and configurable sharding of tables
    • Automatic failover support between RegionServers
    • Easy to use Java API for client access

Course Syllabus

  • Module 1 - Introduction to HBase
    1. HBase Overview, CAP Theorem and ACID properties
    2. Roles of HBase and difference between RDBMS
    3. HBase Shell and Tables
  • Module 2 - HBase Client API - The Basics
    1. Use of Java API for Batch, Scan, and Scan operations
  • Module 3 - Client API: Administrative and Advance Features
    1. Use of administrative operations and schemas
    2. Use of Filters, Counters, and ImportTSV tool
  • Module 4 - Available HBase Clients
    1. Understand how interactive and batch clients interact with HBase
  • Module 5 - HBase and MapReduce Integration
    1. Understand how MapReduce works in the Hadoop framework
    2. How to setup HBase as a source and a sink
  • Module 6 - HBase Configuration and Administration
    1. Configuration of HBase for various environmental optimization
    2. Architecture and administrative tasks

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.
  • Labs can be performed on the Cloud, or using a 64-bit system. If using a 64-bit system, you can install the required software (Linux-only), or use the supplied VMWare image. More details are provided in the section "Labs setup".
  • There is only ONE chance to pass the course, but multiple attempts per question (see the Grading Scheme section for details)

Recommended skills prior to taking this course

Requirements

Course Staff

Henry Quach, Instructor of SQL Access for Hadoop

Henry L. Quach

Henry L. Quach is the Technical Curriculum Developer Lead for Big Data. He has been with IBM for 9 years focusing on education development. Henry likes to dabble in a number of things including being part of the original team that developed and designed the concept for the IBM Open Badges program. He has a Bachelor of Science in Computer Science and a Master of Science in Software Engineering from San Jose State University.

Kevin Wong

Kevin Wong is a Technical Curriculum Developer. He enjoys developing courses that focuses on the education in the Big Data field. Kevin updates courses to be compatible with the newest software releases, recreates courses on the new cloud environment, and develops new courses such as Introduction to Machine Learning. In addition to contributing to the transition of BDU, he has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, Phoenix, HBase, MapReduce & YARN, Sqoop and Oozie. Kevin is from the University of Alberta, where he has completed his third year of Computer Engineering Co-op and is currently a IBM Co-op Student.

Digital Developer Conference: Hybrid Cloud. On Sep 22 & 24, start your journey to OpenShift certification.Register for free