Cognitive Class

Simplifying Data Pipelines with Apache Kafka

When you hear the terms, producer, consumer, topic category, broker, and cluster used together to describe a messaging system, something is brewing in the pipelines. Get connected and learn what that is, and what it means!

Start the Free Course

About This Course

Learn what Big Data use cases have in common - the use of Apache Kafka somewhere in the mix.

Whether the distributed, partitioned, replicated commit log service is being used for messaging, website activity tracking, stream processing or more, there's no denying it is a hot technology.

  • Learn how Kafka is used in the real world and its architecture and components.
  • Get quickly up-and-running, producing and consuming messages using both the command line tools and the Java APIs.
  • Get hands on experience connecting Kafka to Spark, and working with Kafka Connect.

Course Syllabus

  • Module 1 - Introduction to Apache Kafka
    1. What Kafka is and why it was created
    2. The Kafka Architecture
    3. The main components of Kafka
    4. Some of the use cases for Kafka
  • Module 2 - Kafka Command Line
    1. The contents of Kafka's /bin directory
    2. How to start and stop Kafka
    3. How to create new topics
    4. How to use Kafka command line tools to produce and consume messages
  • Module 3 - Kafka Producer Java API
    1. The Kafka producer client
    2. Some of the KafkaProducer configuration settings and what they do
    3. How to create a Kafka producer using the Java API and send messages both synchronously and asynchronously
  • Module 4 - Kafka Consumer Java API
    1. The Kafka consumer client
    2. Some of the KafkaConsumer configuration settings and what they do
    3. How to create a Kafka consumer using the Java API
  • Module 5 - Kafka Connect and Spark Streaming
    1. Kafka Connect and how to use a pre-built connector
    2. Some of the components of Kafka Connect
    3. How to use Kafka and Spark Streaming together

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.

Recommended skills prior to taking this course

  • Basic understanding of Apache Hadoop and Big Data.
  • Basic Linux Operating System knowledge.
  • Basic understanding of the Scala, Python, R, or Java programming languages.


Course Staff

Aaron Ritchie

Aaron Ritchie has worked at IBM for over 11 years and has held a variety of roles within the Center of Excellence and Education groups. Aaron enjoys creating courses that educate and entertain and working with a wide assortment of open-source technologies. In addition to being the lead curriculum developer for IBM Streams, Aaron writes courses for BDU and is one of the dynamic duo in the "Big Data Dudes" YouTube series. Aaron holds a Bachelor of Science in Computer Science degree from Clarkson University and a Master of Science in Information Technology degree from WPI.