Big Data University

Apache Pig 101

Login to enroll
  • Course Number
  • Classes Start
    Any Time, Self-Paced
  • Estimated Effort
    4 Hours
  • Audience
  • Course Level
  • Language
  • Learning Path
  • Tell Your Friends


Learn about the two major components of Apache Pig: the first is the language itself, which is called PigLatin, and the second is the runtime environment where PigLatin programs are executed.

Apache Pig was initially developed at Yahoo! to allow people using Hadoop® to focus more on analyzing large data sets and spend less time having to write mapper and reducer programs. Like actual pigs, who eat almost anything, the Pig programming language is designed to handle any kind of data—hence the name!

  • Get an overview of Pig's data structures supported and how to access data using the LOAD operator.
  • Learn Pig's relational operators.
  • Learn Pig's evaluation functions, as well as math and string functions.


  • Module 1 - Pig data types
  • Module 2 - Built-in functions used with the LOAD and STORE operators
  • Module 3 - The difference between storing and dumping a relation
  • Module 4 - The Pig operators
  • Module 5 - Pig evaluation functions


  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.


  • Basic understanding of Apache Hadoop and Big Data.
  • Basic Linux Operating System knowledge.
  • Basic programming skills.


  • None


Warren Pettit

Warren Pettit has been with IBM for over 30 years. For the last 16 years, he has worked in Information Management education where he has been both an instructor and a course developer in the Data Warehouse and Big Data curriculums. For the nine years prior to his joining IBM, he was an application programmer and was responsible for developing a training program for newly hired programmers.


Daniel Tran

Daniel Tran

Daniel Tran is an IBM Co-op Student working as a Technical Curriculum Developer in Toronto, Ontario. He develops courses to improve the education of customers who seek knowledge in the Big Data field. He has also reworked previously developed courses, updating them to be compatible with the newest software releases, as well as work at the forefront of recreating courses on a newly developed cloud environment. He has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, HBase, MapReduce & YARN, Sqoop, Oozie, and Phoenix. He has also worked on separate courses involving Machine Learning. Daniel is from the University of Alberta, where he has completed his third year of traditional Computer Engineering Co-op.


Leons Petrazickis photo

Leons Petrazickis

Leons Petrazickis is the Ombud for Hadoop content on IBM Big Data U as well as the Platform Architect for Big Data U Labs. As a senior software developer at IBM, he uses Ruby, Python, and Javascript to develop microservices and web applications, as well as manage containerized infrastructure.