At a Glance

Name: Web Scraping for Python using Beautiful Soup
Price: Free CAD
Rating: 4.5 (287 reviews)
Author: joseph_santarcangelo

Data is the fuel of Data Science. We can get data from databases and other data repositories. A lot of data is published as web pages. Web scraping is the process of harvesting data from web pages. BeautifulSoup is a Python library that allows for web scraping, parsing, and extracting data from HTML and XML documents. In this guided project, you will use BeautifulSoup to scrape the contents of a web page.

Web scraping with BeautifulSoup is a popular method for extracting data from websites and transforming the scraped data into a structured format for analysis and manipulation. BeautifulSoup provides a simple and efficient way to parse HTML and XML documents, an essential tool for web scraping projects.

In this guided project, you will learn how to download and scrape the contents of a webpage, allowing you to extract and store specific information for further analysis.

First, you’ll create a BeautifulSoup object and learn how to navigate its HTML structure using tags, children, parents, and siblings. Then, you’ll extract information, or elements, from HTML files by using filters, find_all, and find. Then, after you locate the specified elements, you will extract their text or attributes. Then, you’ll download and scrape the contents of a web page, including images and data from HTML tables, and convert the data into a Pandas DataFrame for further analysis.

Complete this guided project and gain the experience you need to begin successfully scraping web pages using BeautifulSoup.

A Look at the Project Ahead

After completing this project, you'll be able to:

Create a BeautifulSoup object
Extract information from HTML files
Download and scrape the contents of a web page

What You'll Need

For this project, you will need:

Familiarity with Python fundamentals
Familiarity with the basics of HTML
A web browser

Everything else is provided to you through the IBM Skills Network Labs environment, where you will have access to the Python service that we offer as part of the IBM Skills Network Labs environment. This platform works best with current versions of modern browsers.

IBM Skills Network Labs will provide you with everything you need to complete this project. However, if you are serious about Data Science, you should give IBM Watson® Studio a try. IBM Watson® Studio empowers data scientists, developers, and analysts to build, run and manage AI models, and optimize decisions anywhere on IBM Cloud Pak® for Data. Unite teams, automate AI lifecycles, and speed time to value on an open multi-cloud architecture. Get started with the IBM Watson Studio free of charge.

Offered By: IBMSkillsNetwork

Web Scraping for Python using Beautiful Soup

At a Glance

A Look at the Project Ahead

What You'll Need